NEWS.md
sashimiDataConstants()
is a subset of sashimiAppConstants()
that includes only the required data, and not the R-shiny app components such as the help text and widgets. It was updated to store all data inside an environment, which updates data in that environment during processing as needed.get_fn_envir()
is a helper function that returns a variable value from the calling function, or from one or more provided environments. The use case is to allow a user to define function variable values directly, then to use the environment values otherwise.launchSashimiApp()
was changed so that it uses a user-supplied environment to store intermediate data for the R-shiny app. By default this environment is globalenv()
for backward compatibility, however it is recommended to create a new environment to insulate data from the globalenv()
.
sashimiAppConstants()
was updated to use environment as input and the new function get_fn_envir()
. Note that previous input would ultimately use values in globalenv()
even when using a specific environment. The default was to use globalenv()
so that behavior was not problematic. The new behavior will strictly only use either the function argument, or the value in the specific environment provided.
import_juncs_from_bed()
was refactored completely, so it is able to handle BED12 input as well as "SJ.out.tab"
junction output directly from STAR alignments. This refactor will cause all previous memoise cache for junction files to be invalidated, and thus downloaded again. These files are small and fast to download, but it is worth noting. The underlying method now uses data.table::fread()
then it decides whether to convert from BED format with as(bed, "GRanges")
or converts from "SJ.out.tab"
when there are 9 columns.
Several functions had the verbose
output reduced, instead when verbose > 1
the more verbose output is printed.
getGRcoverageFromBw()
was updated to fix a bug that really originated from rtracklayer::import.bw()
, which returns output sorted by the bigWig file chromosome sort order, and not by the order of rtracklayer::BigWigSelection()
as documented in rtracklayer::import.bw()
. The workaround implemented in getGRcoverageFromBw()
is to return coverage in the exact order requested. Note this issue will not affect sashimi plots since each gene is contained within one chromosome, and in this case the coverage is returned by rtracklayer
in the order it is requested within a chromsome, but rtracklayer
bug returns results for each chromosome together regardless the order it was requested. Also this update was performed outside the memoise call, so the cache will store data in the order received from rtracklayer::import.bw()
but will re-order those results consistent with the order of input GRanges.sashimiAppUI()
was updated to change “Sample Selection” to use dataTable
output.sashimiAppServer()
was updated to handle dataTable
widget for “Sample Selection” and sample ordering, removing the previous method of drag-and-drop. The reasons: first, the drag-and-drop was not mobile-friendly; second, the widget apparently broke after the recent updates. The error was reproduced by changing the sample selection, then creating a new sashimi figure - in that case no sample selections were retained and the figure failed due to no selected samples.sashimiAppConstants()
was updated to include a text box with full sessionInfo()
output, inside a box that is collapsed and hidden by default. This output is on the "Guides"
tab below the section "Relevant R version info"
.sashimiAppUI()
was updated to correct some visual glitches in the R-shiny app, using the shinydashboard::box()
instead of shinydashboardPlus::box()
except where required. The shinyWidgets::sliderTextInput()
appears to detect the wrong color for title text and tick marks, so the sidebar background was changed to medium grey to try to account for those oddly inconsistent choices. It currently cannot be customized.gene2gg()
new argument geneSymbolColname
allows using a specific gene symbol column name, instead of the previous default "gene_name"
. The corresponding ggplot2::aes()
was changed to ggplot2::aes_()
to allow using a variable as a name.gene2gg()
argument hjust
was changed to hjust=-0.5
because somehow the apparent behavior of hjust
in ggrepel
has changed in recent versions. The previous value hjust=2
is roughly the same as the new value hjust=-0.5
which is… unexpected.Several updates were implemented to correct behavior seen only with long-running R-shiny apps, such as https://splicejam.vtc.vt.edu unfortunately. These changes should help make the R-shiny app more robust for all users, but are particularly targeted at the longer-running R-shiny servers.
The shinydashboardPlus
package version 2.0.0 included numerous of “breaking changes” that also required numerous updates to the R-shiny sashimi app. Functions were renamed (dashboardHeaderPlus()
, boxPlus()
) and introduced name conflicts with the existing shinydashboard
package - and so required package prefixing to ensure the correct function is being called.
flattenExonsBy()
was updated to warn on disjoint exons correctly this time.flattenExonsBy()
several updates to ensure proper handling of detectedTx
in edge cases. Code was somewhat cleaned up to make each step more clear.sashimiAppServer()
was updated to force several instances of default behavior for some form components that apparently do not return values upon the initial start-up of the R-shiny app. This behavior may be fixed properly in future, so that the values are initially available. For now, the fallback to use default values is chosen.sashimiAppServer()
fixed bug where detectedTx
was not showing even when check box “Show transcripts” and “Detected only” were both checked. In future unchecking “Detected only” should include more transcripts for genes that have a subset of detected transcripts, thus widening the field of view so to speak.sashimiAppConstants()
was updated to print more detail regarding detectedTx
entries during the preparation steps. When there are no detectedTx
entries that match tx2geneDF$transcript_id
then it prints the first 20 lines of tx2geneDF
for visual review. In these cases the resolution might be to remove the local file ".tx2geneDF.txt"
which will force the file to be re-created from the GTF source file, and may resolve the mismatch.Apparently for .tx2geneDF.txt
files that existed in older versions of splicejam, the stored file included rownames that were ignored upon load. When splicejam switched to use data.table::fread()
and data.table::fwrite()
it briefly broke the ability to keep the header line because header=TRUE
was not working as expected. Removing header=TRUE
causes data.table
to detect the rownames and add a new column which is ignored. All future versions of splicejam should not be affected.
"testthat"
package dependency, which requires version 3.0.0 or higher to avoid the bug associated with testthat_print() not found
.assignGRLexonNames()
was updated to include examples, and to remove a redundant step when checking for disjoint ranges. The process was updated to clarify how geneSymbol values are obtained, from values(GRL)[[geneSymbolColname]]
, values(GRL@unlistData)[[geneSymbolColname]]
, then names(GRL)
in order of the first one with valid values. It now also handles the case where names(GRL)
is not defined, but where values are present via geneSymbolColname
.jam_isDisjoint()
was updated to return a logical
vector for GRangesList
input, representing each GRanges
element in the GRangesList
. This change is consistent with GenomicRanges::isDisjoint()
without sacrificing speed.splicejam::gene2gg()
when only flatExonsByGene
is supplied without flatExonsByTx
the GRangesList
coersion no longer allows one object to be NULL
. Workaround is to ignore the missing object which keeps everything as GRangesList
without needing coersion.assignGRLexonNames()
that appears to be caused by a new Bioconductor data type FactorList
, which requires explicit conversion to list
for this function to work as expected.makeTx2geneFromGtf()
was updated to prevent edge cases in parsing GTF files, conversion to data.frame
with stringsAsFactors=FALSE
just to make sure, for R versions lower than 4.0
.defineDetectedTx()
was updated with slight change to the percent max isoform expression. The previous calculation used the higher of max expression or 1
in the percent maximum, to prevent divide by zero. However, it assumed expression should be 1
or higher, making the resulting percentages lower than 100% max for genes whose isoforms all had values less than 1
. Originally this was a feature, since genes with expression less than 1
were not beneficial, however when using TPM values it is useful to allow values less than 1
. New arguments: floorTPM
and floorCounts
allow custom minimum values, the defaults are 0.001.annotateGRfromGR()
was optimized for edge cases where there were a low number of multi-overlaps and large number of single-overlaps.grl2df()
was not properly stacking junctions in sashimi plots, the problem was caused by closestExonToJunctions()
which annotates junctions by the nearest compatible exon boundary, with no regard to distance from the exon. The stackJunctions()
function simply stacks based upon "nameFrom"
and "nameTo"
, so junctions that land between exons were being stacked together with junctions that properly align those exons. Now closestExonToJunctions()
has optional argument spliceBuffer
, when supplied a distance, exons beyond that distance are annotated by exonName.distance
. Novel junctions will stack only when they share the same distance from an annotated exon.jam_isDisjoint()
is a rapid alternative to GenomicRanges::isDisjoint()
, that tests GRanges
or GRangesList
for disjoint (non-overlapping) ranges. In one test with large GRangesList
, the duration was reduced from 100+ seconds to 0.5 seconds.makeTx2geneFromGtf()
was optimized again, substantially reducing the time and memory required. The rownames returned are also defined by values in colname "transcript_id"
if it or similar colname exists, passed through jamba::makeNames()
to ensure they are unique.
Updated help documentation for assignGRLexonNames()
.
Added \preformatted
tags to help docs for curateVtoDF()
and curateDFtoDF()
, otherwise the examples are not properly formatted with whitespace.
internal_junc_score()
new argument minScore=0
will return this minimum score, especially useful if for some reason the input data contains no valid score colname. Otherwise the minScore
is applied to the absolute value of the score, keeping the original sign. (Zero becomes positive.)
detectedTxInfo()
was updated to handle being supplied both Gene
and Tx
values, and a new optional argument Groups
to allow returning a subset of group colnames.
defineDetectedTx()
new argument applyTxPctTo
defines which expression data to use (count data or TPM abundance data) for the percent max calculation. The purpose of this new argument is to specify whether to apply the percent max threshold to TPM, counts, or a combination of the two:
"TPM"
uses only TPM data for percent max thresholding"counts"
uses only count data"either"
uses the higher percentage from TPM and count"both"
uses the lower percentage from TPM and countAs a result, the output is also expanded to include list elements:
"txPctMaxTxGrpAll"
"txPctMaxTxTPMGrpAll"
, in addition to"txPctMaxGrpAll"
still represents the data used for filtering, modified relative to applyTxPctTo
.Several changes were intended to help set up custom R-shiny apps, with specific default settings different than the Farris et al defaults.
sashimiAppConstants()
was updated to enhance the logic flow, and to print more verbose and hopefully more helpful output during the process.
sashimiAppConstants()
uses data.table::fread()
and data.table::fwrite()
. Fixed inconsistency in row.names and col.names.
sashimiAppServer()
was updated to clarify some verbose output, and define default gene more robustly.
launchSashimiApp()
, sashimiAppServer()
, sashimiAppUI()
now respect "gene"
from the environment to use as the first gene loaded, instead of using "Gria1"
which is intended only for the Farris et al data.
launchSashimiApp()
now properly handles server-configured options
such as width, server, port, and other R-shiny app options. Most important are server and port, which allows the R-shiny app to be run manually on ports above 8000 if necessary.
launchSashimiApp()
now properly initializes with variables from the active R environment. More variables will be converted, along with robust error checking. For now, these variables are available:
gene
which defines the first gene displayedsample_id
which defined the set of sample_id values to displaymin_junction_reads
- minimum junction read thresholduse_exon_names
- one of c("coordinates", "exon names")
exon_range_selected
- two gene_nameExon valuesexon_range_choices_default
- exon labels available for default genelayout_ncol
- number of columns in layoutinclude_strand
- one of c("+","-","both")
strands to displayshare_y_axis
- one of c(TRUE, FALSE)
whether to use shared y-axis rangeassignGRLexonNames()
was updated to be more tolerant of “problematic GTF file” data, specifically to allow optionally keeping genes that are found on multiple strands, which previously were removed by default. The idea of keeping a multi-strand gene is not ideal when thinking about sashimi plots, but can be useful when considering gene exons, especially when genome assembly may also be unfinished. Tools such as featureCounts will generate counts across all exons for a gene, regardless of strand and chromosome, so the output from assignGRLexonNames()
will be compatible. The argument filterTwoStrand=FALSE
will keep features present on multiple strands (or chromosomes.)
assignGRLexonNames()
now calls new function jam_isDisjoint()
to test for disjoin features. For a large test GRangesList
object, the duration was reduced from 60 seconds to 0.5 second.
flattenExonsBy()
was also updated with new argument filterTwoStrand
which is passed to assignGRLexonNames()
after exons have been flattened. Use filterTwoStrand=FALSE
to keep all gene features even when they are present on multiple strands or chromosomes.
flattenExonsBy()
new argument exon_method
which has two options:
exon_method="disjoin"
is the default, which combines transcript exons per gene and maintains distinct boundaries, in case an exon is longer in one isoform than another, it is subdivided into something like c("GENE_exon1a", "GENE_exon1b")
.exon_method="reduce"
is a new alternative, it uses GenomicRanges::reduce()
to combine exons across transcripts, and therefore ignores any subdivision of exons. In the same example above it would produce only "GENE_exon1"
. The main benefit from this option is that it is markedly faster for genome-wide data preparation, where "disjoin"
may take 2 to 5 minutes, "reduce"
may only take several seconds.Setting up a new sashimi plot is not crystal clear, and most preparatory steps can be done once and never again (for each version of Gencode.) Ideally, install an R package that contains:
txdb
(optional, but not provided by Bioconductor currently)groups2contrasts()
updated a bug that was not maintaining factor order in two-way contrasts. Slightly enhanced logic used in argument removePairs
so that it works with single factor contrasts (where that factor does not change.)defineDetectedTx()
now uses decimal values for the mean count and mean abundance group values, rounded to the 0.1 decimal place.defineDetectedTx()
default argument value cutoffTxTPMExpr=0.1
which was previously cutoffTxTPMExpr=2
. It appears that this threshold is somewhat dependent upon the input data, and that 2
was too stringent for the majority of datasets testes since the Farris et al publication. Therefore, a more suitable default value is less restrictive of low-TPM abundance values. This change also makes the count-based threshold the dominant filter for technical detection above background noise, which makes some sense because count noise is somewhat more of a hard boundary.Overall, the changes allow some coverage data to be “missing”, though it will generally try twice to retrieve coverage. When coverage is “missing” it is rendered empty, with no other error or message. This change may not be ideal, if the coverage URL is mis-typed for example, but it beneficial during a network outage.
sashimiAppServer()
was updated to repeat the call to prepareSashimi()
when the returned object contained a list element some_null=TRUE
, which indicates that one or more underlying elements of the data was returned as NULL
, indicating a failure. In this case, prepareSashimi()
is called in a way that does not re-use the existing cache. Note that prepareSashimi()
calls other functions, each of which caches data. If those functions return NULL
, they also assign attribute attr(x, "some_null") <- NULL
, which helps cascade this failure up the chain to calling functions.
import_juncs_from_bed()
was updated to use tryCatch()
to catch errors when the requested file cannot be retrieved. When memoise cache is being used, it will try to clear the cache then try again, or will try again without using memoise.
getGRcoverageFromBw()
was updated to enhance the memoise cache repair logic: If coverage is zero-length, it tries again. Zero-length coverage can happen from a zero-length memoise cache file, or when the rtracklayer::import()
function returns zero-length coverage. That tends to happen when the file is not accessible, network is down, or the path is invalid, etc. Nonetheless, memoise will happily store a zero-size cache file (!) with no clear way to remove or ignore it. So we try to call memoise::drop_cache()
which only exists in memoise version 1.1.0.9000
, currently on Github with no expected CRAN release timeline. If we cannot remove the cache file, we try to get coverage without using memoise. The overall summary:
combineGRcoverage()
was modified to be tolerant to missing coverage data. Typically when coverage is missing, the data returned from getGRcoverageFromBw()
returns a vector of 0
values, however sometimes there is no data returned for a single URL, perhaps due to network outage. Nonetheless, combineGRcoverage()
will now ignore any samples that are not present in the colnames of the GRanges
object supplied.
launchSashimiApp()
function which starts the R-shiny app sometimes encounters errors when the required files are not available, as has been the case with server or network outages. Since each requested file is stored in a memoise
local file cache, a previously-accessed gene should pull from the cache and avoid these errors. But newly accessed genes result in corrupted (or zero-size) cache files. Splicejam tries to detect corrupt cache files and reload from the source (referring to filesDF
) however this process is proving to be imperfect, and the sort of thing that is difficult to debug.prepareSashimi()
now positions junction labels at the maximum edge of the junction arc, in stranded fashion. Might consider adding text adjustment so the label is not centered at the edge.plotSashimi()
now applies scales::comma(..., accuracy=1)
by default, which removed the decimal values from junction labels. Some recent update to scales::comma()
seems to have caused decimal values to appear by default. A new argument junc_accuracy
is passed to scales::comma(..., accuracy=junc_accuracy)
to allow customization.plotSashimi()
fixed issue where junc_alpha
was only being applied when data contained junctions without coverage, now the junc_alpha
is applied in all cases.sashimiAppServer()
now displays the strand of the gene found in the top search pane, which allows someone to set the coverage strand consistent with the gene if desired."TypeError: postRenderHandlers is undefined"
. Upon deeper inspection, the uiOutput()
element being used to display both ggplot and ggplotly output, was somehow not providing the plotly javascript dependencies to include relevant postRenderHandlers
. The workaround was to add an empty plotlyOutput("plotly_blank")
to the UI, which appears to cause the required javascript dependencies to be loaded. Stackoverflow threads and Github issues suggest there exists a method to load dependencies when embedding htmlwidgets inside tagList context – for example when embedding a custom htmlwidget inside a table cell. Perhaps the proper remedy is related.annotateGRfromGR()
with numeric columns, which failed to initialize a new GRanges column with numeric(0) instead of using a numeric form of NA, which was accomplished with c(0, NA)[2]
.plotSashimi()
argument default ylab="read depth"
, previously it was "score"
. In future, normalized data might use ylab="normalized read depth"
but that would be a custom option for the analyst.plotSashimi()
arguments "xlabel"
and "xlabel_ref"
control the x-axis label, which by default xlabel_ref=TRUE
will use the reference (chromosome) as the x-axis label, which makes sense because the axis shows chromosome coordinates. The "xlabel"
argument is intended to allow a fully custom label.compressPolyM()
edge case, polygons with only two x values and y values all zero; threw an error Error in x[idx1, ] : only 0's may be mixed with negative subscripts
. New rule requires at least 5 x-axis values, otherwise compression doesn’t seem necessary anyway.compress_introns
added to exoncov2polygon()
and prepareSashimi()
, which determines whether to call compressPolygonM()
when creating polygon coordiantes for coverage data. Setting compress_introns=FALSE
saves about 20% of the time it takes to prepare sashimi data.compressPolyM()
argument from minRatio=3
to minRatio=5
, the threshold of compression above which polygon coordinates are adjusted.sashimiAppServer()
to create the memoise version of prepareSashimi()
only once, instead of re-creating it each time. Probably no noticeable effect, but it feels better.prepareSashimi()
to change the "name"
column to a factor, in order to force the drawing order of junction arcs, so junctions are drawn in order of most dominant, to least dominant, then shorter spans to longer spans. Smaller score, wider junctions are drawn last, since they are typically the most difficult to see otherwise. “Dominant”” refers to the junction_rank
calculated by having the highest score among junctions that start or end at the same coordinate: junction_rank=3
means the junction has the highest score on both sides, and has the darkest color; junction_rank=2
has the highest score on one side but not both sides, and is lighter in color; and junction_rank=1
does not have the highest score on either side of the junction, and has the lightest color. The junction_rank=1
entries are now drawn last, since they typically have the lowest scores, and are small and thus easily obscured.splicejam-sashimi-server()
was updated to handle changes in the gene search field properly, avoiding edge cases with un-detected genes that have no sashimi data.prepareSashimi()
was updated to handle absence of splice junction data, in a more robust way.spliceGR2junctionDF()
was updated to handle missing junction data, ultimately returning NULL which is easier handle consistently by other downstream functions. Also made small update to handle sample_id
as a factor when input data is supplied only as a GRanges object.bgaPlotly3d()
new argument sampleGroups
allows data to be re-grouped to define custom group centroids. Main driver is when running BGA on un-grouped data, for example where each replicate is its own group. Using sampleGroups
allows the proper sample group to be highlighted on the plot. This scenario is mainly only advised when needing to run standard PCA or COA, or when there are fewer than four groups, since BGA produces n-1
dimensions. Code was also slightly update to prepare for broader use outside of BGA.bgaPlotly3d()
was updated to handle the "textposition"
argument to plotly labels, allowing individual adjustment of each centroid and sample label.bgaPlotly3d()
new argument geneLabels
allows replacing gene identifier with a custom label, for example replacing assay ID with gene symbol.gene2gg()
updated to maintain a minimum y-axis range when labelExons=FALSE
. Previously the exons could be cropped.bgaPlotly3d()
bug fixed which prevented display of scaled coordinates from the bgaInfo
object, mainly because the scaled coordinates are not supplied for individual samples. Instead the scaling factor (adjustment used to convert centroid coordinates from raw to scaled) is applied to sample coordinates to produce equivalent scaled coordinates. That said, I recommend using un-scaled coordinates, which keeps the x, y, and z axis scales proportional to their relative contribution, which visually reinforces the relative strength of separation in each dimension.stat_diagonal_wide_arc()
which calls ggforce::StatBezier
. Version 3.1.0 of ggforce changed the required call to that function, breaking the dependency and causing junction arcs to be invisible as a result. The new interface mimics how ggforce::stat_diagonal_wide()
calls ggforce::StatBezier
. I hope the interface does not change in future.ggforce
version 0.3.1, which is the point where the API changed with ggforce::StatBezier
.org.Mm.eg.db
. Now if the package is not available, it skips the corresponding sections."Guides"
tab of the R-shiny Sashimi app. It now also prints the R version, and the package versions of jam packages.DESCRIPTION
to include higher specific version numbers for several packages, to force an update if needed.tximportData
package, making it optional. This change prevents the 250 MB package from being required only for the example vignette. In future, a random generated expression data matrix may be created.getGRcoverageFromBw()
that used default memoise path "memoise_coverage"
instead of "coverage_memoise"
as with other functions. Only affected directly calling getGRcoverageFromBw()
since other functions passed the directory as an argument, and thus probably only me."mm"
units, since it appears to ignore grid "pt"
units. The bast plot font size can be adjusted, affecting everything including exon labels. The exon labels can separately be adjusted relative to the base font size.scale_factor
values are applied outside the memoise cache strategy, which should allow modifying the scale_factor
values without invalidating the cache. It seems correct to cache the unadjusted data, then separately apply any adjustment."farrisdata"
package was added to the "Guides"
tab which lists the versions of relevant packages."wrench"
to "gear"
, thanks to @DivadNojnarg of the "RinteRface/shinydashboardPlus"
package on Github!"shinydashboardPlus"
and "farrisdata"
.sashimiAppConstants()
may become the standard way to prepare the dependencies for sashimi plots, including the gene-transcript-exon models, the filesDF data.frame
, the flattened exons, and color substitutions. This function will be able to take a GTF file, and prepare all the tx2geneDF cross-references, the exon and CDS exons, the flattened exons, using detectedTx
if available for enhanced exon structures.1.1.0.9000
which allows invalidating a single problem cache file. It has been available since October 2018 but not released to CRAN yet. This changes is a step toward recovering from network outages, which currently create empty cache files which cannot be recovered without emptying the entire cache.getGRcoverageFromBw()
now checks if memoise cached data is NULL
, which happens during network outage. If the data returned from the cache is NULL
it clears the cache for that key, then tries again.import_juncs_from_bed()
now checks memoise cached data for NULL
as described above.sashimiAppConstants()
was refactored, to define variables in a systematic way, by default updating global environment for convenience, but optionally creating a new environment inside which the required variables are populated. In future this technique may become the default here along with downstream functions, which may reference the environment rather than data objects themselves.groups2contrasts()
to detect when any factor level contains "-"
, and converts the "-"
to "."
prior to detecting contrasts. Previously the "-"
interfered with proper contrast names, and resulted in some contrasts not being returned by this function. Note that this function avoids using base::make.names()
because that function aggressively converts other valid characters to "."
. However, if downstream functions require base::make.names()
compliant names, run that function prior to calling groups2contrasts()
.sortSamples()
was updated to match "wildtype"
as a control term.groups2contrasts()
was updated to check for specific scenarios where iFactors and iSamples may be missing, to cover a variety of common scenarios. Error messages were updated to be specific to the expected input.flattenExonsBy()
removed the ...
dots argument, to try to appease the memoise caching logic, which is suspected to be comparing the ...
data when invalidating the cached files. Ironically, this change itself will invalidate existing cached files.internal_junc_score()
was updated to handle data with no sampleColname
column, which fixed an issue with the example code.stackJunctions()
was updated to handle multiple sample_id in the same operation, keeping each sample_id independent during stacking. This update should fix the edge cases where junctions appeared to be improperly stacked.to_basic.GeomShape()
is now exported, since it was causing problems during the call to plotly::ggplotly()
when converting ggforce::geom_shape()
to plotly format. Somehow it worked for the Sashimi coverage ggforce::geom_shape()
but not for gene exon ggforce::geom_shape()
. I am not always here to understand fully, but to make things work.prepareSashimi()
now returns list item "df"
which contains the merged coordinates for coverage, junctions, and labels. This data.frame
overrides the need for plotSashimi()
to merge this data on the fly, and takes some extra logic away regarding junction rank, label positions, etc. It also returns "ref2c"
as an attribute to the "df"
, to make sure it is available even when requesting only the "df"
format. The "ref2c"
contains the functions used to compress the introns on the x-axis scale.plotSashimi()
now expects the input sashimi
object to contain list element named "df"
to include the full coordinate data.frame
used for plotting. Note that plotSashimi()
will now require output from prepareSashimi()
in version 46 or higher. Fortunately, memoise already invalidates the cache for even the slightest change in any molecule in the world, so caching should not be problematic.aboutExtra
as optional text to describe the data used in the R-shiny app.plotSashimi()
new argument junc_alpha
to control the alpha transparency of junctions, to allow transparency in cases where the intron coverage may be obscured by the junction arc.plotSashimi()
and gene2gg()
have argument label_coords
for optional x-axis range to subset labels before displaying with ggrepel::geom_text_repel()
. This change solves the issue where zooming the x-axis range with coord_cartesian()
kept all labels which were then arranged at the edges of the plot. This change also necessitates re-creating the ggplot object, since it has to change the label coordinate data.gene
and sample_id
to the global environment."Samples and Data"
tab from R-shiny for now.sample_id
items in R-shiny "Sample Selection"
is now maintained, allowing custom sorting of samples.prepareSashimiData()
argument use_memoise
enables memoise caching of individual bigWig files per gene, and individual junction files per gene. The file paths are defined by memoise_coverage_path
and memoise_junction_path
. These options should greatly enhance the success rate of caching, at the expense of creating more cache files.getGRcoverageFromBw()
now has arguments use_memoise
and memoise_coverage_path
to cache at the level of each bigWig file and genomic range.gene
and same set of sample_id
will retrieve the full cached sashimi data. However, if any smaller argument changes, including changes to any bigWig coverage or junction BED file, each individual step is also cached to prevent retrieving the same coverage or junction data repeatedly. For practical R-shiny usage, where sample_id
is frequently changed, this update should be a substantial improvement.sashimiAppConstants()
to improve handling of memoise flatExonsByTx data.prepareSashimi()
.shinyjqui
to use orderInput()
for drag-and-drop selection and ordering of sample_id
values.shinycssloaders
.filesDF
."scale_factor"
values in normalizing coverage and junction scores.color_sub
to colorize the table, and will create colors for any undefined values in the "sample_id"
column. Other columns are colorized when all values match names(color_sub)
otherwise are not colorized. Columns are arranged so any extra columns (such as group, subgroup, batch, etc.) would appear beside sample_id
, and typically would be colorized also using color_sub
.grl2df()
now returns seqnames, and adjusts yBaseline within each seqname (chromosome). To plot multi-chromosome GRangesList, use +facet_wrap(~seqnames)
plotSashimi()
to merge together the junction, coverage, and label coordinate data.frames to enable plotly and crosstalk to highlight features. The end result looks amazing.stackJunctions()
now also returns “junction_rank” scored from 1 to 3, where 1=minor, 2=partial, 3=major, based upon the rank of junctions entering and leaving exons.junc_fill
is converted to gradient so a darker color is used for dominant junctions, to try to highlight the major isoform splice junctions per sample. Unfortunately, plotly does not honor the polygon render order (yet).geom_diagonal_wide_arc()
, extends ggforce::geom_diagonal_wide()
by connecting the left and right halves of a diagonal into one continuous ribbon arc.plotSashimi()
now uses geom_diagonal_wide_arc()
to display junction arcs instead of using two ggforce::geom_diagonal_wide()
. For now, prepareSashimi()
still provides coordinates to create two polygons, which are combined in the geom. Still todo: figure out how to add a label; and make the middle x position immune to x-axis rescaling.prepareSashimi()
that occurred when no junctions overlapped another, resulting in error “invalid ‘type’ (S4) of argument”.defineDetectedTx()
has a new argument zeroAsNA
to handle the special case where some expression values reported as zero should be treated as NA
(no data obtained) and therefore will not be included in group mean calculations. As transcript-exon models get more “comprehensive”, kmer quantitation tools such as Salmon and Kallisto sometimes need to assign abundance to one of several nearly identical isoforms, and in low count scenarios all counts may be assigned to one or another isoform, leaving zero in the alternate position. Excluding zero ensures that group mean values represent only the assigned quantitation.launchSashimiApp()
: shiny
, shinydashboard
, shinydashboardPlus
, shinyWidgets
.getGRcoverageFromBw()
now returns NULL when a BigWig file is not accessible. Previously any error caused "Error in seqinfo(con) : UCSC library operation failed"
which could mean the file does not exist, or any number of other validation checks failed. Since getGRcoverageFromBw()
used a vector of BigWig files, any error caused the entire set to fail.getFirstStrandedFromGRL()
added package prefix to the use of IRanges::heads()
which was not imported directly.bgaPlotly3d()
removed some unnecessary print functions.makeTx2geneFromGtf()
help docs recommend installing the R.utils
package when .gz
files are required for import, since this process uses data.table::fread()
.to_basic.GeomShape()
is a hidden function intended only to provide basic support of ggforce::geom_shape()
when converting ggplot objects to plotly.launchSashimiApp()
function.farrisdata
package for example data used for the Farris et all RNA-seq mouse hippocampus manuscript.codonUsage2df()
example data file path.test_exon_gr
, test_junc_gr
, test_cov_gr
for easy example data for exons, junctions, and coverage, respectively.flattenExonsByGene()
to flattenExonsBy()
to reflect that the function handles by="gene"
and by="tx"
.create-a-sashimi-plot.Rmd
which walks through a full example showing how to create a Sashimi plot.stackJunctions()
with schematics, including example on plotting junctions by themselves.bgaPlotly3d()
now properly handles ellipsoid colors, previously the colors were assigned but not honored by plotly::add_trace()
.combineGRcoverage()
determines strandedness by requiring all values to be at or below zero, with at least one negative value. Otherwise, data can have position and negative values and will be considered positive stranded.plotSashimi()
replaces jamSashimi()
because it is just more intuitive… Ah well.jamSashimi()
is used to plot Sashimi data prepared by prepareSashimi()
in order to separate the download and preparation of Sashimi data from the visualization.prepareSashimi()
is refactored to remove the plot functions, moving them to the new jamSashimi()
.flattenExonsByGene()
can now handle Tx data, mainly useful to add CDS regions to existing exon models.gene2gg()
is more robust to edge input cases.prepareSashimi()
is a workhorse function that prepares several types of Sashimi-like data output, including ggplot2-based Sashimi plots. This plot uses vanilla ggplot2 with custom x-axis scaling to compress genomic coordinates, while keeping data in proper genomic coordinate space.gene2gg()
is a lightweight function that creates gene and transcript exons models, and returns a ggplot2 object for plotting. It can optionally return the data.frame used by ggplot2.grl2df()
is similar to fortify()
from ggplot2, or the broom::tidy()
functions, that take a custom object and return a data.frame for downstream use. Here the function currently works with “rectangle” data (like exons, introns, peaks, etc.), and “junction” (like junction regions to be represented by ggforce::geom_diagonal_wide()
). In future it may also handle sequence coverage polygons.exoncov2polygon()
converts a GRanges object containing coverage in the form of NumericList for each exon or intron, into a data.frame suitable for use by geom_polygon()
or ggforce::geom_shape()
.getGRcoverageFromBw()
loads bigWig coverage data for a GRanges of exons and introns, returning a GRanges object with columns representing each sample_id. It calls combineGRcoverage()
to combine coverages by strand within each sample_id
.flattenExonsByGene()
is intended to provide a set of unique exons per gene, using all or only detected transcript exon models. It numbers exons by contiguous segment, then labels subsections of each exon with a letter suffix, for example “exon1”, “exon2a”, “exon2b”, “exon3”. Lastly, it can annotate exons as CDS or non-CDS, if given a set of cdsByTx
data.make_ref2compressed()
takes a GRanges object of exons (or any useful feature like ChIP-seq peaks, etc) and returns functions needed to compress x-axis coordinates, in order to shrink gaps/introns to a fixed width, suitable for use by ggplot2.spliceGR2junctionDF()
takes junctions GRanges data, and flatExonsByGene, and summarizes and annotates each junction by the gene_exon connected, and combines scores when multiple junctions are within “spliceBuffer” distance of an exon boundary, by default 3 bases. It can accept a GRanges object that was derived from multiple sample_id, and will return data summarized within each sample_id. It calls stackJunctions()
to combine junctions per replicate.simplifyXY()
takes a vector of coordinates, and simplifies them to the minimal set of points to represent the polygon. For sequence coverage data, that process can reduce individual points by 90%, but it works with any coordinate data. When two or more consecutive line segments have identical angle (with non-zero distance), they are combined using only the first and last points.getGRgaps()
, getGRLgaps()
, addGRgaps()
, addGRLgaps()
are functions that take GRanges input, and return either just the gaps between non-overlapping regions, or the original features with gaps inserted between them. Useful to convert a set of exons, to a set of exons and introns.sample_id
.prepareSashimi()
, namely BAM alignment files, from which coverage and junctions can be derived.ref2c
chromosome-wide, however it would require x-axis coordinates to know their context, in terms of chromosome/seqname.annotateGRLfromGRL()
to add annotations, etc.curateVtoDF()
, curateDFtoDF()
are data curation functions to curate a vector, or a data.frame, into a data.frame with consistent, usable nomenclature for downstream analysis. They use a flexible yaml format that should help automate analysis pipelines that start with raw data file import.exoncov2polygon()
and compressPolygonM()
are basic functions for sashimi plots, efficiently converting exon/intron coverages to multi-polygons, then optionally compressing introns and reducing the information content to roughly similar resolution as uncompressed regions.spliceGR2junctionDF()
is the summary function to convert a set of splice junction read counts to gene-annotated junctions, grouped and summed where needed.closestExonToJunctions()
called by spliceGR2junctionDF()
is used to annotate junction ends near compatible exon boundaries, with some buffer distance allowed to “snap” junctions to the edge.numTxs
was not getting populated in runDiffSplice()
, otherwise the stats summary is not changed.runDiffSplice()
includes examples using groups2contrasts()
, also allows txColname,geneColname
to be defined and therefore custom.verbose=TRUE
to verbose=FALSE
by default.makeTx2geneFromGtf()
to use data.table::fread()
ability to uncompress .gz files, hopefully making it cross-platform.groups2contrasts()
was updated to handle single-factor experiments, and be more robust to two-factor experiments with missing groups in the full design table.runDiffSplice()
is a wrapped around limma::diffSplice()
intended to capture several steps of pre- and post-processing, optionally applying limma::voom()
.groups2contrasts()
takes a vector or data.frame of sample groups, and determines the relevant pairwise and two-way contrasts, returning a design matrix and contrast matrix.sortSamples()
sorts biological samples so that known patterns of control group terms are returned before non-control groups, in order to help provide useful defaults when setting up sample group contrasts.strsplitOrdered()
provides base::strsplit()
but returns factor output whose factor levels are influenced either by factor input, or by other arguments.limma
package was added to Imports.runDiffSplice()
has an argument spliceTest
which defines test
when calling limma::topSplice()
. Only the “t” (t-test) returns fold change, therefore hits are not filtered by fold change for “F” or “simes”.ale2violin()
takes output from tx2ale()
with some arguments, and produces a ggplot2 violin plot object, as well as the underlying data. It allows a custom filtering function, which allows filtering gene lists for relevant regions of expression.shrinkMatrix()
minor fix to remove the shadow x
in the resulting colnames.tx2ale()
modified to be tolerant of NA values in the expression matrix.sortGRL()
sorts GRangesList objects by chromosome and position.getFirstStrandedFromGRL()
return the first GRanges feature per GRangesList, ordered properly by strand.annotateGRfromGR()
which annotates one GRanges object based upon overlaps with another GRanges object.annotateGRLfromGRL()
which annotates one GRangesList object based upon overlaps only with the same index entries in a second GRangesList object.findOverlapsGRL()
runs GenomicRanges::findOverlaps()
for the case of two GRangesList objects, matching at the GRanges level but restricting overlaps to those matching the same GRangesList index.assignGRLexonNames()
assigns exon names and numbers to a disjoint set of exons per gene model.lengths()
have been moved there. Another reason using a package prefix is clunky at best. If methods have dispatch, and if they did not conflict between S3 and S4 methods, that would be the better strategy. It is not a solution to hardcode package names into function bodies, since every package maintainer has to stay updated on the source package of all other dependent functions. Those details should be irrelevant to other package maintainers.list2im()
function. A slower workaround could be written, but ultimately the arules package is preferred.factor2label()
will convert a factor to a factor label, with the same order of levels as the input factor, but including summary stats like the number of items for each factor level. Useful for ggplot2 visualizations, to include counts in the color legend for example.system.file("extdata", "Mouse_codon_usage.txt", package="farrisdata")
codonUsage2df()
imports a text codon usage file, and returns a data.frame. Support function dna2codon()
simply takes a character vector and returns vector whose elements all have three characters, e.g. all(nchar(x) == 3).jamGeomean()
is preferred by the Jam packages, but geomean()
is provided for direct comparison to the classical approach.detectedTxInfo()
summarizes the data used to define detected transcripts for a given gene, or for a given set of transcripts.makeTx2geneFromGtf()
to remove requirement for col.names
during import.data.table
package required a specific #' @import data.table
entry in the roxygen2 entry for the shrinkMatrix
function. This issue prevented defineDetectedTx()
from working within an R package.defineDetectedTx()
is a core RNA-seq function to determine which transcript isoforms are considered “detected” based upon counts (or pseudocounts), TPM values if available, and the relative abundance of isoforms per gene. When “everything with over 10 counts” is not sufficient.tx2ale()
takes a set of transcript exon models, and returns a set of alternate 3’UTR regions, similar to ALE (alternative last exon.) It differs slightly from other methods in that it aggregates transcript quantities by shared ALE regions, producing a matrix based upon unique ALEs. It does not use counts in the ALE region itself, but the aggregate abundance of all isoforms that contain each ALE. This method allows tools like Salmon or Kallisto to quantify isoforms using the best available kmers per isoform, without being restricted to the ALE regions which have a very wide distribution of lengths.makeTx2geneFromGtf()
takes a GTF file, and produces a data.frame
with tx (transcript) to gene associations.list2im()
converts a list to an incidence matrix. It uses the arules
package fast methods for creating transactions
objects, which are sparse binary matrices with linked data.frames used to describe rows and columns. Similar to SummarizedExperiment
but with optimization specifically for list-to-matrix and matrix-to-list conversion. Their implementation is memory efficient as well.bgaPlotly3d()
takes a "bga"
class, as produced by the made4::bga()
function, and produces a 3-D plotly visualization. It adds ability to group centroids into “supergroups” which are connected by a spline 3-D curve.spline3d()
is the supporting function to draw interpolated 3-D curves through a set of coordinates.shrinkMatrix()
shrinks a numeric matrix by groups of rows, essentially a wrapper function for the data.table
package functions.