Prepare Sashimi plot data

prepareSashimi(
  flatExonsByGene = NULL,
  filesDF = NULL,
  gene,
  sample_id = NULL,
  minJunctionScore = 10,
  gapWidth = 200,
  addGaps = TRUE,
  baseline = 0,
  compressGR = TRUE,
  compress_introns = TRUE,
  ref2c = NULL,
  gap_feature_type = "intron",
  default_feature_type = "exon",
  feature_type_colname = "feature_type",
  exon_label_type = c("none", "repel", "mark"),
  junc_label_type = c("repel", "mark", "none"),
  return_data = c("df", "ref2c"),
  include_strand = c("both", "+", "-"),
  junc_color = alpha2col("goldenrod3", 0.7),
  junc_fill = alpha2col("goldenrod1", 0.4),
  doStackJunctions = TRUE,
  coord_method = c("coord", "scale", "none"),
  scoreFactor = 1,
  scoreArcFactor = 0.2,
  scoreArcMinimum = 100,
  covGR = NULL,
  juncGR = NULL,
  use_memoise = FALSE,
  memoise_coverage_path = "coverage_memoise",
  memoise_junction_path = "junctions_memoise",
  do_shiny_progress = FALSE,
  verbose = FALSE,
  ...
)

Arguments

flatExonsByGene

GRangesList named by gene, whose GRanges elements are flattened, disjoint, non-overlapping genomic ranges per gene.

filesDF

data.frame with columns url, sample_id, type, where: url is any valid file path or URL compatible with base::read.table(); sample_id is an identified representing a biological sample, used to group common files together; type is one of "bw" for bigWig coverage, "junction" for BED12 format splice junctions.

gene

character string of the gene to prepare, which must be present in names(flatExonsByGene).

gapWidth

numeric value of the fixed width to use for gaps (introns) between exon features. If NULL then getGRgaps() will use the default based upon the median exon width.

addGaps

logical indicating whether to include gap regions in the coverage plot, for example including introns or intergenic regions. When compressGR=TRUE then gaps regions are down-sampled using running maximum signal with roughly the same x-axis resolution as uncompressed regions.

baseline

numeric vector named by names(flatExonsByGene) where baseline is used to adjust the y-axis baseline position above or below zero.

compressGR

logical indicating whether to compress GRanges coordinates in the output data, where gaps/introns are set to a fixed width. When ref2c is not supplied, and compressGR=TRUE, then ref2c is created using make_ref2compressed().

compress_introns

logical indicating whether to compress the coverage polygon coordinates to approximately the same number of pixels per inch as the exon polygons. This option greatly reduces the size of the polygon, since introns are already about 50 to 100 times wider than exons, and when compressGR is TRUE, the introns are visibly compressed to a fixed width on the x-axis. The data has many more x-axis coordinates than the data visualization, this argument is intended to reduce the intron coordinates accordingly.

ref2c

list object output from make_ref2compressed() used to compress axis coordinates, to compress polygon coverage data in compressed regions, and to adjust splice junction arcs using compressed coordinates.

gap_feature_type

the default feature_type value to use for gaps when addGaps=TRUE.

include_strand

character value, one of "both", "+", "-" indicating the strandedness of coverage and junctions to display. The default "both" shows coverage on both strands, otherwise coverage is filtered either by filename (presence of "pos", "+", or "plus" indicates positive strand), or by detecting strandedness by positive/negative coverage scores. Detecting by filename is intended to avoid retrieving coverage in the R-shiny app, to help efficiency.

doStackJunctions

logical indicating whether to stack junction arcs at each end, this argument is passed to grl2df() which calls stackJunctions().

covGR

GRanges object containing coverage data in columns stored as NumericList class, where colnames(values(covGR)) are present in filesDF$url when files$type %in% "coverage_gr".

juncGR

GRanges object containing splice junctions, where "score" is used for the abundance of splice junction reads, and "sample_id" is used to define the biological sample_id.

do_shiny_progress

logical indicating whether to send progress updates to a running shiny app, using the shiny::withProgress() and shiny::setProgress() methods. This function only calls shiny::setProgress() and assumes the shiny::withProgress() has already been initialized.

verbose

logical indicating whether to print verbose output.

...

additional arguments are passed to make_ref2compressed(), getGRcoverageFromBw(), exoncov2polygon().

Value

list containing ggSashimi a ggplot2 graphical object containing a full Sashimi plot; ggCov the RNA-seq coverage subset of the Sashimi plot; ggJunc the splice junction subsset of the Sashimi plot; ref2c the output of make_ref2compressed() used for ggplot2 coordinate visualization; covDF, juncDF data.frame objects with the raw data used to create ggplot2 objects; covGR, juncGR the GRanges objects used to create the data.frames; gr the GRanges object representing the exons for the gene of interest; juncLabelDF the data.frame containing exon label coordinates used to add labels to the splice junction arcs.

Details

This function is the workhorse function used to produce Sashimi plots, and is intended to be a convenient wrapper function for several other individual functions.

At a minimum, a Sashimi plot requires three things:

  1. Exons, usually from a gene of interest.

  2. RNA-seq coverage data.

  3. Splice junction data.

There is some required pre-processing before running prepareSashimi():

  • Prepare flattened exons by gene using flattenExonsByGene() and corresponding data, including exonsByGene, cdsByGene, and tx2geneDF. Verify the gene exon model data using gene2gg().

  • Find file paths, or web URLs, for a set of bigWig coverage files, representing RNA-seq coverage for each strand, for the samples of interest. Test the coverage data using getGRcoverageFromBw() for a small set of GRanges data.

  • Find file paths, or web URLs, for a set of BED6 or BED12 format files, note that it cannot currently use bigBed format due to limitations in the rtracklayer package. Test the splice junction data using rtracklayer::import() for a small range of GRanges features, then send the data to spliceGR2junctionDF() to prepare a data.frame summary.

The basic input for coverage and junction data is a data.frame, which defines each file path or url, the type of data "bw" or "junction", and the biological sample "sample_id". Any file path compatible with rtracklayer::import() will work, including web URLs and local files. When using a web URL you may need to use "https://" format to force the use of secure web requests, but this requirement varies by country.

See also

Examples

# The active example below uses sample data suppressPackageStartupMessages(library(GenomicRanges)); data(test_exon_gr); data(test_junc_gr); data(test_cov_gr); filesDF <- data.frame(url="sample_A", type="coverage_gr", sample_id="sample_A"); sh1 <- prepareSashimi(GRangesList(TestGene1=test_exon_gr), filesDF=filesDF, gene="TestGene1", covGR=test_cov_gr, juncGR=test_junc_gr);
#> Warning: invalid factor level, NA generated
#> Warning: invalid factor level, NA generated