Sashimi Shiny app constants

sashimiAppConstants(
  ...,
  filesDF = NULL,
  color_sub = NULL,
  aboutExtra = NULL,
  envir = NULL,
  assign_global = TRUE,
  use_memoise = TRUE,
  empty_uses_farrisdata = TRUE,
  gtf = NULL,
  txdb = NULL,
  tx2geneDF = NULL,
  exonsByTx = NULL,
  cdsByTx = NULL,
  detectedTx = NULL,
  detectedGenes = NULL,
  flatExonsByGene = NULL,
  flatExonsByTx = NULL,
  verbose = FALSE
)

Arguments

...

additional arguments are passed to sashimiDataConstants()

filesDF

data.frame that contains at least these colnames: "sample_id", "url", "type". This data.frame defines the source data used to create sashimi plots.

color_sub

character vector of R colors, whose names match values in filesDF$sample_id. If not supplied, or if not all names are present in color_sub, the remaining names are converted to colors using colorjam::group2colors().

aboutExtra

character string or html tag from "htmltools" suitable for use in a R-shiny app. This text is displayed in the Help tab, and is intended to describe the data content shown in the R-shiny app.

envir

environment in which the data should be loaded, which takes priority over argument assign_global. When envir=NULL and assign_global=TRUE the default environment is globalenv(). When assign_global=FALSE and envir=NULL a new environment is created using new.env(parent=emptyenv()) so there is no parent environment, thereby preventing it from searching globalenv() for variables not defined in its own environment.

assign_global

logical indicating whether the default environment should be globalenv(). Note this is not typically recommended, however it can be convenient to operate using only the user global environment, and is the default approach.

use_memoise

logical indicating whether to use memoise to cache intermediate data files for exons, flattened exons, transcript-gene data, and so on. This mechanism reduces time to render sashimi plots that re-use the same gene. All memoise cache folders are named with "_memoise".

empty_uses_farrisdata

logical indicating whether to use data from the Github R package "jmw86069/farrisdata" if no data is supplied to this function. This behavior is intended to make it easy to use farrisdata to recreate the Sashimi plots in that publication.

gtf, txdb, tx2geneDF, exonsByTx, cdsByTx

arguments passed to sashimiDataConstants().

detectedTx, detectedGenes, flatExonsByGene, flatExonsByTx

arguments passed to sashimiDataConstants().

verbose

logical indicating whether to print verbose output.

Value

environment that contains the data required for the splicejam R-shiny app. It also includes data returned by sashimiDataConstants(). Note that if envir is supplied, the data will be updated inside that environment.

Details

This function defines several constant values used by the R-shiny Splicejam Sashimi viewer. The required coverage and junction data is prepared and defined by sashimiDataConstants(). The remaining items used in the R-shiny app are defined for inline documentation in the R-shiny app, including aboutExtra which is included in the "About" tab, intended to describe the source of data included in the R-shiny app. Data is returned in an environment which by default is the global environment globalenv(). However it is recommended to use a custom environment, for example: shiny_envir <- new.env().

When the R-shiny app is defined launchSashimiApp(), it calls shiny::shinyApp() using arguments server, ui, and options. This function sashimiAppConstants() prepares environment which are assigned to the ui and server objects. The process therefore makes data inside these environments available to the ui and server functions.

The following values will be used from the environment, searching up the environment parent chain until it finds a match, until searching the global environment. Similarly, this function also defines variables in the environment using the <<- operator, which by default also searches up the environment chain until it finds a match, otherwise populating the global environment.

If a variable is not found, the corresponding data will be derived from relevant source data. If no data is provided, the default argument empty_uses_farrisdata=TRUE means the data and filesDF will use data from the publication Farris et al, from Github package "jmw86069/farrisdata".

#' The filesDF object should be a data.frame with at least three colnames:

  • "sample_id"

  • "type" (with values either "bw" or "junction")

  • "url" (a URL or file path to each file.)

It can optionally include colname "scale_factor" with numeric values used to multiply the coverage or junction values, the default scale_factor=1.

Other data derived by this function or by sashimiDataConstants():

  • color_sub: character vector of R colors, whose names are used to match filesDF$sample_id. When not supplied, colors are defined by colorjam::group2colors() and unique(filesDF$sample_id).

  • txdb: TranscriptDb object used to derive exonsByTx and cdsByTx if either object does not already exist. If txdb is not supplied, it is derived from gtf using GenomicFeatures::makeTxDbFromGFF().

  • tx2geneDF: data.frame with colnames: "transcript_id" and "gene_name".

  • gtf: character path to a GTF/GFF/GFF3 file, suitable for GenomicFeatures::makeTxDbFromGFF(). The gtf is only used if tx2geneDF or exonsByTx are not supplied. Note that when gtf points to a remote server, the file is copied to the current working directory for more rapid use. If the file already exists in the local directory, it is re-used.

  • exonsByTx: GRangesList object, named by "transcript_id", containing all exons for each transcript. It is derived from txdb if not supplied; and names should match tx2geneDF$transcript_id.

  • cdsByTx: GRangesList object, named by "transcript_id", containing only CDS (protein-coding) exons for each transcript. It is derived from txdb if not supplied; and names should match tx2geneDF$transcript_id.

  • detectedTx: character vector of tx2geneDF$transcript_id values, representing a subset of transcripts detected above background. See definedDetectedTx() for one strategy to define detected transcripts. If detectedTx does not exist, it is defined by all transcripts present in tx2geneDF$transcript_id. Note this step can be the rate-limiting step in the preparation of flatExonsByTx.

  • detectedGenes: character vector of values that match tx2geneDF$gene_name. If it is not supplied, it is inferred from detectedTx and tx2geneDF$transcript_id.

  • flatExonsByGene: GRangesList object containing non-overlapping exons for each gene, whose names match tx2geneDF$gene_name. If not supplied, it is derived using flattenExonsBy() and objects exonsByTx, cdsByTx, detectedTx, and tx2geneDF. This step is the key step for using a subset of detected transcripts, in order to produce a clean gene-exon model.

  • flatExonsByTx: GRangesList object containing non-overlapping exons for each transcript. If not supplied, it is derived using flattenExonsBy() and objects exonsByTx, cdsByTx, detectedTx, and tx2geneDF. This step is the key step for using a subset of detected transcripts, in order to produce a clean transcript-exon model.

When use_memoise=TRUE several R objects are cached using memoise::memoise(), to help re-use of prepared R objects, and to help speed the re-use of data within the R-shiny app:

  • flatExonsByGene

  • flatExonsByTx

  • exonsByTx

  • cdsByTx

To include a description of data used in your R-shiny app, define the variable aboutExtra either using character text, or as htmltools::tags() sufficient to be displayed in the R-shiny UI. The content is displayed in the tab "About Sashimi Plots" at the top of the app.

See also

Other splicejam R-shiny functions: launchSashimiApp(), sashimiAppServer(), sashimiAppUI(), sashimiDataConstants()