Prepare sashimi plot required data, deriving data objects as needed
sashimiDataConstants( gtf = NULL, txdb = NULL, tx2geneDF = NULL, exonsByTx = NULL, cdsByTx = NULL, detectedTx = NULL, detectedGenes = NULL, flatExonsByGene = NULL, flatExonsByTx = NULL, envir = NULL, empty_uses_farrisdata = TRUE, use_memoise = TRUE, verbose = FALSE, ... )
gtf, txdb, tx2geneDF, exonsByTx, cdsByTx | objects used to define the overall set of genes, transcripts, and associated exons and CDS exons. See this function description for more detail. |
---|---|
detectedTx, detectedGenes, flatExonsByGene, flatExonsByTx | objects used to derive a specific subset of gene-exon models using only detected transcripts or genes. See this function description for more detail. |
envir |
|
empty_uses_farrisdata |
|
use_memoise |
|
verbose |
|
... | additional arguments are ignored. |
default_gene |
|
environment
that contains the required data objects
for splicejam sashimi plots. Note that the environment itself
is updated during processing, so the environment does not
need to be returned for the data contained inside it to
be updated by this function.
This function performs a subset of steps performed by
sashimiAppConstants()
, focusing only on data required
for gene-exon structure. The sashimiAppConstants()
defines
color_sub
and validates filesDF
, then calls this function
sashimiDataConstants()
to prepare and validate the gene-exon
data.
Data derived by this function sashimiDataConstants()
:
txdb: TranscriptDb
object used to derive exonsByTx
and cdsByTx
if either object does not already exist. If txdb
is not supplied, it is derived from gtf
using
GenomicFeatures::makeTxDbFromGFF()
.
tx2geneDF: data.frame
with colnames: "transcript_id"
and
"gene_name"
.
gtf: character
path to a GTF/GFF/GFF3 file, suitable for
GenomicFeatures::makeTxDbFromGFF()
. The gtf
is only used
if tx2geneDF
or exonsByTx
are not supplied. Note that
when gtf
points to a remote server, the file is copied to
the current working directory for more rapid use.
If the file already exists in the local directory, it is re-used.
exonsByTx: GRangesList
object, named by "transcript_id"
,
containing all exons for each transcript. It is derived from txdb
if not supplied; and names should match tx2geneDF$transcript_id
.
cdsByTx: GRangesList
object, named by "transcript_id"
,
containing only CDS (protein-coding) exons for each transcript.
It is derived from txdb
if not supplied;
and names should match tx2geneDF$transcript_id
.
detectedTx: character
vector of tx2geneDF$transcript_id
values,
representing a subset of transcripts detected above background.
See definedDetectedTx()
for one strategy to define detected transcripts.
If detectedTx
does not exist, it is defined by all transcripts
present in tx2geneDF$transcript_id
. Note this step can be the
rate-limiting step in the preparation of flatExonsByTx
.
detectedGenes: character
vector of values that match
tx2geneDF$gene_name
. If it is not supplied, it is inferred
from detectedTx
and tx2geneDF$transcript_id
.
flatExonsByGene: GRangesList
object containing non-overlapping
exons for each gene, whose names match tx2geneDF$gene_name
. If not
supplied, it is derived using flattenExonsBy()
and objects
exonsByTx
, cdsByTx
, detectedTx
, and tx2geneDF
. This step is
the key step for using a subset of detected transcripts, in order
to produce a clean gene-exon model.
flatExonsByTx: GRangesList
object containing non-overlapping
exons for each transcript. If not
supplied, it is derived using flattenExonsBy()
and objects
exonsByTx
, cdsByTx
, detectedTx
, and tx2geneDF
. This step is
the key step for using a subset of detected transcripts, in order
to produce a clean transcript-exon model.
When use_memoise=TRUE
several R objects are cached using
memoise::memoise()
, to help re-use of prepared R objects,
and to help speed the re-use of data within the R-shiny app:
Other splicejam R-shiny functions:
launchSashimiApp()
,
sashimiAppConstants()
,
sashimiAppServer()
,
sashimiAppUI()