vignettes/shiny-server-for-sashimi-plots.Rmd
shiny-server-for-sashimi-plots.Rmd
This vignette is intended to describe how to create a Shiny server to display interactive Sashimi plots with RNA-seq data.
Splicejam uses memoise to cache data, so it is recommended to start the Splicejam Shiny app in its own directory. Each type of data is cached in its own sub-directory with “_memoise” in the name, so you can recognize and delete these directories to clear the cache as needed.
There are two basic requirements for a Sashimi plot:
Gene-exon structure, usually provided by a GTF file.
Source data, sequence coverage and junction read counts:
"SJ.out.tab"
files.Briefly:
library(splicejam)
# define files with coverage and junctions
<- data.frame(sample_id=c("sample_A", "sample_A"),
filesDF url=c("https://server/sample_A.bw", "https://server/sample_A/SJ.out.tab")
type=c("bw", "junction"));
# provide path to genes GTF
<- "path/to/genes.gtf"
gtf
# launch Splicejam Shiny app
launchSashimiApp()
Ideally, the GTF used by splicejam will be the same GTF file used in upstream processing, for example with Salmon, Kallisto, or featureCounts. The benefit is that the GTF will display gene-exon structure consistent with your overall analysis work.
That said, any GTF file for your genome will work fine. The GTF is used to determine gene-exon structure, and to display transcript isoforms per gene. It then displays RNA-seq coverage and junction reads over these exons.
Splicejam will derive several objects from this GTF file:
data.frame
with transcript-to-gene association. Specifically it looks for "gene_name"
and "transcript_id"
.detectedTx
is provided, then it will include only exons from the detected transcripts.Most of the workflow was designed for the Gencode GTF format, which includes "gene_name"
with gene symbols, and "transcript_id"
with transcript identifiers.
Other GTF files should work even if they have custom gene and transcript attributes. When in doubt, use a Gencode GTF file.
Some GTF files that have been tested and confirmed with Splicejam are listed below.
## Mouse mm10 as used by Farris et al
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M12/gencode.vM12.annotation.gtf.gz
## Mouse mm10
http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/gencode.vM27.annotation.gtf.gz
## Human hg38
http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz
## Human hg19
http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/GRCh37_mapping/gencode.v38lift37.annotation.gtf.gz
Source data with sequence coverage and junction read counts, is supplied as a data.frame
with these columns:
"url"
: The web URL or file path to each file."sample_id"
: The name of each sample as it should appear in each panel."type"
: A character
value with either "bw"
for bigWig coverage, or "junction"
for junction read counts."scale_factor"
: optional numeric
column, used to scale numeric values for each files. This column is used for dynamic normalization, for example if you have size factors from DESeq2, they can be used here. The default scale_factor=1
.When there are multiple replicate files for a sample_id
, the scores are added together and the total score is displayed in the sashimi plot for each sample_id
. When scale_factor
is also defined, it is applied to each file before values are summed across sample_id
replicates. In this way, files can be individually normalized as needed.
An example of filesDF
is provided in the R package "farrisdata"
.
if (jamba::check_pkg_installed("farrisdata")) { farrisdata::farris_sashimi_files_df[,1:4] } #> sample_id #> 1 CA1_CB #> 2 CA1_DE #> 3 CA2_CB #> 4 CA2_DE #> 5 CA3_CB #> 6 CA3_DE #> 7 DG_CB #> 8 DG_DE #> 21 CA1_CB #> 17 CA1_CB #> 41 CA1_DE #> 31 CA1_DE #> 61 CA2_CB #> 51 CA2_CB #> 81 CA2_DE #> 71 CA2_DE #> 10 CA3_CB #> 9 CA3_CB #> 12 CA3_DE #> 11 CA3_DE #> 14 DG_CB #> 13 DG_CB #> 16 DG_DE #> 15 DG_DE #> url #> 1 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/CA1_CB.STAR_mm10.combinedJunctions.bed.gz #> 2 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/CA1_DE.STAR_mm10.combinedJunctions.bed.gz #> 3 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/CA2_CB.STAR_mm10.combinedJunctions.bed.gz #> 4 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/CA2_DE.STAR_mm10.combinedJunctions.bed.gz #> 5 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/CA3_CB.STAR_mm10.combinedJunctions.bed.gz #> 6 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/CA3_DE.STAR_mm10.combinedJunctions.bed.gz #> 7 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/DG_CB.STAR_mm10.combinedJunctions.bed.gz #> 8 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/DG_DE.STAR_mm10.combinedJunctions.bed.gz #> 21 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA1_CB.union.neg.bw #> 17 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA1_CB.union.pos.bw #> 41 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA1_DE.union.neg.bw #> 31 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA1_DE.union.pos.bw #> 61 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA2_CB.union.neg.bw #> 51 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA2_CB.union.pos.bw #> 81 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA2_DE.union.neg.bw #> 71 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA2_DE.union.pos.bw #> 10 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA3_CB.union.neg.bw #> 9 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA3_CB.union.pos.bw #> 12 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA3_DE.union.neg.bw #> 11 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/CA3_DE.union.pos.bw #> 14 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/DG_CB.union.neg.bw #> 13 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/DG_CB.union.pos.bw #> 16 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/DG_DE.union.neg.bw #> 15 https://orio.niehs.nih.gov/ucscview/farrisHub/mm10/union_bigwig/DG_DE.union.pos.bw #> type scale_factor #> 1 junction 0.8796491 #> 2 junction 1.1064974 #> 3 junction 0.8461586 #> 4 junction 1.6993700 #> 5 junction 0.8615462 #> 6 junction 1.1941632 #> 7 junction 0.7705711 #> 8 junction 1.2941457 #> 21 bw 0.9891766 #> 17 bw 0.9891766 #> 41 bw 1.2362753 #> 31 bw 1.2362753 #> 61 bw 0.9817312 #> 51 bw 0.9817312 #> 81 bw 1.0678433 #> 71 bw 1.0678433 #> 10 bw 1.0058324 #> 9 bw 1.0058324 #> 12 bw 1.4419289 #> 11 bw 1.4419289 #> 14 bw 0.8449864 #> 13 bw 0.8449864 #> 16 bw 0.8218802 #> 15 bw 0.8218802
RNA-seq coverage is provided as bigWig files, and can be accessed via HTTP web hyperlink, or a direct file path.
Splice junctions are provided in one of two formats:
"SJ.out.tab"
format: a tab-delimited file produced by the STAR alignment tool. This file must have 9 columns, and column 7 is used to define read counts because it contains uniquely mapped reads. Junctions with zero uniquely mapped reads are removed.As a positive control for the Splicejam Shiny server, you can use the Farris data that supports Farris et al (2019).
if (!jamba::check_pkg_installed("farrisdata")) { remotes::install_github("jmw86069/farrisdata") } library(splicejam) launchSashimiApp()
Note this workflow will use filesDF
from the “farrisdata” package, and will download Gencode mouse GTF used for that publication. It takes about 3 minutes to prepare data and create the first Shiny plot for the gene Gria1.
This workflow will by default populate the R environment globalenv()
with variables used in the farrisdata Splicejam Shiny app. See Advanced Options for ways to specify a new environment.
library(splicejam) # filesDF should already be defined filesDF2 <- subset(farrisdata::farris_sashimi_files_df, sample_id %in% c("CA1_DE", "CA2_CB")); # gtf should be a file path or web URL gtf <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M12/gencode.vM12.annotation.gtf.gz"; launchSashimiApp(filesDF=filesDF2, gtf=gtf)
This workflow will by default populate the R environment globalenv()
with variables used in the farrisdata Splicejam Shiny app. See Advanced Options for ways to specify a new environment.
The following options can be invoked by defining the variable name inside the enrivonment used for the Splicejam Shiny app. By default, this environment is globalenv()
, however the examples below show how to use a custom environment. The advantage of using an environment is that the data contained inside is not copied in memory during function calls, and can be shared by the Shiny UI and Shiny Server.
character
vector of transcript_id
values that were “detected” by your experiment. You can define these however you prefer, or you can leave this value blank and include every transcript in the GTF file. Supplying a subset of detectedTx is beneficial by presenting a simpler gene-exon structure per gene. It also speeds up the initial Splicejam Shiny server start up time.character
vector of gene_name
values that were “detected” by your experiment. The main effect of detectedGenes
is that it limits the number of flat gene-exons prepared by Splicejam Shiny, and limits the number of genes that can be searched. However, the user can still change “Genes to Search” to “All genes” to search all genes defined by the GTF file. In that case, each new gene will have exons flattened during load time. This process usually takes only about 1 second longer than normal per gene, in return for having a faster Splicejam Shiny server start up time the first time.character
vector of colors, with names that match your sample_id
values. This vector will define colors used in the Sashimi plots.For example, to define detectedTx
you would simply assign a value in the globalenv()
:
detectedTx <- rownames(tx2geneDF);
Or if using a custom environment:
One of the most common ways to set up a Shiny server is to run it on a custom port, and listen to a specific address.
Note in the example below, host="0.0.0.0"
will instruct the Shiny app to respond to requests directed at any host or IP address. If you used host="127.0.0.1"
the server would only respond to requests specific to https://127.0.0.1:8080
and would not respond to requests to https://localhost:8080
.
launchSashimiApp(options=list( port=8080, hist="0.0.0.0"))
By default the data preparation uses the global environment defined by globalenv()
. This process will create objects in your user session, and will update those objects during the preparation step.
However, you can create a custom environment to keep the data encapsulated, and separate from your user session.
It is intended to be straightforward to use a custom environment. First define a new environment.
library(splicejam) splicejam_env <- new.env(); # filesDF should already be defined filesDF2 <- subset(farrisdata::farris_sashimi_files_df, sample_id %in% c("CA1_DE", "CA2_CB")); # gtf should be a file path or web URL gtf <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M12/gencode.vM12.annotation.gtf.gz"; launchSashimiApp( empty_uses_farrisdata=FALSE, envir=splicejam_env, filesDF=filesDF2, gtf=gtf)