This package is under active development, these functions make up a core set of methods currently used across active Omics analysis projects. A summary of goals and relevant features are described below.
Goals of jamses
The core goal is to make data analysis and visualization of SummarizedExperiment objects straightforward for common scenarios. It also accepts SingleCellExperiment and Seurat objects.
Define a design matrix using
~ 0 + groupsyntax, see below.Define statistical contrasts to compare one or more factors, see
groups_to_sedesign().Plot contrasts as defined, see
plot_sedesign().Confirm design,contrast matrices are always in sync, see
SEDesign-class.Automate analysis of multiple contrasts, using limma/limmavoom/DEqMS, see
se_contrast_stats().-
Convenient “short hand” for contrasts, see
contrasts2comp(). -
Integrate with other tools.
-
venndir::venndir()- see Github"jmw86069/venndir"to create directional Venn diagrams.
-
Approach for Statistical Contrasts
The Limma User’s Guide (LUG) is an amazing resource which describes alternate approaches for one-way and two-way contrasts, which are mathematically equivalent. For a more thorough discussion please review these approaches to confirm that ~0 + x is mathematically identical to ~ x, and only differs in how estimates are reported.
The approach used in
jamsesuses the~0 + xstrategy.
Each experiment group is defined using independent replicates.
This approach does not imply that there is “no intercept” during the model fit, see LUG for details.-
One-way contrasts must compare only one factor change per contrast.
For example, a valid one-way contrast is shown below:It compares
(treated-control)with factor levelAunchanged.
However, an invalid one-way contrast is shown below:It is invalid because allows two factor changes:
treated-controlandA-B. -
A two-way contrast is defined as the fold change of two compatible one-way fold changes, and in the proper order.
Consider the following: Caveat: Specific contrasts can be added manually when necessary.
Design and contrasts - SEDesign
-
groups_to_sedesign()takes by default adata.framewhere each column represents an experiment factor, and creates the following:-
designmatrix -
contrastmatrix using only appropriate changes in single experiment factor levels for one-way contrasts, and equivalent contrasts across secondary factor levels where appropriate for two-way contrasts, up tomax_depthfactor depth. -
samplesvector representing individual sample columns used from the input data.
-
-
Output is
SEDesignas an S4 object with slot names:-
sedesign@design- the design matrix. -
sedesign@contrasts- the contrasts matrix. -
sedesign@samples- vector of samples.
-
SEDesign S4 Object Details
SEDesign is an S4 object that contains the following slots:
-
samples: acharactervector of sample names, derived fromcolnames()of the SummarizedExperiment` object.- When
samplesare subset, it has a cascade effect ondesignandcontrasts. When groups are no longer represented, they are removed fromdesignandcontrastsas relevant.
- When
-
design: amatrixrepresenting the design matrix, with additional constraints:-
rownames()are always maintained, so they can be aligned withcolnames()of theSummarizedExperimentobject. This requirement helps ensure that the design and assay data are always in proper sync. -
colnames()are defined as sample groups, equivalent to usingmodel.matrix( ~ 0 + groups )for example from the The Limma User’s Guide from thelimmaBioconductor package. - All values are always
1or0with no fractions. More complicated designs require manual adjustment, beyond the scope ofjamses.
-
-
contrasts: amatrixrepresenting the contrast matrix used for statistical comparisons, with additional constraints:-
rownames(contrasts)are always defined, and always equalcolnames(contrasts)to ensure contrasts and design are always properly synchronized.
-
Example SEDesign object
The example below uses a character vector of group names per sample, with two factors separated by underscore "_". The same data can be provided as a data.frame with two columns.
library(jamses)
library(kableExtra)
igroups <- jamba::nameVector(paste(rep(c("WT", "KO"), each=6),
rep(c("Control", "Treated"), each=3),
sep="_"),
suffix="_rep");
igroups <- factor(igroups, levels=unique(igroups));
# jamba::kable_coloring(color_cells=FALSE,
# format="markdown",
# caption="Sample to group association",
# data.frame(groups=igroups))
knitr::kable(data.frame(groups=igroups))| groups | |
|---|---|
| WT_Control_rep1 | WT_Control |
| WT_Control_rep2 | WT_Control |
| WT_Control_rep3 | WT_Control |
| WT_Treated_rep1 | WT_Treated |
| WT_Treated_rep2 | WT_Treated |
| WT_Treated_rep3 | WT_Treated |
| KO_Control_rep1 | KO_Control |
| KO_Control_rep2 | KO_Control |
| KO_Control_rep3 | KO_Control |
| KO_Treated_rep1 | KO_Treated |
| KO_Treated_rep2 | KO_Treated |
| KO_Treated_rep3 | KO_Treated |
The resulting design and contrasts matrices are shown below:
sedesign <- groups_to_sedesign(igroups);
jamba::kable_coloring(
# knitr::kable(
colorSub=c(`-1`="dodgerblue", `1`="firebrick"),
caption="Design matrix output from design(sedesign).",
data.frame(check.names=FALSE, design(sedesign)));| WT_Control | WT_Treated | KO_Control | KO_Treated | |
|---|---|---|---|---|
| WT_Control_rep1 | 1 | 0 | 0 | 0 |
| WT_Control_rep2 | 1 | 0 | 0 | 0 |
| WT_Control_rep3 | 1 | 0 | 0 | 0 |
| WT_Treated_rep1 | 0 | 1 | 0 | 0 |
| WT_Treated_rep2 | 0 | 1 | 0 | 0 |
| WT_Treated_rep3 | 0 | 1 | 0 | 0 |
| KO_Control_rep1 | 0 | 0 | 1 | 0 |
| KO_Control_rep2 | 0 | 0 | 1 | 0 |
| KO_Control_rep3 | 0 | 0 | 1 | 0 |
| KO_Treated_rep1 | 0 | 0 | 0 | 1 |
| KO_Treated_rep2 | 0 | 0 | 0 | 1 |
| KO_Treated_rep3 | 0 | 0 | 0 | 1 |
# knitr::kable(
jamba::kable_coloring(
colorSub=c(`-1`="dodgerblue", `1`="firebrick"),
caption="Contrast matrix output from contrasts(sedesign).",
data.frame(check.names=FALSE, contrasts(sedesign)));| KO_Control-WT_Control | KO_Treated-WT_Treated | WT_Treated-WT_Control | KO_Treated-KO_Control | (KO_Treated-WT_Treated)-(KO_Control-WT_Control) | |
|---|---|---|---|---|---|
| WT_Control | -1 | 0 | -1 | 0 | 1 |
| WT_Treated | 0 | -1 | 1 | 0 | -1 |
| KO_Control | 1 | 0 | 0 | -1 | -1 |
| KO_Treated | 0 | 1 | 0 | 1 | 1 |
For convenience, SEDesign can be visualized using plot_sedesign():
# plot the design and contrasts
plot_sedesign(sedesign);
title(main="plot_sedesign(sedesign)\noutput:")
- Two-way contrasts are indicated by the “squiggly curved line” which connects the end of one contrast to the beginning of the next contrast. This connection describes the first contrast, subtracted by the second contrast.
Data normalization
SummarizedExperiment objects are normalized using:
-
se_normalize()- lightweight wrapper to several normalization functions:-
method="jammanorm": callsjamma::jammanorm()which normalizes the median log fold change to zero. This method is also used inDESeq2::estimateSizeFactors()for example. When usingjamma::jammaplot()for MA-plots, thejammanorm()method is conceptually equivalent to shifting the y-axis values to zero, so the median log fold change is zero. (Assumptions apply.) -
method="quantile: callslimma::normalizeQuantiles()for quantile normalization. -
method="limma_batch_adjust"ormethod="lba": callslimma::removeBatchEffect()to adjust of batch effects. This normalization is only recommended for visualization and clustering, not recommended prior to statistical comparisons. Instead, the recommendation is to define a blocking factor passed tose_contrast_stats()with argumentblock.
-
matrix_normalize()- is the core function forse_normalize()and operates on individual numeric data matrices.
Normalization can be reviewed with MA-plots, recommended by using
jamma::jammaplot(). See Github repository"jmw86069/jamma".
Statistical comparisons
se_contrast_stats() is the central function
Applies
SEDesign(design and contrasts) toSummarizedExperimentdata.-
Calls the appropriate
limmafunctions one or moreassaysin theSummarizedExperimentobject.-
use_voom=TRUEwill enablelimmavoommethodology for count data. -
posthoc_test="DEqMS"will enablelimma-DEqMSmethodology for proteomics mass spec data, which defines an error model based upon PSM counts per row. -
blockwill define blocking factor forlimma. When also applyinglimmavoom, it performs the appropriate two-step process as described by Dr. Smyth.
-
-
Applies statistical thresholds to mark statistical hits:
-
adjp_cutoff: adjusted P-value threshold -
fold_cutoff: normal space fold change threshold -
mgm_cutoff: “mgm” is an abbreviation for “max group mean”. This threshold requires at least one experiment group involved in the contrast to have group mean at or above this threshold in order to mark a statistical hit. It does not subset data prior to testing.
-
-
Additional threshold options:
-
p_cutoff: unadjusted P-value threshold -
ave_cutoff: average expression threshold, using the equivalent oflimmacolumnAveExprwith the mean group expression. Note thisAveExprvalue is calculated mean per row across all groups, and may be subject to skewing. -
int_adjp_cutoff,int_fold_cutoff,int_mgm_cutoff: optional thresholds only used for interaction contrasts, intended for data mining purposes. For example, data mining may filter for two-way contrast results with no fold change threshold, or slightly relaxedint_adjp_cutoff=0.1.
-
-
Options:
-
use_voom=TRUE: enable limmavoom workflow for count data -
posthoc_test="DEqMS": enable DEqMS post-hoc adjustment to the limma empirical Bayes model fit. -
floor_min,floor_value: logic to handle numeric values below a defined noise threshold, values below this threshold becomefloor_value. This filtering can also convert values from0zero toNA, where appropriate. For data with substantial missing measurements, this option may be beneficial. -
handle_na,na_value: logic to handleNAvalues, particularly when an entire group isNA. For proteomics, or platforms with a defined “noise floor”, it may be useful to usehandle_na="full1"andna_value=noise_floor. This specific scenario assignsnoise_floorto one value in any completelyNAgroup tonoise_floorso that its variability is not used, however the fold change can still be calculated relative to the platform minimum signal -
block: optional blocking factor, which enables thelimma::duplicateCorrelation()calculation, which is applied to the model fit per the The Limma User’s Guide.
-
-
Returns a
listreferred to assestats:-
"hit_array":arrayof named numeric vectors indicating direction of statistical hits after applying the relevant threshold cutoffs:-
1up-regulated -
-1down-regulated
-
"stats_dfs":listofdata.framefor each contrast, where column headers also include the statistical contrast to help confirm the identity of each analysis result."stats_df": one super-widedata.framewith results across all contrasts, assay data matrices, and hit thresholds. This matrix is intended to help filter for hits across the various different contrasts and thresholds.
-
-
Save to excel with
save_sestats():- This function helps automate saving every statistical contrast in table format, into individual Excel worksheets. It may include additional
rowData()annotations. - Each worksheet is styled consistently, making it easy to flip through each worksheet.
- This function helps automate saving every statistical contrast in table format, into individual Excel worksheets. It may include additional
Heatmaps
heatmap_se() is a wrapper for the amazing ComplexHeatmap::Heatmap(), intended to automate frequently-used options that represent repetitive work.
Common rules:
- All data is centered by default. It can be customized.
- All data is assumed to be
log2or otherwise “appropriately transformed”. (The typical transform islog2(1 + x).) - The heatmap can also be displayed as sample correlations (
correlation=TRUE) which re-uses the same data centering options (critical aspect of these plots). - The heatmap can be subset using
rowsorsestatsto show stat hit annotations on the left side. - Column and row annotations can be added using
colData(se)androwData(se) - Columns and rows can be split using
colData(se)androwData(se)colnames. - The heatmap title includes a summary of data.
Common arguments:
se: theSummarizedExperimentdata for the heatmapassay_name: define a specific data matrix to displayrows: choose a specific subset of rows (optional)-
sestats: optional stat hits in one of these formats:- output from
se_contrast_stats() - incidence
matrixwith valuesc(-1, 0, 1) -
listof numeric values usingc(-1, 0, 1), with names representingrownames(se) -
listofrownames(se)
- output from
top_colnames: column annotations incolData(se)to display across the top.rowData_colnames: optional row annotations to display on the left.sample_color_list:listnamed by colnames fromcolData(se)orrowData(se)to define colors.-
centerby_colnames: optional sub-groups for data centering- Useful to center multiple tissue types, or sample classes, batches, etc.
- Set
centerby_colnames=FALSEto turn off data centering altogether.
controlSamples: custom subset of control samplesfor data centering.row_split: split rows using colnames inrowData(se), or by number of dendrogram sub-tree clusters.column_split: split columns using colnames incolData(se), or by number of dendrogram sub-tree clusters.mark_rows: optional subset of rows for callout labels.