Prepare MultiEnrichMap data from enrichList
Usage
multiEnrichMap(
enrichList,
geneHitList = NULL,
geneHitIM = NULL,
colorV = NULL,
nrow = NULL,
ncol = NULL,
byrow = FALSE,
enrichLabels = NULL,
subsetSets = NULL,
overlapThreshold = deprecated(),
p_cutoff = 0.05,
cutoffRowMinP = deprecated(),
enrichBaseline = -log10(p_cutoff),
enrichLens = 0,
enrichNumLimit = 4,
nEM = 500,
min_count = 3,
topEnrichN = 20,
topEnrichSources = c("gs_cat", "gs_subcat"),
topEnrichCurateFrom = NULL,
topEnrichCurateTo = NULL,
topEnrichSourceSubset = NULL,
topEnrichDescriptionGrep = NULL,
topEnrichNameGrep = NULL,
keyColname = c("ID", "Name", "pathway", "itemsetID", "Description"),
nameColname = c("Name", "pathway", "Description", "itemsetID", "ID"),
geneColname = c("geneID", "geneNames", "Genes"),
countColname = c("gene_count", "count", "geneHits"),
pvalueColname = c("padjust", "p.adjust", "adjp", "padj", "qvalue", "qval", "q.value",
"pvalue", "p.value", "pval", "FDR"),
descriptionColname = c("Description", "Name", "Pathway", "ID"),
descriptionCurateFrom = c("^Genes annotated by the GO term "),
descriptionCurateTo = c(""),
directionColname = c("activation.z.{0,1}score", "z.{0,1}score"),
direction_cutoff = 0,
pathGenes = c("setSize", "pathGenes", "Count"),
geneHits = c("Count", "geneHits", "gene_count"),
geneDelim = "[,/ ]+",
GmtTname = NULL,
msigdbGmtT = NULL,
returnType = c("Mem", "list"),
verbose = FALSE,
...
)Arguments
- enrichList
listofenrichResultordata.frameobjects.The
names(enrichList)are used in subsequent results.Note that
data.frameare converted toenrichResultusingenrichDF2enrichResult(x, ...)where the '...' ellipses are used to recognize colnames.Recommendation is to confirm each
data.frameis properly converted toenrichResultupfront.
- geneHitList
listof character vectors, orlistofnumericvectors whose names represent genes, or orNULL. WhenNULLthe gene hit list for each enrichment result is inferred from the enrichment results themselves, however this option may incompletely represent which genes were statistical hits. Note thatgeneHitListandgeneHitIMserve the same purpose and either can be supplied.- geneHitIM
numericmatrix with gene rows, enrichment columns, andnumericvalues indicating the presence and/or direction of change for each gene. Note thatgeneHitListandgeneHitIMserve the same purpose and either can be supplied.- colorV
charactervector of colors, length equal tolength(enrichList), used to assign specific colors to each enrichment result.- nrow, ncol, byrow
optional arguments used to customize
igraphnode shape"coloredrectangle", useful when the number ofenrichListresults is larger than around 4. It defines the number of columns and rows used for each node, to display enrichment result colors, and whether to fill colors by row whenbyrow=TRUE, or by column whenbyrow=FALSE.- enrichLabels
charactervector of enrichment labels to use, as an optional alternative tonames(enrichList).- subsetSets
charactervector of optional set names to use in the analysis, useful to analyze only a specific subset of known pathways.- overlapThreshold
numeric(deprecated), value between 0 and 1, used to define the Enrichment Map, which is only created for legacy output enabled withreturnType="list".To create a Multi-Enrichment Map
igraphobject, seemem2emap().overlapThreshold is the Jaccard overlap score above which two pathways will be linked in the resulting network.
- p_cutoff
numericvalue between 0 and 1, default 0.05, enrichment P-value required by at least one enrichment result to be retained in downstream analyses. This P-value can be reviewed inenrichIM(Mem)of the output, which is a matrix of P-values by pathway and enrichment. The column header assigned as P-value is stored inheaders(Mem)$pvalueColname.- cutoffRowMinP
(deprecated in favor of 'p_cutoff'). When it is non-NULL, it will be used for backward compatibility.
- enrichBaseline
numericvalue, default uses-log10(p_cutoff), the -log10 P-value threshold required for the color gradient to assign a color associated with enrichment.In other words, P-values that do not meet this threshold are not colored in the color gradient for enrichment P-values.
To color all P-values use
enrichBaseline=0.
- enrichLens
numericvalue, default 0, indicating the "lens" to enhance the intensity of color gradients. Numbers above 0 make the color ramp more compressed, and more vivid at lower numeric values.- enrichNumLimit
numericvalue indicating the-log10(P-value)above which each color gradient is considered the maximum color, useful to apply a fixed threshold for each color gradient.- nEM
integer(deprecated) maximum pathways to include in the Enrichment Map, which is only created for legacy output enabled withreturnType="list".To create a Multi-Enrichment Map
igraphobject, seemem2emap().
- topEnrichN
integermaximum rows to retain from eachenrichResultin 'enrichList', for each source when supplied. SettopEnrichN=0ortopEnrichN=NULLto retain all rows. Only rows where the pathway met other filtering criteria in at least one enrichment in 'enrichList' will be retained.- topEnrichSources, topEnrichCurateFrom, topEnrichCurateTo, topEnrichSourceSubset, topEnrichDescriptionGrep, topEnrichNameGrep
arguments passed to
topEnrichListBySource()whentopEnrichNis greater than0. The default values are used only when input data matches these patterns.- keyColname, nameColname, geneColname, pvalueColname, descriptionColname
charactervector in each case with text strings or patterns to use when matching or prioritizing colnames to assign to each type:key: The primary unique key for each pathwayname: The short name for each pathwaygene: column with delimited gene symbols or identifiers tested for enrichment of each pathway.pvalue: column with enrichment P-value, typically prioritizing either 'qvalue' or 'p.adjust'. PerenrichResultconvention, the selected column is renamed to 'pvalue'.description: longer description associated with the pathway.
Each vector is passed to
find_colname()to find a suitable matching colname for each entry inenrichList. That function prioritizes full colname matches, then leading or trailing matches, then substring.- descriptionCurateFrom, descriptionCurateTo
charactervectors with patterns and replacements, passed togsubs(), intended to help curate common descriptions to shorter, perhaps more user-friendly labels. One example is removing the prefix"Genes annotated by the GO term "from Gene Ontology pathways. These label can be manually curated later in theMem-classmethods, specificallysets()<-will allows assignment of custom names.- pathGenes, geneHits
charactervalues indicating the colnames that contain the number of pathway genes, and the number of gene hits, respectively. These values are optional, and not specifically used by multienrichjam.- geneDelim
characterpattern used withstrsplit()to split multiple gene values into a list of vectors. The default (required) delimiter forenrichResultobjects is'/'. A common alternative is','(comma-delimited).- verbose
logicalindicating whether to print verbose output. Forverboseto cascade to internal functions, useverbose=2.- ...
additional arguments are passed to various internal functions.
Value
list object containing various result formats:
colorV: named vector of colors assigned to each enrichment, where names match the names of each enrichment in
enrichList.
Details
This function performs most of the work of comparing multiple
enrichment results.
This function takes a list of enrichResult objects,
generates an overall pathway-gene incidence matrix, assembles
a pathway-to-Pvalue matrix, creates EnrichMap igraph network
objects, and CnetPlot igraph network objects. It also applies
node shapes and colors consistent with the colors used for
each enrichment result.
By default, each enrichment result table is subsetted for the
top n=20 pathways sorted by pathway source, defined by
colnames c("Source", "Category"). For data without a source
column, the overall enrichment results are sorted to take the
top 20. Once the top 20 from each enrichment table are selected,
the overall set of pathways are used to retain these pathways
from all enrichment tables. In this way, a significant enrichment
result from one table will still be compared to a non-significant
result from another table.
The default values for topEnrichN and related arguments
are intended when using enrichment results from MSigDB,
which has colnames c("Source","Category") and represents
100 or more combinations of sources and categories. The
default values will select the top 20 entries from the
canonical pathways, after curating the canonical pathway
categories to one "CP" source value.
To disable the top pathway filtering, set topEnrichN=0.
Colors can be defined for each enrichment result using the
argument colorV, otherwise colors are assigned using
colorjam::rainbowJam().
See also
Other multienrichjam core functions:
jam_igraph(),
mem_plot_folio()