Prepare MultiEnrichMap data from enrichList
Usage
multiEnrichMap(
enrichList,
geneHitList = NULL,
geneHitIM = NULL,
colorV = NULL,
nrow = NULL,
ncol = NULL,
byrow = FALSE,
enrichLabels = NULL,
subsetSets = NULL,
overlapThreshold = 0.1,
cutoffRowMinP = 0.05,
enrichBaseline = -log10(cutoffRowMinP),
enrichLens = 0,
enrichNumLimit = 4,
nEM = 500,
min_count = 3,
topEnrichN = 20,
topEnrichSources = c("gs_cat", "gs_subat"),
topEnrichCurateFrom = NULL,
topEnrichCurateTo = NULL,
topEnrichSourceSubset = NULL,
topEnrichDescriptionGrep = NULL,
topEnrichNameGrep = NULL,
keyColname = c("ID", "Name", "pathway", "itemsetID", "Description"),
nameColname = c("Name", "pathway", "Description", "itemsetID", "ID"),
geneColname = c("geneID", "geneNames", "Genes"),
countColname = c("gene_count", "count", "geneHits"),
pvalueColname = c("padjust", "p.adjust", "adjp", "padj", "qvalue", "qval", "q.value",
"pvalue", "p.value", "pval", "FDR"),
descriptionColname = c("Description", "Name", "Pathway", "ID"),
descriptionCurateFrom = c("^Genes annotated by the GO term "),
descriptionCurateTo = c(""),
directionColname = c("activation.z.{0,1}score", "z.{0,1}score"),
direction_cutoff = 0,
pathGenes = c("setSize", "pathGenes", "Count"),
geneHits = c("Count", "geneHits", "gene_count"),
geneDelim = "[,/ ]+",
GmtTname = NULL,
msigdbGmtT = NULL,
returnType = c("list", "Mem"),
verbose = FALSE,
...
)
Arguments
- enrichList
list
ofenrichResult
ordata.frame
objects.The
names(enrichList)
are used in subsequent results.Note that
data.frame
are converted toenrichResult
usingenrichDF2enrichResult(x, ...)
where the '...' ellipses are used to recognize colnames.Recommendation is to confirm each
data.frame
is properly converted toenrichResult
upfront.
- geneHitList
list
of character vectors, orlist
ofnumeric
vectors whose names represent genes, or orNULL
. WhenNULL
the gene hit list for each enrichment result is inferred from the enrichment results themselves, however this option may incompletely represent which genes were statistical hits. Note thatgeneHitList
andgeneHitIM
serve the same purpose and either can be supplied.- geneHitIM
numeric
matrix with gene rows, enrichment columns, andnumeric
values indicating the presence and/or direction of change for each gene. Note thatgeneHitList
andgeneHitIM
serve the same purpose and either can be supplied.- colorV
character
vector of colors, length equal tolength(enrichList)
, used to assign specific colors to each enrichment result.- nrow, ncol, byrow
optional arguments used to customize
igraph
node shape"coloredrectangle"
, useful when the number ofenrichList
results is larger than around 4. It defines the number of columns and rows used for each node, to display enrichment result colors, and whether to fill colors by row whenbyrow=TRUE
, or by column whenbyrow=FALSE
.- enrichLabels
character
vector of enrichment labels to use, as an optional alternative tonames(enrichList)
.- subsetSets
character
vector of optional set names to use in the analysis, useful to analyze only a specific subset of known pathways.- overlapThreshold
numeric
value between 0 and 1, indicating the Jaccard overlap score above which two pathways will be linked in the EnrichMapigraph
network. By default, pathways whose genes overlap more than0.1
will be connected, which is roughly equivalent to about a 10% overlap. Note that the Jaccard coefficient is adversely affected when pathway sets differ in size by more than about 5-fold.- cutoffRowMinP
numeric
value between 0 and 1, indicating the enrichment P-value required by at least one enrichment result, to be retained in downstream analyses. This P-value can be confirmed in the returned list element"enrichIM"
, which is a matrix of P-values by pathway and enrichment.- enrichBaseline
numeric
value indicating the-log10(P-value)
at which colors are defined as non-blank in color gradients. This value is typically derived fromcutoffRowMinP
to ensure that colors are only applied when a pathway meets this significance threshold.- enrichLens
numeric
value indicating the "lens" to apply to color gradients, where numbers above 0 make the color ramp more compressed, so colors are more vivid at lower numeric values.- enrichNumLimit
numeric
value indicating the-log10(P-value)
above which each color gradient is considered the maximum color, useful to apply a fixed threshold for each color gradient.- nEM
integer
number, to define the maximum number of pathway nodes to include in the EnrichMapigraph
network. This argument is passed toenrichMapJam()
.- topEnrichN
integer
value with the maximum rows to retain from eachenrichList
table, by source. SettopEnrichN=0
ortopEnrichN=NULL
to disable subsetting for the top rows.- topEnrichSources, topEnrichCurateFrom, topEnrichCurateTo, topEnrichSourceSubset, topEnrichDescriptionGrep, topEnrichNameGrep
arguments passed to
topEnrichListBySource()
whentopEnrichN
is greater than0
. The default values are used only when input data matches these patterns.- keyColname, nameColname, geneColname, pvalueColname, descriptionColname
character
vector in each case indicating the colnames forkey
,name
,gene
,pvalue
, anddescription
, respectively. Each vector is passed tofind_colname()
to find a suitable matching colname for eachdata.frame
inenrichList
.- descriptionCurateFrom, descriptionCurateTo
character
vectors with patterns and replacements, passed togsubs()
, intended to help curate common descriptions to shorter, perhaps more user-friendly labels. One example is removing the prefix"Genes annotated by the GO term "
from Gene Ontology pathways. These label can be manually curated later, but it is often more convenient to curate them upfront in order to keep the different result objects consistent.- pathGenes, geneHits
character
values indicating the colnames that contain the number of pathway genes, and the number of gene hits, respectively.- geneDelim
character
pattern used withstrsplit()
to split multiple gene values into a list of vectors. The default forenrichResult
objects is"/"
, but the default for other sources is often","
. The default pattern"[,/ ]+"
splits by either"/"
,","
, or whitespace" "
.- verbose
logical
indicating whether to print verbose output. Forverbose
to cascade to internal functions, useverbose=2
.- ...
additional arguments are passed to various internal functions.
Value
list
object containing various result formats:
colorV: named vector of colors assigned to each enrichment, where names match the names of each enrichment in
enrichList
.
Details
This function performs most of the work of comparing multiple
enrichment results.
This function takes a list of enrichResult
objects,
generates an overall pathway-gene incidence matrix, assembles
a pathway-to-Pvalue matrix, creates EnrichMap igraph
network
objects, and CnetPlot igraph
network objects. It also applies
node shapes and colors consistent with the colors used for
each enrichment result.
By default, each enrichment result table is subsetted for the
top n=20
pathways sorted by pathway source, defined by
colnames c("Source", "Category")
. For data without a source
column, the overall enrichment results are sorted to take the
top 20. Once the top 20 from each enrichment table are selected,
the overall set of pathways are used to retain these pathways
from all enrichment tables. In this way, a significant enrichment
result from one table will still be compared to a non-significant
result from another table.
The default values for topEnrichN
and related arguments
are intended when using enrichment results from MSigDB
,
which has colnames c("Source","Category")
and represents
100 or more combinations of sources and categories. The
default values will select the top 20 entries from the
canonical pathways, after curating the canonical pathway
categories to one "CP"
source value.
To disable the top pathway filtering, set topEnrichN=0
.
Colors can be defined for each enrichment result using the
argument colorV
, otherwise colors are assigned using
colorjam::rainbowJam()
.
See also
Other jam enrichment functions:
add_pathway_direction()
,
multienrichjam()
,
topEnrichBySource()