Prepare MultiEnrichMap data from enrichList
multiEnrichMap(
enrichList,
geneHitList = NULL,
geneHitIM = NULL,
colorV = NULL,
nrow = NULL,
ncol = NULL,
byrow = FALSE,
enrichLabels = NULL,
subsetSets = NULL,
overlapThreshold = 0.1,
cutoffRowMinP = 0.05,
enrichBaseline = -log10(cutoffRowMinP),
enrichLens = 0,
enrichNumLimit = 4,
nEM = 500,
min_count = 1,
topEnrichN = 20,
topEnrichSources = c("gs_cat", "gs_subat"),
topEnrichCurateFrom = NULL,
topEnrichCurateTo = NULL,
topEnrichSourceSubset = NULL,
topEnrichDescriptionGrep = NULL,
topEnrichNameGrep = NULL,
keyColname = c("ID", "Name", "pathway", "itemsetID", "Description"),
nameColname = c("Name", "pathway", "Description", "itemsetID", "ID"),
geneColname = c("geneID", "geneNames", "Genes"),
countColname = c("gene_count", "count", "geneHits"),
pvalueColname = c("padjust", "p.adjust", "adjp", "padj", "qvalue", "qval", "q.value",
"pvalue", "p.value", "pval", "FDR"),
descriptionColname = c("Description", "Name", "Pathway", "ID"),
descriptionCurateFrom = c("^Genes annotated by the GO term "),
descriptionCurateTo = c(""),
directionColname = c("activation.z.{0,1}score", "z.{0,1}score"),
direction_cutoff = 0,
pathGenes = c("setSize", "pathGenes", "Count"),
geneHits = c("Count", "geneHits", "gene_count"),
geneDelim = "[,/ ]+",
GmtTname = NULL,
msigdbGmtT = NULL,
verbose = FALSE,
...
)
list
of enrichResult
objects, whose
names are used in subsequent derived results.
list
of character vectors, or
list
of numeric
vectors whose names represent genes, or
or NULL
. When NULL
the gene hit list for each enrichment
result is inferred from the enrichment results themselves,
however this option may incompletely represent which genes
were statistical hits.
Note that geneHitList
and geneHitIM
serve the same purpose
and either can be supplied.
numeric
matrix with gene rows, enrichment columns,
and numeric
values indicating the presence and/or direction
of change for each gene.
Note that geneHitList
and geneHitIM
serve the same purpose
and either can be supplied.
character
vector of colors, length
equal to length(enrichList)
,
used to assign specific colors to each enrichment result.
optional arguments used to customize
igraph
node shape "coloredrectangle"
, useful when the
number of enrichList
results is larger than around 4. It
defines the number of columns and rows used for each node,
to display enrichment result colors, and whether to fill
colors by row when byrow=TRUE
, or by column when byrow=FALSE
.
character
vector of enrichment labels to use,
as an optional alternative to names(enrichList)
.
character
vector of optional set names to
use in the analysis, useful to analyze only a specific subset
of known pathways.
numeric
value between 0 and 1, indicating
the Jaccard overlap score above which two pathways will be linked
in the EnrichMap igraph
network. By default, pathways whose
genes overlap more than 0.1
will be connected, which is roughly
equivalent to about a 10% overlap. Note that the Jaccard coefficient
is adversely affected when pathway sets differ in size by more than
about 5-fold.
numeric
value between 0 and 1, indicating the
enrichment P-value required by at least one enrichment result, to
be retained in downstream analyses. This P-value can be confirmed
in the returned list element "enrichIM"
, which is a matrix of
P-values by pathway and enrichment.
numeric
value indicating the -log10(P-value)
at which colors are defined as non-blank in color gradients.
This value is typically derived from cutoffRowMinP
to ensure
that colors are only applied when a pathway meets this significance
threshold.
numeric
value indicating the "lens" to apply to
color gradients, where numbers above 0 make the color ramp more
compressed, so colors are more vivid at lower numeric values.
numeric
value indicating the -log10(P-value)
above which each color gradient is considered the maximum color,
useful to apply a fixed threshold for each color gradient.
integer
number, to define the maximum number of pathway
nodes to include in the EnrichMap igraph
network. This argument
is passed to enrichMapJam()
.
integer
value with the maximum rows to retain
from each enrichList
table, by source. Set topEnrichN=0
or
topEnrichN=NULL
to disable subsetting for the top rows.
arguments passed to topEnrichListBySource()
when topEnrichN
is greater than 0
. The default values are used only when
input data matches these patterns.
character
vector in each case indicating the colnames
for key
, name
, gene
, pvalue
, and description
,
respectively. Each vector is passed to find_colname()
to find
a suitable matching colname for each data.frame
in
enrichList
.
character
vectors
with patterns and replacements, passed to gsubs()
, intended to
help curate common descriptions to shorter, perhaps more
user-friendly labels. One example is removing the prefix
"Genes annotated by the GO term "
from Gene Ontology pathways.
These label can be manually curated later, but it is often
more convenient to curate them upfront in order to keep the
different result objects consistent.
character
values indicating the colnames
that contain the number of pathway genes, and the number of gene
hits, respectively.
character
pattern used with strsplit()
to
split multiple gene values into a list of vectors. The default
for enrichResult
objects is "/"
, but the default for other
sources is often ","
. The default pattern "[,/ ]+"
splits
by either "/"
, ","
, or whitespace " "
.
logical
indicating whether to print verbose output.
For verbose
to cascade to internal functions, use verbose=2
.
additional arguments are passed to various internal functions.
list
object containing various result formats:
colorV: named vector of colors assigned to each enrichment,
where names match the names of each enrichment in enrichList
.
This function performs most of the work of comparing multiple
enrichment results.
This function takes a list of enrichResult
objects,
generates an overall pathway-gene incidence matrix, assembles
a pathway-to-Pvalue matrix, creates EnrichMap igraph
network
objects, and CnetPlot igraph
network objects. It also applies
node shapes and colors consistent with the colors used for
each enrichment result.
By default, each enrichment result table is subsetted for the
top n=20
pathways sorted by pathway source, defined by
colnames c("Source", "Category")
. For data without a source
column, the overall enrichment results are sorted to take the
top 20. Once the top 20 from each enrichment table are selected,
the overall set of pathways are used to retain these pathways
from all enrichment tables. In this way, a significant enrichment
result from one table will still be compared to a non-significant
result from another table.
The default values for topEnrichN
and related arguments
are intended when using enrichment results from MSigDB
,
which has colnames c("Source","Category")
and represents
100 or more combinations of sources and categories. The
default values will select the top 20 entries from the
canonical pathways, after curating the canonical pathway
categories to one "CP"
source value.
To disable the top pathway filtering, set topEnrichN=0
.
Colors can be defined for each enrichment result using the
argument colorV
, otherwise colors are assigned using
colorjam::rainbowJam()
.
Other jam enrichment functions:
add_pathway_direction()
,
topEnrichBySource()
## See the Vignette for a full walkthrough example