Multienrichment folio of summary plots
mem_plot_folio(
mem,
do_which = NULL,
p_cutoff = NULL,
p_floor = 1e-10,
main = "",
use_raster = TRUE,
min_gene_ct = 1,
min_set_ct = 1,
min_set_ct_each = 4,
column_method = "euclidean",
row_method = "euclidean",
exemplar_range = c(1, 2, 3),
pathway_column_split = NULL,
pathway_column_title = LETTERS,
gene_row_split = NULL,
gene_row_title = letters,
edge_color = NULL,
cex.main = 2,
cex.sub = 1.5,
row_cex = 1,
column_cex = 1,
max_labels = 4,
max_nchar_labels = 25,
include_cluster_title = TRUE,
repulse = 4,
use_shadowText = FALSE,
color_by_column = FALSE,
style = "dotplot",
enrich_im_weight = 0.3,
gene_im_weight = 0.5,
colorize_by_gene = TRUE,
cluster_color_min_fraction = 0.4,
byCols = c("composite_rank", "minp_rank", "gene_count_rank"),
edge_bundling = "connections",
apply_direction = NULL,
do_plot = TRUE,
verbose = TRUE,
...
)
list
object created by multiEnrichMap()
. Specifically
the object is expected to contain colorV
, enrichIM
,
memIM
, geneIM
.
integer vector of plots to produce. When do_which
is NULL
, then all plots are produced. This argument is intended
to help produce one plot from a folio, therefore each plot is referred
by the number of the plot, in order.
numeric value indicating the enrichment P-value threshold
used for multiEnrichMap()
, but when NULL
this value is taken
from the mem
input, or 0.05
is used by default.
numeric value indicating the lowest enrichment P-value used in the color gradient on the Enrichment Heatmap.
character string used as a title on Cnet plots.
logical indicating whether to use raster heatmaps,
passed to ComplexHeatmap::Heatmap()
.
integer values passed to
mem_gene_path_heatmap()
. The min_gene_ct
requires each set
to contain min_gene_ct
genes, and min_set_ct
requires each gene
to be present in at least min_set_ct
sets.
minimum number of genes required for each set, required for at least one enrichment test.
arguments passed to
ComplexHeatmap::Heatmap()
which indicate the distance method used
to cluster columns and rows, respectively.
integer vector (or NULL
) used to create Cnet
exemplar plots, using this many exemplars per cluster.
integer
value passed
as column_split
and row_split
, respectively, to
mem_gene_path_heatmap()
, indicating the number of pathway
clusters, and gene clusters, to create in the gene-pathway heatmap.
When either value is NULL
then auto-split logic is used.
character
vectors
passed to mem_gene_path_heatmap()
as column_title
and
row_title
, respectively. When one value is supplied, it is
displayed and centered across all the respective splits. When
multiple values are supplied, values are used to the number
of splits, and recycled as needed. In that case, repeated
values are made unique by jamba::makeNames()
.
numeric values passed to title()
which
size the default title and sub-title in Cnet plots.
numeric
character expansion factor, used
to adjust the relative size of row and column labels,
respectively. A value of 1.1
will make row font size 10%
larger.
logical
indicating whether to colorize
the enrichment heatmap columns using colorV
in the input mem
.
This argument is only relevant when do_which
include 1
.
numeric
value between 0 and
1, passed to mem_gene_path_heatmap()
, used to apply relative
weight to clustering columns and rows, respectively, when
combining the gene-pathway incidence matrix with either column
enrichment P-values, or row gene incidence matrix data.
logical
passed to mem_gene_path_heatmap()
indicating whether the heatmap body for the gene-pathway heatmap
will be colorized using the enrichment colors for each gene.
numeric
value passed to
collapse_mem_clusters()
used to determine which enrichment
colors to associate with each Cnet cluster.
character
vector describing how to sort the
pathways within Cnet clusters. This argument is passed
to rank_mem_clusters()
.
character
string passed to jam_igraph()
to control edge bundling. The default edge_bundling="connections"
will bundle Cnet plot edges for genes that share the same pathway
connections.
logical
or NULL
indicating whether to
indicate directionality in the mem_enrichment_heatmap()
which is
the first plot in the series. The default apply_direction=NULL
will auto-detect whether there is directionality present in the
data, and will set apply_direction=TRUE
only when there are non-NA
values that differ from zero.
logical
indicating whether to render each plot.
When do_plot=FALSE
the plot objects will be created and returned,
but the plot itself will not be rendered. This option may be
useful to generate the full set of figures in one set, then
review each figure one by one in an interactive session.
logical
indicating whether to print verbose output.
additional arguments are passed to downstream functions.
Notably, sets
is passed to mem_gene_path_heatmap()
which
allows one to define a specific subset of sets to use in the
gene-pathway heatmap.
list
is returned via invisible()
, which contains each
plot object enabled by the argument do_which
:
enrichment_hm
is a Heatmap object from ComplexHeatmap
that contains the enrichment P-value heatmap. Note that this
data is not used directly in subsequent plots, the pathway
clusters shown here are based upon -log10(Pvalue)
and not
the underlying gene content of each pathway. This plot is
a useful overview that answers the question "How many
pathways are significantly enriched across the different
enrichment tests?"
gp_hm
is a Heatmap object from ComplexHeatmap
with
the gene-pathway incidence matrix heatmap. This heatmap and
the column/pathway clusters are the subject of subsequent
Cnet plots.
gp_hm_caption
is a text caption that describes the gene
and set filter criteria, and the row and column distance methods
used for clustering. Because the filtering and clustering
options have substantial impact on clustering, and the
pathway clusters are the key for all subsequent plots,
these values are important to keep associated with the
output of this function.
clusters_mem
is a list
with the pathways contained
in each pathway cluster shown by the gene-pathway heatmap,
obtained by heatmap_column_order(gp_hm)
. The pathway names
should also be present in colnames(mem$memIM)
and
rownames(mem$enrichIM)
, for follow-up inspection.
cnet_collapsed
is an igraph
object with Cnet plot data,
where the pathways have been collapsed by cluster, using the
gene-pathway heatmap clusters defined in clusters_mem
. Each
pathway cluster is labeled by cluster name, and the first few
pathway names.
This data can be plotted using jam_igraph(cnet_collapsed)
.
cnet_collapsed_set
is the same as cnet_collapsed
except the
pathways are labeled by the cluster name only, for example
c("A", "B", "C", "D")
.
This data can be plotted using jam_igraph(cnet_collapsed_set)
.
cnet_collapsed_set2
is the same as cnet_collapsed_set
except the
gene labels are hidden, useful when there are too many genes to label
clearly. The gene symbols are still stored in V(g)$name
but the labels
in V(g)$label
are updated to hide the genes.
This data can be plotted using jam_igraph(cnet_collapsed_set2)
.
cnet_exemplars
is a list
of igraph
Cnet objects, each
one contains only the number of exemplar pathways from each cluster
defined by argument exemplar_range
. By default it uses 1
exemplar
per cluster, then 2
exemplars per cluster, then 3
exemplars
per cluster. A number of published figures use 1
exemplar per
pathway cluster.
This data can be plotted using jam_igraph(cnet_exemplars[[1]])
,
which will plot only the first igraph
object from the list.
cnet_clusters
is a list
of igraph
Cnet objects, each one
contains all the pathways in one pathway cluster.
This data can be plotted using jam_igraph(cnet_clusters[[1]])
,
or by calling a specific cluster jam_igraph(cnet_clusters[["A"]])
.
This function is intended to create multiple summary plots
using the output data from multiEnrichMap()
. By default
it creates all plots one by one, sufficient for including
in a multi-page PDF document with cairo_pdf(..., onefile=TRUE)
or pdf(..., onefile=TRUE)
.
The data for each plot object can be created and visualized later
with argument do_plot=FALSE
.
Note: Since version 0.0.76.900
the first step in the workflow is
to cluster the underlying gene-pathway incidence matrix.
This step defines a consistent dendrogram driven by underlying
gene content in each pathway.
The dendrogram is used by each subsequent plot
including the enrichment heatmap.
There are two recommended strategies for visualizing multienrichment results:
Pathway clusters viewed as a concept network (Cnet) plot.
Given numerous statistically enriched pathways, this process defines pathway clusters using the underlying gene-pathway incidence matrix.
Within each pathway cluster, the pathways typically share a high proportion of the same genes, and therefore are expected to represent very similar functions. Ideally, each cluster represents some distinct biological function, or a functional theme.
Benefit: Reducing a large number of pathways to a small number of clusters greatly improves the options for visualization, while retaining a comprehensive view of all genes and pathways involved.
Benefit: This option is recommended when there are numerous pathways, and when including more pathways is beneficial to understanding the overall functional effects of the experimental study.
Limitation: The downside with this approach is that sometimes this comprehensive content can be too much detail to interpret in one figure, overshadowing individual pathways in each cluster.
Limitation: It may be difficult to recognize a functional theme for each pathway cluster, unfortunately that process is not (yet) automated and requires some domain expertise of the pathways and functions involved.
Limitation: It may not be possible for one Cnet plot to represent all functional effects of an experimental study.
Exemplar pathways are viewed as a Cnet plot.
As described above, given numerous statistically enriched pathways, pathways are clustered using the gene-pathway incidence matrix. One "exemplar" pathway is selected from each cluster to represent the typical pathway content in each cluster, usually the most significant pathway in the cluster, but optionally the pathway containing the most total genes.
Benefit: This process can produce a cleaner figure than Option 1 PathwayClusters, because fewer pathways and their associated genes are included in the figure.
Limitation: This cleaner figure is understandably somewhat less comprehensive, and may be subject to bias when selecting exemplar pathways. However the selection of relevant pathways may be very effective within the context of the experimental study.
Benefit: The resulting Cnet plot can often improve focus on specific genes and pathways, which can be advantageous when including numerous "synonyms" for the same or similar pathways is not beneficial.
Benefit: This strategy also works particularly well when there are
relatively few enriched pathways, or when argument topEnrichN
used
with multiEnrichMap()
was relatively small.
The folio of plots includes:
Enrichment Heatmap, using enrichment P-values via
mem_enrichment_heatmap()
. Plot #1.
Gene-Pathway Incidence Matrix Heatmap using mem_gene_path_heatmap()
.
This step visualizes the pathway clustering to be used by all
other plots in the folio. Plot #2.
Cnet Cluster Plot representing Gene-Pathway clusters as a network,
created using collapse_mem_clusters()
, then plotted with jam_igraph()
.
Plots #3, #4, and #5.
Cnet Exemplar Plots using exemplar pathways from each
gene-pathway cluster, with increase number of exemplars included
from each cluster (n per cluster). Cnet igraph
objects are created
using subsetCnetIgraph()
, then plotted with jam_graph()
.
Plots #6, #7, and #8.
Cnet Individual Cluster Plots with one plot for each gene-pathway
cluster defined above, including all pathways within the cluster.
These plots are mostly useful when a particular cluster may
have multiple sub-clusters included together. The plots can be useful
to understand the relationship between pathways in each cluster.
Plots #9, #10, and so on, length equal to pathway_column_split
.
The specific plots to be created are controlled with do_which
:
do_which=1
will create the enrichment heatmap.
do_which=2
will create the gene-pathway heatmap.
do_which=3
will create the Cnet Cluster Plot using
pathway cluster labels for each pathway node, by default it uses LETTERS
:
"A", "B", "C", "D"
, etc.
do_which=4
will create the Cnet Cluster Plot using abbreviated
pathway labels for each pathway cluster node.
do_which=5
will create the Cnet Cluster Plot with no node labels.
do_which=6
begins the series of Cnet Exemplar Plots for each value
in argument exemplar_range
, whose default is c(1, 2, 3)
.
do_which=9
(by default) begins the series of Cnet individual
cluster plots, which includes all pathways from each cluster.
The most frequently used plots are do_which=2
for the
gene-pathway heatmap, and do_which=4
for the collapsed Cnet
plot, where Cnet clusters are based upon the gene-pathway heatmap.
Arguments p_cutoff
and min_set_ct_each
can be used to
apply more stringent thresholds than the original mem
data.
For example, applying p_cutoff=0.05
during multiEnrichMap()
will colorize pathways in mem$enrichIMcolors
, however when
calling mem_plot_folio()
with p_cutoff=0.001
will use blank
color in the color gradient for pathways that do not
have mem$enrichIM
value at or below 0.001
.
Our experience is that the pathway clustering does not need to be perfect to be useful and valid. The pathway clusters are valid based upon the parameters used for clustering, and provide insight into the genes that help define each cluster distinct from other clusters. Sometimes the clustering results are more or less effective based upon the type of pattern observed in the data, so it can be helpful to adjust parameters to drill down to the most effective patterns.
The clustering is performed by combining the gene-pathway incidence
matrix mem$memIM
with the -log10(mem$enrichIM)
enrichment P-values.
The relative weight of each matrix is controlled by
enrich_im_weight
, where enrich_im_weight=0
assigns weight=0
to the enrichment P-values, and thus clusters only using the
gene-pathway matrix. Similarly, enrich_im_weight=1
will assign
full weight to the enrichment P-value matrix, and will ignore
the gene-pathway matrix data.
The corresponding weight for gene (rows) is controlled by
gene_im_weight
, which balances row clustering with the
mem$geneIM
matrix, and the gene-pathway matrix mem$memIM
.
The argument column_method
defines the distance method,
for example "euclidean"
and "binary"
are two immediate choices.
The method also adds "correlation"
from amap::hcluster()
which
can be very useful especially with large datasets.
The number of pathway clusters is controlled by
pathway_column_split
, by default when pathway_column_split=NULL
and auto_cluster=TRUE
the number of clusters is defined based
upon the total number of pathways. In practice, pathway_column_split=4
or pathway_column_split=3
is recommended, as this number of
clusters is most convenient to visualize as a Cnet plot.
To define your own pathway cluster labels, define pathway_column_title
as a vector with length equal to pathway_column_split
. These labels
become network node labels in subsequent plots, and in the
resulting igraph
object.
The pathway clusters are dependent upon the genes and pathways
used during clustering, which are also controlled by
min_set_ct
and min_gene_ct
.
min_set_ct
filters the matrix by the number of times a Set is
represented in the matrix,
which can be helpful when there are pathways with large number of
genes, with some pathways with very low number of genes.
min_gene_ct
filters the matrix by the number of times a gene is
represented in the matrix. It can be helpful for requiring a gene
be represented in more than one enriched pathway.
min_set_ct_each
filters the matrix to require each Set to
contain at least this many entries from one enrichment result,
rather than using the combined incidence matrix. It is mostly
helpful to increase the value used in multiEnrichMap()
argument
min_count
, which already filters pathways for minimum number
of genes involved.
Note: These filters are only recommended when the gene-pathway matrix is very large, perhaps 100 pathways, or 500 genes.
The resulting Cnet pathway clusters are single nodes in the
network, and these nodes are colorized based upon the enrichment
tests involved. The threshold for including the color for
each enrichment test is defined by cluster_color_min_fraction
,
which requires at least this fraction of pathways in a
pathway cluster meets the significance criteria for that
enrichment test.
To adjust the coloration filter to include any enrichment
test with at least one significant result, use
cluster_color_min_fraction=0.01
.
In the gene-pathway heatmap,
these colors are shown across the top of the heatmap.
The default cluster_color_min_fraction=0.4
requires 40%
of pathways in a cluster for each enrichment test.
Note: Prior to version 0.0.76.900
the enrichment heatmap was clustered only using enrichment
P-values, transformed with log10(Pvalue)
. The clustering was
inconsistent with other plots in the folio, and was not effective
at clustering pathways based upon similar content, which is the
primary goal of the multienrichjam
R package.
Other jam plot functions:
adjust_polygon_border()
,
grid_with_title()
,
jam_igraph()
,
mem_enrichment_heatmap()
,
mem_gene_path_heatmap()
,
mem_legend()
,
mem_multienrichplot()
,
plot_layout_scale()