Multienrichment folio of summary plots
Usage
mem_plot_folio(
mem,
do_which = NULL,
p_cutoff = NULL,
p_floor = 1e-10,
main = "",
use_raster = FALSE,
min_gene_ct = 1,
min_set_ct = 1,
min_set_ct_each = 4,
column_method = "euclidean",
row_method = "euclidean",
exemplar_range = c(1, 2, 3),
pathway_column_split = NULL,
pathway_column_title = LETTERS,
gene_row_split = NULL,
gene_row_title = letters,
edge_color = NULL,
cex.main = 2,
cex.sub = 1.5,
row_cex = 1,
column_cex = 1,
max_labels = 4,
max_nchar_labels = 25,
include_cluster_title = TRUE,
repulse = 4,
use_shadowText = FALSE,
color_by_column = FALSE,
style = "dotplot_inverted",
enrich_im_weight = 0.3,
gene_im_weight = 0.5,
colorize_by_gene = TRUE,
cluster_color_min_fraction = 0.4,
byCols = c("composite_rank", "minp_rank", "gene_count_rank"),
edge_bundling = "connections",
apply_direction = NULL,
rotate_heatmap = FALSE,
row_anno_padding = grid::unit(3, "mm"),
column_anno_padding = grid::unit(3, "mm"),
do_plot = TRUE,
verbose = FALSE,
...
)Arguments
- mem
listobject created bymultiEnrichMap(). Specifically the object is expected to containcolorV,enrichIM,memIM,geneIM.- do_which
integervector of plots to produce. Whendo_whichisNULL, then all plots are produced. This argument is intended to help produce one plot from a folio, therefore each plot is referred by the number of the plot, in order.- p_cutoff
numericvalue indicating the enrichment P-value threshold used formultiEnrichMap(), but whenNULLthis value is taken from thememinput, or0.05is used by default.- p_floor
numericvalue indicating the lowest enrichment P-value used in the color gradient on the Enrichment Heatmap.- main
characterstring used as a title on Cnet plots.- use_raster
logicaldefault FALSE, deprecated, whether to use raster heatmaps, passed toComplexHeatmap::Heatmap().Note that
use_raster=TRUEmay produce visual artifacts, and changing this argument is no longer supported
- min_gene_ct, min_set_ct
integervalues passed tomem_gene_path_heatmap(). Themin_gene_ctrequires each set to containmin_gene_ctgenes, andmin_set_ctrequires each gene to be present in at leastmin_set_ctsets.- min_set_ct_each
integerminimum genes required for each set, required for at least one enrichment test.- column_method, row_method
characterarguments passed toComplexHeatmap::Heatmap()which indicate the distance method used to cluster columns and rows, respectively.- exemplar_range
integervector (orNULL) used to create Cnet exemplar plots, using this many exemplars per cluster.- pathway_column_split, gene_row_split
integervalue passed ascolumn_splitandrow_split, respectively, tomem_gene_path_heatmap(), indicating the number of pathway clusters, and gene clusters, to create in the gene-pathway heatmap. When either value isNULLthen auto-split logic is used.- pathway_column_title, gene_row_title
charactervectors passed tomem_gene_path_heatmap()ascolumn_titleandrow_title, respectively. When one value is supplied, it is displayed and centered across all the respective splits. When multiple values are supplied, values are used to the number of splits, and recycled as needed. In that case, repeated values are made unique byjamba::makeNames().- cex.main, cex.sub
numericvalues passed totitle()which size the default title and sub-title in Cnet plots.- row_cex, column_cex
numericcharacter expansion factor, used to adjust the relative size of row and column labels, respectively. A value of1.1will make row font size 10% larger.- color_by_column
logicalindicating whether to colorize the enrichment heatmap columns usingcolorVin the inputmem. This argument is only relevant whendo_whichinclude1.- enrich_im_weight, gene_im_weight
numericvalue between 0 and 1, passed tomem_gene_path_heatmap(), used to apply relative weight to clustering columns and rows, respectively, when combining the gene-pathway incidence matrix with either column enrichment P-values, or row gene incidence matrix data.- colorize_by_gene
logicalpassed tomem_gene_path_heatmap()indicating whether the heatmap body for the gene-pathway heatmap will be colorized using the enrichment colors for each gene.- cluster_color_min_fraction
numericvalue passed tocollapse_mem_clusters()used to determine which enrichment colors to associate with each Cnet cluster.- byCols
charactervector describing how to sort the pathways within Cnet clusters. This argument is passed torank_mem_clusters().- edge_bundling
characterstring passed tojam_igraph()to control edge bundling. The defaultedge_bundling="connections"will bundle Cnet plot edges for genes that share the same pathway connections.- apply_direction
logicalorNULLindicating whether to indicate directionality in themem_enrichment_heatmap()which is the first plot in the series. The defaultapply_direction=NULLwill auto-detect whether there is directionality present in the data, and will setapply_direction=TRUEonly when there are non-NA values that differ from zero.- rotate_heatmap
logicalpassed tomem_gene_path_heatmap()()and only this function, defaultFALSE. It indicates whether to rotate the heatmap to have gene columns and pathway rows. If you find most people tilt their head to read the pathways, it might be preferable.- row_anno_padding, column_anno_padding
grid::unitornumericwhich will be converted to "mm" units. These values control the space between the heatmap body and row/column annotations, respectively, only relevant formem_gene_path_heatmap()(). The value is only applied duringdraw()and cannot be defined in theHeatmapobject itself, which is why it is included here and notmem_gene_path_heatmap()().- do_plot
logicalindicating whether to render each plot. Whendo_plot=FALSEthe plot objects will be created and returned, but the plot itself will not be rendered. This option may be useful to generate the full set of figures in one set, then review each figure one by one in an interactive session.- verbose
logicalindicating whether to print verbose output.- ...
additional arguments are passed to downstream functions. Some useful examples:
setsis passed tomem_gene_path_heatmap()which allows one to define a specific subset of sets to use in the gene-pathway heatmap.cell_sizeis passed tomem_enrichment_heatmap()with the option to define square cell size in the heatmap dotplot. However, the resulting heatmap will be at leastncol * cell_heightwidth, andnrow * cell_size[2]height, in addition to the heights of the title and column labels, and widths of the color key and dendrogram.
Value
list returned using invisible(), containing each
plot object enabled by the argument do_which:
enrichment_hmis a Heatmap object fromComplexHeatmapthat contains the enrichment P-value heatmap. Note that this data is not used directly in subsequent plots, the pathway clusters shown here are based upon-log10(Pvalue)and not the underlying gene content of each pathway. This plot is a useful overview that answers the question "How many pathways are significantly enriched across the different enrichment tests?"gp_hmis a Heatmap object fromComplexHeatmapwith the gene-pathway incidence matrix heatmap. This heatmap and the column/pathway clusters are the subject of subsequent Cnet plots.gp_hm_captionis a text caption that describes the gene and set filter criteria, and the row and column distance methods used for clustering. Because the filtering and clustering options have substantial impact on clustering, and the pathway clusters are the key for all subsequent plots, these values are important to keep associated with the output of this function.clusters_memis alistwith the pathways contained in each pathway cluster shown by the gene-pathway heatmap, obtained byheatmap_column_order(gp_hm). The pathway names should also be present incolnames(mem$memIM)andrownames(mem$enrichIM), for follow-up inspection.cnet_collapsedis anigraphobject with Cnet plot data, where the pathways have been collapsed by cluster, using the gene-pathway heatmap clusters defined inclusters_mem. Each pathway cluster is labeled by cluster name, and the first few pathway names. This data can be plotted usingjam_igraph(cnet_collapsed).cnet_collapsed_setis the same ascnet_collapsedexcept the pathways are labeled by the cluster name only, for examplec("A", "B", "C", "D"). This data can be plotted usingjam_igraph(cnet_collapsed_set).cnet_collapsed_set2is the same ascnet_collapsed_setexcept the gene labels are hidden, useful when there are too many genes to label clearly. The gene symbols are still stored inV(g)$namebut the labels inV(g)$labelare updated to hide the genes. This data can be plotted usingjam_igraph(cnet_collapsed_set2).cnet_exemplarsis alistofigraphCnet objects, each one contains only the number of exemplar pathways from each cluster defined by argumentexemplar_range. By default it uses1exemplar per cluster, then2exemplars per cluster, then3exemplars per cluster. A number of published figures use1exemplar per pathway cluster. This data can be plotted usingjam_igraph(cnet_exemplars[[1]]), which will plot only the firstigraphobject from the list.cnet_clustersis alistofigraphCnet objects, each one contains all the pathways in one pathway cluster. This data can be plotted usingjam_igraph(cnet_clusters[[1]]), or by calling a specific clusterjam_igraph(cnet_clusters[["A"]]).
Details
This function is intended to create multiple summary plots
using the output data from multiEnrichMap(). By default
it creates all plots one by one, sufficient for including
in a multi-page PDF document with cairo_pdf(..., onefile=TRUE)
or pdf(..., onefile=TRUE).
The data for each plot object can be created and visualized later
with argument do_plot=FALSE.
Note: Since version 0.0.76.900 the first step in the workflow is
to cluster the underlying gene-pathway incidence matrix.
This step defines a consistent dendrogram driven by underlying
gene content in each pathway.
The dendrogram is used by each subsequent plot
including the enrichment heatmap.
There are two recommended strategies for visualizing multienrichment results:
Pathway clusters viewed as a concept network (Cnet) plot.
Given numerous statistically enriched pathways, this process defines pathway clusters using the underlying gene-pathway incidence matrix.
Within each pathway cluster, the pathways typically share a high proportion of the same genes, and therefore are expected to represent very similar functions. Ideally, each cluster represents some distinct biological function, or a functional theme.
Benefit: Reducing a large number of pathways to a small number of clusters greatly improves the options for visualization, while retaining a comprehensive view of all genes and pathways involved.
Benefit: This option is recommended when there are numerous pathways, and when including more pathways is beneficial to understanding the overall functional effects of the experimental study.
Limitation: The downside with this approach is that sometimes this comprehensive content can be too much detail to interpret in one figure, overshadowing individual pathways in each cluster.
Limitation: It may be difficult to recognize a functional theme for each pathway cluster, unfortunately that process is not (yet) automated and requires some domain expertise of the pathways and functions involved.
Limitation: It may not be possible for one Cnet plot to represent all functional effects of an experimental study.
Exemplar pathways are viewed as a Cnet plot.
As described above, given numerous statistically enriched pathways, pathways are clustered using the gene-pathway incidence matrix. One "exemplar" pathway is selected from each cluster to represent the typical pathway content in each cluster, usually the most significant pathway in the cluster, but optionally the pathway containing the most total genes.
Benefit: This process can produce a cleaner figure than Option 1 PathwayClusters, because fewer pathways and their associated genes are included in the figure.
Limitation: This cleaner figure is understandably somewhat less comprehensive, and may be subject to bias when selecting exemplar pathways. However the selection of relevant pathways may be very effective within the context of the experimental study.
Benefit: The resulting Cnet plot can often improve focus on specific genes and pathways, which can be advantageous when including numerous "synonyms" for the same or similar pathways is not beneficial.
Benefit: This strategy also works particularly well when there are relatively few enriched pathways, or when argument
topEnrichNused withmultiEnrichMap()was relatively small.
The folio of plots includes:
Enrichment Heatmap (Plot #1), enrichment P-values created using
mem_enrichment_heatmap(). Note that by default, the Gene-Pathway incidence matrix is also created (invisibly) in order to define consistent pathway clusters. Output list name:"enrichment_hm"Gene-Pathway Incidence Matrix Heatmap (Plot #2) is created using
mem_gene_path_heatmap(). This step defines and visualizes the pathway clustering used by all plots in the folio. Output list name:"gp_hm"Cnet Cluster Plot (Plots #3,#4,#5) creates a collapsed Concept network (Cnet) of Genes with Pathway clusters, using
collapse_mem_clusters(), then plotted withjam_igraph().Plot #3 labels the pathway clusters with the first N pathways. Output list name:
"cnet_collapsed"Plot #4 labels the pathway clusters with LETTERS. This file is typically used for other plots. Output list name:
"cnet_collapsed_set"Plot #5 hides all gene labels. Output list name:
"cnet_collapsed_set2"
Cnet Exemplar Plots (Plots #6,#7,#8) creates smaller pathway Cnet plots, as opposed to pathway-cluster Cnets in #3,#4,#5 above, using exemplar pathways from each gene-pathway cluster. Output list name:
"cnet_exemplars"with alistofigraphobjects:Plots #6 includes one exemplar pathway per pathway cluster.
Plots #7 includes two exemplar pathways per pathway cluster.
Plots #8 includes three exemplar pathways per pathway cluster.
Cnet Individual Cluster Plots (Plots #9,#10,#11,etc.) create one pathway Cnet plot per individual pathway cluster, showing only the pathways in that cluster. The number of plots are defined by the number of pathway cluters, usually
pathway_column_split. These plots may be useful to explore pathways in detail within each pathway cluster, for example when there are many pathways which are not well-defined for a particular pathway cluster in the Gene-Pathway heatmap. Output list name"cnet_clusters"
The specific plots to be created are controlled with do_which:
do_which=1will create the enrichment heatmap.do_which=2will create the gene-pathway heatmap.do_which=3will create the Cnet Cluster Plot using pathway cluster labels for each pathway node, by default it usesLETTERS:"A", "B", "C", "D", etc.do_which=4will create the Cnet Cluster Plot using abbreviated pathway labels for each pathway cluster node.do_which=5will create the Cnet Cluster Plot with no node labels.do_which=6begins the series of Cnet Exemplar Plots for each value in argumentexemplar_range, whose default isc(1, 2, 3).do_which=9(by default) begins the series of Cnet individual cluster plots, which includes all pathways from each cluster.
The most frequently used plots are do_which=2 for the
gene-pathway heatmap, and do_which=4 for the collapsed Cnet
plot, where Cnet clusters are based upon the gene-pathway heatmap.
Arguments p_cutoff and min_set_ct_each can be used to
apply more stringent thresholds than the original mem data.
For example, applying p_cutoff=0.05 during multiEnrichMap()
will colorize pathways in mem$enrichIMcolors, however when
calling mem_plot_folio() with p_cutoff=0.001 will use blank
color in the color gradient for pathways that do not
have mem$enrichIM value at or below 0.001.
Our experience is that the pathway clustering does not need to be perfect to be useful and valid. The pathway clusters are valid based upon the parameters used for clustering, and provide insight into the genes that help define each cluster distinct from other clusters. Sometimes the clustering results are more or less effective based upon the type of pattern observed in the data, so it can be helpful to adjust parameters to drill down to the most effective patterns.
Gene-Pathway clustering
The clustering is performed by combining the gene-pathway incidence
matrix mem$memIM with the -log10(mem$enrichIM) enrichment P-values.
The relative weight of each matrix is controlled by
enrich_im_weight, where enrich_im_weight=0 assigns weight=0
to the enrichment P-values, and thus clusters only using the
gene-pathway matrix. Similarly, enrich_im_weight=1 will assign
full weight to the enrichment P-value matrix, and will ignore
the gene-pathway matrix data.
The corresponding weight for gene (rows) is controlled by
gene_im_weight, which balances row clustering with the
mem$geneIM matrix, and the gene-pathway matrix mem$memIM.
The argument column_method defines the distance method,
for example "euclidean" and "binary" are two immediate choices.
The method also adds "correlation" from amap::hcluster() which
can be very useful especially with large datasets.
The number of pathway clusters is controlled by
pathway_column_split, by default when pathway_column_split=NULL
and auto_cluster=TRUE the number of clusters is defined based
upon the total number of pathways. In practice, pathway_column_split=4
or pathway_column_split=3 is recommended, as this number of
clusters is most convenient to visualize as a Cnet plot.
To define your own pathway cluster labels, define pathway_column_title
as a vector with length equal to pathway_column_split. These labels
become network node labels in subsequent plots, and in the
resulting igraph object.
The pathway clusters are dependent upon the genes and pathways
used during clustering, which are also controlled by
min_set_ct and min_gene_ct.
min_set_ctfilters the matrix by the number of times a Set is represented in the matrix, which can be helpful when there are pathways with large number of genes, with some pathways with very low number of genes.min_gene_ctfilters the matrix by the number of times a gene is represented in the matrix. It can be helpful for requiring a gene be represented in more than one enriched pathway.min_set_ct_eachfilters the matrix to require each Set to contain at least this many entries from one enrichment result, rather than using the combined incidence matrix. It is mostly helpful to increase the value used inmultiEnrichMap()argumentmin_count, which already filters pathways for minimum number of genes involved.Note: These filters are only recommended when the gene-pathway matrix is very large, perhaps 100 pathways, or 500 genes.
Cnet pathway clusters
The resulting Cnet pathway clusters are single nodes in the
network, and these nodes are colorized based upon the enrichment
tests involved. The threshold for including the color for
each enrichment test is defined by cluster_color_min_fraction,
which requires at least this fraction of pathways in a
pathway cluster meets the significance criteria for that
enrichment test.
To adjust the coloration filter to include any enrichment
test with at least one significant result, use
cluster_color_min_fraction=0.01.
In the gene-pathway heatmap,
these colors are shown across the top of the heatmap.
The default cluster_color_min_fraction=0.4 requires 40%
of pathways in a cluster for each enrichment test.
Note: Prior to version 0.0.76.900
the enrichment heatmap was clustered only using enrichment
P-values, transformed with log10(Pvalue). The clustering was
inconsistent with other plots in the folio, and was not effective
at clustering pathways based upon similar content, which is the
primary goal of the multienrichjam R package.
See also
Other jam plot functions:
adjust_polygon_border(),
grid_with_title(),
jam_igraph(),
mem_enrichment_heatmap(),
mem_gene_path_heatmap(),
mem_legend(),
mem_multienrichplot(),
plot_layout_scale()