MultiEnrichment Heatmap of Genes and Pathways

mem_gene_path_heatmap(
  mem,
  genes = NULL,
  sets = NULL,
  min_gene_ct = 1,
  min_set_ct = 1,
  min_set_ct_each = 4,
  column_fontsize = NULL,
  column_cex = 1,
  row_fontsize = NULL,
  row_cex = 1,
  row_method = "binary",
  column_method = "binary",
  enrich_im_weight = 0.3,
  gene_im_weight = 0.5,
  gene_annotations = c("im", "direction"),
  annotation_suffix = c(im = "hit", direction = "dir"),
  simple_anno_size = grid::unit(6, "mm"),
  cluster_columns = NULL,
  cluster_rows = NULL,
  cluster_row_slices = TRUE,
  cluster_column_slices = TRUE,
  name = NULL,
  p_cutoff = mem$p_cutoff,
  p_floor = 1e-10,
  row_split = NULL,
  column_split = NULL,
  auto_split = TRUE,
  column_title = LETTERS,
  row_title = letters,
  row_title_rot = 0,
  colorize_by_gene = TRUE,
  na_col = "white",
  rotate_heatmap = FALSE,
  colramp = "Reds",
  column_names_max_height = grid::unit(18, "cm"),
  column_names_rot = 90,
  show_gene_legend = FALSE,
  show_pathway_legend = TRUE,
  show_heatmap_legend = 8,
  use_raster = FALSE,
  seed = 123,
  verbose = FALSE,
  ...
)

Arguments

mem

list object created by multiEnrichMap(). Specifically the object is expected to contain colorV, enrichIM, memIM, geneIM.

genes

character vector of genes to include in the heatmap, all other genes will be excluded.

sets

character vector of sets (pathways) to include in the heatmap, all other sets will be excluded.

min_gene_ct

minimum number of occurrences of each gene across the pathways, all other genes are excluded.

min_set_ct

minimum number of genes required for each set, all other sets are excluded.

min_set_ct_each

minimum number of genes required for each set, required for at least one enrichment test.

column_fontsize, row_fontsize

numeric passed as fontsize to ComplexHeatmap::Heatmap() to define a specific fontsize for column and row labels. When NULL the nrow/ncol of the heatmap are used to infer a reasonable starting point fontsize, which can be adjusted with column_cex and row_cex.

row_method, column_method

character string of the distance method to use for row and column clustering. The clustering is performed by amap::hcluster().

enrich_im_weight

numeric value between 0 and 1 (default 0.3), the relative weight of enrichment -log10 P-value and overall gene-pathway incidence matrix when clustering pathways.

  • When enrich_im_weight=0 then only the gene-pathway incidence matrix is used for pathway clustering.

  • When enrich_im_weight=1 then only the pathway significance (-log10 P-value) is used for pathway clustering.

  • The default enrich_im_weight=0.3 balances the combination of the enrichment P-value matrix, with the gene-pathway incidence matrix.

gene_im_weight

numeric value between 0 and 1 (default 0.5), the relative weight of the mem$geneIM gene incidence matrix, and overall gene-pathway incidence matrix when clustering genes.

  • When gene_im_weight=0 then only the gene-pathway incidence matrix is used for gene clustering.

  • When gene_im_weight=1 then only the gene incidence matrix (mem$geneIM) is used for gene clustering.

  • The default _im_weight=0.5 balances the gene incidence matrix with the gene-pathway incidence matrix, giving each matrix equal weight (since values are typically all (0, 1).

gene_annotations

character string indicating which annotation(s) to display alongside the gene axis of the heatmap. By default it uses "im", "direction", and "direction" is removed when mem$geneIMdirection is not available.

  • "im" displays the gene incidence matrix mem$geneIM using categorical colors defined by mem$colorV.

  • "direction" displays the gene directionality mem$geneIMdirection using colors defined by colorjam::col_div_xf(1.2).

  • When no values are given, the gene annotation is not displayed.

  • When two values are given, the annotations are displayed in the order they are provided.

annotation_suffix

character vector named by values permitted by gene_annotations, with optional suffix to add to the annotation labels. For example it may be helpful to add "hit" or "dir" to distinguish the enrichment labels.

name

character value passed to ComplexHeatmap::Heatmap(), used as a label above the heatmap color legend.

p_cutoff

numeric value of the enrichment P-value cutoff, above which P-values are not colored, and are therefore white. The enrichment P-values are displayed as an annotated heatmap at the top of the main heatmap. Any cell that has a color meets at least the minimum P-value threshold. This value by default is taken from input mem, using mem$p_cutoff, for consistency with the input multienrichment analysis.

column_split, row_split

optional arguments passed to ComplexHeatmap::Heatmap() to split the heatmap by columns or rows, respectively.

  • when row_split is NULL and auto_split=TRUE, it will determine an appropriate number of clusters based upon the number of rows. To turn off row split, use row_split=NULL or row_split=0 or row_split=1; likewise for column_split.

  • when row_split or column_split are supplied as a named vector, the names are aligned with sets to be displayed in the heatmap, and will use the intersect() of the two. When data is clustered, cluster_row_slices=FALSE and cluster_column_slices=FALSE such that the dendrogram will be broken into separate pieces.

column_title

optional character string with title to display above the heatmap.

row_title

optional character string with title to display beside the heatmap. Note when row_split is defined, the row_title is applied to each heatmap section.

row_title_rot

numeric value indicating the rotation of row_title text, where 0 is not rotated, and 90 is rotated 90 degrees.

colorize_by_gene

logical indicating whether to color the main heatmap body using the colors from geneIM which represents each enrichment in which a given gene is involved. Colors are blended using colorjam::blend_colors(), using colors from mem$colorV, applied to mem$geneIM.

na_col

character string indicating the color to use for NA or missing values. Typically this argument is only used when colorize_by_gene=TRUE, where entries with no color are recognized as NA by ComplexHeatmap::Heatmap().

rotate_heatmap

logical indicating whether the entire heatmap should be rotated so that pathway names are displayed as rows, and genes as columns. Notes on how arguments are applied to rows and columns:

  • Column arguments applied to rows: column_split, column_title, cluster_columns, column_fontsize, column_cex are applied to rows since they refer to pathway data;

  • Row arguments applied to columns: row_split, row_title, cluster_rows, row_fontsize, row_cex are applied to columns since they refer to gene data;

  • Arguments applied directly to columns: column_method, column_title_rot are applied directly to heatmap columns since they refer to the output heatmap options.

  • Arguments applied directly to rows: row_method, row_title_rot are applied directly to heatmap rows since they refer to the output heatmap options.

colramp

character name of color, color gradient, or a vector of colors, anything compatible with input to jamba::getColorRamp().

seed

numeric value passed to set.seed() to allow reproducible results, typically with clustering operations.

verbose

logical indicating whether to print verbose output.

...

additional arguments are passed to ComplexHeatmap::Heatmap() for customization. However, if ... causes an error, the same ComplexHeatmap::Heatmap() function is called without ..., which is intended to allow overloading ... for different functions.

Value

Heatmap object defined in ComplexHeatmap::Heatmap(), with two additional attributes:

  • "caption" - a character string with important clustering settings.

  • "draw_caption" - a function that will draw the caption in the bottom-left corner of the heatmap, calling ComplexHeatmap::grid.textbox(). This function should be called with no parameters, for example:

    attr(hm, "draw_caption")()

In addition, the returned object can be interrogated with two helper functions that help define the row and column clusters, and the exact order of labels as they appear in the heatmap.

  1. jamba::heatmap_row_order() - returns a list of vectors of rownames in the order they appear in the heatmap, with list names defined by row split.

  2. jamba::heatmap_column_order() - returns a list of vectors of colnames in the order they appear in the heatmap, with list names defined by row split.

Details

This function takes the mem list output from multiEnrichMap() and creates a gene-by-pathway incidence matrix heatmap, using ComplexHeatmap::Heatmap(). It uses three basic sources of data to annotate the heatmap:

  1. mem$memIM the gene-set incidence matrix

  2. mem$geneIM the gene incidence matrix by dataset

  3. mem$enrichIM the pathway enrichment P-value matrix by dataset

It will try to estimate a reasonable number of column and row splits in the dendrogram, based solely upon the number of columns and rows. These guesses can be controlled with argument column_split and row_split, respectively.

When pathways are filtered by min_gene_ct, min_set_ct, and min_set_ct_each, the order of operations is as follows:

  1. min_set_ct_each, min_set_ct - these filters are applied before filtering genes, in order to ensure all genes are present from the start.

  2. min_gene_ct - genes are filtered after pathway filtering, in order to remove pathways which were not deemed "significant" based upon the required number of genes. Only after those pathways are removed can the number of occurrences of each gene be judged appropriately.