Heatmap for SummarizedExperiment data

Usage

heatmap_se(
  se,
  sestats = NULL,
  hm_name = NULL,
  hm_title = NULL,
  rows = NULL,
  row_type = "rows",
  column_type = "samples",
  data_type = "expression",
  correlation = FALSE,
  assay_name = NULL,
  contrast_names = NULL,
  contrast_suffix = "",
  cutoff_name = NULL,
  alt_sestats = NULL,
  alt_assay_name = assay_name,
  alt_contrast_names = NULL,
  alt_contrast_suffix = "",
  alt_cutoff_name = NULL,
  isamples = colnames(se),
  normgroup_colname = NULL,
  centerby_colnames = NULL,
  controlSamples = NULL,
  control_label = "",
  controlFloor = NA,
  naControlAction = c("na", "row", "floor", "min"),
  naControlFloor = 0,
  top_colnames = NULL,
  top_annotation = NULL,
  top_annotation_name_gp = grid::gpar(),
  rowData_colnames = NULL,
  left_annotation = NULL,
  left_annotation_name_gp = grid::gpar(),
  left_annotation_name_rot = 90,
  right_annotation = NULL,
  simple_anno_size = grid::unit(8, "mm"),
  legend_title_gp = grid::gpar(fontsize = 10),
  legend_labels_gp = grid::gpar(fontsize = 10),
  legend_grid_cex = 1,
  row_names_gp = NULL,
  row_split = NULL,
  row_subcluster = NULL,
  row_title_rot = 0,
  sample_color_list = NULL,
  legend_at = NULL,
  legend_labels = NULL,
  subset_legend_colors = TRUE,
  row_cex = 0.8,
  column_cex = 1,
  row_anno_fontsize = 11,
  useMedian = FALSE,
  show_row_names = NULL,
  show_row_dend = length(rows) < 2000,
  mark_rows = NULL,
  mark_labels_gp = grid::gpar(),
  column_title = character(0),
  apply_hm_column_title = FALSE,
  hm_title_buffer = 0,
  show_heatmap_legend = TRUE,
  show_top_legend = TRUE,
  show_left_legend = TRUE,
  legend_border_color = "black",
  show_top_annotation_name = TRUE,
  show_left_annotation_name = TRUE,
  row_label_colname = NULL,
  cluster_columns = FALSE,
  cluster_column_slices = FALSE,
  cluster_rows = function(x, ...) {
     amap::hcluster(jamba::rmNA(naValue = 0, x), ...,
    method = "euclidean", link = "ward")
 },
  cluster_row_slices = FALSE,
  column_names_gp = NULL,
  column_split = NULL,
  column_split_sep = ",",
  color_max = 3,
  color_floor = 0,
  lens = 2,
  rename_contrasts = TRUE,
  rename_alt_contrasts = TRUE,
  use_raster = TRUE,
  verbose = FALSE,
  debug = FALSE,
  ...
)

Arguments

se

SummarizedExperiment by default, or one of the following:

SummarizedExperiment with accessor functions rowData(), colData(), and assays(). It will use values(rowRanges()) if no slot rowData exists.
SingleCellExperiment with accessor functions rowData(), colData(), and assays(). It will use values(rowRanges()) if no slot rowData exists.
Seurat object, which is coerced to SingleCellExperiment and handled accordingly
ExpressionSet or compatible object with accessor functions featureData(), phenoData(), and assayData().

sestats

one of the following types of data:

list output from se_contrast_stats(), which specifically contains hit_array as a 3-dimensional array of hits with dimensions "Cutoffs", "Contrasts", "Signal".
numeric matrix intended to represent an incidence matrix, where a value 0 indicates absence, and non-zero indicates presence. This format is useful for supplying any incidence matrix, such as gene-by-pathway (for example Github package "jmw86069/multienrichjam" provides mem$memIM with gene-by-pathway matrix), or gene-by-class (see Github package "jmw86069/pajam" for examples using ProteinAtlas protein classification, including membrane-bound, secreted, transcription factors, etc.), or any incidence matrix defined by Github "jmw86069/venndir" function list2im_value() or list2im() which converts input to a Venn diagram into an incidence matrix.
When sestats is supplied, data is converted to incidence matrix, then columns are matched with contrast_names. All rows with non-zero entry in those columns are included in the heatmap. When rows is also supplied, then the intersection of incidence matrix rows and rows is displayed in the heatmap.
Note that alt_sestats does not subset rows displayed in the heatmap.

hm_name

character string, or NULL (default) which uses the data_type value. Note that the legend title uses the data_type, and is also used for hm_name when hm_name=NULL. The hm_name is most useful to customize because this string is used as the prefix for grid graphical components, for example seen with ComplexHeatmap::list_components(). When two heatmaps or a HeatmapList is drawn, the names can be used to define specific grid regions of each heatmap. If the heatmaps share the same hm_name then the regions will also have identical name and cannot be addressed distinctly.

hm_title

character string, or NULL (default) which generates a heatmap title using the dimensions, assay_name, data_type, and a string which describes the data centering. When provided as a character string, it is used as-is. (In future this value may accept variable names.)

rows

character vector of rownames(se) to define a specific set of rows to display. When sestats is supplied, then the intersection of rows with genes defined by sestats is displayed. Note that rows are required to be in rownames(se), all other rows are dropped.

row_type

character string used in the title of the heatmap which indicates how many rows are displayed. For example "1,234 genes detected above background" or "1,234 DEGs by limma-voom". When row_type="" or row_type=NULL this information is not included in the heatmap title.

column_type

character string used in the title of the heatmap which indicates how many column are displayed. For example "12 samples" or "12 biological replicates". When column_type="" or column_type=NULL this information is not included in the heatmap title.

data_type

character string used as title of the heatmap color gradient legend, for example "expression" indicates the data contains gene expression measurements. Notes:

The prefix "centered" is automatically appended whenever the data is also centered for the heatmap. Set centerby_colnames=FALSE to display data that is not centered.
The prefix "correlation of" is automatically appended when correlation=TRUE which displays correlation of whatever data is included in the heatmap.

correlation

logical indicating whether to calculate sample correlation, and plot a sample-by-sample correlation heatmap. This option is included here since many of the same arguments are required for data centering, and sample annotations. Note that color_max is forced to a maximum value of 1.0, representing the maximum correlation value.

assay_name

character string indicating the name in assays(se) to use for data to be displayed in the heatmap.

When multiple assay_name values are supplied, the first assay_name that matches names(assays(se)) will be used in the heatmap. In this way, multiple assay_names can be supplied to define statistical hits in sestats, which calls hit_array_to_list() to combine hits across assay_name entries; but only the first assay_name found in se is used for the heatmap values.
When there is only one value for assayNames(se), then assay_name will default to this value, instead of acting like it couldn't possibly know what was intended. Haha.
Lastly, assay_name can be a numeric index, helpful in case assays(se) contains no names - not recommended but it can happen.

contrast_names

character vector of contrasts in sestats$hit_array to use for the heatmap. When contrast_names=NULL then all contrasts are displayed, which is the default.

contrast_suffix

character string with optional suffix to append to the end of each contrast name label for sestats hit incidence matrix beside the heatmap. This suffix may be useful when comparing two methods for the same set of contrast names, with sestats and alt_sestats.

cutoff_name

character or integer index used to define the specific statistical cutoffs to use from sestats$hit_array. This argument is passed to hit_array_to_list() as cutoff_names.

alt_sestats, alt_assay_name, alt_contrast_names, alt_contrast_suffix

arguments analogous to those described above for sestats which are used when alt_sestats is supplied.

isamples

character vector of colnames(se) used to visualize a subset of samples used for the data centering step. Note that data centering uses all columns supplied in se, and after centering, the subset of columns defined in isamples is displayed in the heatmap. This distinction makes it possible to center data by some control group, then optionally not display the control group data.

normgroup_colname

character vector of colnames in colData(se) used during data centering. When supplied, samples are centered independently within each normgroup grouping. These values are equivalent to using centerby_colnames.

centerby_colnames

either:

character vector of colnames in colData(se) used during data centering. When supplied, samples are centered independently within each centerby grouping. It is typically used for things like cell lines, to center each cell line by a time point control, or untreated control.
NULL to perform centering across all columns in se.
FALSE to disable centering.

controlSamples

character optional vector of samples to use as the reference during data centering. Note that samples are still centered within each normgroup and centerby grouping, and within that grouping samples are centered to the controlSamples which are present in that grouping. Any center group for which no samples are defined in controlSamples will use all samples in that center group. Typically, controlSamples is used to define a specific group as the reference for centering, so changes are displayed relative to that group. Make sure to define control_name to include an appropriate label in the heatmap title.

control_label

character string used in heatmap title to describe the control used during data centering, relevant when controlSamples is also supplied. Recommended format: "versus Wildtype" or "vs. Wildtype". The heatmap title will include data centering and control_label in this format: "centered within {centerby_colnames}, {control_label}", for example "centered within Genotype/Time, versus Vehicle".

controlFloor, naControlAction, naControlFloor

passed to jamma::centerGeneData() to customize data centering.

controlFloor imposes an optional noise floor to control group mean/median values, so the summary value during centering is at least controlFloor. Useful for defining an effective noise floor for a platform technology.
naControlAction defines the action taken only when values for all control samples are NA.
naControlFloor is a numeric value used when naControlAction="floor", which causes the group reference value to use the value provided in naControlFloor.

top_colnames

one of the following types:

character vector of colnames to use from colData(se) as annotations to display in top_annotation above the heatmap.
NULL, will call choose_annotation_colnames() to detect reasonable colnames: columns with more than one unique value; columns with at least one duplicated value.
FALSE will hide the top_colnames, which also occurs when colData(se) is empty.

top_annotation

specific heatmap annotation as defined by ComplexHeatmap::HeatmapAnnotation(). When supplied, the top_colnames described above is not used.

top_annotation_name_gp

grid::gpar object to customize the annotation name displayed beside the top annotation.

rowData_colnames

character vector of colnames in rowData(se) to use for heatmap annotations displayed on the left side of the heatmap. Specific colors can be included in sample_color_list as a named list of color vectors or color functions. The names of this list must match colnames to be displayed, otherwise ComplexHeatmap::Heatmap() will define its own color function.

left_annotation

specific heatmap annotation as defined by ComplexHeatmap::rowAnnotation(). When supplied, the rowData_colnames and sestats row annotations are not displayed. In order to supply custom row annotations and not lose left_annotation defined above, supply the row annotations as right_annotation.

left_annotation_name_gp

grid::gpar object to customize the annotation name displayed beside the left annotation.

left_annotation_name_rot

numeric rotation of left annotation label, in degrees, where 0 indicates normal text, and 90 is rotated vertically.

right_annotation

specific heatmap annotation as defined by ComplexHeatmap::HeatmapAnnotation(). This element is created automatically when mark_rows is supplied.

simple_anno_size

grid::unit size used to define heatmap annotation sizes (height or width of each line) for any simple annotations.

legend_title_gp

grid::gpar to customize the legend title fonts, applied to each legend: top annotation, left annotation, main heatmap.

legend_labels_gp

grid::gpar to customize the legend label fonts, applied to each legend: top annotation, left annotation, main heatmap.

legend_grid_cex

numeric multiplied to adjust the relative size of each legend grid unit, applied to each relevant metric.

row_names_gp

gpar to define custom column name settings. When "fontsize" is not defined, the automatic font size calculation is added to the row_names_gp supplied.

row_split

is used to define heatmap split by row, ultimately passed to ComplexHeatmap::Heatmap() argument row_split. However, the input type can vary:

integer number of row splits based upon row clustering. If row_split is greater than the number of rows, it will be set to the number of rows.
character value or values in colnames of rowData(se) to split using row annotation in se.
data.frame whose rownames() must contain all rows to be displayed in the heatmap. This argument is passed directly to ComplexHeatmap::Heatmap() to apply the split appropriately.
character or factor vector named by rownames(se) with another custom row split, passed directly to ComplexHeatmap::Heatmap() argument row_split, with proper order for rows being displayed

row_subcluster

integer or character vector representing one or more elements returned by row_split to use as a drill-down sub-cluster heatmap. This argument is experimental, and is intended to make it easy to "drill down" into specific row clusters.

The process internally creates a full heatmap using all arguments as defined, then extracts the jamba::heatmap_row_order() which contains row split data in a list of rownames vectors. The list elements that match row_subcluster are extracted and used again for a subsequent heatmap, and are displayed in the same order in which they appear in the original full heatmap - which means cluster_rows=FALSE is defined at this point. However row_split is retained for this subset of rows, to indicate the original row split annotation.
Note that row_subcluster must match the names() returned by jamba::heatmap_row_order() for the full heatmap, or should include a numeric index for the list element or elements to use.
In principle this process would be run in two stages: First, view a heatmap with row_split=6, then re-run the same heatmap with row_subcluster=4 to see cluster number 4 from the full heatmap.

row_title_rot

numeric value indicating text rotation in degrees to use for row titles.

sample_color_list

named list of color vectors or color functions, where names correspond to colnames in either colData(se) or rowData(se), and which are passed to corresponding left or top annotation functions. When colors are not defined, ComplexHeatmap::Heatmap() will define colors using its own internal function.

legend_at, legend_labels

numeric and character, respectively, to define custom values for the heatmap color gradient legend.

When legend_at is supplied, it is used as provided.
When legend_labels is supplied, it is used only when its length equals length(legend_at), in which case it is used as provided.
When centerby_colnames=FALSE and the matrix data does not contain negative values, legend_at uses integers from 0 to color_max, to avoid presenting a color legend with unnecessary negative values. However, when color_max <= 1 it uses pretty(c(0, color_max)), removing extraneous values, then ensuring the maximum value is color_max. For example when color_max=0.85, the legend_at is likely to be c(0, 0.2, 0.4, 0.6, 0.8, 0.85).
When centerby_colnames is not FALSE, and/or data contains negative values, the legend_at is symmetric above and below zero. When color_max <= 1 the label is created using pretty(c(-color_max, color_max)), as described above, so color_max is used as the minimum and maximum value. When color_max > 1 the legend_at uses integer steps.
When color_max <= 1 the legend_labels are presented as-is with no transformation.
When color_max > 1 the legend_labels are transformed with exp2signed(x) which is the inverse of log2(1 + x). This inverse tranform displays normal space values, in the case of centered data, the values represent normal space fold changes. For example the legend_at=c(-2, -1, 0, 1, 2) would result in legend_labels=c("-4", "-2", "1", "2", "4").
When correlation=TRUE the legend_labels by default use legend_at, following rules for color_max <= 1 above. Otherwise, legend_labels values inverse transformed from log2(1 + x) in order to display normal space fold change values,
To override any of this behavior, supply both legend_at and corresponding legend_labels.

subset_legend_colors

logical indicating whether to subset colors shown in the color key defined by sample_color_list, which is useful when the heatmap only represents a subset of categorical color values.

When subset_legend_colors == TRUE, the color key will only include colors shown in the top_annotation.
When subset_legend_colors == FALSE all colors defined in sample_color_list will be included for each relevant column.

row_cex, column_cex

numeric values used to adjust the row and column name font size, relative to the automatic adjustment that is already done based upon the number of rows and columns being displayed.

row_anno_fontsize

numeric base font size for row annotation labels. This value is only used when left_annotation_name_gp is not supplied. Note these labels appears underneath row annotations, alongside column labels, and therefore they are also adjusted by multiplying column_cex so these labels are adjusted together.

useMedian

logical passed to jamma::centerGeneData() during data centering.

show_row_names, show_row_dend

logical indicating whether to display row names, and row dendrogram, respectively. With more than 2,000 rows this step can become somewhat slow.

mark_rows

character vector of values in rownames(se) that should be labeled using ComplexHeatmap::anno_mark() in call-out style. Usually this argument is used when show_row_names=FALSE, hiding the row labels, but is not required. Values in mark_rows are intersected with rows displayed in the heatmap, therefore only matching entries will be labeled.

mark_labels_gp

grid::gpar to customize the font used by labels when mark_rows is supplied.

column_title

character optional title to include at the top of the heatmap. It can include a single value, or multiple values representing each column_split in the order they appear.

Note: This argument is ignored when apply_hm_column_title=TRUE.
When column_title=character(0) (default) or column_title="", the ComplexHeatmap::Heatmap() uses its usual default behavior, which is to assign column_title using column_split values when they are being used.

apply_hm_column_title

logical (default FALSE) whether to apply the heatmap title to column_title. This option makes it convenient to display the title atop the heatmap without additional effort, however it hides any other column_title created by using column_split. When using both column_split and apply_hm_column_title=TRUE it may be useful to call heatmap_column_group_labels().

hm_title_buffer

numeric number of whitespace lines to add to the heatmap title (attr(hm, "hm_title") between the title and the heatmap below it. This whitespace can be useful when also calling heatmap_column_group_labels(), to provide enough space to draw the additional annotations.

show_heatmap_legend, show_left_legend, show_top_legend

logical indicating whether each legend should be displayed. Sometimes there are too many annotations, and the color legends can overwhelm the figure. Note that show_left_legend is applied in a specific order, with these rules:

show_left_legend is extended to at least length 2, then values are used in order for: sestats, rowData_colnames, in order, using whichever is defined.
If sestats is defined, the first value in show_left_legend is used for this annotation, then the remaining values are used for rowData_colnames. Setting the first show_left_legend value to FALSE will ensure the legend for sestats is not displayed.
If rowData_colnames is defined, then the remaining values in show_left_legend are recycled for all columns in rowData_colnames, and applied in order. In this way, individual columns can have the legend displayed or hidden.
If alt_sestats is defined, the legend is always hidden, in favor of showing only the legend for sestats without duplicating this legend.

legend_border_color

character color used as border color tofor be used as a border color for the various legend colors. Note this argument recognizes only the first color provided, and does not recycle different colors across the various legend borders.

show_top_annotation_name, show_left_annotation_name

logical indicating whether to display the annotation name beside the top and left annotations, respectively.

row_label_colname

character string used as a row label, where this value is a colname in rowData(se). It is useful when rownames are some identifier that is not user-friendly, and where another column in the data may provide a more helpful label, for example "SYMBOL" to display gene symbol instead of accession number.

cluster_columns, cluster_rows

logical indicating whether to cluster columns by hierarchical clustering; or function with a specific function that produces hclust or dendrogram output, given a numeric matrix. Note that cluster_rows default will replace NA values with zero 0 to avoid errors with missing data, and uses amap::hcluster() by default which is a one-step compiled process to perform distance calculation and hierarchical clustering.

column_names_gp

gpar to define custom column name settings. When "fontsize" is not defined, the automatic font size calculation is added to the column_names_gp supplied.

column_split

character or integer vector used to define heatmap column split.

column_split_sep

character string used as delimited when column_split defines multiple split levels.

color_max

numeric value passed to colorjam::col_div_xf() which defines the upper limit of color gradient used in the heatmap.

color_floor

numeric value passed to colorjam::col_div_xf() argument floor which defines the minimum non-zero numeric value for a color to be applied. This option is available to prevent coloring values below the color_floor which can be useful in some circumstances.

lens

numeric value passed to colorjam::col_div_xf() to control the intensity of color gradient applied to the numeric range.

rename_contrasts, rename_alt_contrasts

logical (default TRUE) whether to rename long contrast names in sestats and alt_sestats using contrast2comp().

use_raster

logical passed to ComplexHeatmap::Heatmap() to determine whether heatmaps should be converted to raster images, which effectively turns each heatmap panel into a single graphical object. Recommend use_raster=TRUE and also installing R package magick which greatly enhances speed and quality of rasterized heatmap output. When magick is not available, it may be best to use use_raster=FALSE. When use_raster=FALSE each pixel square of a heatmap is its own graphical object. For heatmaps with very large dimensions, having each pixel as an object can make the heatmap extremely large in memory, and sometimes pixels can overlap others because the minimum pixel size of the output graphics device does not reflect the actual size of each pixel.

verbose

logical indicating whether to print verbose output.

debug

logical indicating debug mode, data is returned in a list:

hm object ComplexHeatmap::Heatmap
top_annotation object ComplexHeatmap::HeatmapAnnotation for columns
left_annotation object ComplexHeatmap::HeatmapAnnotation for rows
hm_title object character string with the heatmap title.

...

additional arguments are passed to supporting functions.

Details

Note: Still a work in progress. This function is the basis for the majority of heatmaps created for Omics data.

This function is a bold attempt to simplify the intricate task of creating an expression heatmap, using ComplexHeatmap::Heatmap(), given a SummarizedExperiment object.

It attempts to enable:

selection of assays(se) to use in the heatmap
use of rowData(se) or colData(se) to produce row and column annotations, respectively.
re-use of defined colors for annotations, see platjam::design2colors()
define and adjust heatmap color gradient and scale
data centering by row: versus all columns, or specific controls, optionally within independent centering groups
filtering rows to show only the statistical hits
display annotation of statistical hits beside the heatmap
split rows or columns using rowData(se) and colData(se), respectively
heatmap title to display key options used, for easy reference

Additional Features

data centering can be disabled with centerby_colnames=FALSE.
alternative hits can be displayed using alt_sestats. It does not subset heatmap rows, it inherits rows from sestats.
display a subset of columns after row centering, useful to hide the control group for certain figures.
option to display correlation heatmap, using the same data centering, then calculates Pearson correlation across sample columns.
labels and legend grids can be customized to exact sizes with grid::gpar() and grid::unit() definitions, for manuscript figures.
mark annotations option to label a subset of rows
row subclusters can be visualized using row_subcluster to drill down into specific subclusters from hierarchical clustering, k-means clustering, or any row_split.

Data Centering

The intent is to display expression values from assays(se), centered across all columns, or with customization defined by centerby_colnames and normgroup_colnames. The resulting centered data can be subsetted by argument isamples, which occurs after centering in order to decouple the centering step from the display of resulting data. To subset samples involved in centering itself, either subset the input se data, or supply controlSamples to define a subset of samples used as the baseline in centering. See jamma::centerGeneData() for more details.

Paired data, also called repeated measures data, can be visualized by including the pairing as centerby_colnames so that centering is calculated within each pairing subgroup. In this case if also using controlSamples to define a "time zero" or "baseline", then all baseline samples will have exactly zero, if there is only one replicate per pairing group at the baseline. In this case, it may be useful to create the full heatmap once to confirm the centering is performing as intended, then create a second heatmap using isamples to show only the non-baseline samples - thus removing the large chunk of values with 0.

Note: data centering can be disabled with centerby_colnames=FALSE.

Heatmap Title

A heatmap title is returned as an attribute attr(hm, "hm_title"), which describes:

total rows displayed, with row_type indicating the measured entity (gene, probe, DEGs, etc.)
total columns displayed, with column_type indicating the sampled entity (samples, total replicates, etc.)
the assay_name for the data being displayed
relevant options for data centering, for example "global-centered" (by default) or "Centered within Cell Line, versus Wildtype"

To include the heatmap title:

ComplexHeatmap::draw(hm, column_title=attr(hm, "hm_title))

Top and Left Annotations

The top heatmap annotations use colData(se) with user-supplied top_colnames or by auto-detecting those colnames that apply to multiple colnames(se). Colors can be supplied using argument sample_color_list, as described below.

The an incidence matrix of statistical hits can be displayed on the left of the heatmap, using arguments sestats and alt_sestats. These arguments can accept either the output of se_contrast_stats(), or they can be a numeric matrix with values c(-1, 0, 1), indicating statistical hits down, no change, and up, respectively. The contrasts can optionally be subset with contrast_names, which corresponds to columns in the matrix if supplied in that format.

When sestats is supplied, it will subset all heatmap rows to include only rows with at least one non-zero value in the incidence matrix. If argument rows is supplied, then all rownames(se) matching rows are displayed, regardless of statistical hits.

For comparison across other sestats results, argument alt_sestats is treated similar to sestats except that the heatmap is not subset based upon these values. That means the heatmap will be subset to match hits defined by sestats but not alt_sestats. The alt_sestats incidence matrix is displayed to the far left of the sestats incidence matrix. For clarity, it can be useful to add alt_sestats_suffix to add a suffix to each contrast label, for example if sestats represents limma hits, use sestats_suffix=" limma", and if alt_sestats represents limma-voom hits, use alt_sestats_suffix=" limmavoom".

Argument rowData_colnames can be supplied, which enables display of rowData(se) annotations in the left_annotation of the heatmap. Colors can be supplied using argument sample_color_list.

Argument sample_color_list is a list named by each annotation column to be displayed as top or left annotation. Each list element is either:

a character vector of R colors named by character value, or
a function defined by circlize::colorRamp2() to be applied for numeric column values. In this case the breaks used to define the color function are used to define the color legend.

The function platjam::design2colors() can be used to create sample_color_list starting with a data.frame of annotations, and will soon be moved into this package.

A custom left_annotation can be supplied, but this method currently prevents the other annotations described above from being displayed. To display automated annotations with rowData_colnames and custom row annotations, supply custom annotations with right_annotation. Note that annotations must be supplied in exact row order, which is usually easiest when supplying rows with specific set of rows.

Compatible Input Formats

Data provided in se is expected to be SummarizedExperiment, however other Bioconductor data types are accepted that provide accessor functions: featureData(), phenoData(), and assayData(), including for example the "MethyLumiSet" class.

Note that matrix input is currently not supported, however it can be converted to SummarizedExperiment like this:

se <- SummarizedExperiment::SummarizedExperiment(
   assays=list(data=matrix),
   rowData=data.frame(Gene=rownames(matrix)),
   colData=data.frame(Sample=colnames(matrix)))

Examples

se <- make_se_test(nrow=1000, ngroups=4, nreps=8)

# optionally define factor levels to force the order of labels
SummarizedExperiment::rowData(se)$Class <- factor(
   sample(head(LETTERS, 5), size=nrow(se), replace=TRUE))

# basic heatmap
hm <- heatmap_se(se, rowData_colnames="Class")

# draw by printing hm, or call draw() to add useful options
ComplexHeatmap::draw(hm,
   column_title=attr(hm, "hm_title"),
   merge_legends=TRUE)


# define specific colors
sample_color_list <- list(
   group=colorjam::group2colors(
      unique(SummarizedExperiment::colData(se)$group)),
   Class=colorjam::group2colors(
      unique(SummarizedExperiment::rowData(se)$Class)))

heatmap_se(se,
   rowData_colnames="Class",
   sample_color_list=sample_color_list)


# split rows by "Class"
heatmap_se(se,
   rowData_colnames="Class",
   row_split="Class",
   sample_color_list=sample_color_list)


# let's have some fun now
hm2 <- heatmap_se(se,
   column_split=c("group"),
   column_title_rot=90,
   row_split=c("Class"),
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
hm2drawn <- ComplexHeatmap::draw(hm2, merge_legends=TRUE)


# as an example, extract the row order
# technically you should use hm2drawn, but usually hm2 is enough
hro <- jamba::heatmap_row_order(hm2drawn);
jamba::sdim(hro)
#>   rows     class
#> A  196 character
#> B  210 character
#> C  199 character
#> D  215 character
#> E  180 character
lapply(hro, head, 7)
#> $A
#>   row_0620   row_0676   row_0417   row_0582   row_0858   row_0730   row_0847 
#> "row_0620" "row_0676" "row_0417" "row_0582" "row_0858" "row_0730" "row_0847" 
#> 
#> $B
#>   row_0546   row_0935   row_0996   row_0091   row_0172   row_0636   row_0225 
#> "row_0546" "row_0935" "row_0996" "row_0091" "row_0172" "row_0636" "row_0225" 
#> 
#> $C
#>   row_0966   row_0152   row_0324   row_0863   row_0535   row_0342   row_0450 
#> "row_0966" "row_0152" "row_0324" "row_0863" "row_0535" "row_0342" "row_0450" 
#> 
#> $D
#>   row_0959   row_0934   row_0162   row_0336   row_0911   row_0274   row_0603 
#> "row_0959" "row_0934" "row_0162" "row_0336" "row_0911" "row_0274" "row_0603" 
#> 
#> $E
#>   row_0331   row_0657   row_0930   row_0200   row_0378   row_0497   row_0133 
#> "row_0331" "row_0657" "row_0930" "row_0200" "row_0378" "row_0497" "row_0133" 
#> 
# (the names will differ from values when `row_labels` are customized)

# center by WildType samples
# - controlSamples
# - control_label
hm2 <- heatmap_se(se,
   controlSamples=rownames(subset(
      SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   column_split=c("group"),
   column_title_rot=90,
   row_split=c("Class"),
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
hm2drawn <- ComplexHeatmap::draw(hm2,
   column_title=attr(hm2, "hm_title"),
   merge_legends=TRUE)


# add "callout" labels for a subset of rows
mark_rows <- c(sample(jamba::heatmap_row_order(hm2drawn)[[1]], size=5),
   sample(jamba::heatmap_row_order(hm2drawn)[[1]], size=3));

# turn off ComplexHeatmap warning when using RStudio
ComplexHeatmap::ht_opt(message=FALSE)

hm3 <- heatmap_se(se,
   mark_rows=mark_rows,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   column_split=c("group"),
   column_title_rot=90,
   row_split=c("Class"),
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
ComplexHeatmap::draw(hm3,
   column_title=attr(hm3, "hm_title"),
   merge_legends=TRUE)


# sestats can accept list, incidence matrix, hit_array, or sestats
# this example defines random set of hits
sestats_list <- list(
   contrast1=setNames(sample(c(1, -1), replace=TRUE, size=50),
      sample(rownames(se), size=50)),
   contrast2=setNames(sample(c(1, -1), replace=TRUE, size=50),
      sample(rownames(se), size=50)))
hm4 <- heatmap_se(se,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   sestats=sestats_list,
   column_split=c("group"),
   row_split=c("Class"),
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
ComplexHeatmap::draw(hm4,
   column_title=attr(hm4, "hm_title"),
   merge_legends=TRUE)


# it doesn't take much effort to run stats really quick
sedesign <- groups_to_sedesign(SummarizedExperiment::colData(se)[, "group", drop=FALSE])
contrast_names(sedesign) <- jamba::vigrep("-groupA", contrast_names(sedesign))
sestats <- se_contrast_stats(se=se,
   fold_cutoff=4,
   sedesign=sedesign, assay_name="counts")
hm4s <- heatmap_se(se,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   sestats=sestats,
   column_split=c("group"),
   row_split=6,
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
ComplexHeatmap::draw(hm4s,
   column_title=attr(hm4s, "hm_title"),
   merge_legends=TRUE)


# for fun, "drill down" into cluster 5
hm4s_4 <- heatmap_se(se,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   sestats=sestats,
   column_split=c("group"),
   row_split=6,
   row_subcluster=4,
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
#> Warning: The heatmap has not been initialized. You might have different results
#> if you repeatedly execute this function, e.g. when row_km/column_km was
#> set. It is more suggested to do as `ht = draw(ht); row_order(ht)`.
ComplexHeatmap::draw(hm4s_4,
   column_title=attr(hm4s_4, "hm_title"),
   merge_legends=TRUE)



# sestats can be provided as an incidence matrix
if (jamba::check_pkg_installed("venndir")) {
# convert sestats to list
sestats_hitlist <- hit_array_to_list(sestats)
# convert sestats hitlist to incidence matrix
# - for fun, use only the first two contrasts
sestats_hitim <- venndir::list2im_value(sestats_hitlist[1:2])
print(head(sestats_hitim));

# convert sestats_list to signed incidence matrix
sestats_im <- venndir::list2im_value(sestats_list)
print(head(sestats_im, 10));
# if the list has items (no direction) use venndir::list2im_opt()

hm5 <- heatmap_se(se,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   sestats=sestats_hitim,
   column_split=c("group"),
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
ComplexHeatmap::draw(hm5,
   column_title=attr(hm5, "hm_title"),
   merge_legends=TRUE)
}
#>          groupB-groupA groupC-groupA
#> row_0022            -1            -1
#> row_0030            -1             0
#> row_0066             1             0
#> row_0075             1             0
#> row_0080            -1             0
#> row_0087             1             0
#>          contrast1 contrast2
#> row_0154        -1         0
#> row_0149         1         0
#> row_0267        -1         0
#> row_0839        -1         0
#> row_0753         1         0
#> row_0818         1         0
#> row_0032         1         0
#> row_0545        -1         0
#> row_0052        -1         0
#> row_0851        -1         0



# customize column label fonts using column_names_gp
column_bold <- ifelse(
   SummarizedExperiment::colData(se)$group %in% "groupA",
   2, 1);
hm6 <- heatmap_se(se,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs WildType",
   column_names_gp=grid::gpar(col=sample_color_list$group[
      as.character(SummarizedExperiment::colData(se)$group)],
      font=column_bold),
   column_split=c("group"),
   row_split=c("Class"),
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
ComplexHeatmap::draw(hm6,
   column_title=attr(hm6, "hm_title"),
   merge_legends=TRUE)


# correlation=TRUE, any heatmap becomes a sample correlation heatmap
hm6corr <- heatmap_se(se,
   correlation=TRUE,
   apply_hm_column_title=TRUE,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   column_names_gp=grid::gpar(col=sample_color_list$group[
      as.character(SummarizedExperiment::colData(se)$group)],
      font=rep(c(1, 2, 1), c(3, 5, 24))),
   column_split=c("Group"),
   sample_color_list=sample_color_list)
ComplexHeatmap::draw(hm6corr,
   merge_legends=TRUE)


## Final heatmap:
# 1. Applies heatmap title automatically.
# 2. Hides the top_colnames
# 3. Adds fancy grouped labels above the heatmap.
#
# apply_hm_column_title=TRUE
#    convenient way to define a title,
#    but it does not also display column_split labels
#
# hm_title_buffer=4
#    convenient way to insert some whitespace lines
#
# heatmap_column_group_labels()
#    adds to a drawn heatmap - it must already be drawn
#
SummarizedExperiment::colData(se)$Genotype <- rep(c("WT", "KO"), each=16);
SummarizedExperiment::colData(se)$Treatment <- rep(c("Control", "Dex"), each=8);
hm7 <- heatmap_se(se,
   apply_hm_column_title=TRUE,
   hm_title_buffer=3,
   controlSamples=rownames(
      subset(SummarizedExperiment::colData(se), group %in% "groupA")),
   control_label="vs groupA",
   sestats=sestats_list,
   top_colnames=FALSE,
   column_split=c("group"),
   row_split=c("Class"),
   rowData_colnames=c("Class"),
   cluster_row_slices=FALSE,
   sample_color_list=sample_color_list)
hm7_drawn <- ComplexHeatmap::draw(hm7,
   merge_legends=TRUE)

# now add fancy labels
heatmap_column_group_labels(
   hm_group_list=c("Treatment", "Genotype"),
   se=se,
   hm_drawn=hm7_drawn)

# Note: this step does not work consistently inside RStudio plot pane,
# in that case call dev.new() then run the step above to create hm7_drawn,
# then repeat the step below
#
# adjust the height of labels with argument y_offset_lines
# with positive values (upward), or negative values (downward).