Find recommended overlap threshold for EnrichMap, experimental
Source:R/jamenrich-utils.R
mem_find_overlap.RdFind recommended overlap threshold for EnrichMap, experimental
Usage
mem_find_overlap(
mem,
overlap_range = c(0.1, 0.99),
max_cutoff = 0.4,
adjust = -0.01,
debug = FALSE,
...
)Arguments
- mem
listoutput frommultiEnrichMap()- overlap_range
numericrange of Jaccard overlap values, default0.1, 0.99using step0.01.- max_cutoff
numericvalue between 0 and 1, to define the maximum fraction of nodes in the largest connected component, compared to the total number of non-singlet nodes.- adjust
numericused to adjust the final overlap, default-0.01will use the overlap one step before the max O score.- debug
logicalindicating whether to return full debug data, which is used internally to determine the best overlap cutoff to use.- ...
additional arguments are passed to
mem2emap().
Details
It implements a straightforward approach to determine a reasonable Jaccard overlap threshold for Enrichment Map data, and is still very much open to improvement after more experience using it on varied datasets.
The premise is that two pathways that have Jaccard overlap above a threshold are connected by a network "edge".
With extremely low threshold, most pathways would be connected, even if they have only one gene in common.
With an extremely high threshold, pathways would only be connected if nearly all genes were in common.
A moderate threshold is intended to balance the two extremes.
The aesthetic and biological interesting threshold appears to be dependent upon the type and number of pathways returned from enrichment analysis. For example, immunology pathways may favor a different threshold than metabolic pathways. (Purely hypothetical.)
As a result, this function is intended to find a middle ground based upon the pathway data used for analysis at the time, where some but not all pathways are connected.
The method finds the overlap threshold at which the first connected
component is no more than max_cutoff fraction of the whole
network. This fraction is defined by the number of nodes in the
largest connected component, divided by the total number of
non-singlet nodes.
We found that max_cutoff=0.4, the point at which the
largest connected component contains no more than 40% of all nodes,
seems to be a reasonably good threshold.
See also
Other jam utility functions:
ashape(),
avg_angles(),
avg_colors_by_list(),
cell_fun_bivariate(),
collapse_mem_clusters(),
colorRamp2D(),
curateIPAcolnames(),
deconcat_df2(),
display_colorRamp2D(),
enrichList2geneHitList(),
find_colname(),
find_enrich_colnames(),
get_hull_data(),
get_igraph_layout(),
gsubs_remove(),
handle_igraph_param_list(),
isColorBlank(),
make_legend_bivariate(),
make_point_hull(),
order_colors(),
rank_mem_clusters(),
rotate_coordinates(),
summarize_node_spacing(),
with_ht_opts(),
xyAngle()