R/jamenrich-topenrich.R
topEnrichBySource.Rd
Subset enrichList for top enrichment results by source
Subset enrichList for top enrichment results by source
topEnrichBySource(
enrichDF,
n = 15,
min_count = 1,
p_cutoff = 1,
sourceColnames = c("gs_cat", "gs_subcat"),
sortColname = c("P-value", "pvalue", "qvalue", "padjust", "-GeneRatio", "-Count",
"-geneHits"),
countColname = c("gene_count", "count", "geneHits"),
pvalueColname = c("P.Value", "pvalue", "FDR", "adj.P.Val", "qvalue"),
directionColname = c("activation.z.{0,1}score", "z.{0,1}score"),
direction_cutoff = 1,
newColname = "EnrichGroup",
curateFrom = NULL,
curateTo = NULL,
sourceSubset = NULL,
sourceSep = "_",
subsetSets = NULL,
descriptionColname = c("Description", "Name", "Pathway"),
nameColname = c("ID", "Name"),
descriptionGrep = NULL,
nameGrep = NULL,
verbose = FALSE,
...
)
topEnrichListBySource(
enrichList,
n = 15,
min_count = 1,
p_cutoff = 1,
sourceColnames = c("gs_cat", "gs_subcat"),
sortColname = c("P-value", "pvalue", "qvalue", "padjust", "-GeneRatio", "-Count",
"-geneHits"),
countColname = c("gene_count", "count", "geneHits"),
pvalueColname = c("P.Value", "pvalue", "FDR", "adj.P.Val", "qvalue"),
directionColname = c("activation.z.{0,1}score", "z.{0,1}score"),
direction_cutoff = 1,
newColname = "EnrichGroup",
curateFrom = NULL,
curateTo = NULL,
sourceSubset = NULL,
sourceSep = "_",
subsetSets = NULL,
descriptionColname = c("Description", "Name", "Pathway"),
nameColname = c("ID", "Name"),
descriptionGrep = NULL,
nameGrep = NULL,
verbose = FALSE,
...
)
data.frame
containing enrichment results.
integer
maximum number of pathways to retain,
after applying min_count
and p_cutoff
thresholds
if relevant.
integer
minimum number of genes involved
in an enrichment result to be retained, based upon values
in countColname
.
numeric
value indicating the enrichment
P-value threshold, pathways with enrichment P-value at
or below this threshold are retained, based upon values
in pvalueColname
.
character vector of colnames in
enrichDF
to consider as the "Source"
. Multiple
columns will be combined using delimiter argument
sourceSep
. When sourceColnames
is NULL or
contains no colnames(enrichDF)
, then data
is considered "All"
.
character vector indicating the colnames
to use to sort data, prior to selecting the top n
results by source. This argument is passed to
jamba::mixedSortDF(x, byCols=sortColname)
. Columns
can be sorted in reverse order by using the prefix "-"
,
as described in jamba::mixedSortDF()
.
character
vector of possible colnames
in enrichDF
that should contain the integer
number
of genes involved in enrichment. This vector is
passed to find_colname()
to find an appropriate
matching colname in enrichDF
.
character
vector of possible colnames
in enrichDF
that should contain the enrichment P-value
used for filtering by p_cutoff
.
new column name to use when sourceColname
matches multiple colnames in enrichDF
. Values for each
row are combined using jamba::pasteByRow()
.
character vectors with
pattern,replacement values, passed to gsubs()
to allow some editing of values. The default values
convert MSigDB canonical pathways from the prefix "CP:"
to use "CP"
which has the effect of combining all
canonical pathways before selecting the top n
results.
character vector with a subset of
sources to retain. If there are multiple colnames in
sourceColnames
, then column values are combined
using jamba::pasteByRow()
and delimiter sourceSep
,
prior to filtering.
character string used as a delimiter
when sourceColnames
contains multiple colnames.
character vectors
indicating the colnames to consider description and name,
as returned from find_colname()
. These arguments are
used only when descriptionGrep
or nameGrep
are
supplied.
character vector of patterns, used
to filter pathways to those matching one or more patterns.
This argument is used to help extract a specific subset
of pathways of interest using keywords.
The descriptionGrep
argument searches only descriptionColname
;
the nameGrep
argument searches only nameColname
.
logical indicating whether to print verbose output.
additional arguments are ignored.
list
of enrichDF
entries, each passed
to topEnrichBySource()
.
data.frame
subset up to topEnrichN
rows, after
applying optional min_count
and p_cutoff
filters.
This function takes one enrichResult
object, or
a data.frame
of enrichment results, and determines the
top n
number of pathways sorted by P-values, within
each pathway source. This function may optionally require
min_count
genes in each pathway, and p_cutoff
maximum
enrichment P-value, prior to taking the top topEnrichN
entries. The default arguments do not apply filters
to min_count
and p_cutoff
.
When the enrichment data represents pathways from multiple sources, the filtering and sorting is applied to each source independently. The intent is to retain the top entries from each source, as a method of representing each source consistently even when one source may contain many more pathways, and importantly where the range of enrichment P-values may be very different for each source. For example, a database of small canonical pathways would generally provide less statistically significant P-values than a database of dysregulated genes from gene expression experiments, where each set contains a large number of genes.
This function can optionally apply basic curation of pathway
source names, and can optionally be applied to multiple
source columns. This feature is intended for sources like
MSigDB (see http://software.broadinstitute.org/gsea/msigdb/index.jsp)
which contains columns "Source"
and "Category"
,
and where canonical pathways are either represented with "CP"
or a prefix "CP:"
. The default parameters recognize this
case and curates all prefix "CP:.*"
down to just "CP"
so that all canonical pathways are considered to be the
same source. For MSigDB there are also numerous other sources,
which are each independently filtered and sorted to the
top topEnrichN
entries.
Finally, this function is useful to subset enrichment results
by name, using descriptionGrep
or nameGrep
.
topEnrichListBySource()
extends topEnrichBySource()
by applying
filters to each enrichList
entry, then keeping pathways
across all enrichList
that match the filter criteria in any
one enrichList
. It is most useful in the context of
multiEnrichMap()
where a pathway must meet all criteria
in at least one enrichment, and that pathway should then
be included for all enrichments for the purpose of
comparative analysis.
Other jam enrichment functions:
multiEnrichMap()
Other jam enrichment functions:
multiEnrichMap()