SummarizedExperiment heuristics to define detected rows
Source:R/jam_se_detectrows.R
se_detected_rows.RdSummarizedExperiment heuristics to define detected rows
Usage
se_detected_rows(
se,
assay_name = 1,
group_colnames,
normgroup_colname = NULL,
detect_mincounts = 0,
detect_totalreps = 1,
detect_minreps = 2,
detect_minpct = 0.65,
detect_mingroups = 1,
isamples = colnames(se),
verbose = FALSE,
...
)Arguments
- se
SummarizedExperimentobject- assay_name
characterorintegerindex, referring to the entry inassays(se)to use when determining valid measurements.- group_colnames
charactervector of colnames incolData(se)which defines sample grouping.- normgroup_colname
characterstring with optional colname incolData(se)to use for normgroups.- detect_mincounts
numericvalue at or above which a measurement is considered "valid".- detect_totalreps
numericminimum total number of replicates that must contain "valid" measurements.- detect_minreps
numericminimum replicates which must contain "valid" measurements in any given sample group.- detect_minpct
numericminimum fraction of available replicates in a sample group that must contain "valid" measurements.- detect_mingroups
numericminimum number of sample groups that are considered "valid" based upon other criteria.- isamples
characteroptional vector ofcolnames(se)to use during this analysis. This vector is useful for example, when excluding outlier samples that were defined by other methods.- verbose
logicalindicating whether to print verbose output.- ...
additional arguments are ignored.
Value
list with the following elements:
detected_rowsis acharactervector of detectedrownames(se)detected_normgroupis alistoflogicalvectors for each normgroup, where the vectors encode whether a row is detected within each normgroup.detected_dfis adata.framewith summary information for each normgroup.
Details
This function is intended to help apply common logical rules to define valid, "detected" rows for downstream analysis.
The rules:
minimum value at or above which a measurement is "valid"
minimum total replicates with "valid" measurement, across all sample columns
minimum replicates with "valid" measurement required in any sample group
minimum percent replicates with "valid" measurement required in any sample group
minimum sample groups with "valid" criteria above required
Example
Consider an experiment with 7 groups, and n=3 replicates, which contains 21 total samples.
Assume one row of data that contains 6 "valid" measurements.
If these 6 "valid" measurements are found in only 2 groups, both groups contain n=3 "valid" measurements. This row may have sufficient data for a statistical comparison across these two groups.
However, if the 6 "valid" measurements are also found across 6 different groups, it may not be suitable for statistical testing.
Use of normgroups
Detection can be carried out within "normgroups", which are
independent subsets of sample columns. In most cases this method
is not necessary, but is intended when the detected rows should
be independently calculated for two or more subsets of sample
columns.
A specific example might be an experiment that measures treatment
effects in two very different tissue types, like lung and muscle.
The detected genes in lung may well not be the same as detected
genes in lung. And in fact, statistical comparisons may not be intended
to compare muscle and lung directly. (That judgement is left
to the analyst.) One may define a column in colData(se) that represents
tissue type, with values "muscle", and "lung", then define
this column with argument normgroup_colname. The detection will
be done within each independent normgroup, returned as a list
named "detected_normgroup". The detected rows are also combined
into "detected_rows" which returns rows detected across
all normgroups.
See also
Other jamses SE utilities:
make_se_test(),
se_collapse_by_column(),
se_collapse_by_row(),
se_normalize(),
se_rbind(),
se_to_rowcoldata()