SummarizedExperiment heuristics to define detected rows
Source:R/jam_se_detectrows.R
se_detected_rows.Rd
SummarizedExperiment heuristics to define detected rows
Usage
se_detected_rows(
se,
assay_name = 1,
group_colnames,
normgroup_colname = NULL,
detect_mincounts = 0,
detect_totalreps = 1,
detect_minreps = 2,
detect_minpct = 0.65,
detect_mingroups = 1,
isamples = colnames(se),
verbose = FALSE,
...
)
Arguments
- se
SummarizedExperiment
object- assay_name
character
orinteger
index, referring to the entry inassays(se)
to use when determining valid measurements.- group_colnames
character
vector of colnames incolData(se)
which defines sample grouping.- normgroup_colname
character
string with optional colname incolData(se)
to use for normgroups.- detect_mincounts
numeric
value at or above which a measurement is considered "valid".- detect_totalreps
numeric
minimum total number of replicates that must contain "valid" measurements.- detect_minreps
numeric
minimum replicates which must contain "valid" measurements in any given sample group.- detect_minpct
numeric
minimum fraction of available replicates in a sample group that must contain "valid" measurements.- detect_mingroups
numeric
minimum number of sample groups that are considered "valid" based upon other criteria.- isamples
character
optional vector ofcolnames(se)
to use during this analysis. This vector is useful for example, when excluding outlier samples that were defined by other methods.- verbose
logical
indicating whether to print verbose output.- ...
additional arguments are ignored.
Value
list
with the following elements:
detected_rows
is acharacter
vector of detectedrownames(se)
detected_normgroup
is alist
oflogical
vectors for each normgroup, where the vectors encode whether a row is detected within each normgroup.detected_df
is adata.frame
with summary information for each normgroup.
Details
This function is intended to help apply common logical rules to define valid, "detected" rows for downstream analysis.
The rules:
minimum value at or above which a measurement is "valid"
minimum total replicates with "valid" measurement, across all sample columns
minimum replicates with "valid" measurement required in any sample group
minimum percent replicates with "valid" measurement required in any sample group
minimum sample groups with "valid" criteria above required
Example
Consider an experiment with 7 groups, and n=3 replicates, which contains 21 total samples.
Assume one row of data that contains 6 "valid" measurements.
If these 6 "valid" measurements are found in only 2 groups, both groups contain n=3 "valid" measurements. This row may have sufficient data for a statistical comparison across these two groups.
However, if the 6 "valid" measurements are also found across 6 different groups, it may not be suitable for statistical testing.
Use of normgroups
Detection can be carried out within "normgroups"
, which are
independent subsets of sample columns. In most cases this method
is not necessary, but is intended when the detected rows should
be independently calculated for two or more subsets of sample
columns.
A specific example might be an experiment that measures treatment
effects in two very different tissue types, like lung and muscle.
The detected genes in lung may well not be the same as detected
genes in lung. And in fact, statistical comparisons may not be intended
to compare muscle and lung directly. (That judgement is left
to the analyst.) One may define a column in colData(se)
that represents
tissue type, with values "muscle"
, and "lung"
, then define
this column with argument normgroup_colname
. The detection will
be done within each independent normgroup, returned as a list
named "detected_normgroup"
. The detected rows are also combined
into "detected_rows"
which returns rows detected across
all normgroups.
See also
Other jamses SE utilities:
make_se_test()
,
se_collapse_by_column()
,
se_collapse_by_row()
,
se_normalize()
,
se_rbind()
,
se_to_rowcoldata()