Apply curation to colData in a SummarizedExperiment object
SummarizedExperiment object.
data.frame (or equivalent) which contains columns of
data annotation to be applied.
The first column is assumed to be the column used for
patterns to be matched with identifiers in the se object.
The pattern column can be defined with pattern_colname.
character value indicating which
column in df contains patterns to be matched with identifiers
in se. The default uses the first column in df.
This value is passed to curate_to_df_by_pattern().
character or NULL (default) indicating
which column(s) represent experimental groups, used
only to create a corresponding column with unique label
for each entry. When NULL no action is taken, which is default.
character used only when group_colname is
defined and present in colnames(df), used to create a
unique label for each row in colData(se).
By default group_colname=NULL so no action is taken.
character string indicating the data to use as
the identifiers when applying curation logic.
The default is to use colnames(se), however it can use one
or more columns from SummarizedExperiment::colData(se).
Some options are described below:
"colnames": uses colnames(se), which should be equivalent
to using rownames(SummarizedExperiment::colData(se)).
"rownames": uses rownames(SummarizedExperiment::colData(se)),
which as stated above should be equivalent to colnames(se).
one or more character values that match colnames(colData(se)).
character string used as a delimiter when
use is supplied as a vector with multiple colnames.
The values in each column are concatenated using this delimiter,
by calling jamba::pasteByRow().
logical indicating whether the se object columns
be subset when not all identifiers matched the patterns in df.
When subset_se=FALSE any entries in se for which
the identifier did not match the pattern in df,
the corresponding rows of SummarizedExperiment::colData()
will contain NA values.
When subset=TRUE any entries in se for which
the identifier did not match the pattern in df will
be removed from the se object. This option is sometimes
a convenient way to subset a large data to use only
user-defined samples.
logical indicating whether to print a warning
when any one pattern matches two or more identifiers.
Sometimes this behavior is intended, however it may indicate
that the patterns are not specific enough to match one unique
identifier. See Details.
numeric value used when verbose=TRUE, passed to
jamba::printDebug().
logical indicating whether to print verbose output.
additional arguments are passed to curate_to_df_by_pattern().
SummarizedExperiment::SummarizedExperiment object.
When subset_se=FALSE (default), the output will contain
the same dimensions and column order as the input se.
When subset_se=TRUE the output object may contain fewer columns
based upon the number of identifiers that matched the patterns
supplied in df.
Given a SummarizedExperiment object, this function is intended
to augment the SummarizedExperiment::colData() annotation associated
with columns, which are typically biological or experimental
samples.
Measurements within each sample are typically stored as rows.
A convenient wrapper to curate_to_df_by_pattern(), which applies
the result directly to SummarizedExperiment::colData() which is
stored as a S4Vectors::DataFrame-class.
Note that colnames present in both colData(se) and df will
take the value from df as replacement, including the presence of NA
values.
The patterns are used to match identifiers using regular expressions,
and the argument warn_multimatch=TRUE (default) will print a
warning when one pattern matches two or more identifiers.
It may be intended, or may indicate that some patterns are not
specific enough to match only one intended identifier.
For example pattern="sample_3" will match identifiers:
c("one_sample_3", "two_sample_3", "one_sample_31").
To overcome this type of issue, use regular expressions to
limit matching to the end, for example pattern="sample_3$"
will only match c("one_sample_3", "two_sample_3") and
will not match "one_sample_31".
It can be helpful to name the pattern column "Pattern" so that
the pattern used is clearly defined in the output
colData(se), and can be compared to the intended identifiers.
Other jam utility functions:
cardinality(),
color_complement(),
convert_PD_df_to_SE(),
convert_imputed_assays_to_na(),
curate_to_df_by_pattern(),
design2layout(),
get_numeric_transform(),
handle_df_args(),
merge_proteomics_se(),
rowNormScale()