Apply curation to colData in a SummarizedExperiment object
SummarizedExperiment
object.
data.frame
(or equivalent) which contains columns of
data annotation to be applied.
The first column is assumed to be the column used for
patterns to be matched with identifiers in the se
object.
The pattern column can be defined with pattern_colname
.
character
value indicating which
column in df
contains patterns to be matched with identifiers
in se
. The default uses the first column in df
.
This value is passed to curate_to_df_by_pattern()
.
character
or NULL
(default) indicating
which column(s) represent experimental groups, used
only to create a corresponding column with unique label
for each entry. When NULL
no action is taken, which is default.
character
used only when group_colname
is
defined and present in colnames(df)
, used to create a
unique label for each row in colData(se)
.
By default group_colname=NULL
so no action is taken.
character
string indicating the data to use as
the identifiers when applying curation logic.
The default is to use colnames(se)
, however it can use one
or more columns from SummarizedExperiment::colData(se)
.
Some options are described below:
"colnames"
: uses colnames(se)
, which should be equivalent
to using rownames(SummarizedExperiment::colData(se))
.
"rownames"
: uses rownames(SummarizedExperiment::colData(se))
,
which as stated above should be equivalent to colnames(se)
.
one or more character
values that match colnames(colData(se))
.
character
string used as a delimiter when
use
is supplied as a vector with multiple colnames.
The values in each column are concatenated using this delimiter,
by calling jamba::pasteByRow()
.
logical
indicating whether the se
object columns
be subset when not all identifiers matched the patterns in df
.
When subset_se=FALSE
any entries in se
for which
the identifier did not match the pattern in df
,
the corresponding rows of SummarizedExperiment::colData()
will contain NA
values.
When subset=TRUE
any entries in se
for which
the identifier did not match the pattern in df
will
be removed from the se
object. This option is sometimes
a convenient way to subset a large data to use only
user-defined samples.
logical
indicating whether to print a warning
when any one pattern matches two or more identifiers.
Sometimes this behavior is intended, however it may indicate
that the patterns are not specific enough to match one unique
identifier. See Details.
numeric
value used when verbose=TRUE
, passed to
jamba::printDebug()
.
logical
indicating whether to print verbose output.
additional arguments are passed to curate_to_df_by_pattern()
.
SummarizedExperiment::SummarizedExperiment
object.
When subset_se=FALSE
(default), the output will contain
the same dimensions and column order as the input se
.
When subset_se=TRUE
the output object may contain fewer columns
based upon the number of identifiers that matched the patterns
supplied in df
.
Given a SummarizedExperiment object, this function is intended
to augment the SummarizedExperiment::colData()
annotation associated
with columns, which are typically biological or experimental
samples.
Measurements within each sample are typically stored as rows.
A convenient wrapper to curate_to_df_by_pattern()
, which applies
the result directly to SummarizedExperiment::colData()
which is
stored as a S4Vectors::DataFrame-class
.
Note that colnames present in both colData(se)
and df
will
take the value from df
as replacement, including the presence of NA
values.
The patterns are used to match identifiers using regular expressions,
and the argument warn_multimatch=TRUE
(default) will print a
warning when one pattern matches two or more identifiers.
It may be intended, or may indicate that some patterns are not
specific enough to match only one intended identifier.
For example pattern="sample_3"
will match identifiers:
c("one_sample_3", "two_sample_3", "one_sample_31")
.
To overcome this type of issue, use regular expressions to
limit matching to the end, for example pattern="sample_3$"
will only match c("one_sample_3", "two_sample_3")
and
will not match "one_sample_31"
.
It can be helpful to name the pattern column "Pattern"
so that
the pattern used is clearly defined in the output
colData(se)
, and can be compared to the intended identifiers.
Other jam utility functions:
cardinality()
,
color_complement()
,
convert_PD_df_to_SE()
,
convert_imputed_assays_to_na()
,
curate_to_df_by_pattern()
,
design2layout()
,
get_numeric_transform()
,
handle_df_args()
,
merge_proteomics_se()
,
rowNormScale()