merge proteomics SE objects
merge_proteomics_se(
SE1,
SE2,
rowname1 = "SYMBOL",
rowname2 = "SYMBOL",
rowData_colnames_intersect = TRUE,
colData_colnames_intersect = TRUE,
rowData_colnames_unique = c("percentCoverage", "numPepsUnique", "scoreUnique"),
assay_names = NULL,
se_names = c("A", "B"),
startN = 2,
verbose = TRUE,
...
)
SummarizedExperiment
objects to be merged into
one output object.
character
string that describes which
SummarizedExperiment::rowData()
annotation to use to create
appropriate rownames to be merged. This approach is useful when
merging data based upon gene symbol, instead of a protein accession
or peptide sequence. The intent is to allow "equivalent" rows
to be combined across SE1
and SE2
, while non-equivalent rows
unique to SE1
or SE2
are represented on their own row.
The default values assume each proteomics
SE object contains a rowData column "SYMBOL"
with the official
gene symbol represented on each row. This column is appropriate if
proteomics data already represents abundance measurements which
were already aggregated to the protein-level (i.e. gene locus level).
The data will therefore be merged based upon the gene symbol.
In the event that multiple rows represent the same gene symbol,
they will be renamed using jamba::makeNames(..., renameFirst=FALSE)
so that the entries will be merged in order they appear in each
dataset.
However, if the input data contains peptide-level measurements, the appropriate column should contain the peptide sequence, so that the data is merged based upon equivalent peptide sequences.
If rowname1
or rowname2
contain multiple values, and/or are
not equal to each other, a new column "merge_key"
is created
in both SE1
and SE2
, and populated with relevant values.
When multiple columns are indicated, they are concatenated
using jamba::pasteByRow()
to fill the column "merge_key"
.
Then both rowname1
and rowname2
are redefined to
"merge_key"
. Note that any pre-existing "merge_key"
column
will be overwritten.
A combination of "rownames"
and colnames(rowData())
can
be used.
The argument value should contain one value from either:
colnames(rowData())
for the relevant object SE1
or SE2
,
representing a row annotation to use as the merge key. Note that
any empty values (NA
or blank string ""
) will be replaced
by existing rownames()
.
"rownames"
to indicate that existing rownames()
of the
relevant object SE1
or SE2
should be used as the merge key.
Note that if a column "rownames"
already exists in rowData()
it will be used as-is.
logical
indicating whether to retain only the intersection of
colnames(rowData())
and colnames(colData())
in the output
rowData and colData, respectively.
TRUE
: only the intersection is retained in the output data, default.
FALSE
: not yet implemented.
character
vector with optional
colnames(rowData())
which should be retained in a uniquely-named
output column, to keep its values distinct between SE1
and SE2
.
This argument is useful for something like "score"
where independent
datasets are expected to have unique values, and which may be
important to compare.
Note that columns not already being retained will be ignored.
character
vector with one or more specific assay
names to retain in the output data. By default, all assay names
are retained.
character
vector length=2 to define the output labels
used to indicate which rows and columns were present in SE1
and SE2
.
integer
number passed to jamba::makeNames()
to define
the suffix number for the first versioned output. Note that
renameFirst=FALSE
so the first occurrence of a character
string
will not be renamed. When startN=2
, subsequent repeated
entries will have suffix "_v2"
, then "_v3"
and so on.
additional arguments are passed to jamba::makeNames()
.
See notes for specific arguments for a description of how
data is merged relative to
rows and rowData()
, columns and colData()
.
The general strategy is to merge equivalent rows to integrate rows
across SE1
and SE2
, but to force columns (sample measurements)
to be unique across SE1
and SE2
.
This process is somewhat similar to calling cbind()
, in that
the sample columns are extended. However, the rows are merged where
possible.
No assay measurement values are lost during this process.
Other jam utility functions:
cardinality()
,
color_complement()
,
convert_PD_df_to_SE()
,
convert_imputed_assays_to_na()
,
curate_se_colData()
,
curate_to_df_by_pattern()
,
design2layout()
,
get_numeric_transform()
,
handle_df_args()
,
rowNormScale()