Collapse SummarizedExperiment data by column
Source:R/jam_collapse_by_column.R
se_collapse_by_column.Rd
Collapse SummarizedExperiment data by column
Usage
se_collapse_by_column(
se,
columns = colnames(se),
column_groups,
assay_names = NULL,
colDataColnames = colnames(SummarizedExperiment::colData(se)),
keepNULLlevels = FALSE,
groupFunc = jamba::rowGroupMeans,
noise_floor = 0,
noise_floor_value = 0,
rmOutliers = FALSE,
madFactor = 5,
useMedian = FALSE,
verbose = FALSE,
...
)
Arguments
- se
SummarizedExperiment
object- columns
character
vector ofcolnames(se)
to include in the process.- column_groups
character
vector of column groupings, orcharacter
vector ofcolnames(colData(se))
used to define the column groupings.- assay_names
character
vector with one or moreassayNames(se)
to apply the column grouping calculation defined ingroupFunc
. By default, all assay names inassayNames(se)
are used.- colDataColnames
character
vector ofcolData(se)
colnames to be included in the returnedSummarizedExperiment
after the column grouping. This argument is used to subset the columns, in cases where some columns do not need to be combined and returned in the output data.- keepNULLlevels
logical
indicating whether to return empty columns when there are not factor levels present in the data. This option is intended whencolumn_group
references afactor
type, whose factor levels are not present in the current data, usingcolumns
. WhenkeepNULLlevels=TRUE
any missing levels will be present withNA
values, which can be helpful for generating a consistent output.- groupFunc
function
used to perform row group calculations on anumeric
matrix. The default is passed tojamba::rowGroupMeans()
, but can be substituted with another row-based function. It must accept argumentsx
andgroups
, but the other arguments are passed only ifgroupFunc
permits these argument names, or...
:x
as anumeric
matrix (required),groups
as acharacter
vector of column groups, in order ofcolnames(x)
(required)rmOutliers
alogical
indicating whether to apply outlier removal, though the function can ignore this value (optional).madFactor
anumeric
value indicating the MAD threshold used whenrmOutliers=TRUE
; though again the function can ignore this value (optional).useMedian=FALSE
islogical
and whenuseMedian=FALSE
it disables calculating themedian()
value per group, and instead takes the groupmean()
value....
additional arguments in...
will be passed only if permitted bygroupFunc
.
- noise_floor
numeric
value indicating the minimum numeric value permitted, at or below this value will be replaced withnoise_floor_value
. The default valuenoise_floor=0
will therefore change all values at or below zero tonoise_floor_value=0
by default. Another alternative is to change abnormally low values such as zero0
toNA
so these values are not treated as actual measurements during the group summary calculation. This value and the replacement should be adjusted with caution. Usenoise_floor=NULL
ornoise_floor=-Inf
to disable this step.- noise_floor_value
numeric
orNA
used as a replacement fornumeric
values at or belownoise_floor
, which occurs prior to calling thegroupFunc
summary calculation.- rmOutliers, madFactor
logical
andnumeric
, respectively, passed togroupFunc
which by default isjamba::rowGroupMeans()
.- useMedian
logical
passed to argumentgroupFunc()
, intended to be used byjamba::rowGroupMeans()
to specify taking the mean and not the median value per row group.- verbose
logical
indicating whether to print verbose output.- ...
additional arguments are passed through
groupFunc
.
Value
SummarizedExperiment
object with these changes:
columns will be collapsed by
column_groups
, for eachassays(se)
numeric
matrix defined byassay_names
.colData(se)
will also be collapsed byshrinkDataFrame()
to combine unique values from each column annotation.
Details
Purpose is to collapse columns of a SummarizedExperiment
object,
where measurements for a given entity, usually a gene, are split
across multiple rows in the source data. The output of this function
should be measurements appropriately summarized to the gene level.
The driving use case is slightly different than with se_collapse_by_row()
,
in this case the function is mostly convenient method to calculate
group mean values in context of a SummarizedExperiment
object,
so it can be used with jamses::heatmap_se()
for example.
This function retains associated column annotations colData(se)
,
after combining multiple values in an appropriate manner.
Optionally, this function will detect and remove individual outlier values before calculating the group mean.
See also
Other jamses SE utilities:
make_se_test()
,
se_collapse_by_row()
,
se_detected_rows()
,
se_normalize()
,
se_rbind()
,
se_to_rowcoldata()