Collapse SummarizedExperiment data by column
Source:R/jam_collapse_by_column.R
se_collapse_by_column.RdCollapse SummarizedExperiment data by column
Usage
se_collapse_by_column(
se,
columns = colnames(se),
column_groups,
assay_names = NULL,
colDataColnames = colnames(SummarizedExperiment::colData(se)),
keepNULLlevels = FALSE,
groupFunc = jamba::rowGroupMeans,
noise_floor = 0,
noise_floor_value = 0,
rmOutliers = FALSE,
madFactor = 5,
useMedian = FALSE,
verbose = FALSE,
...
)Arguments
- se
SummarizedExperimentobject- columns
charactervector ofcolnames(se)to include in the process.- column_groups
charactervector of column groupings, orcharactervector ofcolnames(colData(se))used to define the column groupings.- assay_names
charactervector with one or moreassayNames(se)to apply the column grouping calculation defined ingroupFunc. By default, all assay names inassayNames(se)are used.- colDataColnames
charactervector ofcolData(se)colnames to be included in the returnedSummarizedExperimentafter the column grouping. This argument is used to subset the columns, in cases where some columns do not need to be combined and returned in the output data.- keepNULLlevels
logicalindicating whether to return empty columns when there are not factor levels present in the data. This option is intended whencolumn_groupreferences afactortype, whose factor levels are not present in the current data, usingcolumns. WhenkeepNULLlevels=TRUEany missing levels will be present withNAvalues, which can be helpful for generating a consistent output.- groupFunc
functionused to perform row group calculations on anumericmatrix. The default is passed tojamba::rowGroupMeans(), but can be substituted with another row-based function. It must accept argumentsxandgroups, but the other arguments are passed only ifgroupFuncpermits these argument names, or...:xas anumericmatrix (required),groupsas acharactervector of column groups, in order ofcolnames(x)(required)rmOutliersalogicalindicating whether to apply outlier removal, though the function can ignore this value (optional).madFactoranumericvalue indicating the MAD threshold used whenrmOutliers=TRUE; though again the function can ignore this value (optional).useMedian=FALSEislogicaland whenuseMedian=FALSEit disables calculating themedian()value per group, and instead takes the groupmean()value....additional arguments in...will be passed only if permitted bygroupFunc.
- noise_floor
numericvalue indicating the minimum numeric value permitted, at or below this value will be replaced withnoise_floor_value. The default valuenoise_floor=0will therefore change all values at or below zero tonoise_floor_value=0by default. Another alternative is to change abnormally low values such as zero0toNAso these values are not treated as actual measurements during the group summary calculation. This value and the replacement should be adjusted with caution. Usenoise_floor=NULLornoise_floor=-Infto disable this step.- noise_floor_value
numericorNAused as a replacement fornumericvalues at or belownoise_floor, which occurs prior to calling thegroupFuncsummary calculation.- rmOutliers, madFactor
logicalandnumeric, respectively, passed togroupFuncwhich by default isjamba::rowGroupMeans().- useMedian
logicalpassed to argumentgroupFunc(), intended to be used byjamba::rowGroupMeans()to specify taking the mean and not the median value per row group.- verbose
logicalindicating whether to print verbose output.- ...
additional arguments are passed through
groupFunc.
Value
SummarizedExperiment object with these changes:
columns will be collapsed by
column_groups, for eachassays(se)numericmatrix defined byassay_names.colData(se)will also be collapsed byshrinkDataFrame()to combine unique values from each column annotation.
Details
Purpose is to collapse columns of a SummarizedExperiment object,
where measurements for a given entity, usually a gene, are split
across multiple rows in the source data. The output of this function
should be measurements appropriately summarized to the gene level.
The driving use case is slightly different than with se_collapse_by_row(),
in this case the function is mostly convenient method to calculate
group mean values in context of a SummarizedExperiment object,
so it can be used with jamses::heatmap_se() for example.
This function retains associated column annotations colData(se),
after combining multiple values in an appropriate manner.
Optionally, this function will detect and remove individual outlier values before calculating the group mean.
See also
Other jamses SE utilities:
make_se_test(),
se_collapse_by_row(),
se_detected_rows(),
se_normalize(),
se_rbind(),
se_to_rowcoldata()