Calculate row group means, or other statistics
Source:R/jamba-rowgroupmeans-madoutliers.R
rowGroupMeans.Rd
Calculate row group means, or other statistics, where: rowGroupMeans()
calculates row summary stats; and rowGroupRmOutliers()
is a convenience
function to call rowGroupMeans(..., rmOutliers=TRUE, returnType="input")
.
Usage
rowGroupMeans(
x,
groups,
na.rm = TRUE,
useMedian = TRUE,
rmOutliers = FALSE,
crossGroupMad = TRUE,
madFactor = 5,
returnType = c("output", "input"),
rowStatsFunc = NULL,
groupOrder = c("same", "sort"),
keepNULLlevels = FALSE,
includeAttributes = FALSE,
verbose = FALSE,
...
)
rowGroupRmOutliers(
x,
groups,
na.rm = TRUE,
rmOutliers = TRUE,
crossGroupMad = TRUE,
madFactor = 5,
returnType = c("input"),
groupOrder = c("same", "sort"),
keepNULLlevels = FALSE,
includeAttributes = FALSE,
verbose = FALSE,
...
)
Arguments
- x
numeric
data matrix- groups
character
orfactor
vector of group labels, either as a character vector, or a factor. See the parametergroupOrder
for ordering of group labels in the output data matrix.- na.rm
logical
, default TRUE, passed to the stats func to ignore NA values.- useMedian
logical
, default TRUE, indicating whether the default stat should be "mean" or "median".- rmOutliers
logical
, default FALSE, indicating whether to apply outlier detection and removal.- crossGroupMad
logical
indicating whether to calculate row MAD values using the median across groups for each row. The median is calculated using non-NA and non-zero row group MAD values. WhencrossGroupMad=TRUE
it also calculates the non-NA, non-zero median row MAD across all rows, which defines the minimum difference from median applied across all values to be considered an outlier.- madFactor
numeric
value indicating the multiple of the MAD value to define outliers. For examplemadFactor=5
will take the MAD value for a group multiplied by 5, 5MAD, as a threshold for outliers. So any points more than 5MAD distance from the median per group are outliers.- returnType
character
, default "output", the return data type:"output"
returns one summary stat value per group, per row;"input"
is useful whenrmOutliers=TRUE
in that it returns a matrix with the same dimensions as the input, except with outlier points replaced with NA.
- rowStatsFunc
function
, default NULL, which takes a numeric matrix as input, and returns a numeric vector equal to the number of rows of the input data matrix. When supplied,useMedian
is ignored. Examples:base::rowMeans()
,matrixStats::rowMedians()
,matrixStats::rowMads
.- groupOrder
character
string indicating how character group labels are ordered in the final data matrix, whenreturnType="output"
. Note that whengroups
is a factor, the factor levels are kept in that order. Otherwise,"same"
keeps groups in the same order they appear in the input matrix;"sort"
appliesjamba::mixedSort()
to the labels.- keepNULLlevels
logical
, default FALSE, whether to keep factor levels even when there are no corresponding columns inx
. WhenTRUE
andreturnType="output"
the output matrix will contain one colname for each factor level, with NA values used to fill empty factor levels. This mechanism can be helpful to ensure that output matrices have consistent colnames.- includeAttributes
logical
, default FALSE, whether to include attributes with"n"
number of replicates per group, and"nLabel"
with replicate label inn=#
form.- verbose
logical
indicating whether to print verbose output.- ...
additional parameters are passed to
rowStatsFunc
, and ifrmOutliers=TRUE
tojamba::rowRmMadOutliers()
.
Value
numeric
matrix based upon returnType
:
When
returnType="output"
the output is a numeric matrix with the same number of columns as the number of uniquegroups
labels. Whengroups
is a factor andkeepNULLlevels=TRUE
, the number of columns will be the number of factor levels, otherwise it will be the number of factor levels used ingroups
.When
returnType="input"
the output is a numeric matrix with the same dimensions as the input data. This output is intended for use withrmOutliers=TRUE
which will replace outlier points withNA
values. Therefore, this matrix can be used to see the location of outliers.
The function also returns attributes when includeAttributes=TRUE
,
although the default is FALSE. The attributes describe the
number of samples per group overall:
- attr(out, "n")
The attribute
"n"
is used to describe the number of replicates per group.- attr(out, "nLabel")
The attribute
"nLabel"
is a simple text label in the form"n=3"
.
Note that when rmOutliers=TRUE
the number of replicates per
group will vary depending upon the outliers removed. In that
case, remember that the reported "n"
is always the total
possible columns available prior to outlier removal.
Details
This function by default calculates group mean values per row in a numeric matrix. However, the stat function can be changed to calculate row medians, row MADs, etc.
An added purpose of this function is optional outlier
filtering, via calculation of MAD values and applying
a MAD threshold cutoff. The intention is to identify
technical outliers that otherwise adversely affect the
calculated group mean or median values. To inspect the
data after outlier removal, use the parameter returnType="input"
which will return the input data matrix with NA
substituted for outlier points. Outlier detection and
removal is performed by jamba::rowRmMadOutliers()
.
See also
Other jam numeric functions:
deg2rad()
,
noiseFloor()
,
normScale()
,
rad2deg()
,
rowRmMadOutliers()
,
warpAroundZero()
Examples
x <- matrix(ncol=9, stats::rnorm(90));
colnames(x) <- LETTERS[1:9];
use_groups <- rep(letters[1:3], each=3)
rowGroupMeans(x, groups=use_groups)
#> a b c
#> [1,] 0.3297912 -0.006198262 -0.66518864
#> [2,] -1.1655448 0.634362125 0.45203019
#> [3,] -0.8185157 -0.279333528 0.30027912
#> [4,] 0.2865486 0.793585308 0.07485682
#> [5,] -0.3200564 -0.241689768 0.20637270
#> [6,] -0.4321298 -0.374800093 1.76365303
#> [7,] 0.8001769 -0.772978228 0.03768285
#> [8,] -0.1294107 0.084543768 -0.04691673
#> [9,] 0.8867361 -1.334353628 0.15161137
#> [10,] -0.9343851 0.495870480 1.29230591
# rowGroupRmOutliers returns the input data after outlier removal
rowGroupRmOutliers(x, groups=use_groups, returnType="input")
#> A B C D E F
#> [1,] -0.1453936 0.3297912 0.39370865 -0.5208693 1.2339762 -0.006198262
#> [2,] -1.1655448 -3.2273228 0.40363146 1.6232025 0.6343621 -0.685706846
#> [3,] -0.8185157 -0.7717918 -0.88643672 -1.0700682 0.4120223 -0.279333528
#> [4,] 0.6849361 0.2865486 -1.31893760 1.6858872 0.7935853 -0.782730275
#> [5,] -0.3200564 NA 0.02884391 -0.2416898 -0.1524106 -0.778997240
#> [6,] NA NA -0.43212979 -0.4682005 -0.2288958 -0.374800093
#> [7,] -0.5996083 0.8001769 1.68987252 -0.7729782 -0.9007918 -0.319393809
#> [8,] -0.1294107 -0.1639310 1.22839278 NA -0.7350262 0.084543768
#> [9,] 0.8867361 1.2429188 0.27602348 -1.3343536 -1.4276858 -0.768473603
#> [10,] -0.1513960 -0.9343851 -1.04897550 0.4958705 0.6192835 NA
#> G H I
#> [1,] -0.90087086 -0.46355650 -0.6651886
#> [2,] 0.66372867 0.30546323 0.4520302
#> [3,] 0.30027912 -0.08398871 0.5268557
#> [4,] 0.07485682 0.41036345 -0.2302622
#> [5,] 0.20637270 0.18367824 NA
#> [6,] NA 1.77874162 1.7636530
#> [7,] -0.62795166 0.03768285 0.4856014
#> [8,] -0.04691673 1.17622012 -0.2657389
#> [9,] 0.16261812 NA 0.1516114
#> [10,] 1.29230591 NA 1.3766098
# rowGroupMeans(..., returnType="input") also returns the input data
rowGroupMeans(x, groups=use_groups, rmOutliers=TRUE, returnType="input")
#> A B C D E F
#> [1,] -0.1453936 0.3297912 0.39370865 -0.5208693 1.2339762 -0.006198262
#> [2,] -1.1655448 -3.2273228 0.40363146 1.6232025 0.6343621 -0.685706846
#> [3,] -0.8185157 -0.7717918 -0.88643672 -1.0700682 0.4120223 -0.279333528
#> [4,] 0.6849361 0.2865486 -1.31893760 1.6858872 0.7935853 -0.782730275
#> [5,] -0.3200564 NA 0.02884391 -0.2416898 -0.1524106 -0.778997240
#> [6,] NA NA -0.43212979 -0.4682005 -0.2288958 -0.374800093
#> [7,] -0.5996083 0.8001769 1.68987252 -0.7729782 -0.9007918 -0.319393809
#> [8,] -0.1294107 -0.1639310 1.22839278 NA -0.7350262 0.084543768
#> [9,] 0.8867361 1.2429188 0.27602348 -1.3343536 -1.4276858 -0.768473603
#> [10,] -0.1513960 -0.9343851 -1.04897550 0.4958705 0.6192835 NA
#> G H I
#> [1,] -0.90087086 -0.46355650 -0.6651886
#> [2,] 0.66372867 0.30546323 0.4520302
#> [3,] 0.30027912 -0.08398871 0.5268557
#> [4,] 0.07485682 0.41036345 -0.2302622
#> [5,] 0.20637270 0.18367824 NA
#> [6,] NA 1.77874162 1.7636530
#> [7,] -0.62795166 0.03768285 0.4856014
#> [8,] -0.04691673 1.17622012 -0.2657389
#> [9,] 0.16261812 NA 0.1516114
#> [10,] 1.29230591 NA 1.3766098
# rowGroupMeans with outlier removal
rowGroupMeans(x, groups=use_groups, rmOutliers=TRUE)
#> a b c
#> [1,] 0.3297912 -0.006198262 -0.66518864
#> [2,] -1.1655448 0.634362125 0.45203019
#> [3,] -0.8185157 -0.279333528 0.30027912
#> [4,] 0.2865486 0.793585308 0.07485682
#> [5,] -0.1456063 -0.241689768 0.19502547
#> [6,] -0.4321298 -0.374800093 1.77119732
#> [7,] 0.8001769 -0.772978228 0.03768285
#> [8,] -0.1294107 -0.325241194 -0.04691673
#> [9,] 0.8867361 -1.334353628 0.15711474
#> [10,] -0.9343851 0.557577007 1.33445786