R/jamba-rowgroupmeans-madoutliers.R
rowGroupMeans.Rd
Calculate row group means, or other statistics, where: rowGroupMeans()
calculates row summary stats; and rowGroupRmOutliers()
is a convenience
function to call rowGroupMeans(..., rmOutliers=TRUE, returnType="input")
.
rowGroupMeans(
x,
groups,
na.rm = TRUE,
useMedian = TRUE,
rmOutliers = FALSE,
crossGroupMad = TRUE,
madFactor = 5,
returnType = c("output", "input"),
rowStatsFunc = NULL,
groupOrder = c("same", "sort"),
keepNULLlevels = FALSE,
verbose = FALSE,
...
)
rowGroupRmOutliers(
x,
groups,
na.rm = TRUE,
crossGroupMad = TRUE,
madFactor = 5,
rmOutliers = TRUE,
returnType = c("input"),
groupOrder = c("same", "sort"),
keepNULLlevels = FALSE,
verbose = FALSE,
...
)
numeric
data matrix
character
or factor
vector of group labels,
either as a character vector, or a factor. See the parameter
groupOrder
for ordering of group labels in the output
data matrix.
logical
indicating whether the default
stat should be "mean" or "median".
logical
indicating whether to apply outlier
detection and removal.
logical
indicating whether to calculate
row MAD values using the median across groups for each row.
The median is calculated using non-NA and non-zero row group
MAD values. When crossGroupMad=TRUE
it also calculates
the non-NA, non-zero median row MAD across all rows,
which defines the minimum difference from median applied across
all values to be considered an outlier.
numeric
value indicating the multiple of the
MAD value to define outliers. For example madFactor=5
will take the MAD value for a group multiplied by 5,
5MAD, as a threshold for outliers. So any points more than
5MAD distance from the median per group are outliers.
character
value indicating the return data
type, "output"
returns one summary stat value per group, per row;
"input"
is useful when rmOutliers=TRUE
in that it returns
a matrix with the same dimensions as the input, except with
outlier points replaced with NA.
optional function
which takes a numeric matrix
as input, and returns a numeric vector equal to the number of
rows of the input data matrix. Examples: base::rowMeans()
,
matrixStats::rowMedians()
, matrixStats::rowMads
.
character
string indicating how character group
labels are ordered in the final data matrix, when returnType="output"
.
Note that when groups
is a factor, the factor levels are kept
in that order. Otherwise, "same"
keeps groups in the same
order they appear in the input matrix; "sort"
applies
jamba::mixedSort()
to the labels.
logical
indicating whether to keep factor
levels even when there are no corresponding columns in x
.
When TRUE
and returnType="output"
the output matrix will
contain one colname for each factor level, with NA values used
to fill empty factor levels. This mechanism can be helpful to
ensure that output matrices have consistent colnames.
logical
indicating whether to print verbose output.
additional parameters are passed to rowStatsFunc
,
and if rmOutliers=TRUE
to jamba::rowRmMadOutliers()
.
When returnType="output"
the output is a numeric matrix
with the same number of columns as the number of unique
groups
labels. When groups
is a factor and
keepNULLlevels=TRUE
, the number of columns will be the
number of factor levels, otherwise it will be the number of
factor levels used in groups
.
When returnType="input"
the output is a numeric matrix
with the same dimensions as the input data. This output is
intended for use with rmOutliers=TRUE
which will replace
outlier points with NA
values. Therefore, this matrix can
be used to see the location of outliers.
The function also returns attributes that describe the number of samples per group overall:
The attribute "n"
is used to describe
the number of replicates per group.
The attribute "nLabel"
is
a simple text label in the form "n=3"
.
Note that when rmOutliers=TRUE
the number of replicates per
group will vary depending upon the outliers removed. In that
case, remember that the reported "n"
is always the total
possible columns available prior to outlier removal.
This function by default calculates group mean values per row in a numeric matrix. However, the stat function can be changed to calculate row medians, row MADs, etc.
An added purpose of this function is optional outlier
filtering, via calculation of MAD values and applying
a MAD threshold cutoff. The intention is to identify
technical outliers that otherwise adversely affect the
calculated group mean or median values. To inspect the
data after outlier removal, use the parameter returnType="input"
which will return the input data matrix with NA
substituted for outlier points. Outlier detection and
removal is performed by jamba::rowRmMadOutliers()
.
Other jam numeric functions:
deg2rad()
,
fix_matrix_ratio()
,
noiseFloor()
,
normScale()
,
rad2deg()
,
rowRmMadOutliers()
,
warpAroundZero()
x <- matrix(ncol=9, rnorm(90));
colnames(x) <- LETTERS[1:9];
rowGroupMeans(x, groups=rep(letters[1:3], each=3))
#> a b c
#> [1,] -0.005767173 0.83204713 0.24841265
#> [2,] 0.046726172 -0.27934628 0.06528818
#> [3,] -0.224267885 0.35872890 0.01915639
#> [4,] -0.542888255 -0.01104548 0.25733838
#> [5,] -0.433310317 -0.45278397 0.00837096
#> [6,] -0.289461574 -0.79533912 -0.21951563
#> [7,] -0.057106774 -0.81496871 0.59625902
#> [8,] 0.503607972 0.24226348 0.11971764
#> [9,] 0.992160365 -1.42509839 0.14377148
#> [10,] -0.690953840 0.36594112 -0.11775360
#> attr(,"n")
#> a b c
#> 3 3 3
#> attr(,"nLabel")
#> a b c
#> "n=3" "n=3" "n=3"