Calculate MA-plot data
jammacalc(
x,
na.rm = TRUE,
controlSamples = NULL,
centerGroups = NULL,
controlFloor = NA,
naControlAction = c("row", "floor", "min", "na"),
naControlFloor = 0,
groupedX = TRUE,
useMedian = TRUE,
useMean = NULL,
whichSamples = NULL,
noise_floor = -Inf,
noise_floor_value = noise_floor,
naValue = NA,
mad_row_min = 0,
grouped_mad = TRUE,
centerFunc = centerGeneData,
useRank = FALSE,
returnType = c("ma_list", "tidy"),
verbose = FALSE,
...
)
numeric
matrix typically containing log-normal measurements,
with measurement rows, and sample columns.
logical
indicating whether to ignore NA values
during numeric
summary functions.
character
vector containing values in
colnames(x)
to define control samples used during centering.
These values are passed to centerGeneData()
.
character
vector with length equal to ncol(x)
which defines the group for each column in x
. Data will
be centered within each group.
logical
indicating how to calculate the x-axis
value when centerGroups
contains multiple groups. When
groupedX=TRUE, the mean of each group median is used, which
has the effect of representing each group equally. When
groupedX=FALSE, the median across all columns is used, which
can have the effect of preferring sample groups with a larger
number of columns.
logical
indicating whether to use the median
values when calculating the x-axis and during data centering.
The median naturally reduces the effect of outlier points on
the resulting MA-plots., when compared to using the mean.
When useMedian=FALSE, the mean value is used.
(deprecated) logical
indicating whether to use the
mean instead of the median value. This argument is being removed
in order to improve consistency with other Jam package functions.
character
vector containing colnames(x)
, or
integer vector referencing column numbers in x
. This argument
specifies which columns to return, but does not change the columns
used to define the group centering values. For example, the
group medians are calculated using all the data, but only the
samples in whichSamples
are centered to produce MA-plot data.
numeric
value indicating the minimum numeric value
allowed in the input matrix x
. When NULL
or -Inf
no noise
floor is applied. It is common to set noise_floor=0
to limit
MA-plot data to use values zero and above.
single numeric
value used to replace numeric
values at or below noise_floor
when noise_floor
is not NULL.
By default,
noise_floor_value=noise_floor
which means values at or below
the noise floor are set to the floor. Another useful option is
noise_floor_value=NA
which has the effect of removing the point
from the MA-plot altogether. This option is recommended for sparse
data matrices where the presence of values at or below zero are
indicative of missing data (zero-inflated data) and does not
automatically reflect an actual value of zero.
single numeric
value used to replace any NA
values in
the input matrix x
. This argument can be useful to replace
NA
values with something like zero.
numeric
value defining the minimum group
value, corresponding to the x-axis position on the MA-plot,
required for a row to be included in the MAD calculation.
This threshold is useful to filter outlier data below a noise
threshold, so that the MAD calculation will include only the
data above that value. For example, with count data, it is
useful to filter out counts below roughly 8, where Poisson
noise is a more dominant component than real count data.
Remember that count data should already be log2-transformed,
so the threshold should also be identically transformed,
for example using log2(1 + 8)
to set a minimum count
threshold of at least 8.
logical
indicating whether the MAD value
should be calculated per group when centerGroups
is
supplied, from which the MAD factor values are derived.
When TRUE
it has the effect of highlighting outliers
within each group using the variability in that group.
When FALSE
the overall MAD is calculated, and a
particularly high variability group may have all its
group members labeled with a high MAD factor.
function
used for centering data, by default
one of the functions centerGeneData()
or centerGeneData_v1()
.
This argument will be removed in the near future and is mainly
intended to allow testing the two centering functions.
The following arguments are passed to this function:
x: the input numeric
data matrix
na.rm: logical
whether to ignore NA value. Always use na.rm=TRUE
.
controlSamples: character
optional subset of colnames(x)
to
use as reference controls during centering
centerGroups: character
vector of groups for colnames(x)
controlFloor: numeric
optional minimum allowed value for control
summary prior to centering
naControlAction: character
string for how to handle entirely NA
control groups during centering
naControlFloor: numeric
used when naControlAction="floor"
and
all control values are NA
. One numeric
value is inserted into
the control group.
useMedian: logical
whether to use median (TRUE) or mean (FALSE)
returnGroups: logical
whether to return summary of group assignment
in attribute "center_df"
returnGroupedValues: logical
whether to return group summary values
in attribute "x_group"
...: other arguments are passed along via ...
.
character
string indicating the format of data
to return: "ma_list"
is a list of MA-plot two-column
numeric matrices with colnames c("x","y")
; "tidy"
returns a tall data.frame
suitable for use in ggplot2.
logical
indicating whether to print verbose output.
additional arguments are ignored.
This function takes a numeric matrix as input, and calculates
data sufficient to produce MA-plots. The default output is a
list of two-column numeric matrices with "x"
and "y"
coordinates,
representing the group median and difference from median,
respectively.
The mean value can be used by setting useMedian=FALSE
.
Samples can be grouped using the argument centerGroups
.
In this case the y-axis value will be "difference from
group median."
Control samples can be specified for centering using the
argument controlSamples
. In this case, the y-axis value will
be "difference from control median".
The sample grouping, and control samples can be combined, in which case the y-axis values will be "difference from the control median within the centering group."
Other jam matrix functions:
centerGeneData()
,
jammanorm()
,
matrix_to_column_rank()