Normalize SummarizedExperiment data
Usage
se_normalize(
se,
method = c("quantile", "jammanorm", "limma_batch_adjust", "TMM", "TMMwsp", "RLE"),
assay_names = NULL,
output_method_prefix = NULL,
output_assay_names = NULL,
genes = NULL,
samples = NULL,
params = list(quantile = list(ties = TRUE), jammanorm = list(controlGenes = NULL,
minimum_mean = 0, controlSamples = NULL, centerGroups = NULL, useMedian = FALSE,
noise_floor = NULL, noise_floor_value = NULL), limma_batch_adjust = list(batch =
NULL, group = NULL), TMM = list(refColumn = NULL, logratioTrim = 0.3, sumTrim = 0.05,
doWeighting = TRUE, Acutoff = NULL), TMMwsp = list(refColumn = NULL, logratioTrim =
0.3, sumTrim = 0.05, doWeighting = TRUE, Acutoff = NULL), RLE = list(refColumn =
NULL, logratioTrim = 0.3,
sumTrim = 0.05, doWeighting = TRUE, Acutoff = NULL)),
normgroup = NULL,
floor = 0,
enforce_norm_floor = TRUE,
output_sep = "_",
override = TRUE,
populate_mcols = TRUE,
verbose = FALSE,
...
)Arguments
- se
SummarizedExperimentobject- method
charactervector indicating which normalization method(s) to apply."quantile": quantile normalization vialimma::normalizeQuantiles()"jammanorm": log-ratio normalization viajamma::jammanorm()"limma_batch_adjust": batch adjustment vialimma::removeBatchEffect(), recommended for data visualization, but not recommended for downstream statistical comparisons."TMM": trimmed mean of M-values viaedgeR::calcNormFactors()"TMMwsp": TMM with singleton pairing viaedgeR::calcNormFactors()"RLE": relative log expression viaedgeR::calcNormFactors()
- assay_names
charactervector or one or morenames(assays(se))that indicates which numeric matrix to use during normalization. When multiple values are provided, each matrix is normalized independently by eachmethod.- output_method_prefix
charactervector (optional) with custom method prefix values to use when creating the newassay_namefor each normalization. It must have length equal tolength(method), to be applied to each method in order. Note thatoutput_assay_namestakes priority, and when it is defined theoutput_method_prefixentries are ignored.Consider these arguments:
assay_name="counts", method="limma_batch_adjust", output_method_prefix="lba"The assay_name created during normalization will be
"lba_counts".- output_assay_names
charactervector (optional) which overrides the default method for defining assay names for normalized data. This vector length must equallength(method) * length(assay_names), and will be applied in the order data is normalized:assay_namesare iterated.For each value in
assay_names, each normalization inmethodis applied.
Therefore the order of
output_assay_namescould follow this order:method1_assay1,method1_assay2,method2_assay1,method2_assay2.- genes
charactervector (optional) used to define a subset of gene rows inseto use for normalization. Values must matchrownames(se).- samples
charactervector (optional) used to define a subset of sample columns inseto use for normalization. Values must matchcolnames(se).- params
list(optional) parameters specific to each normalization method, passed tomatrix_normalize(). Any value which is not defined in theparamsprovided will use the default value inmatrix_normalize(), for exampleparams=list(jammanorm=list(minimum_mean=2))will useminimum_mean=2then use other default values relevant to thejammanormnormalization method.- normgroup
characteror equivalent vector that defines subgroups ofsamplesto be normalized indendently of each normgroup. WhenNULLthen all data is normalized together as default. Thenormgroupvector is expected to be in the same order assamples, ornames(normgroup)must contain allsamples.- output_sep
characterstring used as a delimited between themethodand theassay_namesto define the output assay name, for example whenassay_name="counts",method="quantile", andoutput_sep="_"the new assay name will be"quantile_counts".- override
logicalindicating whether to override any pre-existing matrix values with the same output assay name. Whenoverride=FALSEand the output assay name already exists, the normalization will not be performed.- populate_mcols
logicalindicating whether to populate normalization details intomcols(assays(se)), including the normalizationmethod, the sourceassay_nameused during normalization, and values fromparams.- verbose
logicalindicating whether to print verbose output.- ...
additional arguments are passed to
matrix_normalize().
Value
SummarizedExperiment object where the normalized output
is added to assays(se) using the naming format method_assayname.
Details
This function applies one or more data normalization methods
to an input SummarizedExperiment object. The normalization is
applied to one or more matrix data stored in assays(se),
each one is run independently.
Note that supplying genes and samples will apply normalization
to only those genes and samples, and this data will be
stored in the full SummarizedExperiment object se with
NA values used to fill any values not present in genes
or samples.
For example if assay_names contains two assay names,
and method contains two methods, the output will include
four normalizations, where each assay name is normalized two ways.
The output assay names will be something like "assay1_method1",
"assay1_method2", "assay2_method1", "assay2_method2".
It is not always necessary to normalize data by multiple different
methods, however when two methods are similar and need to be
compared, the SummarizedExperiment object is a convenient
place to store different normalization results for downstream
comparison. Further, the method se_contrast_stats() is able
to apply equivalent statistical contrasts to each normalization,
and returns an array of statistical hits which is convenient
for direct comparison of results.
This method calls matrix_normalize() to perform each normalization
step, see that function description for details on each method.
See also
Other jamses SE utilities:
make_se_test(),
se_collapse_by_column(),
se_collapse_by_row(),
se_detected_rows(),
se_rbind(),
se_to_rowcoldata()
Examples
if (jamba::check_pkg_installed("farrisdata")) {
# se_normalize
# suppressPackageStartupMessages(library(SummarizedExperiment))
GeneSE <- farrisdata::farrisGeneSE;
samples <- colnames(GeneSE);
genes <- rownames(GeneSE);
GeneSE <- se_normalize(GeneSE,
genes=genes,
samples=samples,
assay_names=c("raw_counts", "counts"),
method="jammanorm",
params=list(jammanorm=list(minimum_mean=5)))
SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))
names(SummarizedExperiment::assays(GeneSE))
# review normalization factor values
round(digits=3, attr(
SummarizedExperiment::assays(GeneSE)$jammanorm_raw_counts, "nf"))
# the data in "counts" was already normalized
# so the normalization factors are very near 0 as expected
round(digits=3,
attr(SummarizedExperiment::assays(GeneSE)$jammanorm_counts, "nf"))
# note that housekeeper genes are supplied in params
# also this demonstrates output_method_prefix
set.seed(123);
hkgenes <- sample(rownames(GeneSE), 1000)
GeneSE <- se_normalize(GeneSE,
genes=genes,
samples=samples,
assay_names=c("raw_counts"),
method="jammanorm",
output_method_prefix="hkjammanorm",
params=list(jammanorm=list(minimum_mean=5,
controlGenes=hkgenes)))
SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))
# example showing quantile normalization
GeneSE <- se_normalize(GeneSE,
assay_names=c("raw_counts"),
method="quantile")
SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))
# example showing quantile normalization with custom output_assay_names
GeneSE <- se_normalize(GeneSE,
assay_names=c("raw_counts"),
method="quantile",
output_assay_names="newquantile_raw_counts")
SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))
}
#> DataFrame with 7 rows and 6 columns
#> assay_name normalization_method
#> <character> <character>
#> counts counts NA
#> raw_counts raw_counts NA
#> jammanorm_raw_counts jammanorm_raw_counts jammanorm
#> jammanorm_counts jammanorm_counts jammanorm
#> hkjammanorm_raw_counts hkjammanorm_raw_counts jammanorm
#> quantile_raw_counts quantile_raw_counts quantile
#> newquantile_raw_counts newquantile_raw_counts quantile
#> source_assay_name minimum_mean controlGenes
#> <character> <numeric> <list>
#> counts NA NA NA
#> raw_counts NA NA NA
#> jammanorm_raw_counts raw_counts 5 NA
#> jammanorm_counts counts 5 NA
#> hkjammanorm_raw_counts raw_counts 5 Aldh3b1,Gm8194,Gm7776,...
#> quantile_raw_counts raw_counts NA NA
#> newquantile_raw_counts raw_counts NA NA
#> ties
#> <logical>
#> counts NA
#> raw_counts NA
#> jammanorm_raw_counts NA
#> jammanorm_counts NA
#> hkjammanorm_raw_counts NA
#> quantile_raw_counts TRUE
#> newquantile_raw_counts TRUE