Sort biological sample labels for experimental design

sortSamples(
  x,
  controlTerms = c("WT|wildtype", "(^|[-_ ])(NT|NTC)($|[-_ ]|[0-9])", "ETOH",
    "control|ctrl|ctl", "Vehicle|veh", "none|empty|blank", "scramble", "ttx", "PBS",
    "knockout", "mutant"),
  sortFunc = jamba::mixedSort,
  preControlTerms = NULL,
  postControlTerms = NULL,
  ignore.case = TRUE,
  boundary = TRUE,
  perl = boundary,
  keepFactorsAsIs = TRUE,
  ...
)

Arguments

x

character vector or factor

controlTerms

vector of regular expression patterns used to determine control terms, where the patterns are matched and returned in order.

preControlTerms

vector or NULL, optional control terms or regular expressions to use before the controlTerms above. This argument is used as a convenient prefix to the default terms.

postControlTerms

vector or NULL, optional control terms or regular expressions to use after the controlTerms above. This argument is used as a convenient suffix to the default terms.

ignore.case

logical passed to jamba::provigrep() indicating whether to ignore case-sensitive matching.

boundary

logical indicating whether to require a word boundary at either the start or end of the control terms. When TRUE, it uses perl=TRUE by default, and allows either perl boundary or an underscore "_".

perl

logical indicating whether to use Perl regular expression pattern matching.

keepFactorsAsIs

logical indicating whether to maintain factor level order, if x is supplied as a factor. If keepFactorsAsIs==TRUE then only sort(x) is returned.

...

additional arguments are ignored.

Value

character vector ordered such that control terms are preferentially first before non-control terms.

Details

This function sorts a vector of sample labels using typical heuristics that order typical control groups terms before test groups. For example, "Vehicle" would be returned before "Treatment" since "Vehicle" is a recognized control term.

It also employs jamba::mixedSort() for proper alphanumeric sorting, for example so "Time_5hr" would be sorted before "Time_12hr".

See also

Examples

# the defaults perform well for clear descriptors sortSamples(c("Trt_12h", "Trt_9h", "Trt_1h", "Trt_9h", "Vehicle"));
#> [1] "Vehicle" "Trt_1h" "Trt_9h" "Trt_9h" "Trt_12h"
# custom terms can be added before the usual control terms sortSamples(c("Trt_12h", "Trt_9h", "Trt_1h", "Trt_9h", "Fixated", "Vehicle"), preControlTerms="fixate");
#> [1] "Fixated" "Vehicle" "Trt_1h" "Trt_9h" "Trt_9h" "Trt_12h"
# custom terms can be added after the usual control terms sortSamples(c("Trt_12h", "Trt_9h", "Trt_1h", "Trt_9h", "Fixated", "Vehicle"), postControlTerms="fixate");
#> [1] "Vehicle" "Fixated" "Trt_1h" "Trt_9h" "Trt_9h" "Trt_12h"