Skip to contents

Sort biological sample labels for experimental design

Usage

sort_samples(
  x,
  control_terms = c("WT|wildtype", "normal|healthy|healthycontrol|^hc$",
    "control|ctrl|ctl", "(^|[-_ ])(NT|NTC)($|[-_ ]|[0-9])", "none|empty|blank",
    "untreated|untrt|untreat", "Vehicle|veh", "ETOH|ethanol", "scramble|mock|sham",
    "ttx|PBS", "knockout", "mutant"),
  sortFunc = jamba::mixedSort,
  pre_control_terms = NULL,
  post_control_terms = NULL,
  ignore.case = TRUE,
  boundary = TRUE,
  perl = boundary,
  keep_factor_order = TRUE,
  ...
)

Arguments

x

character vector or factor

control_terms

vector of regular expression patterns used to determine control terms, where the patterns are matched and returned in order.

pre_control_terms

vector or NULL, optional control terms or regular expressions to use before the control_terms above. This argument is used as a convenient prefix to the default terms.

post_control_terms

vector or NULL, optional control terms or regular expressions to use after the control_terms above. This argument is used as a convenient suffix to the default terms.

ignore.case

logical passed to jamba::provigrep() indicating whether to ignore case-sensitive matching.

boundary

logical indicating whether to require a word boundary at either the start or end of the control terms. When TRUE, it uses perl=TRUE by default, and allows either perl boundary or an underscore "_".

perl

logical indicating whether to use Perl regular expression pattern matching.

keep_factor_order

logical indicating whether to maintain factor level order, if x is supplied as a factor. If keep_factor_order==TRUE then only sort(x) is returned.

...

additional arguments are ignored.

Value

character vector ordered such that control terms are preferentially first before non-control terms.

Details

This function sorts a vector of sample labels using typical heuristics that order typical control groups terms before test groups. For example, "Vehicle" would be returned before "Treatment" since "Vehicle" is a recognized control term.

It also employs jamba::mixedSort() for proper alphanumeric sorting, for example so "Time_5hr" would be sorted before "Time_12hr".

Examples

# the defaults perform well for clear descriptors
sort_samples(c("Trt_12h", "Trt_9h", "Trt_1h", "Trt_9h", "Vehicle"));
#> [1] "Vehicle" "Trt_1h"  "Trt_9h"  "Trt_9h"  "Trt_12h"

# custom terms can be added before the usual control terms
sort_samples(c("Trt_12h", "Trt_9h", "Trt_1h", "Trt_9h", "Fixated", "Vehicle"),
   pre_control_terms="fixate");
#> [1] "Fixated" "Vehicle" "Trt_1h"  "Trt_9h"  "Trt_9h"  "Trt_12h"

# custom terms can be added after the usual control terms
sort_samples(c("Trt_12h", "Trt_9h", "Trt_1h", "Trt_9h", "Fixated", "Vehicle"),
   post_control_terms="fixate");
#> [1] "Vehicle" "Fixated" "Trt_1h"  "Trt_9h"  "Trt_9h"  "Trt_12h"