Find colname by string or pattern

Find colname by string or pattern, with option to require non-NA values.

find_colname(
  pattern,
  x,
  max = 1,
  index = FALSE,
  require_non_na = TRUE,
  col_types = NULL,
  exclude_pattern = NULL,
  verbose = FALSE,
  ...
)

Arguments

pattern: character vector of text strings and/or regular expression patterns.
x: data.frame or other object that contains colnames(x).
max: integer maximum number of entries to return.
index: logical indicating whether to return the column index, that is the column number.
require_non_na: logical indicating whether to require at least one non-NA value in the matching colname. When require_non_na=TRUE and all values in a column are NA, that colname is not returned by this function.
exclude_pattern: character vector of colnames or patterns to exclude from returned results.
verbose: logical indicating whether to print verbose output.
...: additional arguments are ignored.

Details

This function is a simple utility function intended to help find the most appropriate matching colname given one or more character strings or patterns.

It returns the first best matching result, but can return multiple results in order of preference if max=Inf.

The order of matching:

Match the exact colname.
Match case-insensitive colname.
Match the beginning of each colname.
Match the end of each colname.
Match anywhere in each colname.

The goal is to use something like c("p.value", "pvalue", "pval") and be able to find colnames with these variations:

P.Value
P.Value Group-Control
Group-Control P.Value
pvalue

Even if the data contains c("P.Value", "adj.P.Val") as returned by limma::topTable() for example, the pattern c("p.val") will preferentially match "P.Value" and not "adj.P.Val".

Examples

x <- data.frame(
   `Gene`=paste0("gene", LETTERS[1:25]),
   `log2fold Group-Control`=rnorm(25)*2,
   `P.Value Group-Control`=10^-rnorm(25)^2,
   check.names=FALSE);
x[["fold Group-Control"]] <- log2fold_to_fold(x[["log2fold Group-Control"]]);
x[["adj.P.Val Group-Control"]] <- x[["P.Value Group-Control"]];

print(head(x));
#>    Gene log2fold Group-Control P.Value Group-Control fold Group-Control
#> 1 geneA              1.5163421          8.252273e-01           2.860648
#> 2 geneB              0.9225730          6.387370e-01           1.895493
#> 3 geneC              0.9680046          2.502431e-01           1.956133
#> 4 geneD             -1.6330128          1.375335e-02          -3.101600
#> 5 geneE             -1.7015228          3.603060e-06          -3.252441
#> 6 geneF              1.2107613          6.554372e-01           2.314597
#>   adj.P.Val Group-Control
#> 1            8.252273e-01
#> 2            6.387370e-01
#> 3            2.502431e-01
#> 4            1.375335e-02
#> 5            3.603060e-06
#> 6            6.554372e-01
find_colname(c("p.val", "pval"), x);
#> [1] "P.Value Group-Control"
find_colname(c("fold", "fc", "ratio"), x);
#> [1] "fold Group-Control"
find_colname(c("logfold", "log2fold", "lfc", "log2ratio", "logratio"), x);
#> [1] "log2fold Group-Control"

## use exclude_pattern
## if the input data has no "P.Value" but has "adj.P.Val"
y <- x[,c(1,2,4,5)];
print(head(y));
#>    Gene log2fold Group-Control fold Group-Control adj.P.Val Group-Control
#> 1 geneA              1.5163421           2.860648            8.252273e-01
#> 2 geneB              0.9225730           1.895493            6.387370e-01
#> 3 geneC              0.9680046           1.956133            2.502431e-01
#> 4 geneD             -1.6330128          -3.101600            1.375335e-02
#> 5 geneE             -1.7015228          -3.252441            3.603060e-06
#> 6 geneF              1.2107613           2.314597            6.554372e-01
find_colname(c("p.val"), y, exclude_pattern=c("adj"))
#> NULL

Arguments

Details

See also

Examples