Find colname by string or pattern, with option to require non-NA values.
find_colname(
pattern,
x,
max = 1,
index = FALSE,
require_non_na = TRUE,
col_types = NULL,
exclude_pattern = NULL,
verbose = FALSE,
...
)
character
vector of text strings and/or regular
expression patterns.
data.frame
or other object that contains colnames(x)
.
integer
maximum number of entries to return.
logical
indicating whether to return the column index,
that is the column number.
logical
indicating whether to require at
least one non-NA
value in the matching colname. When
require_non_na=TRUE
and all values in a column are NA
,
that colname is not returned by this function.
character
vector of colnames or patterns
to exclude from returned results.
logical
indicating whether to print verbose output.
additional arguments are ignored.
This function is a simple utility function intended to help find the most appropriate matching colname given one or more character strings or patterns.
It returns the first best matching result, but can return
multiple results in order of preference if max=Inf
.
The order of matching:
Match the exact colname.
Match case-insensitive colname.
Match the beginning of each colname.
Match the end of each colname.
Match anywhere in each colname.
The goal is to use something like c("p.value", "pvalue", "pval")
and be able to find colnames with these variations:
P.Value
P.Value Group-Control
Group-Control P.Value
pvalue
Even if the data contains c("P.Value", "adj.P.Val")
as returned
by limma::topTable()
for example, the pattern c("p.val")
will
preferentially match "P.Value"
and not "adj.P.Val"
.
Other jam utility functions:
blockArrowMargin()
,
fold_to_log2fold()
,
get_se_assaydata()
,
gradient_rect()
,
handle_highlightPoints()
,
log2fold_to_fold()
,
logAxis()
,
outer_legend()
,
points2polygonHull()
,
update_function_params()
,
update_list_elements()
x <- data.frame(
`Gene`=paste0("gene", LETTERS[1:25]),
`log2fold Group-Control`=rnorm(25)*2,
`P.Value Group-Control`=10^-rnorm(25)^2,
check.names=FALSE);
x[["fold Group-Control"]] <- log2fold_to_fold(x[["log2fold Group-Control"]]);
x[["adj.P.Val Group-Control"]] <- x[["P.Value Group-Control"]];
print(head(x));
#> Gene log2fold Group-Control P.Value Group-Control fold Group-Control
#> 1 geneA 1.5163421 8.252273e-01 2.860648
#> 2 geneB 0.9225730 6.387370e-01 1.895493
#> 3 geneC 0.9680046 2.502431e-01 1.956133
#> 4 geneD -1.6330128 1.375335e-02 -3.101600
#> 5 geneE -1.7015228 3.603060e-06 -3.252441
#> 6 geneF 1.2107613 6.554372e-01 2.314597
#> adj.P.Val Group-Control
#> 1 8.252273e-01
#> 2 6.387370e-01
#> 3 2.502431e-01
#> 4 1.375335e-02
#> 5 3.603060e-06
#> 6 6.554372e-01
find_colname(c("p.val", "pval"), x);
#> [1] "P.Value Group-Control"
find_colname(c("fold", "fc", "ratio"), x);
#> [1] "fold Group-Control"
find_colname(c("logfold", "log2fold", "lfc", "log2ratio", "logratio"), x);
#> [1] "log2fold Group-Control"
## use exclude_pattern
## if the input data has no "P.Value" but has "adj.P.Val"
y <- x[,c(1,2,4,5)];
print(head(y));
#> Gene log2fold Group-Control fold Group-Control adj.P.Val Group-Control
#> 1 geneA 1.5163421 2.860648 8.252273e-01
#> 2 geneB 0.9225730 1.895493 6.387370e-01
#> 3 geneC 0.9680046 1.956133 2.502431e-01
#> 4 geneD -1.6330128 -3.101600 1.375335e-02
#> 5 geneE -1.7015228 -3.252441 3.603060e-06
#> 6 geneF 1.2107613 2.314597 6.554372e-01
find_colname(c("p.val"), y, exclude_pattern=c("adj"))
#> NULL