Choose interesting annotation colnames from a data.frame
Source:R/jam_choose_annotation_colnames.R
choose_annotation_colnames.Rd
Choose interesting annotation colnames from a data.frame
Usage
choose_annotation_colnames(
df,
min_reps = 2,
min_values = 2,
max_values = Inf,
keep_numeric = FALSE,
simplify = TRUE,
max_colnames = 20,
...
)
Arguments
- df
data.frame
with annotations that could be interesting to display at the top or side of a heatmap.- min_reps
numeric
minimum number of replicates required for a column to be considered interesting. For example,min_reps=3
would require any value in a column to be repeated at least3
times for that column to be interesting. This filter is intended to remove columns whose values are all unique, such as row identifiers.- min_values
numeric
minimum number of unique values required for a column to be considered interesting.- max_values
numeric
maximum number of unique values required for a column to be considered interesting. Too many values and the interest is lost. Also, too many values, and the color key becomes unbearable with too many labels.- keep_numeric
logical
indicating whether to keep columns withnumeric
values. Whenkeep_numeric == TRUE
it will override the rules above.- simplify
logical
indicating whether to filter out columns whose data already matches another column with 1:1 cardinality. This step requiresplatjam::cardinality()
until that function is moved into thejamba
package.- max_colnames
numeric
maximum number of colnames to return. Note that columns are not sorted for priority, so they will be returned in the order they appear indf
after applying the relevant criteria.- ...
additional arguments are ignored.
Value
character
vector of colnames in df
that meet the criteria.
If no colnames meet the criteria, this function returns NULL
.
See also
Other jamses utilities:
contrast2comp_dev()
,
fold_to_log2fold()
,
intercalate()
,
list2im_opt()
,
log2fold_to_fold()
,
make_block_arrow_polygon()
,
mark_stat_hits()
,
matrix_normalize()
,
point_handedness()
,
point_slope_intercept()
,
shortest_unique_abbreviation()
,
shrinkDataFrame()
,
shrink_df()
,
shrink_matrix()
,
sort_samples()
,
strsplitOrdered()
,
sub_split_vector()
,
update_function_params()
,
update_list_elements()
Examples
df <- data.frame(
threereps=paste0("threereps_", letters[c(1,1,1,3,5,7,7)]),
time=paste0("time_", letters[c(1:7)]),
tworeps=paste0("tworeps_", letters[c(12,12,14,14,15,15,16)]),
num=sample(1:7),
class=paste0("class_", LETTERS[c(1,1,1,3,5,7,7)]),
blah=rep("blah", 7),
maxvalues=c("one", "two", "three", "four", "five", "six", "six"))
df
#> threereps time tworeps num class blah maxvalues
#> 1 threereps_a time_a tworeps_l 5 class_A blah one
#> 2 threereps_a time_b tworeps_l 4 class_A blah two
#> 3 threereps_a time_c tworeps_n 7 class_A blah three
#> 4 threereps_c time_d tworeps_n 2 class_C blah four
#> 5 threereps_e time_e tworeps_o 3 class_E blah five
#> 6 threereps_g time_f tworeps_o 1 class_G blah six
#> 7 threereps_g time_g tworeps_p 6 class_G blah six
choose_annotation_colnames(df)
#> [1] "threereps" "tworeps" "maxvalues"
df[,choose_annotation_colnames(df)]
#> threereps tworeps maxvalues
#> 1 threereps_a tworeps_l one
#> 2 threereps_a tworeps_l two
#> 3 threereps_a tworeps_n three
#> 4 threereps_c tworeps_n four
#> 5 threereps_e tworeps_o five
#> 6 threereps_g tworeps_o six
#> 7 threereps_g tworeps_p six
choose_annotation_colnames(df, max_values=5)
#> [1] "threereps" "tworeps"
df[,choose_annotation_colnames(df, max_values=5)]
#> threereps tworeps
#> 1 threereps_a tworeps_l
#> 2 threereps_a tworeps_l
#> 3 threereps_a tworeps_n
#> 4 threereps_c tworeps_n
#> 5 threereps_e tworeps_o
#> 6 threereps_g tworeps_o
#> 7 threereps_g tworeps_p
choose_annotation_colnames(df, simplify=FALSE)
#> threereps tworeps class maxvalues
#> "threereps" "tworeps" "class" "maxvalues"
df[,choose_annotation_colnames(df, simplify=FALSE)]
#> threereps tworeps class maxvalues
#> 1 threereps_a tworeps_l class_A one
#> 2 threereps_a tworeps_l class_A two
#> 3 threereps_a tworeps_n class_A three
#> 4 threereps_c tworeps_n class_C four
#> 5 threereps_e tworeps_o class_E five
#> 6 threereps_g tworeps_o class_G six
#> 7 threereps_g tworeps_p class_G six
choose_annotation_colnames(df, min_reps=3)
#> [1] "threereps"
choose_annotation_colnames(df, min_reps=1)
#> [1] "threereps" "time" "tworeps" "maxvalues"
choose_annotation_colnames(df, keep_numeric=TRUE)
#> [1] "threereps" "tworeps" "num" "maxvalues"
choose_annotation_colnames(df, min_reps=1)
#> [1] "threereps" "time" "tworeps" "maxvalues"
choose_annotation_colnames(df, min_reps=1, keep_numeric=TRUE)
#> [1] "threereps" "time" "tworeps" "num" "maxvalues"