Shrink data.frame by row groups
Usage
shrink_df(
x,
by,
string_func = function(x) jamba::cPasteSU(x, na.rm = TRUE),
num_func = function(x) mean(x, na.rm = TRUE),
add_string_cols = NULL,
num_to_string_func = as.character,
keep_na_groups = TRUE,
include_num_reps = FALSE,
extra_funcs = NULL,
do_test = FALSE,
use_new_method = FALSE,
verbose = FALSE,
...
)
Arguments
- by
character
vector of one or morecolnames(df)
, used to define the row grouping.- string_func
function
used forcharacter
and other non-numeric column types. For efficiency,string_func
by default is applied to the entire column, withlist
input, expectingvector
output. It is not applied usingdata.table
.- num_func
function
used fornumeric
column types. This function is applied usingdata.table
and should expect avector
input, and provide a single atomic value output.- extra_funcs
list
, defaultNULL
, containingfunction
objects. The list names should matchcolnames(x)
, in order to apply a function to a specific column inx
. These functions will therefore override the default functions defined bystring_func
andnum_func
. Only one function is applied per column.- do_test
logical
, default FALSE, indicating whether to perform an internal test with internally-generated argument values.- use_new_method
logical
default FALSE, whether to call newer tidy/data.table methods (TRUE), or callshrinkDataFrame()
(FALSE). CurrentlyshrinkDataFrame()
is remarkably faster. More research necessary.- verbose
logical
indicating whether to print verbose output.- ...
additional arguments are ignored.
- df
data.frame
or compatible input class.
Details
This function is currently a wrapper for shrinkDataFrame()
,
it was formerly a simplified version of shrinkDataFrame()
which is intended to use more modern methods from the R package
data.table
.
The general idea is to collapse numeric
columns using num_func
,
and collapse character
and all other columns using string_func
.
Any exceptions, where a different function should be applied, are
passed via argument extra_funcs
which is a list
of functions
named by values in colnames(df)
.
See also
Other jamses utilities:
choose_annotation_colnames()
,
contrast2comp_dev()
,
fold_to_log2fold()
,
intercalate()
,
list2im_opt()
,
log2fold_to_fold()
,
make_block_arrow_polygon()
,
mark_stat_hits()
,
matrix_normalize()
,
point_handedness()
,
point_slope_intercept()
,
shortest_unique_abbreviation()
,
shrinkDataFrame()
,
shrink_matrix()
,
sort_samples()
,
strsplitOrdered()
,
sub_split_vector()
,
update_function_params()
,
update_list_elements()