Shrink data.frame by row groups

Usage

shrink_df(
  x,
  by,
  string_func = function(x) jamba::cPasteSU(x, na.rm = TRUE),
  num_func = function(x) mean(x, na.rm = TRUE),
  add_string_cols = NULL,
  num_to_string_func = as.character,
  keep_na_groups = TRUE,
  include_num_reps = FALSE,
  extra_funcs = NULL,
  do_test = FALSE,
  use_new_method = FALSE,
  verbose = FALSE,
  ...
)

Arguments

by: character vector of one or more colnames(df), used to define the row grouping.
string_func: function used for character and other non-numeric column types. For efficiency, string_func by default is applied to the entire column, with list input, expecting vector output. It is not applied using data.table.
num_func: function used for numeric column types. This function is applied using data.table and should expect a vector input, and provide a single atomic value output.
extra_funcs: list, default NULL, containing function objects. The list names should match colnames(x), in order to apply a function to a specific column in x. These functions will therefore override the default functions defined by string_func and num_func. Only one function is applied per column.
do_test: logical, default FALSE, indicating whether to perform an internal test with internally-generated argument values.
use_new_method: logical default FALSE, whether to call newer tidy/data.table methods (TRUE), or call shrinkDataFrame() (FALSE). Currently shrinkDataFrame() is remarkably faster. More research necessary.
verbose: logical indicating whether to print verbose output.
...: additional arguments are ignored.
df: data.frame or compatible input class.

Details

This function is currently a wrapper for shrinkDataFrame(), it was formerly a simplified version of shrinkDataFrame() which is intended to use more modern methods from the R package data.table.

The general idea is to collapse numeric columns using num_func, and collapse character and all other columns using string_func.

Any exceptions, where a different function should be applied, are passed via argument extra_funcs which is a list of functions named by values in colnames(df).

Shrink data.frame by row groups

Usage

Arguments

Details

See also