Skip to contents

Shrink data.frame by row groups

Usage

shrink_df(
  x,
  by,
  string_func = function(x) jamba::cPasteSU(x, na.rm = TRUE),
  num_func = function(x) mean(x, na.rm = TRUE),
  add_string_cols = NULL,
  num_to_string_func = as.character,
  keep_na_groups = TRUE,
  include_num_reps = FALSE,
  extra_funcs = NULL,
  do_test = FALSE,
  use_new_method = FALSE,
  verbose = FALSE,
  ...
)

Arguments

by

character vector of one or more colnames(df), used to define the row grouping.

string_func

function used for character and other non-numeric column types. For efficiency, string_func by default is applied to the entire column, with list input, expecting vector output. It is not applied using data.table.

num_func

function used for numeric column types. This function is applied using data.table and should expect a vector input, and provide a single atomic value output.

extra_funcs

list, default NULL, containing function objects. The list names should match colnames(x), in order to apply a function to a specific column in x. These functions will therefore override the default functions defined by string_func and num_func. Only one function is applied per column.

do_test

logical, default FALSE, indicating whether to perform an internal test with internally-generated argument values.

use_new_method

logical default FALSE, whether to call newer tidy/data.table methods (TRUE), or call shrinkDataFrame() (FALSE). Currently shrinkDataFrame() is remarkably faster. More research necessary.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

df

data.frame or compatible input class.

Details

This function is currently a wrapper for shrinkDataFrame(), it was formerly a simplified version of shrinkDataFrame() which is intended to use more modern methods from the R package data.table.

The general idea is to collapse numeric columns using num_func, and collapse character and all other columns using string_func.

Any exceptions, where a different function should be applied, are passed via argument extra_funcs which is a list of functions named by values in colnames(df).