Perform SALSA steps for threshold detection

do_salsa_steps(x, n_vector = NULL, n_start = NULL, max_step = NULL,
  step_size = NULL, count_vector = NULL, dists = c("frechet",
  "frechet-weibull"), cache_fr = cache_filesystem("./cache_fr"),
  cache_fr_wei = cache_filesystem("./cache_fr_wei"),
  param_fr_wei = NULL, verbose = FALSE, ...)

Arguments

x

numeric vector of counts, either the number of UMI per cell, or the number of UMI per gene.

n_vector, n_start, max_step, step_size, count_vector

arguments passed to get_salsa_steps() which returns count_vector. If count_vector is supplied, it is used without modification.

dists

character vector determining which distribution fit functions to calculate, "frechet" fits the Frechet distribution, "frechet-weibull" fits the five-parameter combined Frechet and Weibull distributions. Typically the Frechet parameters are used to define the upper bound, and the Frechet-Weibull parameters are used to define the lower bound.

cache_fr, cache_fr_wei

list objects output from memoise::cache_filesystem() used to store cached distribution fit results. When either are NULL, the memoise cache steps are disabled, and the functions are called directly.

param_fr_wei

data.frame output from params_fr_wei() which defines the parameter upper and lower limits, and start values for each parameter. If NULL then the default values from params_fr_wei() are used.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Value

list with one element for each value in count_vector, where each list element contains a list with one entry for each value in argument dists containing the fit parameters for each selected distribution, as well as an entry "min_count" which contains the minimum counts to use in each fit. When dists contains "frechet-weibull" each list includes "lower_bound". When dists contains "frechet" each list includes "upper_bound". The output is intended to be passed to get_salsa_table().

Details

This function is a wrapper around fitdist_fr() and fitdist_fr_wei(), which iterates through a wide range of possible thresholds to determine the fit parameters, and associated lower and upper bounds. The results are intended to be plotted to determine appropriate thresholds to use when calculating the lower and upper bounds for barcodes and genes in a single cell RNA-seq dataset.

See also

Other SALSA core functions: get_salsa_table

Examples

library(salsa); data(oz2_numi_per_cell); x <- oz2_numi_per_cell$count[oz2_numi_per_cell$count >= 16]; x_salsa <- do_salsa_steps(x, count_vector=c(16,32,128), cache_fr=NULL, cache_fr_wei=NULL); x_df <- get_salsa_table(x_salsa); x_df;
#> count shape scale fr_weight fr_shape fr_scale wei_shape wei_scale #> 16 16 NA NA 0.99 1.5 25.24369 1.514120 1775.742 #> 32 32 2.038619 138.2035 0.99 1.5 178.24735 1.500003 1700.841 #> 128 128 2.067224 265.6946 0.99 1.8 180.50922 1.500004 1699.818 #> lower_bound upper_bound #> 16 161.1956 NA #> 32 480.9103 555.284 #> 128 410.7675 1054.265