Skip to contents

Collapse incidence matrix using row groups, for example when converting probe-level, transcript-level, peptide-level data to gene-level data.

Usage

collapse_im(
  im,
  row_groups = NULL,
  logic = c("majority-hit", "majority"),
  verbose = FALSE,
  ...
)

Arguments

im

numeric matrix with columns for each set

row_groups

character or factor with row groupings.

logic

character logic to use, default 'majority-hit'.

  • "majority-hit": uses the majority winner among non-zero values.

  • "majority": uses the majority winner including non-zero and zero.

...

additional arguments are ignored.

Value

numeric matrix

Details

This function is a simple converted for incidence matrix data, taking the "majority-hit" for each row grouping. The most common scenario is to group rows by gene, in order to summarize the observed changes at gene level, when the original data may contain multiple possible measurements for each gene.

The default logic assumes that any observed statistical hit for a gene is sufficient evidence to implicate that gene as a "hit", even if other potential measurements for the same gene did not meet the statistical criteria used, as relevant to the platform technology.

Examples

im <- cbind(A=c(-1, -1, 0, 1, 1, 1, -1, 0, 0, 1, 1, 0),
   B=c(-1, -1, -1, 1, 1, 0, -1, 0, 0, 1, 1, 1),
   C=c(-1, -1, -1, 1, 1, 0, -1, 0, 0, 0, 0, 0));
row_groups <- rep(c("a", "b", "c"), c(6, 3, 3))

# default logic returns the majority non-zero value when present
new_im <- collapse_im(im, row_groups)
new_im
#>    A  B  C
#> a  1 -1 -1
#> b -1 -1 -1
#> c  1  1  0

# majority logic will prioritize "0" when it is the majority
# (not recommended for most gene-based data)
new_im2 <- collapse_im(im, row_groups, logic="majority")
new_im2
#>   A  B  C
#> a 1 -1 -1
#> b 0  0  0
#> c 1  1  0

# more detail
imdf <- data.frame(im, row_groups,
   new_im[match(row_groups, rownames(new_im)), ])
split(imdf, imdf$row_groups)
#> $a
#>      A  B  C row_groups A.1 B.1 C.1
#> a   -1 -1 -1          a   1  -1  -1
#> a.1 -1 -1 -1          a   1  -1  -1
#> a.2  0 -1 -1          a   1  -1  -1
#> a.3  1  1  1          a   1  -1  -1
#> a.4  1  1  1          a   1  -1  -1
#> a.5  1  0  0          a   1  -1  -1
#> 
#> $b
#>      A  B  C row_groups A.1 B.1 C.1
#> b   -1 -1 -1          b  -1  -1  -1
#> b.1  0  0  0          b  -1  -1  -1
#> b.2  0  0  0          b  -1  -1  -1
#> 
#> $c
#>     A B C row_groups A.1 B.1 C.1
#> c   1 1 0          c   1   1   0
#> c.1 1 1 0          c   1   1   0
#> c.2 0 1 0          c   1   1   0
#>