Optimized conversion of list to incidence matrix

list2im_opt(setlist, empty = 0, do_sparse = FALSE, ...)

Arguments

setlist

list of vectors

empty

default single value used for empty/missing entries, the default empty=0 uses zero for entries not present. Another alternative is NA. Providing a character value will convert the output to a character matrix, be warned.

do_sparse

logical indicating whether to coerce the output to sparse matrix class "CsparseMatrix" from the Matrix package. The default is FALSE as of version 0.0.33.900, since the most common use case requires a regular matrix. For extremely large data, consider using a sparse matrix.

...

additional arguments are ignored.

Value

matrix object with value c(0, 1) when do_sparse=FALSE

(default), or when do_sparse=TRUE, it returns a Matrix object class "CsparseMatrix" with logical values, only when Matrix is available.

Details

This function rapidly converts a list of vectors into an incidence matrix whose rownames are items, and colnames are the names of the input list. The default output do_sparse=TRUE returns a logical matrix class ngCMatrix from the Matrix package. When do_sparse=FALSE the output is a matrix class with numeric values 0 and 1.

Note that the rows in the output matrix are not sorted, since the step of sorting item names may take several seconds when working with a list whose vectors contain millions of items. For sorted rows, the best remedy is to run this function, the re-order rownames afterward.

See also

Examples

setlist <- list(A=c("one", "two", "three"),
   b=c("two", "one", "four", "five"));
list2im_opt(setlist);
#>       A b
#> one   1 1
#> two   1 1
#> three 1 0
#> four  0 1
#> five  0 1

list2im_opt(setlist, do_sparse=TRUE);
#> 5 x 2 sparse Matrix of class "ngCMatrix"
#>       A b
#> one   | |
#> two   | |
#> three | .
#> four  . |
#> five  . |