Paste a list of vectors into a character vector, with values delimited by default with a comma.

cPaste(
  x,
  sep = ",",
  doSort = FALSE,
  makeUnique = FALSE,
  na.rm = FALSE,
  keepFactors = FALSE,
  checkClass = TRUE,
  useBioc = TRUE,
  useLegacy = FALSE,
  honorFactor = TRUE,
  verbose = FALSE,
  ...
)

Arguments

x

input list of vectors

sep

character delimiter used to paste multiple values together

doSort

logical indicating whether to sort each vector using mixedOrder().

makeUnique

logical indicating whether to make each vector in the input list unique before pasting its values together.

na.rm

boolean indicating whether to remove NA values from each vector in the input list. When na.rm is TRUE and a list element contains only NA values, the resulting string will be "".

keepFactors

logical only used when useLegacy=TRUE and doSort=TRUE; indicating whether to preserve factors, keeping factor level order. When keepFactors=TRUE, if any list element is a factor, all elements are converted to factors. Note that this step combines overall factor levels, and non-factors will be ordered using base::order() instead of jamba::mixedOrder() (for now.)

useBioc

logical indicating whether this function should try to use S4Vectors::unstrsplit() when the Bioconductor package S4Vectors is installed, otherwise it will use a less efficient mapply() operation.

useLegacy

logical indicating whether to enable to previous legacy process used by cPaste().

honorFactor

logical passed to mixedSorts(), whether any factor vector should be sorted in factor level order. When honorFactor=FALSE then even factor vectors are sorted as if they were character vectors, ignoring the factor levels.

...

additional arguments are passed to mixedOrder() when doSort=TRUE.

Value

character vector with the same names and in the same order as the input list x.

Details

This function is essentially a wrapper for S4Vectors::unstrsplit() except that it also optionally applies uniqueness to each vector in the list, and sorts values in each vector using mixedOrder().

The sorting and uniqueness is applied to the unlisted vector of values, which is substantially faster than any apply family function equivalent. The uniqueness is performed by uniques(), which itself will use S4Vectors::unique() if available.

Examples

L1 <- list(CA=LETTERS[c(1:4,2,7,4,6)], B=letters[c(7:11,9,3)]);

cPaste(L1);
#>                CA                 B 
#> "A,B,C,D,B,G,D,F"   "g,h,i,j,k,i,c" 
#               CA                 B
# "A,B,C,D,B,G,D,F"   "g,h,i,j,k,i,c"

cPaste(L1, doSort=TRUE);
#>                CA                 B 
#> "A,B,B,C,D,D,F,G"   "c,g,h,i,i,j,k" 
#               CA                 B
# "A,B,B,C,D,D,F,G"   "c,g,h,i,i,j,k"

## The sort can be done with convenience function cPasteS()
cPasteS(L1);
#>                CA                 B 
#> "A,B,B,C,D,D,F,G"   "c,g,h,i,i,j,k" 
#               CA                 B
# "A,B,B,C,D,D,F,G"   "c,g,h,i,i,j,k"

## Similarly, makeUnique=TRUE and cPasteU() are the same
cPaste(L1, makeUnique=TRUE);
#>            CA             B 
#> "A,B,C,D,G,F" "g,h,i,j,k,c" 
cPasteU(L1);
#>            CA             B 
#> "A,B,C,D,G,F" "g,h,i,j,k,c" 
#           CA             B
# "A,B,C,D,G,F" "g,h,i,j,k,c"

## Change the delimiter
cPasteSU(L1, sep="; ")
#>                 CA                  B 
#> "A; B; C; D; F; G" "c; g; h; i; j; k" 
#                CA                  B
# "A; B; C; D; F; G" "c; g; h; i; j; k"

# test mix of factor and non-factor
L2 <- c(
   list(D=factor(letters[1:12],
      levels=letters[12:1])),
   L1);
L2;
#> $D
#>  [1] a b c d e f g h i j k l
#> Levels: l k j i h g f e d c b a
#> 
#> $CA
#> [1] "A" "B" "C" "D" "B" "G" "D" "F"
#> 
#> $B
#> [1] "g" "h" "i" "j" "k" "i" "c"
#> 
cPasteSU(L2, keepFactors=TRUE);
#>                         D                        CA                         B 
#> "l,k,j,i,h,g,f,e,d,c,b,a"             "A,B,C,D,F,G"             "c,g,h,i,j,k" 

# tricky example with mix of character and factor
# and factor levels are inconsistent
# end result: factor levels are defined in order they appear
L <- list(entryA=c("miR-112", "miR-12", "miR-112"),
   entryB=factor(c("A","B","A","B"),
      levels=c("B","A")),
   entryC=factor(c("C","A","B","B","C"),
      levels=c("A","B","C")),
   entryNULL=NULL)
L;
#> $entryA
#> [1] "miR-112" "miR-12"  "miR-112"
#> 
#> $entryB
#> [1] A B A B
#> Levels: B A
#> 
#> $entryC
#> [1] C A B B C
#> Levels: A B C
#> 
#> $entryNULL
#> NULL
#> 
cPaste(L);
#>                   entryA                   entryB                   entryC 
#> "miR-112,miR-12,miR-112"                "A,B,A,B"              "C,A,B,B,C" 
#>                entryNULL 
#>                       "" 
cPasteU(L);
#>           entryA           entryB           entryC        entryNULL 
#> "miR-112,miR-12"            "A,B"          "C,A,B"               "" 

# by default keepFactors=FALSE, which means factors are sorted as characters
cPasteS(L);
#>                   entryA                   entryB                   entryC 
#> "miR-12,miR-112,miR-112"                "B,B,A,A"              "B,B,A,C,C" 
#>                entryNULL 
#>                       "" 
cPasteSU(L);
#>           entryA           entryB           entryC        entryNULL 
#> "miR-12,miR-112"            "B,A"          "B,A,C"               "" 
# keepFactors=TRUE will keep unique factor levels in the order they appear
# this is the same behavior as unlist(L[c(2,3)]) on a list of factors
cPasteSU(L, keepFactors=TRUE);
#>           entryA           entryB           entryC        entryNULL 
#> "miR-12,miR-112"            "B,A"          "B,A,C"               "" 
levels(unlist(L[c(2,3)]))
#> [1] "B" "A" "C"