2.1 Common Data Input Types

2.1.1 List of sets

The simplest input is a List of sets, where each set is represented as an R vector of items.

setlist <- make_venn_test()
str(setlist)
## List of 3
##  $ set_A: chr [1:35] "item_014" "item_195" "item_170" "item_050" ...
##  $ set_B: chr [1:18] "item_155" "item_188" "item_053" "item_135" ...
##  $ set_C: chr [1:79] "item_041" "item_175" "item_090" "item_060" ...

Each vector is not considered 'signed', because each vector has no names().

This setlist can be used directly with venndir(). In Figure 2.1 there are no signed counts displayed.

venndir(setlist)
Default Venn diagram with three sets.

Figure 2.1: Default Venn diagram with three sets.

2.1.2 List of signed sets

The next common input is a list of signed sets, represented as numeric vectors with items stored as names().

An example is shown below, showing only the first six elements of each vector. Notice the vector names are "item_014", "item_195", while the values are -1, -1.

setlist <- make_venn_test(do_signed=TRUE)
lapply(setlist, head, 6)
## $set_A
## item_014 item_195 item_170 item_050 item_118 item_043 
##       -1       -1        1        1       -1       -1 
## 
## $set_B
## item_155 item_188 item_053 item_135 item_198 item_200 
##       -1        1        1       -1        1        1 
## 
## $set_C
## item_041 item_175 item_090 item_060 item_016 item_116 
##        1        1       -1       -1       -1       -1

Figure 2.2 illustrates the output from venndir(), where signed counts are displayed using overlap_type='concordance' by default for a signed setlist.

venndir(setlist)
Default Venn diagram with three signed sets. The overlap counts are each tabulated by directional sign.

Figure 2.2: Default Venn diagram with three signed sets. The overlap counts are each tabulated by directional sign.

The output is described below:

  • The region "set_A" contains 19 items that are not present in any other set for this Venn diagram.
    • 10 of these 19 items are "up", indicated by the red label which includes the up arrow \(↑ 10\).
    • 9 of these 19 items are "down", indicated by the blue label which includes the down arrow \(↓ 9\).
  • The region of overlap between 'set_A' and 'set_B' contains 5 items which are only present in these two sets, and not present in 'set_C'.
    • 3 of these 5 items are "up" in both 'set_A' and 'set_B', indicated by the red label with two up arrows: \(↑↑ 3\).
    • 2 of these 5 items are discordant in direction, which means the sign in 'set_A' disagrees with 'set_B'. The label is indicated in grey color and large 'X' \(X 2\). The purpose of using one 'X' is to avoid describing all possible combinations of "up" and "down".

The default summary for a signed setlist is overlap_type="concordance", which summarizes directional discordance in one category 'X', and tabulates counts in each subset that involves only one direction.

For other approaches to summarize directional counts, see the summary in Table 3.3, in the section Overlap Type.

2.1.3 Incidence matrix

An incidence matrix (im) is another common input format, a matrix whose rownames are items, and colnames are sets. Any non-zero, non-empty value in the matrix indicates the item (row) exists in the set (column).

setlist <- make_venn_test()
im <- list2im_opt(setlist)
head(im, 10)
##          set_A set_B set_C
## item_014     1     0     0
## item_195     1     0     1
## item_170     1     0     0
## item_050     1     0     1
## item_118     1     0     1
## item_043     1     0     0
## item_200     1     1     0
## item_196     1     0     1
## item_153     1     1     1
## item_090     1     0     1

Figure 2.3 shows the Venn diagram created by venndir() accepts an incidence matrix as input data. The input data is converted to a setlist within the function.

venndir(im)
Default Venn diagram with three sets, using an incidence matrix as input data.

Figure 2.3: Default Venn diagram with three sets, using an incidence matrix as input data.

When the incidence matrix only contains positive values, it is assumed to be non-directional. This assumption can be changed by using the argument overlap_type.

2.1.4 Signed incidence matrix

A signed incidence matrix is similar to an incidence matrix, except that it may include positive and negative values.

setlist <- make_venn_test(do_signed=TRUE)
im <- list2im_value(setlist)
head(im, 10)
##          set_A set_B set_C
## item_014    -1     0     0
## item_195    -1     0     1
## item_170     1     0     0
## item_050     1     0     1
## item_118    -1     0    -1
## item_043    -1     0     0
## item_200     1     1     0
## item_196    -1     0    -1
## item_153    -1    -1    -1
## item_090    -1     0    -1

Figure 2.4 shows the signed Venn diagram created by venndir() also accepts a signed incidence matrix as input data, and subsequently displays signed counts.

venndir(im)
Default Venn diagram with three sets, using a signed incidence matrix as input data.

Figure 2.4: Default Venn diagram with three sets, using a signed incidence matrix as input data.

2.1.5 Overlap counts

If the Venn overlap counts are already known, they can also be used to re-create the corresponding Venn diagram.

The function counts2setlist() accepts Venn counts, and returns the corresponding setlist. The counts should be named by the Venn overlap, by using the name of each set involved, separated by ampersand '&'. The example below uses sets "A" and "B", and the corresponding overlap between 'A' and 'B' is named 'A&B'.

Note that the overlap name should be defined in quotes in R code, so the ampersand '&' is stored properly.

counts <- c(
   "A"=12,
   "B"=9,
   "B&A"=15)
setlist <- counts2setlist(counts)
str(setlist)
## List of 2
##  $ A: chr [1:27] "A_1" "A_2" "A_3" "A_4" ...
##  $ B: chr [1:24] "B_1" "B_2" "B_3" "B_4" ...

Figure 2.5 shows the output of venndir() using this setlist as input.

venndir(setlist)
Venn diagram created by using overlap counts as input data.

Figure 2.5: Venn diagram created by using overlap counts as input data.

Tip:

When starting with overlap counts, it is recommended to define set names with single characters, such as 'A', 'B', and 'C'.

  • The set names can be adjusted afterwards by editing names(setlist).
  • However, the preferred approach is to use venndir() arguments setlist_labels and legend_labels, also described in Custom legend labels.
  • Figure 2.6 illustrates this process.

The complete combination of sets and overlaps can be defined by calling make_venn_combn_df() using a vector of set names.
For example rownames(make_venn_combn_df(LETTERS[1:3])) produces the following:
'A', 'B', 'C', 'A&B', 'A&C', 'B&C', 'A&B&C'

venndir(setlist,
   setlist_labels=c("Set A:\ncontrol state",
      "Set B:\ntest state"),
   legend_labels=c("A: control state",
      "B: test state"))
Venn diagram using custom set names using setlist_labels and legend_labels.

Figure 2.6: Venn diagram using custom set names using setlist_labels and legend_labels.

2.1.6 Signed overlap counts

Similar to providing overlap counts as above, this approach defines counts for each directional overlap, using signed_counts2setlist(). This input is quite complex, also the least common.

The input is a list of integer count vectors. Each vector is named by the overlap, for example "A" or "A&B". Each vector is named by the direction, delimited by underscore "_", for example "1" for 'up', or "1_-1" for 'up_down'.

Notice the format of the input data:

signed_counts <- list(
   "A"=c(
      "1"=80,
      "-1"=95),
   "B"=c(
      "1"=15,
      "-1"=30),
   "A&B"=c(
      "1_1"=100,
      "1_-1"=3,
      "-1_1"=4,
      "-1_-1"=125))
signed_counts
## $A
##  1 -1 
## 80 95 
## 
## $B
##  1 -1 
## 15 30 
## 
## $`A&B`
##   1_1  1_-1  -1_1 -1_-1 
##   100     3     4   125

This input format is complicated for 2-way data, and certainly even more complicated for 3-way data. However, sometimes it is the most practical way to produce a given figure.

Counts are converted to a signed setlist with signed_counts2setlist(). Item names are generated only to create the setlist and are not otherwise useful.

setlist <- signed_counts2setlist(signed_counts)
lapply(setlist, jamba::middle, 5)
## $A
##         A_1_1       A_-1_22    A&B_1_1_29  A&B_-1_-1_24 A&B_-1_-1_125 
##           "1"          "-1"           "1"          "-1"          "-1" 
## 
## $B
##         B_1_1    A&B_1_1_25    A&B_1_1_94  A&B_-1_-1_56 A&B_-1_-1_125 
##           "1"           "1"           "1"          "-1"          "-1"

Figure 2.7 shows the setlist visualized with venndir().

venndir(setlist, overlap_type="each")
Venndir showing signed counts summarized using overlap_type=

Figure 2.7: Venndir showing signed counts summarized using overlap_type="each", which includes counts for each combination of signs.

2.1.7 Overlap list

This approach is conceptually similar to Overlap counts which starts with the Venn counts as input, except in this case instead of supplying the integer count in each Venn overlap, the data contains the actual items.

Consider a simple example:

  • two sets: 'A' and 'B'
  • three total items, one only present in 'A', one unique to 'B', and one shared by both groups which is assigned to 'A&B'.
overlaplist <- list(
   A=c("Christina"),
   B=c("James"),
   "A&B"=c("Jillian", "Zander", "Java Pup")
)
str(overlaplist)
## List of 3
##  $ A  : chr "Christina"
##  $ B  : chr "James"
##  $ A&B: chr [1:3] "Jillian" "Zander" "Java Pup"

This overlaplist is converted into setlist using overlaplist2setlist(), which can be input to venndir().

Figure 2.8 shows the default Venn diagram (left), and a variation which displays the item labels (right) using argument show_labels="Ni". The show_labels argument is described in Label Content.

setlist <- overlaplist2setlist(overlaplist)
venndir(setlist)

venndir(setlist,keep_item_order=TRUE,
   item_cex_factor=c(0.5, 0.5, 0.8),
   show_labels="Ni")
Venn diagram after converting an overlap list to setlist, showing overlap counts (left), and overlap item labels (right).Venn diagram after converting an overlap list to setlist, showing overlap counts (left), and overlap item labels (right).

Figure 2.8: Venn diagram after converting an overlap list to setlist, showing overlap counts (left), and overlap item labels (right).

The argument item_cex_factor is used to adjust the item label font size, and is described in Item Labels.

2.1.8 Signed overlap list

A powerful but complex import option is a signed overlap list, similar to the previous section Overlap list with the addition of directional sign.

The format is similar to overlap list:

  • list named by the Venn overlap: 'A', 'B', 'A&B'

  • Each list element is also a list, named by the directional sign.

    • Each sign is defined using '1' or '-1' separated by a space.
    • Signs involving one set: "1", "-1"
    • Signs involving two sets: "1 1", "1 -1", etc.
    • Signs involving three sets: "1 1 1", "1 -1 1", etc.
signed_overlaplist <- list(
   A=list(
      "-1"=c("Item_A1"),
      "1"=c("Item_A2")),
   B=list(
      "1"=c("Item_B1", "ItemB2"),
      "-1"=c("ItemB3")),
   "A&B"=list(
      "1 1"=c("Item_AB2"),
      "1 -1"=c("Item_AB4"),
      "-1 1"=c("Item_AB1"),
      "-1 -1"=c("Item_AB3"))
)
str(signed_overlaplist)
## List of 3
##  $ A  :List of 2
##   ..$ -1: chr "Item_A1"
##   ..$ 1 : chr "Item_A2"
##  $ B  :List of 2
##   ..$ 1 : chr [1:2] "Item_B1" "ItemB2"
##   ..$ -1: chr "ItemB3"
##  $ A&B:List of 4
##   ..$ 1 1  : chr "Item_AB2"
##   ..$ 1 -1 : chr "Item_AB4"
##   ..$ -1 1 : chr "Item_AB1"
##   ..$ -1 -1: chr "Item_AB3"

The signed_overlaplist is converted to setlist using overlaplist2setlist().

Figure 2.9 shows the resulting Venn diagram with item labels enabled. Notice the item labels include a directional arrow, and are colored by the sign.

setlist <- overlaplist2setlist(signed_overlaplist)
v <- venndir(setlist,
   show_labels="Ni",
   item_cex_factor=0.6,
   xyratio=1.5)
Venn diagram derived from a signed overlap list. The figure shows item labels which are colored by sign, and placed beside a directional arrow.

Figure 2.9: Venn diagram derived from a signed overlap list. The figure shows item labels which are colored by sign, and placed beside a directional arrow.

This input format can also be generated from the Venndir object itself using overlaplist(v). However, it requires that venndir() was called with argument overlap_type="each" to preserve each sign. The default overlap_type="concordance" does not preserve signs for discordant overlaps.

2.1.8.1 Alternative signed overlap list

An alternative input format is shown below, which may be more convenient to produce in some circumstances.

The alternate format is shown below:

alt_signed_overlaplist <- list(
   A=c("Item_A1"="-1",
      "item_A2"="1"),
   B=c("Item_B1"="1",
      "ItemB2"="1",
      "ItemB3"="-1"),
   "A&B"=c("Item_AB2"="1 1",
      "Item_AB4"="1 -1",
      "Item_AB1"="-1 1",
      "Item_AB3"="-1 -1"))
alt_signed_overlaplist
## $A
## Item_A1 item_A2 
##    "-1"     "1" 
## 
## $B
## Item_B1  ItemB2  ItemB3 
##     "1"     "1"    "-1" 
## 
## $`A&B`
## Item_AB2 Item_AB4 Item_AB1 Item_AB3 
##    "1 1"   "1 -1"   "-1 1"  "-1 -1"

First, the alternative format is converted to the expected format for overlaplist2setlist(). Then it is converted to setlist.

signed_overlaplist <- lapply(alt_signed_overlaplist, function(i){
   split(names(i), i)
})
setlist <- overlaplist2setlist(signed_overlaplist)

The resulting setlist can be used with venndir() as in Figure 2.9.

2.1.9 Items

A streamlined approach to convert items from a vector or list into setlist is provided by venn_meme(), as described Venn Memes.

The process assumes the input data is provided in a specific order. The process is quite similar to that described in Overlap list, except that this process assumes the order matches the output of make_venn_combn_df().

The assumptions:

  • Input must be provided in the order defined by make_venn_combn_df(n) where n is the number of sets, or make_venn_combn_df(x) where x is a vector of set names.
  • Input is provided without using overlap names, they are assigned based upon the length of the input data.
  • Duplicate items are not permitted.

To illustrate the process, each example below uses a vector of character LETTERS, showing how the letters are populated into the Venn diagram.

Figure 2.10 illustrates the process by showing the output of venn_meme(), with each panel using a slightly longer vector of items.

  • For 1 item, the output includes one set (not shown).
  • For 2 or 3 items, it produces two sets (a).
  • For 3 to 7 items, a 3-way Venn diagram is created (b)
  • For 8 to 15 items, a 4-way Venn diagram is created (c).
  • For 16 to 31 items, a 5-way Venn diagram is created (d).
vm3 <- venn_meme(head(LETTERS, 3))
vm7 <- venn_meme(head(LETTERS, 7))
vm15 <- venn_meme(head(LETTERS, 15))
vm31 <- venn_meme(1:31)
Four Venn diagrams showing the output from venn_meme() when providing 3, 7, 15, and 31 items, respectively.

Figure 2.10: Four Venn diagrams showing the output from venn_meme() when providing 3, 7, 15, and 31 items, respectively.

Tip: A useful by-product of calling venn_meme() is the creation of a setlist or overlaplist, without needing to provide all the Venn overlap labels upfront.

vm3 <- venn_meme(head(LETTERS, 3), do_plot=FALSE)
setlist(vm3)
## $A
## [1] "A" "C"
## 
## $B
## [1] "B" "C"
overlaplist(vm3)
## $A
## [1] "A"
## 
## $B
## [1] "B"
## 
## $`A&B`
## [1] "C"

2.1.10 List of data frames

A common starting point is a data.frame for each set, with a column of numeric values, and column of item names. Very often this input is a "table of stats" which may also need to be filtered for statistical significance.

The general guidance is to filter each data.frame, then convert the result into a vector. The vector can either be un-signed, or signed.

  1. Setlist: a character vector of items.
  2. Signed setlist: vector using sign(), named by item.

To illustrate, consider the test data generated below, as two data.frame objects with P-values and log2 fold changes.

# define some test data
set.seed(123)
df1 <- data.frame(
   item=paste0("item", jamba::padInteger(1:20)),
   pvalue=round(digits=3, stats::runif(20) / 5),
   log2FoldChange=round(digits=3, rnorm(20) * 3))

df2 <- data.frame(
   item=paste0("item", jamba::padInteger(1:20)),
   pvalue=round(digits=3, stats::runif(20) / 5),
   log2FoldChange=round(digits=3, rnorm(20) * 3))

head(df1)
item pvalue log2FoldChange
item01 0.058 3.672
item02 0.158 1.079
item03 0.082 1.202
item04 0.177 0.332
item05 0.188 -1.668
item06 0.009 5.361

Option 1. Setlist

One convenient approach is to create a list of data.frame objects, apply a P-value filter to each entry, and return column 'item'.

This output does not have directional sign.

# create a list of data.frame objects
dflist <- list(Dataset1=df1,
   Dataset2=df2)

# iterate the list
setlist <- lapply(dflist, function(df){
   # apply stat threshold
   dfsub <- subset(df, pvalue < 0.1);
   # return the item name
   dfsub$item
})
str(setlist)
## List of 2
##  $ Dataset1: chr [1:9] "item01" "item03" "item06" "item10" ...
##  $ Dataset2: chr [1:11] "item02" "item03" "item04" "item06" ...

Option 2. Signed setlist

The other common alternative is to extend Option 1 to create a signed list, using the sign of 'log2FoldChange'. The vector should be the sign, then named by 'item'.

# create a list of data.frame objects
dflist <- list(Dataset1=df1,
   Dataset2=df2)

# iterate the list
setlist_signed <- lapply(dflist, function(df){
   # apply stat threshold
   dfsub <- subset(df, pvalue < 0.1);
   # create vector using sign()
   x <- sign(dfsub$log2FoldChange);
   # add item names
   names(x) <- dfsub$item;
   x
})
setlist_signed
## $Dataset1
## item01 item03 item06 item10 item12 item15 item17 item18 item19 
##      1      1      1     -1     -1     -1      1      1     -1 
## 
## $Dataset2
## item02 item03 item04 item06 item10 item14 item15 item16 item17 item19 item20 
##     -1     -1      1     -1     -1      1     -1      1     -1      1      1

Figure 2.11 shows the resulting Venn diagram without sign (left) and with sign (right).

Venn diagram using a list of data frames to a setlist (left) and signed setlist (right).Venn diagram using a list of data frames to a setlist (left) and signed setlist (right).

Figure 2.11: Venn diagram using a list of data frames to a setlist (left) and signed setlist (right).

venndir(setlist)
venndir(setlist_signed)