2.1 Common Data Input Types
2.1.1 List of sets
The simplest input is a List of sets, where each set is
represented as an R vector
of items.
## List of 3
## $ set_A: chr [1:35] "item_014" "item_195" "item_170" "item_050" ...
## $ set_B: chr [1:18] "item_155" "item_188" "item_053" "item_135" ...
## $ set_C: chr [1:79] "item_041" "item_175" "item_090" "item_060" ...
Each vector is not considered 'signed', because each vector has no names()
.
This setlist
can be used directly with venndir()
.
In Figure 2.1 there are no
signed counts displayed.

Figure 2.1: Default Venn diagram with three sets.
2.1.2 List of signed sets
The next common input is a list of signed sets,
represented as numeric
vectors with items stored as names()
.
An example is shown below, showing only the first six elements of each
vector. Notice the vector names are "item_014"
, "item_195"
,
while the values are -1
, -1
.
## $set_A
## item_014 item_195 item_170 item_050 item_118 item_043
## -1 -1 1 1 -1 -1
##
## $set_B
## item_155 item_188 item_053 item_135 item_198 item_200
## -1 1 1 -1 1 1
##
## $set_C
## item_041 item_175 item_090 item_060 item_016 item_116
## 1 1 -1 -1 -1 -1
Figure 2.2 illustrates the
output from venndir()
, where
signed counts are displayed using
overlap_type='concordance'
by default for a
signed setlist.

Figure 2.2: Default Venn diagram with three signed sets. The overlap counts are each tabulated by directional sign.
The output is described below:
- The region
"set_A"
contains19
items that are not present in any other set for this Venn diagram.10
of these19
items are "up", indicated by the red label which includes the up arrow \(↑ 10\).9
of these19
items are "down", indicated by the blue label which includes the down arrow \(↓ 9\).
- The region of overlap between
'set_A'
and'set_B'
contains 5 items which are only present in these two sets, and not present in'set_C'
.3
of these5
items are "up" in both'set_A'
and'set_B'
, indicated by the red label with two up arrows: \(↑↑ 3\).2
of these5
items are discordant in direction, which means the sign in'set_A'
disagrees with'set_B'
. The label is indicated in grey color and large'X'
\(X 2\). The purpose of using one'X'
is to avoid describing all possible combinations of "up" and "down".
The default summary for a signed setlist is
overlap_type="concordance"
,
which summarizes directional discordance in one category 'X'
, and
tabulates counts in each subset that involves only one direction.
For other approaches to summarize directional counts, see the summary in Table 3.3, in the section Overlap Type.
2.1.3 Incidence matrix
An incidence matrix (im) is another common input format,
a matrix
whose rownames
are items, and colnames
are sets.
Any non-zero, non-empty value in the matrix indicates the item (row) exists
in the set (column).
## set_A set_B set_C
## item_014 1 0 0
## item_195 1 0 1
## item_170 1 0 0
## item_050 1 0 1
## item_118 1 0 1
## item_043 1 0 0
## item_200 1 1 0
## item_196 1 0 1
## item_153 1 1 1
## item_090 1 0 1
Figure 2.3 shows the Venn diagram
created by venndir()
accepts an incidence matrix as input data.
The input data is converted to a setlist
within the function.

Figure 2.3: Default Venn diagram with three sets, using an incidence matrix as input data.
When the incidence matrix only contains positive values, it is
assumed to be non-directional. This assumption can be changed
by using the argument overlap_type
.
2.1.4 Signed incidence matrix
A signed incidence matrix is similar to an incidence matrix, except that it may include positive and negative values.
## set_A set_B set_C
## item_014 -1 0 0
## item_195 -1 0 1
## item_170 1 0 0
## item_050 1 0 1
## item_118 -1 0 -1
## item_043 -1 0 0
## item_200 1 1 0
## item_196 -1 0 -1
## item_153 -1 -1 -1
## item_090 -1 0 -1
Figure 2.4 shows the signed Venn
diagram created by venndir()
also accepts a
signed incidence matrix
as input data, and subsequently displays signed counts.

Figure 2.4: Default Venn diagram with three sets, using a signed incidence matrix as input data.
2.1.5 Overlap counts
If the Venn overlap counts are already known, they can also be used to re-create the corresponding Venn diagram.
The function counts2setlist()
accepts Venn counts, and returns
the corresponding setlist
.
The counts should be named by the Venn overlap, by using the name
of each set involved, separated by ampersand '&'
.
The example below uses sets "A"
and "B"
, and the corresponding
overlap between 'A'
and 'B'
is named 'A&B'
.
Note that the overlap name should be defined in quotes in R code,
so the ampersand '&'
is stored properly.
## List of 2
## $ A: chr [1:27] "A_1" "A_2" "A_3" "A_4" ...
## $ B: chr [1:24] "B_1" "B_2" "B_3" "B_4" ...
Figure 2.5 shows the output of venndir()
using this setlist
as input.

Figure 2.5: Venn diagram created by using overlap counts as input data.
Tip:
When starting with overlap counts, it is recommended to define
set names with single characters, such as 'A'
, 'B'
, and 'C'
.
- The set names can be adjusted afterwards by editing
names(setlist)
. - However, the preferred approach is to use
venndir()
argumentssetlist_labels
andlegend_labels
, also described in Custom legend labels. - Figure 2.6 illustrates this process.
The complete combination of sets and overlaps can be defined by
calling make_venn_combn_df()
using a vector of set names.
For example rownames(make_venn_combn_df(LETTERS[1:3]))
produces
the following:
'A', 'B', 'C', 'A&B', 'A&C', 'B&C', 'A&B&C'
venndir(setlist,
setlist_labels=c("Set A:\ncontrol state",
"Set B:\ntest state"),
legend_labels=c("A: control state",
"B: test state"))

Figure 2.6: Venn diagram using custom set names using setlist_labels
and legend_labels
.
2.1.6 Signed overlap counts
Similar to providing overlap counts as above, this approach defines
counts for each directional overlap, using signed_counts2setlist()
.
This input is quite complex, also the least common.
The input is a list
of integer
count vectors. Each vector is
named by the overlap, for example "A"
or "A&B"
. Each vector
is named by the direction, delimited by underscore "_"
, for
example "1"
for 'up', or "1_-1"
for 'up_down'.
Notice the format of the input data:
signed_counts <- list(
"A"=c(
"1"=80,
"-1"=95),
"B"=c(
"1"=15,
"-1"=30),
"A&B"=c(
"1_1"=100,
"1_-1"=3,
"-1_1"=4,
"-1_-1"=125))
signed_counts
## $A
## 1 -1
## 80 95
##
## $B
## 1 -1
## 15 30
##
## $`A&B`
## 1_1 1_-1 -1_1 -1_-1
## 100 3 4 125
This input format is complicated for 2-way data, and certainly even more complicated for 3-way data. However, sometimes it is the most practical way to produce a given figure.
Counts are converted to a signed setlist with signed_counts2setlist()
.
Item names are generated only to create the setlist
and
are not otherwise useful.
## $A
## A_1_1 A_-1_22 A&B_1_1_29 A&B_-1_-1_24 A&B_-1_-1_125
## "1" "-1" "1" "-1" "-1"
##
## $B
## B_1_1 A&B_1_1_25 A&B_1_1_94 A&B_-1_-1_56 A&B_-1_-1_125
## "1" "1" "1" "-1" "-1"
Figure 2.7 shows the setlist
visualized with venndir()
.

Figure 2.7: Venndir showing signed counts summarized using overlap_type="each"
, which includes counts for each combination of signs.
2.1.7 Overlap list
This approach is conceptually similar to Overlap counts which starts with the Venn counts as input, except in this case instead of supplying the integer count in each Venn overlap, the data contains the actual items.
Consider a simple example:
- two sets: 'A' and 'B'
- three total items, one only present in 'A', one unique to 'B', and one shared by both groups which is assigned to 'A&B'.
overlaplist <- list(
A=c("Christina"),
B=c("James"),
"A&B"=c("Jillian", "Zander", "Java Pup")
)
str(overlaplist)
## List of 3
## $ A : chr "Christina"
## $ B : chr "James"
## $ A&B: chr [1:3] "Jillian" "Zander" "Java Pup"
This overlaplist
is converted into setlist
using overlaplist2setlist()
,
which can be input to venndir()
.
Figure 2.8 shows the default Venn diagram (left),
and a variation which displays the item labels (right) using argument
show_labels="Ni"
.
The show_labels
argument is described in Label Content.
setlist <- overlaplist2setlist(overlaplist)
venndir(setlist)
venndir(setlist,keep_item_order=TRUE,
item_cex_factor=c(0.5, 0.5, 0.8),
show_labels="Ni")


Figure 2.8: Venn diagram after converting an overlap list to setlist
, showing overlap counts (left), and overlap item labels (right).
The argument item_cex_factor
is used to adjust the item label font
size, and is described in Item Labels.
2.1.8 Signed overlap list
A powerful but complex import option is a signed overlap list, similar to the previous section Overlap list with the addition of directional sign.
The format is similar to overlap list:
list
named by the Venn overlap: 'A', 'B', 'A&B'Each list element is also a
list
, named by the directional sign.- Each sign is defined using
'1'
or'-1'
separated by a space. - Signs involving one set:
"1"
,"-1"
- Signs involving two sets:
"1 1"
,"1 -1"
, etc. - Signs involving three sets:
"1 1 1"
,"1 -1 1"
, etc.
- Each sign is defined using
signed_overlaplist <- list(
A=list(
"-1"=c("Item_A1"),
"1"=c("Item_A2")),
B=list(
"1"=c("Item_B1", "ItemB2"),
"-1"=c("ItemB3")),
"A&B"=list(
"1 1"=c("Item_AB2"),
"1 -1"=c("Item_AB4"),
"-1 1"=c("Item_AB1"),
"-1 -1"=c("Item_AB3"))
)
str(signed_overlaplist)
## List of 3
## $ A :List of 2
## ..$ -1: chr "Item_A1"
## ..$ 1 : chr "Item_A2"
## $ B :List of 2
## ..$ 1 : chr [1:2] "Item_B1" "ItemB2"
## ..$ -1: chr "ItemB3"
## $ A&B:List of 4
## ..$ 1 1 : chr "Item_AB2"
## ..$ 1 -1 : chr "Item_AB4"
## ..$ -1 1 : chr "Item_AB1"
## ..$ -1 -1: chr "Item_AB3"
The signed_overlaplist
is converted to setlist
using
overlaplist2setlist()
.
Figure 2.9 shows the resulting Venn diagram with item labels enabled. Notice the item labels include a directional arrow, and are colored by the sign.
setlist <- overlaplist2setlist(signed_overlaplist)
v <- venndir(setlist,
show_labels="Ni",
item_cex_factor=0.6,
xyratio=1.5)

Figure 2.9: Venn diagram derived from a signed overlap list. The figure shows item labels which are colored by sign, and placed beside a directional arrow.
This input format can also be generated from the Venndir
object
itself using overlaplist(v)
. However, it requires that venndir()
was called with argument overlap_type="each"
to preserve each sign.
The default overlap_type="concordance"
does not preserve signs
for discordant overlaps.
2.1.8.1 Alternative signed overlap list
An alternative input format is shown below, which may be more convenient to produce in some circumstances.
The alternate format is shown below:
alt_signed_overlaplist <- list(
A=c("Item_A1"="-1",
"item_A2"="1"),
B=c("Item_B1"="1",
"ItemB2"="1",
"ItemB3"="-1"),
"A&B"=c("Item_AB2"="1 1",
"Item_AB4"="1 -1",
"Item_AB1"="-1 1",
"Item_AB3"="-1 -1"))
alt_signed_overlaplist
## $A
## Item_A1 item_A2
## "-1" "1"
##
## $B
## Item_B1 ItemB2 ItemB3
## "1" "1" "-1"
##
## $`A&B`
## Item_AB2 Item_AB4 Item_AB1 Item_AB3
## "1 1" "1 -1" "-1 1" "-1 -1"
First, the alternative format is converted to the expected
format for overlaplist2setlist()
.
Then it is converted to setlist
.
signed_overlaplist <- lapply(alt_signed_overlaplist, function(i){
split(names(i), i)
})
setlist <- overlaplist2setlist(signed_overlaplist)
The resulting setlist
can be used with venndir()
as
in Figure 2.9.
2.1.9 Items
A streamlined approach to convert items from a vector
or list
into setlist
is provided by venn_meme()
, as described Venn Memes.
The process assumes the input data is provided in a specific order.
The process is quite similar to that described in Overlap list,
except that this process assumes the order matches the output of
make_venn_combn_df()
.
The assumptions:
- Input must be provided in the order defined by
make_venn_combn_df(n)
wheren
is the number of sets, ormake_venn_combn_df(x)
wherex
is a vector of set names. - Input is provided without using overlap names, they are assigned based upon the length of the input data.
- Duplicate items are not permitted.
To illustrate the process, each example below uses a vector of
character LETTERS
, showing how the letters are populated into
the Venn diagram.
Figure 2.10 illustrates the process by showing the
output of venn_meme()
, with each panel using a slightly longer
vector of items.
- For 1 item, the output includes one set (not shown).
- For 2 or 3 items, it produces two sets (a).
- For 3 to 7 items, a 3-way Venn diagram is created (b)
- For 8 to 15 items, a 4-way Venn diagram is created (c).
- For 16 to 31 items, a 5-way Venn diagram is created (d).
vm3 <- venn_meme(head(LETTERS, 3))
vm7 <- venn_meme(head(LETTERS, 7))
vm15 <- venn_meme(head(LETTERS, 15))
vm31 <- venn_meme(1:31)

Figure 2.10: Four Venn diagrams showing the output from venn_meme()
when providing 3, 7, 15, and 31 items, respectively.
Tip: A useful by-product of calling venn_meme()
is the creation
of a setlist
or overlaplist
, without needing to provide
all the Venn overlap labels upfront.
## $A
## [1] "A" "C"
##
## $B
## [1] "B" "C"
## $A
## [1] "A"
##
## $B
## [1] "B"
##
## $`A&B`
## [1] "C"
2.1.10 List of data frames
A common starting point is a data.frame
for each set,
with a column of numeric values, and column of item names.
Very often this input is a "table of stats" which may
also need to be filtered for statistical significance.
The general guidance is to filter each data.frame
, then convert
the result into a vector. The vector can either be un-signed,
or signed.
- Setlist: a
character
vector of items. - Signed setlist: vector using
sign()
, named by item.
To illustrate, consider the test data generated below, as two
data.frame
objects with P-values and log2 fold changes.
# define some test data
set.seed(123)
df1 <- data.frame(
item=paste0("item", jamba::padInteger(1:20)),
pvalue=round(digits=3, stats::runif(20) / 5),
log2FoldChange=round(digits=3, rnorm(20) * 3))
df2 <- data.frame(
item=paste0("item", jamba::padInteger(1:20)),
pvalue=round(digits=3, stats::runif(20) / 5),
log2FoldChange=round(digits=3, rnorm(20) * 3))
head(df1)
item | pvalue | log2FoldChange |
---|---|---|
item01 | 0.058 | 3.672 |
item02 | 0.158 | 1.079 |
item03 | 0.082 | 1.202 |
item04 | 0.177 | 0.332 |
item05 | 0.188 | -1.668 |
item06 | 0.009 | 5.361 |
Option 1. Setlist
One convenient approach is to create a list
of data.frame
objects,
apply a P-value filter to each entry, and return column 'item'
.
This output does not have directional sign.
# create a list of data.frame objects
dflist <- list(Dataset1=df1,
Dataset2=df2)
# iterate the list
setlist <- lapply(dflist, function(df){
# apply stat threshold
dfsub <- subset(df, pvalue < 0.1);
# return the item name
dfsub$item
})
str(setlist)
## List of 2
## $ Dataset1: chr [1:9] "item01" "item03" "item06" "item10" ...
## $ Dataset2: chr [1:11] "item02" "item03" "item04" "item06" ...
Option 2. Signed setlist
The other common alternative is to extend Option 1 to create
a signed list, using the sign of 'log2FoldChange'
.
The vector should be the sign, then named by 'item'.
# create a list of data.frame objects
dflist <- list(Dataset1=df1,
Dataset2=df2)
# iterate the list
setlist_signed <- lapply(dflist, function(df){
# apply stat threshold
dfsub <- subset(df, pvalue < 0.1);
# create vector using sign()
x <- sign(dfsub$log2FoldChange);
# add item names
names(x) <- dfsub$item;
x
})
setlist_signed
## $Dataset1
## item01 item03 item06 item10 item12 item15 item17 item18 item19
## 1 1 1 -1 -1 -1 1 1 -1
##
## $Dataset2
## item02 item03 item04 item06 item10 item14 item15 item16 item17 item19 item20
## -1 -1 1 -1 -1 1 -1 1 -1 1 1
Figure 2.11 shows the resulting Venn diagram without sign (left) and with sign (right).


Figure 2.11: Venn diagram using a list of data frames to a setlist
(left) and signed setlist
(right).