Calculate signed, directional overlaps across sets
Usage
signed_overlaps(
setlist,
overlap_type = c("detect", "each", "overlap", "concordance", "agreement"),
return_items = FALSE,
return_item_labels = return_items,
sep = "&",
trim_label = TRUE,
include_blanks = TRUE,
keep_item_order = FALSE,
verbose = FALSE,
warn = FALSE,
...
)
Arguments
- setlist
list
of named vectors, whose names represent set items, and whose values represent direction using valuesc(-1, 0, 1)
.- overlap_type
character
value indicating the type of overlap logic:"each"
records each combination of signs;"overlap"
disregards the sign and returns any match item overlap;"concordance"
represents counts for full agreement, or"mixed"
for any inconsistent overlapping direction;"agreement"
represents full agreement in direction as"agreement"
, and"mixed"
for any inconsistent direction.
- return_items
logical
indicating whether to return the items within each overlap set.- return_item_labels
logical
indicating whether to return the directional label associated with each item. A directional label combines the direction fromsetlist
by item.- sep
character
used as a delimiter between set names, the default is"&"
.- trim_label
logical
indicating whether to trim the directional label, for example instead of returning"0 1 -1"
it will return"1 -1"
because the overlap name already indicates the sets involved.- include_blanks
logical
indicating whether each set overlap should be represented at least once even when no items are present in the overlap. Wheninclude_blanks=TRUE
is useful in that it guarantees all possible combinations of overlaps are represented consistently in the output.- keep_item_order
logical
default FALSE, to determine whether items will be stored and displayed in the order they are provided. Note:keep_item_order=TRUE
enables the following behaviors:Any
character
vector input will retain the order they appear.Any
factor
vector input will sort items using factorlevels
, which maintains the factor level order.Any named vector will use the
character
vector of names, keeping the order they appear in the vector.
- verbose
logical
indicating whether to print verbose output.- warn
logical
default FALSE, whether to print warnings during import in the event that input data is coerced to another type.- ...
additional arguments are passed to
list2imsigned()
.
Value
data.frame
with columns intended to support venndir()
,
but which may be more widely useful:
"sets"
- character vector with sets and overlap names.one column indicating the
overlap_type
and corresponding values:"overlap"
- This column is always included."concordance"
- includes1
(concordant) and-1
(discordant)"agreement"
- includes"agreement"
and"disgreement"
"each"
- includes sign values-1
and1
.
"overlap"
- integer vector with overlap values, where0
and1
indicate which sets contained these items. This column is always included, even whenoverlap_type
is something else."num_sets"
- integer number of sets represented in the overlap."count"
- integer number of items in the overlap.one colname for each set name represented in the
"sets"
column, intended to help filter by each set. Values will be0
or1
.overlap_label
- will represent only the non-0 elements from"overlap"
for convenience."items"
- whenreturn_items=TRUE
this column will contain alist
(inAsIs
format) ofcharacter
vectors, with the items.
Details
This function is the core function to summarize overlaps that include signed directionality. It is intended for situations where two sets may share items, but where the signed direction associated with those items may or may not also be shared.
One motivating example is with biological data, where a subset of genes, proteins, or regions of genome, may be regulated up or down, and this direction is relevant to understanding the biological process. Two experiments may identify similar genes, proteins, or regions of genome, but they may not regulate them in the same direction. This function is intended to help summarize item overlaps alongside the directionality of each item.
The directional counts can be summarized in slightly different
ways, defined by the argument overlap_type
:
overlap_type="detect"
- default behavior: each vector insetlist
is handled independently:a vector with no names will use the vector values as items after converting them to
character
;a named vector with
character
orfactor
values will will use the vector names as items, and character values as item values;a named vector with
numeric
orinteger
values will use vector names as items, and will convert numeric values tosign()
.
overlap_type="each"
- this option returns all possible directions individually counted.overlap_type="concordance"
- this option returns the counts for each consistent direction, for example"up-up-up"
would be counted, and"down-down-down"
would be counted, but any mixture of"up"
and"down"
would be summarized and counted as"mixed"
. For 3-way overlaps, there are 8 possible directions, the labels are difficult to place in the Venn diagram, and are not altogether meaningful. Note that this option is the default forvenndir()
.overlap_type="overlap"
- this option only summarizes overlaps without regard to direction. This option returns standard Venn overlap counts.overlap_type="agreement"
- this option groups all directions that agree and returns them as"concordant"
, all others are returned as"mixed"
.
Note that overlap_type="agreement"
and overlap_type="concordance"
will not convert numeric
values to sign()
, so if the input
contains numeric
values such as 1.2435
they should probably be
converted to sign()
before calling signed_overlaps()
, for example:
signed_overlaps(lapply(setlist, sign))
See also
Other venndir core:
render_venndir()
,
textvenn()
,
venn_meme()
,
venndir()
Examples
setlist <- make_venn_test(100, 2, do_signed=FALSE);
setlist <- make_venn_test(1e6, 3, do_signed=FALSE);
# so is a data.frame
so <- signed_overlaps(setlist, verbose=TRUE);
#> ## (21:04:43) 09Dec2024: signed_overlaps(): Processing input setlist.
#> ## (21:04:43) 09Dec2024: signed_overlaps(): Processing overlap_type='detect'
#> ## (21:04:44) 09Dec2024: signed_overlaps(): Creating other data types.
#> ## (21:04:44) 09Dec2024: signed_overlaps(): Creating overlap vector.
#> ## (21:04:44) 09Dec2024: signed_overlaps(): Creating concordance vector.
#> ## (21:04:44) 09Dec2024: signed_overlaps(): Creating split names.
#> ## (21:04:44) 09Dec2024: signed_overlaps(): Creating final vector.
#> ## (21:04:45) 09Dec2024: signed_overlaps(): Splitting by observed directions per overlap.
#> ## (21:04:45) 09Dec2024: signed_overlaps(): Creating labels for each split.
#> ## (21:04:45) 09Dec2024: signed_overlaps(): Processing include_blanks=TRUE
#> ## (21:04:45) 09Dec2024: signed_overlaps(): Sorting rows by overlap count then set.
so
#> sets overlap num_sets count set_A set_B
#> set_A|1 0 0 set_A 1 0 0 1 29914 1 0
#> set_B|0 1 0 set_B 0 1 0 1 378895 0 1
#> set_C|0 0 1 set_C 0 0 1 1 74287 0 0
#> set_A&set_B|1 1 0 set_A&set_B 1 1 0 2 27516 1 1
#> set_A&set_C|1 0 1 set_A&set_C 1 0 1 2 5479 1 0
#> set_B&set_C|0 1 1 set_B&set_C 0 1 1 2 68631 0 1
#> set_A&set_B&set_C|1 1 1 set_A&set_B&set_C 1 1 1 3 4995 1 1
#> set_C overlap_label
#> set_A|1 0 0 0 1
#> set_B|0 1 0 0 1
#> set_C|0 0 1 1 1
#> set_A&set_B|1 1 0 0 1 1
#> set_A&set_C|1 0 1 1 1 1
#> set_B&set_C|0 1 1 1 1 1
#> set_A&set_B&set_C|1 1 1 1 1 1 1
# detect overlap_type
attr(signed_overlaps(setlist, "detect"), "overlap_type")
#> [1] "overlap"
setlist <- make_venn_test(100, 2, do_signed=TRUE);
# detect overlap_type
attr(signed_overlaps(setlist, "detect"), "overlap_type")
#> [1] "concordance"
# straight overlap counts
signed_overlaps(setlist, "overlap");
#> sets overlap num_sets count set_A set_B overlap_label
#> set_A|1 0 set_A 1 0 1 25 1 0 1
#> set_B|0 1 set_B 0 1 1 9 0 1 1
#> set_A&set_B|1 1 set_A&set_B 1 1 2 7 1 1 1 1
# each directional overlap count
signed_overlaps(setlist, "each");
#> sets each overlap num_sets count set_A set_B
#> set_A|-1 0 set_A -1 0 1 0 1 13 1 0
#> set_A|1 0 set_A 1 0 1 0 1 12 1 0
#> set_B|0 -1 set_B 0 -1 0 1 1 4 0 1
#> set_B|0 1 set_B 0 1 0 1 1 5 0 1
#> set_A&set_B|-1 -1 set_A&set_B -1 -1 1 1 2 2 1 1
#> set_A&set_B|-1 1 set_A&set_B -1 1 1 1 2 1 1 1
#> set_A&set_B|1 1 set_A&set_B 1 1 1 1 2 4 1 1
#> overlap_label
#> set_A|-1 0 -1
#> set_A|1 0 1
#> set_B|0 -1 -1
#> set_B|0 1 1
#> set_A&set_B|-1 -1 -1 -1
#> set_A&set_B|-1 1 -1 1
#> set_A&set_B|1 1 1 1
# concordance overlap counts
signed_overlaps(setlist, "concordance");
#> sets concordance overlap num_sets count set_A set_B
#> set_A|-1 0 set_A -1 0 1 0 1 13 1 0
#> set_A|1 0 set_A 1 0 1 0 1 12 1 0
#> set_B|0 -1 set_B 0 -1 0 1 1 4 0 1
#> set_B|0 1 set_B 0 1 0 1 1 5 0 1
#> set_A&set_B|-1 -1 set_A&set_B -1 -1 1 1 2 2 1 1
#> set_A&set_B|1 1 set_A&set_B 1 1 1 1 2 4 1 1
#> set_A&set_B|mixed set_A&set_B mixed 1 1 2 1 1 1
#> overlap_label
#> set_A|-1 0 -1
#> set_A|1 0 1
#> set_B|0 -1 -1
#> set_B|0 1 1
#> set_A&set_B|-1 -1 -1 -1
#> set_A&set_B|1 1 1 1
#> set_A&set_B|mixed mixed
# agreement overlap counts
signed_overlaps(setlist, "agreement");
#> sets agreement overlap num_sets count set_A set_B
#> set_A|agreement set_A agreement 1 0 1 25 1 0
#> set_B|agreement set_B agreement 0 1 1 9 0 1
#> set_A&set_B|agreement set_A&set_B agreement 1 1 2 6 1 1
#> set_A&set_B|mixed set_A&set_B mixed 1 1 2 1 1 1
#> overlap_label
#> set_A|agreement agreement
#> set_B|agreement agreement
#> set_A&set_B|agreement agreement
#> set_A&set_B|mixed mixed
# test to ensure factor input is handled properly
inputlist <- list(setA=factor(c("A", "B", "D")),
setB=factor(c("A", "C", "E", "F")))
signed_overlaps(inputlist, return_items=TRUE)
#> sets overlap num_sets count setA setB overlap_label items
#> setA|1 0 setA 1 0 1 2 1 0 1 B, D
#> setB|0 1 setB 0 1 1 3 0 1 1 C, E, F
#> setA&setB|1 1 setA&setB 1 1 2 1 1 1 1 1 A
# check to verify
signed_overlaps(inputlist, return_items=TRUE)$items
#> $`setA|1 0`
#> [1] "B" "D"
#>
#> $`setB|0 1`
#> [1] "C" "E" "F"
#>
#> $`setA&setB|1 1`
#> [1] "A"
#>
# test specific factor level order
inputlist <- list(
setA=factor(c("A", "B", "D"), levels=c("D", "B", "A")),
setB=factor(c("A", "C", "E", "F")))
signed_overlaps(inputlist, return_items=TRUE)
#> sets overlap num_sets count setA setB overlap_label items
#> setA|1 0 setA 1 0 1 2 1 0 1 B, D
#> setB|0 1 setB 0 1 1 3 0 1 1 C, E, F
#> setA&setB|1 1 setA&setB 1 1 2 1 1 1 1 1 A