Calculate signed, directional overlaps across sets
Usage
signed_overlaps(
setlist,
overlap_type = c("detect", "each", "overlap", "concordance", "agreement"),
return_items = FALSE,
return_item_labels = return_items,
sep = "&",
trim_label = TRUE,
include_blanks = TRUE,
keep_item_order = FALSE,
verbose = FALSE,
warn = FALSE,
...
)Arguments
- setlist
listof named vectors, whose names represent set items, and whose values represent direction using valuesc(-1, 0, 1).- overlap_type
charactervalue indicating the type of overlap logic:"each"records each combination of signs;"overlap"disregards the sign and returns any match item overlap;"concordance"represents counts for full agreement, or"mixed"for any inconsistent overlapping direction;"agreement"represents full agreement in direction as"agreement", and"mixed"for any inconsistent direction.
- return_items
logicalindicating whether to return the items within each overlap set.- return_item_labels
logicalindicating whether to return the directional label associated with each item. A directional label combines the direction fromsetlistby item.- sep
characterused as a delimiter between set names, the default is"&".- trim_label
logicalindicating whether to trim the directional label, for example instead of returning"0 1 -1"it will return"1 -1"because the overlap name already indicates the sets involved.- include_blanks
logicalindicating whether each set overlap should be represented at least once even when no items are present in the overlap. Wheninclude_blanks=TRUEis useful in that it guarantees all possible combinations of overlaps are represented consistently in the output.- keep_item_order
logicaldefault FALSE, to determine whether items will be stored and displayed in the order they are provided. Note:keep_item_order=TRUEenables the following behaviors:Any
charactervector input will retain the order they appear.Any
factorvector input will sort items using factorlevels, which maintains the factor level order.Any named vector will use the
charactervector of names, keeping the order they appear in the vector.
- verbose
logicalindicating whether to print verbose output.- warn
logicaldefault FALSE, whether to print warnings during import in the event that input data is coerced to another type.- ...
additional arguments are passed to
list2imsigned().
Value
data.frame with columns intended to support venndir(),
but which may be more widely useful:
"sets"- character vector with sets and overlap names.one column indicating the
overlap_typeand corresponding values:"overlap"- This column is always included."concordance"- includes1(concordant) and-1(discordant)"agreement"- includes"agreement"and"disgreement""each"- includes sign values-1and1.
"overlap"- integer vector with overlap values, where0and1indicate which sets contained these items. This column is always included, even whenoverlap_typeis something else."num_sets"- integer number of sets represented in the overlap."count"- integer number of items in the overlap.one colname for each set name represented in the
"sets"column, intended to help filter by each set. Values will be0or1.overlap_label- will represent only the non-0 elements from"overlap"for convenience."items"- whenreturn_items=TRUEthis column will contain alist(inAsIsformat) ofcharactervectors, with the items.
Details
This function is the core function to summarize overlaps that include signed directionality. It is intended for situations where two sets may share items, but where the signed direction associated with those items may or may not also be shared.
One motivating example is with biological data, where a subset of genes, proteins, or regions of genome, may be regulated up or down, and this direction is relevant to understanding the biological process. Two experiments may identify similar genes, proteins, or regions of genome, but they may not regulate them in the same direction. This function is intended to help summarize item overlaps alongside the directionality of each item.
The directional counts can be summarized in slightly different
ways, defined by the argument overlap_type:
overlap_type="detect"- default behavior: each vector insetlistis handled independently:a vector with no names will use the vector values as items after converting them to
character;a named vector with
characterorfactorvalues will will use the vector names as items, and character values as item values;a named vector with
numericorintegervalues will use vector names as items, and will convert numeric values tosign().
overlap_type="each"- this option returns all possible directions individually counted.overlap_type="concordance"- this option returns the counts for each consistent direction, for example"up-up-up"would be counted, and"down-down-down"would be counted, but any mixture of"up"and"down"would be summarized and counted as"mixed". For 3-way overlaps, there are 8 possible directions, the labels are difficult to place in the Venn diagram, and are not altogether meaningful. Note that this option is the default forvenndir().overlap_type="overlap"- this option only summarizes overlaps without regard to direction. This option returns standard Venn overlap counts.overlap_type="agreement"- this option groups all directions that agree and returns them as"concordant", all others are returned as"mixed".
Note that overlap_type="agreement" and overlap_type="concordance"
will not convert numeric values to sign(), so if the input
contains numeric values such as 1.2435 they should probably be
converted to sign() before calling signed_overlaps(), for example:
signed_overlaps(lapply(setlist, sign))
See also
Other venndir support:
make_venn_test(),
modify_venndir_overlap(),
venndir_legender()
Examples
setlist <- make_venn_test(100, 2, do_signed=FALSE);
setlist <- make_venn_test(1e6, 3, do_signed=FALSE);
# so is a data.frame
so <- signed_overlaps(setlist, verbose=TRUE);
#> ## (22:43:00) 09Jul2025: signed_overlaps(): Processing input setlist.
#> ## (22:43:01) 09Jul2025: signed_overlaps(): Processing overlap_type='detect'
#> ## (22:43:01) 09Jul2025: signed_overlaps(): Creating other data types.
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Creating overlap vector.
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Creating concordance vector.
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Creating split names.
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Creating final vector.
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Splitting by observed directions per overlap.
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Creating labels for each split.
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Processing include_blanks=TRUE
#> ## (22:43:02) 09Jul2025: signed_overlaps(): Sorting rows by overlap count then set.
so
#> sets overlap num_sets count set_A set_B
#> set_A|1 0 0 set_A 1 0 0 1 29914 1 0
#> set_B|0 1 0 set_B 0 1 0 1 378895 0 1
#> set_C|0 0 1 set_C 0 0 1 1 74287 0 0
#> set_A&set_B|1 1 0 set_A&set_B 1 1 0 2 27516 1 1
#> set_A&set_C|1 0 1 set_A&set_C 1 0 1 2 5479 1 0
#> set_B&set_C|0 1 1 set_B&set_C 0 1 1 2 68631 0 1
#> set_A&set_B&set_C|1 1 1 set_A&set_B&set_C 1 1 1 3 4995 1 1
#> set_C overlap_label
#> set_A|1 0 0 0 1
#> set_B|0 1 0 0 1
#> set_C|0 0 1 1 1
#> set_A&set_B|1 1 0 0 1 1
#> set_A&set_C|1 0 1 1 1 1
#> set_B&set_C|0 1 1 1 1 1
#> set_A&set_B&set_C|1 1 1 1 1 1 1
# detect overlap_type
attr(signed_overlaps(setlist, "detect"), "overlap_type")
#> [1] "overlap"
setlist <- make_venn_test(100, 2, do_signed=TRUE);
# detect overlap_type
attr(signed_overlaps(setlist, "detect"), "overlap_type")
#> [1] "concordance"
# straight overlap counts
signed_overlaps(setlist, "overlap");
#> sets overlap num_sets count set_A set_B overlap_label
#> set_A|1 0 set_A 1 0 1 25 1 0 1
#> set_B|0 1 set_B 0 1 1 9 0 1 1
#> set_A&set_B|1 1 set_A&set_B 1 1 2 7 1 1 1 1
# each directional overlap count
signed_overlaps(setlist, "each");
#> sets each overlap num_sets count set_A set_B
#> set_A|-1 0 set_A -1 0 1 0 1 13 1 0
#> set_A|1 0 set_A 1 0 1 0 1 12 1 0
#> set_B|0 -1 set_B 0 -1 0 1 1 4 0 1
#> set_B|0 1 set_B 0 1 0 1 1 5 0 1
#> set_A&set_B|-1 -1 set_A&set_B -1 -1 1 1 2 2 1 1
#> set_A&set_B|-1 1 set_A&set_B -1 1 1 1 2 1 1 1
#> set_A&set_B|1 1 set_A&set_B 1 1 1 1 2 4 1 1
#> overlap_label
#> set_A|-1 0 -1
#> set_A|1 0 1
#> set_B|0 -1 -1
#> set_B|0 1 1
#> set_A&set_B|-1 -1 -1 -1
#> set_A&set_B|-1 1 -1 1
#> set_A&set_B|1 1 1 1
# concordance overlap counts
signed_overlaps(setlist, "concordance");
#> sets concordance overlap num_sets count set_A set_B
#> set_A|-1 0 set_A -1 0 1 0 1 13 1 0
#> set_A|1 0 set_A 1 0 1 0 1 12 1 0
#> set_B|0 -1 set_B 0 -1 0 1 1 4 0 1
#> set_B|0 1 set_B 0 1 0 1 1 5 0 1
#> set_A&set_B|-1 -1 set_A&set_B -1 -1 1 1 2 2 1 1
#> set_A&set_B|1 1 set_A&set_B 1 1 1 1 2 4 1 1
#> set_A&set_B|mixed set_A&set_B mixed 1 1 2 1 1 1
#> overlap_label
#> set_A|-1 0 -1
#> set_A|1 0 1
#> set_B|0 -1 -1
#> set_B|0 1 1
#> set_A&set_B|-1 -1 -1 -1
#> set_A&set_B|1 1 1 1
#> set_A&set_B|mixed mixed
# agreement overlap counts
signed_overlaps(setlist, "agreement");
#> sets agreement overlap num_sets count set_A set_B
#> set_A|agreement set_A agreement 1 0 1 25 1 0
#> set_B|agreement set_B agreement 0 1 1 9 0 1
#> set_A&set_B|agreement set_A&set_B agreement 1 1 2 6 1 1
#> set_A&set_B|mixed set_A&set_B mixed 1 1 2 1 1 1
#> overlap_label
#> set_A|agreement agreement
#> set_B|agreement agreement
#> set_A&set_B|agreement agreement
#> set_A&set_B|mixed mixed
# test to ensure factor input is handled properly
inputlist <- list(setA=factor(c("A", "B", "D")),
setB=factor(c("A", "C", "E", "F")))
signed_overlaps(inputlist, return_items=TRUE)
#> sets overlap num_sets count setA setB overlap_label items
#> setA|1 0 setA 1 0 1 2 1 0 1 B, D
#> setB|0 1 setB 0 1 1 3 0 1 1 C, E, F
#> setA&setB|1 1 setA&setB 1 1 2 1 1 1 1 1 A
# check to verify
signed_overlaps(inputlist, return_items=TRUE)$items
#> $`setA|1 0`
#> [1] "B" "D"
#>
#> $`setB|0 1`
#> [1] "C" "E" "F"
#>
#> $`setA&setB|1 1`
#> [1] "A"
#>
# test specific factor level order
inputlist <- list(
setA=factor(c("A", "B", "D"), levels=c("D", "B", "A")),
setB=factor(c("A", "C", "E", "F")))
signed_overlaps(inputlist, return_items=TRUE)
#> sets overlap num_sets count setA setB overlap_label items
#> setA|1 0 setA 1 0 1 2 1 0 1 B, D
#> setB|0 1 setB 0 1 1 3 0 1 1 C, E, F
#> setA&setB|1 1 setA&setB 1 1 2 1 1 1 1 1 A