Calculate signed, directional overlaps across sets

signed_overlaps(
  setlist,
  overlap_type = c("detect", "each", "overlap", "concordance", "agreement"),
  return_items = FALSE,
  return_item_labels = return_items,
  sep = "&",
  trim_label = TRUE,
  include_blanks = TRUE,
  keep_item_order = FALSE,
  verbose = FALSE,
  warn = FALSE,
  ...
)

Arguments

setlist

list of named vectors, whose names represent set items, and whose values represent direction using values c(-1, 0, 1).

overlap_type

character value indicating the type of overlap logic:

  • "each" records each combination of signs;

  • "overlap" disregards the sign and returns any match item overlap;

  • "concordance" represents counts for full agreement, or "mixed" for any inconsistent overlapping direction;

  • "agreement" represents full agreement in direction as "agreement", and "mixed" for any inconsistent direction.

return_items

logical indicating whether to return the items within each overlap set.

return_item_labels

logical indicating whether to return the directional label associated with each item. A directional label combines the direction from setlist by item.

sep

character used as a delimiter between set names, the default is "&".

trim_label

logical indicating whether to trim the directional label, for example instead of returning "0 1 -1" it will return "1 -1" because the overlap name already indicates the sets involved.

include_blanks

logical indicating whether each set overlap should be represented at least once even when no items are present in the overlap. When include_blanks=TRUE is useful in that it guarantees all possible combinations of overlaps are represented consistently in the output.

keep_item_order

logical default FALSE, to determine whether items will be stored and displayed in the order they are provided. Note: keep_item_order=TRUE enables the following behaviors:

  • Any character vector input will retain the order they appear.

  • Any factor vector input will sort items using factor levels, which maintains the factor level order.

  • Any named vector will use the character vector of names, keeping the order they appear in the vector.

verbose

logical indicating whether to print verbose output.

warn

logical default FALSE, whether to print warnings during import in the event that input data is coerced to another type.

...

additional arguments are passed to list2imsigned().

Value

data.frame with columns intended to support venndir(), but which may be more widely useful:

  • "sets" - character vector with sets and overlap names.

  • one column indicating the overlap_type and corresponding values:

    • "overlap" - This column is always included.

    • "concordance" - includes 1 (concordant) and -1 (discordant)

    • "agreement" - includes "agreement" and "disgreement"

    • "each" - includes sign values -1 and 1.

  • "overlap" - integer vector with overlap values, where 0 and 1 indicate which sets contained these items. This column is always included, even when overlap_type is something else.

  • "num_sets" - integer number of sets represented in the overlap.

  • "count" - integer number of items in the overlap.

  • one colname for each set name represented in the "sets" column, intended to help filter by each set. Values will be 0 or 1.

  • overlap_label - will represent only the non-0 elements from "overlap" for convenience.

  • "items" - when return_items=TRUE this column will contain a list (in AsIs format) of character vectors, with the items.

Details

This function is the core function to summarize overlaps that include signed directionality. It is intended for situations where two sets may share items, but where the signed direction associated with those items may or may not also be shared.

One motivating example is with biological data, where a subset of genes, proteins, or regions of genome, may be regulated up or down, and this direction is relevant to understanding the biological process. Two experiments may identify similar genes, proteins, or regions of genome, but they may not regulate them in the same direction. This function is intended to help summarize item overlaps alongside the directionality of each item.

The directional counts can be summarized in slightly different ways, defined by the argument overlap_type:

  • overlap_type="detect" - default behavior: each vector in setlist is handled independently:

    • a vector with no names will use the vector values as items after converting them to character;

    • a named vector with character or factor values will will use the vector names as items, and character values as item values;

    • a named vector with numeric or integer values will use vector names as items, and will convert numeric values to sign().

  • overlap_type="each" - this option returns all possible directions individually counted.

  • overlap_type="concordance" - this option returns the counts for each consistent direction, for example "up-up-up" would be counted, and "down-down-down" would be counted, but any mixture of "up" and "down" would be summarized and counted as "mixed". For 3-way overlaps, there are 8 possible directions, the labels are difficult to place in the Venn diagram, and are not altogether meaningful. Note that this option is the default for venndir().

  • overlap_type="overlap" - this option only summarizes overlaps without regard to direction. This option returns standard Venn overlap counts.

  • overlap_type="agreement" - this option groups all directions that agree and returns them as "concordant", all others are returned as "mixed".

Note that overlap_type="agreement" and overlap_type="concordance" will not convert numeric values to sign(), so if the input contains numeric values such as 1.2435 they should probably be converted to sign() before calling signed_overlaps(), for example: signed_overlaps(lapply(setlist, sign))

See also

Other venndir core: render_venndir(), textvenn(), venn_meme(), venndir()

Examples

setlist <- make_venn_test(100, 2, do_signed=FALSE);
setlist <- make_venn_test(1e6, 3, do_signed=FALSE);

# so is a data.frame
so <- signed_overlaps(setlist, verbose=TRUE);
#> ##  (17:23:35) 19Nov2024:   signed_overlaps(): Processing input setlist. 
#> ##  (17:23:36) 19Nov2024:   signed_overlaps(): Processing overlap_type='detect' 
#> ##  (17:23:36) 19Nov2024:   signed_overlaps(): Creating other data types. 
#> ##  (17:23:36) 19Nov2024:   signed_overlaps(): Creating overlap vector. 
#> ##  (17:23:37) 19Nov2024:   signed_overlaps(): Creating concordance vector. 
#> ##  (17:23:37) 19Nov2024:   signed_overlaps(): Creating split names. 
#> ##  (17:23:37) 19Nov2024:   signed_overlaps(): Creating final vector. 
#> ##  (17:23:37) 19Nov2024:   signed_overlaps(): Splitting by observed directions per overlap. 
#> ##  (17:23:37) 19Nov2024:   signed_overlaps(): Creating labels for each split. 
#> ##  (17:23:37) 19Nov2024:   signed_overlaps(): Processing include_blanks=TRUE 
#> ##  (17:23:37) 19Nov2024:   signed_overlaps(): Sorting rows by overlap count then set. 
so
#>                                      sets overlap num_sets  count set_A set_B
#> set_A|1 0 0                         set_A   1 0 0        1  29914     1     0
#> set_B|0 1 0                         set_B   0 1 0        1 378895     0     1
#> set_C|0 0 1                         set_C   0 0 1        1  74287     0     0
#> set_A&set_B|1 1 0             set_A&set_B   1 1 0        2  27516     1     1
#> set_A&set_C|1 0 1             set_A&set_C   1 0 1        2   5479     1     0
#> set_B&set_C|0 1 1             set_B&set_C   0 1 1        2  68631     0     1
#> set_A&set_B&set_C|1 1 1 set_A&set_B&set_C   1 1 1        3   4995     1     1
#>                         set_C overlap_label
#> set_A|1 0 0                 0             1
#> set_B|0 1 0                 0             1
#> set_C|0 0 1                 1             1
#> set_A&set_B|1 1 0           0           1 1
#> set_A&set_C|1 0 1           1           1 1
#> set_B&set_C|0 1 1           1           1 1
#> set_A&set_B&set_C|1 1 1     1         1 1 1

# detect overlap_type
attr(signed_overlaps(setlist, "detect"), "overlap_type")
#> [1] "overlap"

setlist <- make_venn_test(100, 2, do_signed=TRUE);

# detect overlap_type
attr(signed_overlaps(setlist, "detect"), "overlap_type")
#> [1] "concordance"

# straight overlap counts
signed_overlaps(setlist, "overlap");
#>                        sets overlap num_sets count set_A set_B overlap_label
#> set_A|1 0             set_A     1 0        1    25     1     0             1
#> set_B|0 1             set_B     0 1        1     9     0     1             1
#> set_A&set_B|1 1 set_A&set_B     1 1        2     7     1     1           1 1

# each directional overlap count
signed_overlaps(setlist, "each");
#>                          sets  each overlap num_sets count set_A set_B
#> set_A|-1 0              set_A  -1 0     1 0        1    13     1     0
#> set_A|1 0               set_A   1 0     1 0        1    12     1     0
#> set_B|0 -1              set_B  0 -1     0 1        1     4     0     1
#> set_B|0 1               set_B   0 1     0 1        1     5     0     1
#> set_A&set_B|-1 -1 set_A&set_B -1 -1     1 1        2     2     1     1
#> set_A&set_B|-1 1  set_A&set_B  -1 1     1 1        2     1     1     1
#> set_A&set_B|1 1   set_A&set_B   1 1     1 1        2     4     1     1
#>                   overlap_label
#> set_A|-1 0                   -1
#> set_A|1 0                     1
#> set_B|0 -1                   -1
#> set_B|0 1                     1
#> set_A&set_B|-1 -1         -1 -1
#> set_A&set_B|-1 1           -1 1
#> set_A&set_B|1 1             1 1

# concordance overlap counts
signed_overlaps(setlist, "concordance");
#>                          sets concordance overlap num_sets count set_A set_B
#> set_A|-1 0              set_A        -1 0     1 0        1    13     1     0
#> set_A|1 0               set_A         1 0     1 0        1    12     1     0
#> set_B|0 -1              set_B        0 -1     0 1        1     4     0     1
#> set_B|0 1               set_B         0 1     0 1        1     5     0     1
#> set_A&set_B|-1 -1 set_A&set_B       -1 -1     1 1        2     2     1     1
#> set_A&set_B|1 1   set_A&set_B         1 1     1 1        2     4     1     1
#> set_A&set_B|mixed set_A&set_B       mixed     1 1        2     1     1     1
#>                   overlap_label
#> set_A|-1 0                   -1
#> set_A|1 0                     1
#> set_B|0 -1                   -1
#> set_B|0 1                     1
#> set_A&set_B|-1 -1         -1 -1
#> set_A&set_B|1 1             1 1
#> set_A&set_B|mixed         mixed

# agreement overlap counts
signed_overlaps(setlist, "agreement");
#>                              sets agreement overlap num_sets count set_A set_B
#> set_A|agreement             set_A agreement     1 0        1    25     1     0
#> set_B|agreement             set_B agreement     0 1        1     9     0     1
#> set_A&set_B|agreement set_A&set_B agreement     1 1        2     6     1     1
#> set_A&set_B|mixed     set_A&set_B     mixed     1 1        2     1     1     1
#>                       overlap_label
#> set_A|agreement           agreement
#> set_B|agreement           agreement
#> set_A&set_B|agreement     agreement
#> set_A&set_B|mixed             mixed

# test to ensure factor input is handled properly
inputlist <- list(setA=factor(c("A", "B", "D")),
   setB=factor(c("A", "C", "E", "F")))
signed_overlaps(inputlist, return_items=TRUE)
#>                    sets overlap num_sets count setA setB overlap_label   items
#> setA|1 0           setA     1 0        1     2    1    0             1    B, D
#> setB|0 1           setB     0 1        1     3    0    1             1 C, E, F
#> setA&setB|1 1 setA&setB     1 1        2     1    1    1           1 1       A

# check to verify
signed_overlaps(inputlist, return_items=TRUE)$items
#> $`setA|1 0`
#> [1] "B" "D"
#> 
#> $`setB|0 1`
#> [1] "C" "E" "F"
#> 
#> $`setA&setB|1 1`
#> [1] "A"
#> 

# test specific factor level order
inputlist <- list(
   setA=factor(c("A", "B", "D"), levels=c("D", "B", "A")),
   setB=factor(c("A", "C", "E", "F")))
signed_overlaps(inputlist, return_items=TRUE)
#>                    sets overlap num_sets count setA setB overlap_label   items
#> setA|1 0           setA     1 0        1     2    1    0             1    B, D
#> setB|0 1           setB     0 1        1     3    0    1             1 C, E, F
#> setA&setB|1 1 setA&setB     1 1        2     1    1    1           1 1       A