Skip to contents

Directional Venn diagram

Usage

venndir(
  setlist,
  overlap_type = c("detect", "concordance", "each", "overlap", "agreement"),
  sets = NULL,
  set_colors = NULL,
  setlist_labels = NULL,
  legend_labels = NULL,
  draw_legend = TRUE,
  legend_signed = NULL,
  legend_font_cex = 1,
  proportional = FALSE,
  draw_footnotes = TRUE,
  show_labels = "Ncs",
  main = NULL,
  return_items = TRUE,
  show_items = c(NA, "none", "sign item", "sign", "item"),
  max_items = 3000,
  show_zero = FALSE,
  font_cex = c(1, 1, 0.75),
  fontfamily = "Arial",
  show_label = NA,
  display_counts = TRUE,
  poly_alpha = 0.6,
  alpha_by_counts = FALSE,
  label_style = c("basic", "fill", "shaded", "shaded_box", "lite", "lite_box"),
  label_preset = "none",
  template = c("wide", "tall"),
  marquee_styles = NULL,
  unicode = TRUE,
  big.mark = ",",
  curate_df = NULL,
  venn_jp = NULL,
  inside_percent_threshold = 0,
  item_cex = 1,
  item_style = c("default", "marquee", "text", "gridtext"),
  item_buffer = -0.15,
  item_degrees = 0,
  sign_count_delim = " ",
  padding = c(3, 2),
  r = 2,
  center = c(0, 0),
  segment_distance = 0.05,
  segment_buffer = -0.1,
  show_segments = TRUE,
  sep = "&",
  do_plot = TRUE,
  verbose = FALSE,
  debug = 0,
  circle_nudge = NULL,
  lwd = 0.3,
  rotate_degrees = 0,
  ...
)

Arguments

setlist

list of named vectors, whose names represent set items, and whose values represent direction using values c(-1, 0, 1).

overlap_type

character value indicating the type of overlap logic:

  • "each" records each combination of signs;

  • "overlap" disregards the sign and returns any match item overlap;

  • "concordance" represents counts for full agreement, or "mixed" for any inconsistent overlapping direction;

  • "agreement" represents full agreement in direction as "agreement", and "mixed" for any inconsistent direction.

sets

integer index with optional subset of sets in setlist for the Venn diagram. This option is useful when defining consistent set_colors for all entries in setlist.

set_colors

character vector of R colors, or default NULL to use categorical colors defined by colorjam::rainbowJam(). It will generate colors for every element in setlist even when a subset is defined with sets.

setlist_labels

character vector with optional custom labels to display in the Venn diagram. This option is intended when the names(setlist) are not suitable for display, but should still be maintained as the original names.

legend_labels

character vector with optional custom labels to display in the Venn legend. This option is intended when the names(setlist) are not suitable for a legend, but should still be maintained as the original names. The legend labels are typically single-line entries and should have relatively short text length.

draw_legend

logical passed to render_venndir(), and stored in the Venndir metadata.

legend_font_cex

numeric scalar, default 1, used to adjust the relative size of fonts with venndir_legender() when draw_legend=TRUE. This value is stored in metadata for persistence.

proportional

logical (default FALSE) indicating whether to draw proportional Venn circles, also known as a Euler diagram. Proportional circles are not guaranteed to represent all possible overlaps. Proportional circles are determined by calling eulerr::eulerr(). Use shape="ellipse" for eulerr() to provide elliptical shapes.

draw_footnotes

logical passed to render_venndir_footnotes(), default TRUE, and stored in the Venndir metadata. When TRUE, footnotes will be drawn if the exist in the 'metadata' slot of the Venndir object, which occurs only when there are overlaps which cannot be displayed due to the polygon geometry. Note that '...' ellipses are passed to render_venndir_footnotes() for arguments such as footnote_style and other customizations.

show_labels

character string to define the labels to display, and where they should be displayed. The definition uses a single letter to indicate each type of label to display, using UPPERCASE to display the label outside the Venn shape, and lowercase to display the label inside the Venn shape. The default "Ncs" displays _N_ame (outside), _c_ount (inside), and _s_igned count (inside).

The label types are defined below:

  • _N_ame: "n" or "N" - the set name, by default it is displayed.

  • _O_verlap: "o" or "O" - the overlap name, by default it is hidden, because these labels can be very long, also the overlap should be evident in the Venn diagram already.

  • _c_ount: "c" or "C" - overlap count, independent of the sign

  • _p_ercentage: "p" or "P" - overlap percentage, by default hidden, but available as an option

  • _s_igned count: "s" or "S" - the signed overlap count, tabulated based upon overlap_type ("each", "concordant", "agreement", etc/)

  • _i_tems: "i" only, by default hidden. When enabled, item labels defined by show_items are spread across the specific Venn overlap region.

main

character string used as a plot title, default NULL will render no title. When provided, it is rendered using gridtext::richtext_grob() which enables some Markdown-style formatting. The title is stored in venndir@metadata$main for persistence.

return_items

logical (default TRUE) indicating whether to return items in the overlap data. When FALSE item labels also cannot be displayed in the figure. The main reason not to return items is to conserve memory, for example if setlist is extremely large.

show_items

character used to define the item label, only used when the show_label entry includes "i" which enables item display inside the Venn diagram.

  • "item": shows only the item labels

  • "sign": shows only the sign of each item

  • "sign items": shows the sign and item together (or "item sign" will show the item, then the sign).

max_items

numeric (default 3000) indicating the maximum number of item labels to display when enabled.

show_zero

logical (default FALSE) indicating whether empty overlaps are labeled with count zero 0. When show_zero=TRUE the count zero label is displayed, otherwise no count label is shown.

font_cex

numeric vector recycled and applied in order:

  1. Set label

  2. Count label

  3. Signed count label(s)

The base font size is 16 points, so the defaults become 16, 16, 12 for set, count, and signed count labels, respectively. The default c(1, 1, 0.75) defines the signed count label slightly smaller than other labels.

  • When one value is provided, it is multiplied by c(1, 1, 0.75) so the proportional values are all adjusted together.

  • When two or more values are provided, the second value is used twice, to generate a vector with three values. This vector is multiplied by c(1, 1, 0.75). The purpose is to allow adjusting the set font independently, or the counts independently, while keeping the relative size between Count and Signed count.

  • When three values are provided, they are used as-is without change, which is the ideal way to define specific font sizes. For example, c(1, 1, 1) will use the same 16-point font for all labels.

fontfamily

character string to define the fontfamily. Default is "sans" because it should get mapped to a supported font for each graphics device, and any missing glyphs such as the Unicode upArrow and downArrow should be substituted with a suitable font with those glyphs. The fontfamily must match a font 'family' recognized by systemfonts. Use subset_systemfonts() to review values in column 'family', or systemfonts::font_info() to inspect possible font substitutions based upon weight, style, or other typography. These substitutions can be controlled in advanced, see systemfonts::font_fallback() and related functions to define substitutions upfront as needed.

In some circumstances, either the font or substitution is not compatible with PDF output, which tends to produce blank labels, presumably when the font encoded in the PDF is not available to the PDF viewer. You may check grDevices::pdfFonts() for more information. A potential workaround is to embed the glyphs or fonts using grDevices::embedGlyphs() or grDevices::embedFonts().

The ragg devices, and svglite device, have the best systemfonts support. RStudio works best with ragg output, which can be set with RStudio Options->General->Graphics then set 'Backend' to use 'AGG'. For ragg, try ragg::agg_png(), ragg::agg_tiff(), ragg::agg_jpeg(). For PDF, try cairo_pdf() or Cairo::CairoPDF().

The extreme fallback is to set unicode=FALSE, thereby avoiding Unicode arrows. Further, use fontfaces and set all values to 'plain' to avoid using bold fonts.

poly_alpha

numeric (default 0.6) value between 0 and 1, for alpha transparency of the polygon fill color. This value is ignored when alpha_by_counts=TRUE.

  • poly_alpha=1 is completely opaque (no transparency)

  • poly_alpha=0.8 is 80% opaque

alpha_by_counts

logical indicating whether to define alpha transparency to Venn polygon fill based upon the counts contained in each polygon. When TRUE the poly_alpha is ignored.

label_style

character string indicating the style for labels. Label color is adjusted based upon the determined background color, determined based upon the label fill color, and either the device background color, or Venn overlap fill color. There are pre-defined label styles.

  • "basic" no background shading

  • "fill" an opaque colored background

  • "shaded" a partially transparent colored background

  • "lite" a partially transparent lite background

  • "box" adds a dark border around the label region

label_preset

character deprecated in favor of show_labels. This argument is passed to venndir_label_style().

template

character (default "wide") describing the default layout for counts and signed counts. The value is stored in venndir@metadata$template for persistence.

  • "wide" - main counts on the left, right-justified; signed counts on the right, left-justified. This option is preferred for small numbers, and less-crowded diagrams.

  • "tall" - main counts, center-justified; signed counts below main counts, center-justified. This option is recommended for large numbers (where there are 1000 or more items in a single overlap region), or for crowded diagrams.

marquee_styles

list with optional marquee::style() objects, with each entry named by the inline tag to use. For example, list(cursive=marquee::style(family="Brush Script MT")) would create a new inline style 'cursive' which could be used like this: '{.cursive Some Cursive Text}' to apply that style.

When provided, marquee::classic_style() is used to create all basic HTML-like styles, then will be combined with additional styles present in marquee_styles.

unicode

logical (default TRUE) indicating whether to display Unicode arrows for signed overlaps. Passed to curate_venn_labels(). Use unicode=FALSE if the signed label is not displayed properly. The most common causes: (1) the R console (terminal) is not configured to allow Unicode (UTF-8 or UTF-16) characters; (2) the display font does not contain Unicode characters in the font set.

big.mark

character (default ",") passed to format() to augment numeric labels.

curate_df

data.frame or NULL passed to curate_venn_labels(), used to customize the formatting of signed overlaps.

venn_jp

NULL or optional JamPolygon which contains one polygon for each setlist, intended to allow custom shapes to be used. Otherwise get_venn_polygon_shapes() is called.

inside_percent_threshold

numeric (default 0) indicating the percent area that a Venn overlap region must contain in order for the count label to be displayed inside the region, otherwise the label is displayed outside the region. Values are expected to range from 0 to 100.

item_cex

numeric default 1, used to define baseline font size (single value), or exact font cex values (multiple values).

  • When a single value is provided, each set of items is used to define a font scaling, based upon the relative area of the overlap polygon to the max item polygon area, and the number of items in each polygon. These values are multiplied by item_cex to produce the final adjustment. These values are multiplied by item_cex_factor.

  • When multiple values are provided, they are recycled to the number of polygons that contain items, and applied in order. There is no further adjustment by polygon area, nor number of labels. These values are multiplied by item_cex_factor.

item_style

character string (default "text") indicating the style to display item labels when they are enabled.

  • "default" detects whether item labels contain "<br>" for newlines, and uses "gridtext" if that is the case, otherwise it uses "text" which is markedly faster.

  • "text" option is substantially faster, but does not allow markdown.

  • "gridtext": substantially slower for a large number of labels, but enables use of limited markdown by calling gridtext::richtext_grob(). Mostly useful for venn_meme().

item_buffer

numeric value (default -0.15) indicating the buffer adjustment applied to Venn overlap regions before arranging item labels. Passed to label_fill_JamPolygon() via render_venndir(). Negative values are recommended, so they shrink the region.

sign_count_delim

character string used as a delimiter between the sign and counts, when overlap_type is not "overlap".

padding

numeric padding in units "mm" (default c(3, 2)) for overlap count, and signed overlap count labels, in order.

r

numeric radius in units "mm" used for rounded rectangle corners for labels. Only visible when label_preset includes a background fill ("lite", "shaded", "fill"), or "box".

center

numeric coordinates relative to the plot bounding box, default c(0, 0) uses a center point in the middle (x=0) and slightly down (y=-0.15) from the plot center. It is used to place labels outside the diagram. In short, labels are placed by drawing a line from this center point, outward through the Venn overlap region to be labeled. The label is positioned outside the polygon region by segment_distance. The default c(0, -0.15) ensures that labels tend to be at the top of the plot, and not on the left/right side of the plot. This argument is passed along to label_outside_JamPolygon().

segment_distance

numeric value, default 0.05, the distance between outside labels and the outer edge of the Venn diaram region, relative to the size of the Venn polygons. The default 0.05 is approximately a 5% buffer outside. Note that when labels are placed outside (using show_labels) the outside label coordinates are used to define the plot range, which causes the Venn diagram itself to shrink accordingly.

sep

character used as a delimiter between set names, the default is "&".

do_plot

`logical (default TRUE) indicating whether to generate the the figure.

  • When do_plot=TRUE it calls render_venndir() to create grid objects to be displayed. Arguments in ... are passed to render_venndir(): To hide display, use do_draw=FALSE. To prevent calling grid::grid.newpage() so the plot can be drawn inside another active display device, use do_newpage=FALSE.

  • When do_plot=FALSE the returned Venndir object can be passed to render_venndir() to render the figure. Same points are valid regarding do_draw and do_newpage, which are arguments

verbose

logical indicating whether to print verbose output.

debug

numeric optional internal debug.

circle_nudge

list of numeric x,y vectors. Not yet re-implemented after the version 0.0.30.900 update.

rotate_degrees

numeric value in degrees, allowing rotation of the Venn diagram. Not yet re-implemented after version 0.0.30.900.

...

additional arguments are passed to internal functions, notably:

Value

Venndir object with slots:

  • "jps": JamPolygon which contains each set polygon, and each overlap polygon defined for the Venn diagram.

  • "label_df": data.frame which contains the coordinates for each Venn set, and Venn overlap label.

  • "setlist": list as input to venndir(). This entry may be empty.

When do_plot=TRUE this function also calls render_venndir(), and returns the grid graphical objects (grobs) in the attributes:

  • "gtree": a grid::gTree object suitable for drawing with grid::grid.draw(attr(vo, "gtre"))

  • "grob_list": a list of grid object components used to build the complete diagram, they can be plotted individually, or assembled with do.call(grid::gList, grob_list). The grid::gList can be assembled into a gTree with: grid::grobTree(gList=do.call(grid::gList, grob_list)

  • "viewport": the grid::viewport that holds important context for the graphical objects, specifically the use of coordinate grid::unit measure "snpc", which maintains a fixed aspect ratio.

Details

This function takes 'setlist' list as input, produces a Venndir object and plots the data by default.

When the input 'setlist' is a list of character vectors, it will produce basic Venn overlap counts.

When the input 'setlist' is a 'list' of numeric vectors, the vector element names are used as items, and the values are considered the directionality, or "sign". The overlaps are tabulated and delineated by the 'overlap_type' requested:

  • overlap_type="detect" - by default it will use "concordance" when the input data contains directionality.

  • overlap_type="concordance" - counts are organized as up/up or down/down, or "discordant".

  • overlap_type="each" - counts are organized by each combination of up/down for each overlap.

  • overlap_type="overlap" - counts are organized without using sign.

  • overlap_type="agreement" - counts are organized by "agreement" (up/up, or down/down), or "disagreement" (up/down, down/up).

Label options

  • The argument 'show_labels' is used to define which labels are displayed.

  • Labels are enabled using a single letter, defined below.

    • UPPERCASE places the label outside.

    • lowercase places the label inside the Venn diagram.

    • Note that some labels cannot be placed outside (item labels). Similarly, when item labels are enabled, counts cannot be displayed inside, and must be outside or hidden.

  • N - set _N_ame

  • c - _c_ount for each overlap

  • s - _s_igned count for each overlap

  • p - _p_ercent total items represented in each overlap

  • i - _i_tem labels for those items within each overlap

Item labels

  • When item labels are enabled, the placement is defined by label_fill_JamPolygon(), which uses an offset method, essentially filling rows of labels left-to-right, alternating higher/lower across each row.

  • Items are sorted by sign if present, then by label.

    • To control the order that the signs are sorted, see curate_venn_labels() and argument curate_df to define custom order for each sign.

    • For venn_meme() item labels, they are displayed in the same order as provided.

  • Item label font size is adjusted by default for each overlap polygon, proportional to the available area relative to the total Venn area.

  • Item label font sizes can be customized using item_cex.

    • A single value will be applied to the auto-scaling font sizes, adjusting all fonts consistently.

    • Multiple values will be recycled across the total number of overlap regions, applying font size to each region as drawn in order.

  • Items are rendered using marquee::marquee_grob() (default) or grid::grid.text().

    • The default marquee interprets items as markdown, which enables text styling, line wrap, inline styles '.style text', and embedded images or R graphics objects ''. Marquee offers robust support for Unicode arrows using whichever font is requested. By default, labels convert newline to a forced markdown newline, which uses two spaces at the end of the line. This change is only performed when newline does not have whitespace immediately before it. Thus, to avoid this behavior, use one space before a newline.

    • The alternative grid::grid.text() might be faster for large number of labels. It does not support markdown and will render text exactly as provided. It also does not font substitution, which means any missing character glyphs, or uninterpreted system locale, will render problem characters using something like an empty box '[]'.

Metadata

A number of arguments are also stored in metadata() of the Venndir-class object. When also provided as a specific argument to render_venndir() or plot(), the argument value takes priority over the internal metadata.

Further, new values defined or updated by render_venndir() are updated in the returned Venndir-class object metadata, for persistence. Notably, calling venndir() which passes extra arguments in ellipses '...'. These arguments are passed to render_venndir() when do_plot=TRUE, and corresponding metadata values will be updated in the Venndir-class object returned by venndir().

Notable example is expand_fraction whose default values are NULL, but are defined in render_venndir() based upon the draw_legend, legend_x and main arguments. See the expand_fraction argument help text for detailed rules. Changing legend_x later in a separate call to render_venndir() or plot() will not automatically update expand_fraction since it will have been stored once already.

See also

Other venndir core: render_venndir(), textvenn(), venn_meme()

Examples

setlist <- make_venn_test(100, 3, do_signed=FALSE);

setlist <- make_venn_test(100, 3, do_signed=TRUE);
vo <- venndir(setlist)

jamba::sdim(vo);
#>          rows cols      class
#> jps        10      JamPolygon
#> label_df   21   32 data.frame
#> setlist     3            list
#> metadata   22            list

# custom set labels
vo <- venndir(setlist,
   setlist_labels=paste("set", LETTERS[1:3]))


# custom set labels with Markdown custom colors
vo <- venndir(setlist,
   setlist_labels=paste0("Set <span style='color:blue'>", LETTERS[1:3], "</span>"))


# custom set and legend labels
vo <- venndir(setlist,
   setlist_labels=paste0("set<br>", LETTERS[1:3]),
   legend_labels=paste("Set", LETTERS[1:3]))


# custom set and legend labels
# proportional
# Set Name is inside with show_labels having lowercase "n"
vo <- venndir(setlist,
   proportional=TRUE,
   show_labels="ncs",
   label_style="lite box",
   setlist_labels=paste0("Set: ", LETTERS[1:3]),
   legend_labels=paste("Set", LETTERS[1:3]))