Draw a volcano plot using reasonable default arguments.
volcano_plot(
x,
n = NULL,
lfc_colname = c("logfc", "log2fold", "log2fc", "lfc", "l2fc", "logratio", "log2ratio"),
fold_colname = c("fold", "fc", "ratio"),
fold_cutoff = 1.5,
fold_max_range = 16,
fold_min_range = 4,
sig_colname = c("adj.P.Val", "padj", "adj.pval", "adjp", "P.Value"),
sig_cutoff = 0.05,
sig_max_range = 1e-10,
sig_min_range = 1e-04,
expr_colname = c("mgm", "groupmean", "mean", "AveExpr", "fkpm", "rpkm", "tpm", "cpm"),
expr_cutoff = NULL,
label_colname = c("gene", "symbol", "protein", "probe", "assay"),
main = "Volcano Plot",
submain = NULL,
blockarrow = TRUE,
blockarrow_colors = c(hit = "#E67739FF", up = "#990000FF", down = "#000099FF"),
blockarrow_font = 1,
blockarrow_cex = c(1.2, 1.2),
blockarrow_label_cex = 1,
blockarrow_shadowtext = TRUE,
symmetric_axes = TRUE,
do_cutoff_caption = TRUE,
caption_cex = 0.8,
include_axis_prefix = FALSE,
n_x_labels = 12,
n_y_labels = 7,
xlim = NULL,
ylim = NULL,
pt_cex = 0.9,
pt_pch = 21,
hit_type = "hits",
color_set = c(base = "#77777777", up = "#99000088", down = "#00009988", hi =
"#FFDD55FF", hi_up = "#FFDD55FF", hi_down = "#FFDD55FF"),
border_set = NULL,
point_colors = NULL,
border_colors = NULL,
abline_color = "#000000AA",
smooth = TRUE,
smooth_func = jamba::plotSmoothScatter,
smooth_ramp = colorRampPalette(c("white", "lightblue", "lightskyblue3", "royalblue",
"darkblue", "orange", "darkorange1", "orangered2")),
tophist = FALSE,
tophist_cutoffs = c("pvalue", "foldchange"),
tophist_breaks = 100,
tophist_color = "#000099FF",
tophist_fraction = 1/3,
tophist_by = 0.2,
hi_points = NULL,
hi_colors = NULL,
hi_hits = FALSE,
hi_cex = 1,
do_both = FALSE,
label_hits = FALSE,
add_plot = FALSE,
xlab = NULL,
ylab = NULL,
cex.axis = 1.2,
mar_min = c(6, 5, 6, 5),
transFactor = 0.24,
transformation = function(x) {
x^transFactor
},
nbin = 256,
verbose = TRUE,
...
)
data.frame
that contains statistical results with at
least a P-value, and fold change or log2 fold change. It is
useful to contain a column with mean expression, and a column
with a relevant label.
integer
indicating the number of subset points to plot
for testing purposes.
character
string or vector used to match
colnames(x)
whose values should be log2 fold changes.
A direct match to colnames(x)
is performed
first, then if no column is found, the values are used as
regular expression patterns in order until the first
matching colname is found. Note that lfc_colname
is used
in preference to fold_colname
.
The colname used will appear as the x-axis label.
character
string or vector used to match
colnames(x)
whose values should be fold changes. Note that
if lfc_colname
successfully finds a value, the fold_colname
is not used.
The colname if used will appear as the x-axis label.
numeric
threshold for values in lfc_colname
or fold_cutoff
, where normal fold change values at or above
fold_cutoff
can be considered statistically significant.
Note that when lfc_colname
is being used, its values are
converted to normal fold change before applying this filter.
numeric
indicating the maximum range to display
on the x-axis fold change. This argument prevents extremely
large fold changes from compressing the useful visible range
of the figure.
numeric
indicating the minimum range to display
on the x-axis fold change. This argument is useful
when fold changes are low and the x-axis range would otherwise
be too small to be very useful.
character
string or vector used to match
colnames(x)
whose values should contain P-values of significance.
The P-values can be unadjusted (raw) P-values, or adjusted
P-values. The P-values are expected not to be -log10()
transformed.
The colname used will appear as the y-axis label.
numeric
threshold for values in sig_colname
,
where values at or below sig_cutoff
can be considered
statistically significant.
numeric
indicating the maximum range to display
on the y-axis significance. This argument prevents extremely
small P-values from compressing the useful visible range
of the figure.
numeric
indicating the minimum range to display
on the y-axis significance. This argument is useful
when P-values are not very significant, and you want to make
sure the y-axis range shows a minimum amount of the significant
region to be visually interpretable in that context.
character
string or vector used to match
colnames(x)
whose values should contain expression mean values.
This column is only used when expr_cutoff
is defined and
is applied to the filter criteria for statistical hits.
numeric
threshold for values in expr_colname
when expr_colname
is defined, where values in expr_colname
at or above expr_cutoff
can be considered statistically
significant. This threshold is useful to filter out potential
statistical hits whose signal is below a noise signal threshold.
character
string or vector used to match
colnames(x)
whose values should contain a useful label,
for example gene symbol or assay identifier.
character
string used as the main title of the figure.
character
string used as a sub-title of the figure.
logical
indicating whether block arrows should
be displayed and used to indicate the number of statistical hits.
arguments used when blockarrow=TRUE
.
logical
indicating whether the x-axis
log fold change range should be symmetric above and below zero.
logical
indicating whether to display
text caption with the statistical cutoff values used, and the
total number of points displayed.
numeric
caption font size adjustment.
logical
indicating whether to include
a prefix for the x-axis and y-axis labels: x-axis "Change"
;
y-axis "Significance"
.
integer
used by pretty()
to determine
the approximate number of x-axis and y-axis labels to display,
respectively.
numeric
used to define specific xlim
and ylim
axis ranges. When NULL
the ranges are defined automatically,
using fold_min_range
,fold_max_range
for the x-axis, and
sig_min_range
,sig_max_range
for the y-axis.
numeric
used to define point size and shape,
used only when individual points are displayed.
character
string used to label points that meet
the statistical cutoffs as "hits"
, but where it may be useful
to indicate the type of entry being tested. For example:
hit_type="genes"
indicates that each row represents a gene;
hit_type="probes"
indicates each row represents a probe;
hit_type="transcripts"
indicates each row represents a transcript.
character
vector of R colors, used only when individual
points are display. The names override default values, and may include:
"base"
- the base color of all points on the plot
"up"
- the color for up-regulated points that meet all
statistical cutoffs to be a "hit".
"down"
- the color for down-regulated points that meet all cutoffs
"hi"
- base color for highlighted points, used when hi_points
is defined.
"hi_up"
- color for highlighted up-regulated points.
"hi_down"
- color for highlighted down-regulated points.
NULL
or character
vector of R colors, used to
define point border colors such as pch=21
which is a filled circle
with border. When border_set=NULL
then it is defined by
jamba::makeColorDarker(color_set)
.
optional character
vector of R colors
recycled to length nrow(x)
, used to specify the exact color of each
point in x
. This argument is useful to colorize certain specific
points that may otherwise not meet statistical criteria.
character
string with R color used to color
the abline that indicates the x-axis fold_cutoff
value, and
y-axis sig_cutoff
value.
logical
indicating whether points should be drawn
as a smooth scatter plot, using jamba::plotSmoothScatter()
.
When smooth=FALSE
individual points are drawn, using
point_colors
, or when point_colors
is not defined the
default is to use color_set
to colorize points based upon
statistical cutoffs.
function
used to plot points when smooth=TRUE
,
by default jamba::plotSmoothScatter()
which has some benefits
over default graphics::smoothScatter()
.
character
vector of R colors which defines
the color gradient to use when smooth=TRUE
.
logical
indicating whether to display a histogram
at the top of the volcano plot figure.
arguments used when tophist=TRUE
.
character
vector indicating points to highlight
in the volcano plot, where values should match rownames(x)
.
This argument is useful to highlight a specific subset of points of
interest on the figure. Note that hi_points
are always
rendered as individual points even when smooth=TRUE
.
logical
indicating whether rows that meet all
statistical cutoffs and are considered "hits" should also be
treated as hi_points
for the purpose of rendering individual
points.
numeric
size adjustment for highlight points,
relative to the size of other points in the figure.
logical
indicating whether to draw both a smooth
scatter and individual points on the same figure.
logical
indicating whether to add a text label
for points that are statistical hits.
logical
indicating whether the plot should be
added to an existing plot, or when add_plot=FALSE
a new
plot is created. This argument is useful to re-run the
same volcano plot with alternate parameters, for example
to display different subsets of highlighted points.
character
strings used to specify the exact
x-axis label and y-axis label. When either value is NULL
the default is to use the relevant colname: x-axis uses
either lfc_colname
or fold_colname
; y-axis uses sig_colname
.
numeric
adjustment for axis label font sizes.
function
passed to smooth_func
used to
adjust the visual contrast of the resulting density plot.
numeric
value passed to smooth_func
and used
by jamba::plotSmoothScatter()
to adjust the number of
bins used to display the density of points, where a higher
value shows more detail, and a lower value shows less detail.
logical
indicating whether to print verbose output.
Note that verbose=2
will enable much more verbose output.
additional arguments are ignored.
vector used to ensure that each margin size is
at least a minimum value, applied to par("mar")
via
the function pmax()
.
Draw a volcano plot using a reasonably robust set of default arguments, and with a large number of customization options. The default plot uses smooth scatter plot for much improved display of point density.
This function produces a volcano plot, which consists of change on the x-axis, and significance on the y-axis.
In addition to displaying the volcano plot, this function also displays statistical thresholds, and marks entries as "hits" by up to three conceptual filters:
"change" - fold change fold_cutoff
"significant" - statistical P-value sig_cutoff
"detected" - signal expr_cutoff
If any cutoff is not defined, that filter is ignored.
Change is usually represented using log2 fold changes,
and in this case is labeled using normal scale fold change
values. The threshold is defined with fold_cutoff
using
normal space values. The log2 fold change values which
have greater magnitude than fold_cutoff
are marked
"changing".
Significance usually represents adjusted P-value, or raw
P-value if necessary. The threshold is defined with sig_cutoff
using a P-value below which entries are marked "significant".
Finally, since some statistical criteria also include a minimum
level of signal, a threshold expr_cutoff
requires an entry to
have signal at or above this value to be considered "detected".
The default behavior of volcano_plot()
is to render a
smooth scatter plot. A smooth scatter plot is much more
effective at representing the true point density along
the figure, which is one of the primary reasons to produce
the plot.
Other jam plot functions:
ggjammaplot()
n <- 15000;
set.seed(12);
x_lfc <- (rnorm(n) * 1);
x_lfc <- x_lfc^2 * sign(x_lfc);
x_lfc <- x_lfc[order(-abs(x_lfc) + rnorm(n) / 2)];
x_pv <- sort(10^-(rnorm(n)*1.5)^2);
x <- data.frame(
Gene=paste("gene", seq_len(n)),
`log2fold Group-Control`=x_lfc,
`P.Value Group-Control`=x_pv[order(-abs(x_lfc))],
`mgm Group-Contol`=((rnorm(1500)+5)^2)/5,
check.names=FALSE);
volcano_plot(x);
#> ## (12:00:13) 26Jun2023: volcano_plot(): sig_colname: P.Value Group-Control
#> ## (12:00:13) 26Jun2023: volcano_plot(): lfc_colname: log2fold Group-Control
#> Warning: coercing argument of type 'double' to logical
volcano_plot(x, expr_cutoff=3);
#> ## (12:00:13) 26Jun2023: volcano_plot(): sig_colname: P.Value Group-Control
#> ## (12:00:13) 26Jun2023: volcano_plot(): lfc_colname: log2fold Group-Control
#> ## (12:00:13) 26Jun2023: volcano_plot(): expr_colname: mgm Group-Contol
#> ## (12:00:13) 26Jun2023: volcano_plot(): 13160 values of 15000 met the threshold.
#> Warning: coercing argument of type 'double' to logical
# volcano_plot(x, mar_min=c(7, 6, 6, 5), blockarrow_cex=1);
# par("mfrow"=c(2, 1));
# volcano_plot(x);
# volcano_plot(x);
# par("mfrow"=c(1, 1));
x[["fold Group-Control"]] <- log2fold_to_fold(x[["log2fold Group-Control"]]);
x[["adj.P.Val Group-Control"]] <- x[["P.Value Group-Control"]];
volcano_plot(x, hi_hits=TRUE);
#> ## (12:00:13) 26Jun2023: volcano_plot(): sig_colname: adj.P.Val Group-Control
#> ## (12:00:13) 26Jun2023: volcano_plot(): lfc_colname: log2fold Group-Control
#> Warning: coercing argument of type 'double' to logical