Stack the y-axis position of junctions

stackJunctions(
  gr,
  scoreColname = "score",
  sampleColname = "sample_id",
  scoreFactor = 1,
  matchFrom = NULL,
  matchTo = NULL,
  strandedScore = TRUE,
  baseline = NULL,
  verbose = FALSE,
  ...
)

Arguments

gr

GRanges object representing splice junctions.

scoreColname

character string matching one of colnames(values(gr)) that contains a numeric value representing the abundance of each splice junction observed.

sampleColname

character string with the column or columns that contain biological sample identifier, used to ensure junctions are only stacked within a sample, and not across samples. When sampleColname is NULL, all junctions are stacked.

scoreFactor

numeric value multiplied by the value in scoreColname to allow scaling the junctions across samples. Note that scoreFactor can be a vector, which would be applied to the vector of scores.

matchFrom, matchTo

optional colnames to use when grouping junctions at the start and end positions. By default "nameFrom" and "nameTo" are used, as output from closestExonToJunctions(), which has the benefit of grouping junctions within the spliceBuffer distance from exon boundaries. If those values are not present colnames(values(gr)) then the new default c("seqnames", "start", "strand") is used for matchFrom, and c("seqnames", "end", "strand") is used for matchTo. That said, if matchFrom or matchTo are supplied, those colnames are used from as.data.frame(gr). Multiple colnames are allowed. Note also that sampleColname is appended to matchFrom and matchTo to ensure that matching is only performed within each sampleColname value.

strandedScore

logical indicating whether to enforce negative scores for junctions on the "-" strand. Note that when strandedScore is true, all "-" strand scores will be negative, and all other scores with be positive.

baseline

numeric vector of length 0, 1 or length(gr), with values added to the y-axis value for junctions. If baseline has names matching names(gr) they will be used for each gr entry; if baseline is not named, values are recycled to length(gr). The purpose is to allow exons to be shifted up or down on the y-axis, along with associated junctions and coverage data (see exoncov2polygon() for another example.)

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Value

GRanges with colnames c("yStart", "yEnd") added to values(gr), indicating the baseline y-axis position for the start and end of the junction arc. The score values(gr)[[scoreColname]] will reflect the adjustments by scoreFactor, and if strandedScore=TRUE then all strand "-" scores will be negative, all other scores will be positive.

Details

This function is intended to help visualize splice junctions specifically when plotted using geom_diagonal_wide_arc(), where the height of the junction arc is defined by the score. When two junctions have the same start position, their y-positions are stacked, such that the shorter junction width is placed before longer junction widths. The intention is to reduce visible overlaps.

The input data is expected to have annotations similar to those provided by closestExonToJunctions(), specifically the columns "nameFrom" and "nameTo", see examples below. When the input data does not contain columns "nameFrom" and and "nameTo", the junctions are by default stacked by coordinates.

See also

Examples

library(GenomicRanges); library(ggplot2); library(ggforce); library(colorjam); library(jamba); grExons <- GRanges(seqnames=rep("chr1", 4), ranges=IRanges::IRanges( start=c(100, 300, 500, 900), end=c(200, 400, 750, 1000)), strand=rep("+", 4)); names(grExons) <- jamba::makeNames(rep("exon", length(grExons)), suffix=""); grJunc <- GRanges(seqnames=rep("chr1", 6), ranges=IRanges::IRanges(start=c(200, 200, 400, 400, 750, 750), end=c(300, 500, 500, 900, 900, 1200)), strand=rep("+", 6), score=c(200, 50, 160, 40, 210, 10)); names(grJunc) <- jamba::makeNames(rep("junc", length(grJunc))); # quick plot showing exons and junctions using rectangles grl <- c( GRangesList(exons=grExons), split(grJunc, names(grJunc)) ); ggplot(grl2df(grl), aes(x=x, y=y, group=id, fill=feature_type)) + ggforce::geom_shape() + scale_y_continuous(breaks=seq_along(grl)-1, labels=names(grl)) + colorjam::theme_jam() + colorjam::scale_fill_jam() + ggtitle("Schematic of exons and junctions GRanges");
#> Warning: failed to set names on the unlisted CompressedRleList object
# add annotation for closest known exon grJunc <- closestExonToJunctions(grJunc, grExons, spliceBuffer=5)$spliceGRgene; # The un-stacked junctions grlJunc2df1 <- grl2df(grJunc, shape="junction", doStackJunctions=FALSE); ggplot(grlJunc2df1, aes(x=x, y=y, group=gr_name, fill=gr_name)) + geom_diagonal_wide_arc(alpha=0.7) + colorjam::scale_fill_jam() + colorjam::theme_jam() + ggtitle("Junctions not stacked at boundaries")
# The stacked junctions grJunc2 <- stackJunctions(grJunc); grlJunc2df2 <- grl2df(grJunc2, scoreArcMinimum=20, shape="junction"); ggplot(grlJunc2df2, aes(x=x, y=y, group=gr_name, fill=gr_name)) + geom_diagonal_wide_arc(alpha=0.7) + colorjam::scale_fill_jam() + colorjam::theme_jam() + ggtitle("Junctions stacked at boundaries");
## Another view showing the junction_rank ## based upon max reads entering and exiting each exon edge ggplot(grlJunc2df2, aes(x=x, y=y, group=gr_name)) + geom_diagonal_wide_arc(aes(alpha=junction_rank), fill="orange") + scale_alpha_manual(values=c(`1`=0.4, `2`=0.6, `3`=0.7)) + colorjam::scale_fill_jam() + colorjam::theme_jam() + ggtitle("Junctions stacked at boundaries")
## Last example showing how two samples are kept separate grJunc_samples <- c(grJunc, grJunc); values(grJunc_samples)[,"sample_id"] <- rep(c("SampleA","SampleB"), each=length(grJunc)); names(grJunc_samples) <- jamba::makeNames(values(grJunc_samples)[,"sample_id"]); grlJunc2df_samples <- grl2df(grJunc_samples, scoreArcMinimum=20, shape="junction"); ggplot(grlJunc2df_samples, aes(x=x, y=y, group=gr_name, fill=gr_name)) + geom_diagonal_wide_arc(alpha=0.7, show.legend=FALSE) + colorjam::scale_fill_jam() + colorjam::theme_jam() + ggtitle("Junctions stacked at boundaries") + facet_wrap(~sample_id)