Splice junction data.frame summary

spliceGR2junctionDF(
  spliceGRgene,
  exonsGR,
  spliceBuffer = 3,
  geneExonSep = "(:|_exon)",
  useOnlyValidEntries = FALSE,
  renameTooFar = TRUE,
  scoreColname = "score",
  sampleColname = "sample_id",
  flipNegativeStrand = TRUE,
  returnGRanges = FALSE,
  verbose = FALSE,
  ...
)

Arguments

spliceGRgene

GRanges object containing splice junctions, where the scoreColname contains numeric scores.

exonsGR

GRanges object containing flattened exons by gene, as is provided by flattenExonsBy().

spliceBuffer

integer distance allowed from a compatible exon boundary, for a junction read to be snapped to that boundary.

useOnlyValidEntries

logical indicating whether to remove junctions that do not align with a compatible exon boundary.

renameTooFar

logical indicating whether junctions are named by the nearest exon boundary and the distance to that boundary.

scoreColname, sampleColname

colnames in values(spliceGRgene) to define the score, and sample_id.

flipNegativeStrand

logical indicating whether to flip the orientation of negative strand features when matching exon boundaries. This argument is passed to closestExonToJunctions().

returnGRanges

logical indicating whether to return GRanges, or by default, data.frame.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Details

This function takes a GRanges object representing multiple splice junction ranges, with associated scores, and returns a data.frame summary of junctions with annotated boundaries using a set of gene exon models. Junctions whose ends are within spliceBuffer distance are combined, and the scores are summed.

By default, junctions not within spliceBuffer of a compatible exon boundary are named by the nearest exon boundary, and the distance upstream or downstream from the boundary.

Multiple samples can be processed together, and the results will be aggregated within each sample, using sampleColname. The results in that case may be cast to wide format using nameFromTo as the row identifier, score as the value column, and sampleColname as the new column headers.

See also