Make tx2gene data.frame from a GTF file
makeTx2geneFromGtf( GTF, geneAttrNames = c("gene_id", "gene_name", "gene_type"), txAttrNames = c("transcript_id", "transcript_type"), geneFeatureType = "gene", txFeatureType = c("transcript", "mRNA"), nrows = -1L, verbose = FALSE, ... )
GTF | character file name sent to |
---|---|
geneAttrNames | character vector of recognized attribute names as they appear in column 9 of the GTF file, for gene rows. |
txAttrNames | character vector of recognized attribute names as they appear in column 9 of the GTF file, for transcript rows. |
geneFeatureType | character value to match column 3 of the GTF file, used to define gene rows, by default "gene". |
txFeatureType | character value to match column 3 of the GTF file, used to define gene rows, by default "transcript". In some GTF files, "mRNA" is used, so either is accepted by default. |
nrows | integer number of rows to read from the GTF file, by default -1 means all rows are imported. This parameter is useful to check the results of a large GTF file using only a subset portion of the file. |
verbose | logical whether to print verbose output during processing. |
data.frame
with colnames indicated by the values in
geneAttrNames
and txAttrNames
.
Create a transcript-to-gene data.frame from a GTF file, which is required
by a number of transcriptome analysis methods such as those in
the DEXseq package, and the limma package functions such as
diffSplice()
.
This function also only uses data.table::fread()
and does not
import the full GTF file using something like Bioconductor
GenomicFeatures
, simply because the data.table method is markedly
faster when importing only the transcript-to-gene relationship. Also, this
method allows the import of more annotations than are supported by the
typical Bioconductor rtracklayer::import()
for GTF data.
This function is intended to help keep all transcript data consistent by using the same GTF file that is also used by other analysis tools, whether those tools be based in R or more likely, outside R in a terminal environment.
For example, the GTF file could be used:
to run STAR sequence alignment
then Rsubread::featureCounts()
to generate a matrix of read
counts per gene, transcript, or exon; or
to generate a transcript
FASTA sequence file then run a kmer quantitation tool such as
Salmon or Kallisto, then using tximport::tximport()
to import
results into R for downstream processing.
Other jam RNA-seq functions:
assignGRLexonNames()
,
closestExonToJunctions()
,
combineGRcoverage()
,
defineDetectedTx()
,
detectedTxInfo()
,
exoncov2polygon()
,
flattenExonsBy()
,
getGRcoverageFromBw()
,
groups2contrasts()
,
internal_junc_score()
,
make_ref2compressed()
,
prepareSashimi()
,
runDiffSplice()
,
sortSamples()
,
spliceGR2junctionDF()