R/platjam-import-salmonquant.R
import_salmon_quant.Rd
Import Salmon quant.sf files to SummarizedExperiment
import_salmon_quant(
salmonOut_paths,
import_types = c("tx", "gene", "gene_body", "gene_tx"),
gtf = NULL,
tx2gene = NULL,
curation_txt = NULL,
tx_colname = "transcript_id",
gene_colname = "gene_name",
gene_body_colname = "transcript_type",
geneFeatureType = "exon",
txFeatureType = "exon",
countsFromAbundance = "lengthScaledTPM",
gene_body_ids = NULL,
trim_tx_from = c("[(][-+][)]"),
trim_tx_to = c(""),
verbose = FALSE,
...
)
character
vectors to each individual folder
that contains the "quant.sf"
output file for Salmon.
character
indicating which type or types of
data to return. Note that the distinction between gene
and
gene_body
is only relevant when there are transcript entries
defined with transcript_type="gene_body"
. These entries specifically
represent unspliced transcribed regions for a gene locus, and
only for multi-exon genes.
tx
: transcript quantitation, direct import of quant.sf
files.
gene
: gene quantitation after calling tximport::summarizeToGene()
,
excluding transcript_type="gene_body"
.
gene_body
: gene quantitation after calling tximport::summarizeToGene()
,
including transcript_type="gene_body"
.
character
path to a GTF file, used only when tx2gene
is not supplied. When used, splicejam::makeTx2geneFromGtf()
is
called to create a data.frame
object tx2gene
.
character
path to file, or data.frame
with at
least two columns matching tx_colname
and gene_colname
below.
When supplied, the gtf
argument is ignored, unless the file
path is not accessible, or the data is not data.frame
.
data.frame
whose first column should match the
sample column headers found in the PD abundance columns, and
subsequent columns contain associated sample annotations.
If curation_txt
is not supplied, then values will be split into
columns by _
underscore or " "
whitespace characters.
character
strings indicating colnames
in tx2gene
that should be used.
tx_colname
represents unique identifier for each transcript,
usually "transcript_id"
.
gene_colname
represents a gene label associated with gene
summarized expression values, typically "gene_name"
.
character
arguments passed to
splicejam::makeTx2geneFromGtf()
only when supplying argument
gtf
with a path to a GTF file.
character
string passed to
tximport::summarizeToGene()
to define the method for calculating
abundance.
character
optional vector with specific row
identifiers that should be considered transcript_type="gene_body"
entries, relevant to argument import_types
above. When gene_body_ids
is defined, these entries are used directly without using tx2gene
.
When gene_body_ids
is not defined, tx2gene$transcript_type
is used
if present. If that column is not present, or does not contain any
entries with "gene_body"
, then all transcripts are used for
import_types="gene"
, and import_types="gene_body"
is not valid
and therefore is not returned.
logical
indicating whether to print verbose output.
additional arguments are passed to supporting functions.
character
vector of regular expression
patterns to be used optionally to curate the values in tx_colname
prior
to joining those values to tx2gene[[tx_colname]]
.
The default is to remove "(-)"
and "(+)"
from the transcript_id
(tx_colname
) column.
list
with SummarizedExperiment
objects, each of which
contain assay names c("counts", "abundance", "length)
, where
c("counts", "abundance")
are transformed with log2(1 + x)
.
The transform can be reversed with 10^x - 1
.
The SummarizedExperiment
objects by name:
"TxSE"
: transcript-level values imported from quant.sf
.
"GeneSE"
: gene-level summary values, excluding
"gene_body"
entries.
"GeneBodySE"
: gene-level summary values, including
"gene_body"
entries.
"GeneTxSE"
: gene-level summary values, where transcripts are
combined to gene level, and "gene_body"
entries are represented
separately, with suffix "_gene_body"
added to the gene name.
This function is intended to automate the process of importing
a series of quant.sf
files, then generating SummarizedExperiment
objects at the transcript and gene level. It optionally includes
sample annotation provided as a data.frame
in argument curation_txt
.
It also includes transcript and gene annotations through either
data.frame
from argument tx2gene
, or it derives tx2gene
from a GTF file from argument gtf
. The GTF file option then calls
splicejam::makeTx2geneFromGtf()
.
This function can optionally process data that includes full length
gene body regions, annotated with "gene_body"
. This option is specific
for Salmon quantitation where the transcripts include full length
gene body for multi-exon genes, for example to measure unspliced
transcript abundance.
import_types="gene"
summarizes only the proper transcripts,
excluding "gene_body"
entries.
import_types="gene_body"
summarizes all transcript
and full gene entries into one summary transcript abundance.
import_types="gene_tx"
summarizes proper transcript to gene level,
and separately represents "gene_body"
entries for comparison.
Other jam import functions:
coverage_matrix2nmat()
,
deepTools_matrix2nmat()
,
frequency_matrix2nmat()
,
import_lipotype_csv()
,
import_metabolomics_niehs()
,
import_nanostring_csv()
,
import_nanostring_rcc()
,
import_nanostring_rlf()
,
import_proteomics_PD()
,
import_proteomics_mascot()
,
process_metab_compounds_file()