R/platjam-import-salmonquant.R
    import_salmon_quant.RdImport Salmon quant.sf files to SummarizedExperiment
import_salmon_quant(
  salmonOut_paths,
  import_types = c("tx", "gene", "gene_body", "gene_tx"),
  gtf = NULL,
  tx2gene = NULL,
  curation_txt = NULL,
  tx_colname = "transcript_id",
  gene_colname = "gene_name",
  gene_body_colname = "transcript_type",
  geneFeatureType = "exon",
  txFeatureType = "exon",
  countsFromAbundance = "lengthScaledTPM",
  gene_body_ids = NULL,
  trim_tx_from = c("[(][-+][)]"),
  trim_tx_to = c(""),
  verbose = FALSE,
  ...
)character vectors to each individual folder
that contains the "quant.sf" output file for Salmon.
character indicating which type or types of
data to return. Note that the distinction between gene and
gene_body is only relevant when there are transcript entries
defined with transcript_type="gene_body". These entries specifically
represent unspliced transcribed regions for a gene locus, and
only for multi-exon genes.
tx: transcript quantitation, direct import of quant.sf files.
gene: gene quantitation after calling tximport::summarizeToGene(),
excluding transcript_type="gene_body".
gene_body: gene quantitation after calling tximport::summarizeToGene(),
including transcript_type="gene_body".
character path to a GTF file, used only when tx2gene
is not supplied. When used, splicejam::makeTx2geneFromGtf() is
called to create a data.frame object tx2gene.
character path to file, or data.frame with at
least two columns matching tx_colname and gene_colname below.
When supplied, the gtf argument is ignored, unless the file
path is not accessible, or the data is not data.frame.
data.frame whose first column should match the
sample column headers found in the PD abundance columns, and
subsequent columns contain associated sample annotations.
If curation_txt is not supplied, then values will be split into
columns by _ underscore or " " whitespace characters.
character strings indicating colnames
in tx2gene that should be used.
tx_colname represents unique identifier for each transcript,
usually "transcript_id".
gene_colname represents a gene label associated with gene
summarized expression values, typically "gene_name".
character arguments passed to
splicejam::makeTx2geneFromGtf() only when supplying argument
gtf with a path to a GTF file.
character string passed to
tximport::summarizeToGene() to define the method for calculating
abundance.
character optional vector with specific row
identifiers that should be considered transcript_type="gene_body"
entries, relevant to argument import_types above. When gene_body_ids
is defined, these entries are used directly without using tx2gene.
When gene_body_ids is not defined, tx2gene$transcript_type is used
if present. If that column is not present, or does not contain any
entries with "gene_body", then all transcripts are used for
import_types="gene", and import_types="gene_body" is not valid
and therefore is not returned.
logical indicating whether to print verbose output.
additional arguments are passed to supporting functions.
character vector of regular expression
patterns to be used optionally to curate the values in tx_colname prior
to joining those values to tx2gene[[tx_colname]].
The default is to remove "(-)" and "(+)" from the transcript_id
(tx_colname) column.
list with SummarizedExperiment objects, each of which
contain assay names c("counts", "abundance", "length), where
c("counts", "abundance") are transformed with log2(1 + x).
The transform can be reversed with 10^x - 1.
The SummarizedExperiment objects by name:
"TxSE": transcript-level values imported from quant.sf.
"GeneSE": gene-level summary values, excluding
"gene_body" entries.
"GeneBodySE": gene-level summary values, including
"gene_body" entries.
"GeneTxSE": gene-level summary values, where transcripts are
combined to gene level, and "gene_body" entries are represented
separately, with suffix "_gene_body" added to the gene name.
This function is intended to automate the process of importing
a series of quant.sf files, then generating SummarizedExperiment
objects at the transcript and gene level. It optionally includes
sample annotation provided as a data.frame in argument curation_txt.
It also includes transcript and gene annotations through either
data.frame from argument tx2gene, or it derives tx2gene
from a GTF file from argument gtf. The GTF file option then calls
splicejam::makeTx2geneFromGtf().
This function can optionally process data that includes full length
gene body regions, annotated with "gene_body". This option is specific
for Salmon quantitation where the transcripts include full length
gene body for multi-exon genes, for example to measure unspliced
transcript abundance.
import_types="gene" summarizes only the proper transcripts,
excluding "gene_body" entries.
import_types="gene_body" summarizes all transcript
and full gene entries into one summary transcript abundance.
import_types="gene_tx" summarizes proper transcript to gene level,
and separately represents "gene_body" entries for comparison.
Other jam import functions: 
coverage_matrix2nmat(),
deepTools_matrix2nmat(),
frequency_matrix2nmat(),
import_lipotype_csv(),
import_metabolomics_niehs(),
import_nanostring_csv(),
import_nanostring_rcc(),
import_nanostring_rlf(),
import_proteomics_PD(),
import_proteomics_mascot(),
process_metab_compounds_file()