Get Salmon metadata and aux info into a data.frame
get_salmon_meta(metafile, exclude_hashes = TRUE, ...)
character vector of one or more files, usually the
full file path to the meta_info.json
file after running Salmon
quant. The path metafile
should be the path to any output file
from one Salmon quant analysis.
logical indicating whether to drop columns that contain file hashes.
additional arguments are ignored.
data.frame
whose number of rows is equal to the number
of unique Salmon root directories in the input metafile
.
For any input metafile
not found, the output is NULL
.
This function takes a file path to one or more Salmon
output files, uses that path to locate the full set of
available files, loads data from each of the discovered
files, and returns the results in a data.frame
format.
This function uses rprojroot::find_root()
to find the
root directory, defined as the directory that contains
the file "cmd_info.json"
. The path to "meta_info.json"
is constructed relative to that location.
Recognized files:
meta_info.json
- typically in a subdirectory aux_info/meta_info.json
cmd_info.json
- typically in the same directory as the aux_info
directory.
If a relative path to "cmd_info.json"
cannot be determined, this
function returns NULL
.
When the input metafile
includes multiple files, only
the unique Salmon root directories are returned.
This function uses jsonlite
to read each JSON file, which
is converted to a data.frame
. Any JSON fields that contain
multiple values are comma-delimited using jamba::cPaste()
in order to fit on one row in the data.frame
.
Other jam nextgen sequence functions:
get_salmon_root()
,
parse_salmon_flenfile()
,
save_salmon_qc_xlsx()
cmdinfopath <- system.file("data", "salmonOut", "cmd_info.json", package="platjam");
if (nchar(cmdinfopath) > 0) {
get_salmon_meta(cmdinfopath);
}
#> salmon_version samp_type opt_type num_libraries library_types
#> 1 0.11.2 none vb 1 ISR
#> frag_dist_length seq_bias_correct gc_bias_correct num_bias_bins mapping_type
#> 1 1001 TRUE TRUE 4096 mapping
#> num_targets serialized_eq_classes length_classes num_bootstraps
#> 1 205259 FALSE 509,656,1031,2287,103053 0
#> num_processed num_mapped percent_mapped call start_time
#> 1 49484736 28143781 56.87366 quant Mon Aug 6 16:21:05 2018
#> end_time
#> 1 Mon Aug 6 16:23:48 2018
#> index
#> 1 /ddn/gs1/shared/fargod/reference_genomes/hg19/hg19gencode/gencode.v28lift37.annotation/gencode.v28lift37.transcripts_quasi-k31.idx
#> threads libType useVBOpt seqBias gcBias
#> 1 60 A useVBOpt seqBias gcBias
#> mates1
#> 1 SW13_none_A-NS50728_1.sickle.sanger.cutadapt.fastq.gz,SW13_none_A-NS50729_1.sickle.sanger.cutadapt.fastq.gz
#> mates2
#> 1 SW13_none_A-NS50728_2.sickle.sanger.cutadapt.fastq.gz,SW13_none_A-NS50729_2.sickle.sanger.cutadapt.fastq.gz
#> output auxDir
#> 1 SW13_none_A-lA-vbo-seq-gc_salmonOut aux_info
#> read_files
#> 1 ( SW13_none_A-NS50728_1.sickle.sanger.cutadapt.fastq.gz, SW13_none_A-NS50728_2.sickle.sanger.cutadapt.fastq.gz ), ( SW13_none_A-NS50729_1.sickle.sanger.cutadapt.fastq.gz, SW13_none_A-NS50729_2.sickle.sanger.cutadapt.fastq.gz )
#> expected_format compatible_fragment_ratio num_compatible_fragments
#> 1 ISR 1 28143781
#> num_assigned_fragments num_frags_with_consistent_mappings
#> 1 28143781 24661298
#> num_frags_with_inconsistent_or_orphan_mappings MSF OSF ISF MSR OSR ISR
#> 1 3482716 0 0 889 0 0 24661298
#> SF SR MU OU IU U
#> 1 1707293 1774534 0 0 0 0