Get Salmon metadata and aux info into a data.frame

get_salmon_meta(metafile, exclude_hashes = TRUE, ...)

Arguments

metafile

character vector of one or more files, usually the full file path to the meta_info.json file after running Salmon quant. The path metafile should be the path to any output file from one Salmon quant analysis.

exclude_hashes

logical indicating whether to drop columns that contain file hashes.

...

additional arguments are ignored.

Value

data.frame whose number of rows is equal to the number of unique Salmon root directories in the input metafile. For any input metafile not found, the output is NULL.

Details

This function takes a file path to one or more Salmon output files, uses that path to locate the full set of available files, loads data from each of the discovered files, and returns the results in a data.frame format.

This function uses rprojroot::find_root() to find the root directory, defined as the directory that contains the file "cmd_info.json". The path to "meta_info.json" is constructed relative to that location.

Recognized files:

  • meta_info.json - typically in a subdirectory aux_info/meta_info.json

  • cmd_info.json - typically in the same directory as the aux_info directory.

If a relative path to "cmd_info.json" cannot be determined, this function returns NULL.

When the input metafile includes multiple files, only the unique Salmon root directories are returned.

This function uses jsonlite to read each JSON file, which is converted to a data.frame. Any JSON fields that contain multiple values are comma-delimited using jamba::cPaste() in order to fit on one row in the data.frame.

See also

Other jam nextgen sequence functions: get_salmon_root(), parse_salmon_flenfile(), save_salmon_qc_xlsx()

Examples

cmdinfopath <- system.file("data", "salmonOut", "cmd_info.json", package="platjam");
if (nchar(cmdinfopath) > 0) {
   get_salmon_meta(cmdinfopath);
}
#>   salmon_version samp_type opt_type num_libraries library_types
#> 1         0.11.2      none       vb             1           ISR
#>   frag_dist_length seq_bias_correct gc_bias_correct num_bias_bins mapping_type
#> 1             1001             TRUE            TRUE          4096      mapping
#>   num_targets serialized_eq_classes           length_classes num_bootstraps
#> 1      205259                 FALSE 509,656,1031,2287,103053              0
#>   num_processed num_mapped percent_mapped  call               start_time
#> 1      49484736   28143781       56.87366 quant Mon Aug  6 16:21:05 2018
#>                   end_time
#> 1 Mon Aug  6 16:23:48 2018
#>                                                                                                                                index
#> 1 /ddn/gs1/shared/fargod/reference_genomes/hg19/hg19gencode/gencode.v28lift37.annotation/gencode.v28lift37.transcripts_quasi-k31.idx
#>   threads libType useVBOpt seqBias gcBias
#> 1      60       A useVBOpt seqBias gcBias
#>                                                                                                        mates1
#> 1 SW13_none_A-NS50728_1.sickle.sanger.cutadapt.fastq.gz,SW13_none_A-NS50729_1.sickle.sanger.cutadapt.fastq.gz
#>                                                                                                        mates2
#> 1 SW13_none_A-NS50728_2.sickle.sanger.cutadapt.fastq.gz,SW13_none_A-NS50729_2.sickle.sanger.cutadapt.fastq.gz
#>                                output   auxDir
#> 1 SW13_none_A-lA-vbo-seq-gc_salmonOut aux_info
#>                                                                                                                                                                                                                           read_files
#> 1 ( SW13_none_A-NS50728_1.sickle.sanger.cutadapt.fastq.gz, SW13_none_A-NS50728_2.sickle.sanger.cutadapt.fastq.gz ), ( SW13_none_A-NS50729_1.sickle.sanger.cutadapt.fastq.gz, SW13_none_A-NS50729_2.sickle.sanger.cutadapt.fastq.gz )
#>   expected_format compatible_fragment_ratio num_compatible_fragments
#> 1             ISR                         1                 28143781
#>   num_assigned_fragments num_frags_with_consistent_mappings
#> 1               28143781                           24661298
#>   num_frags_with_inconsistent_or_orphan_mappings MSF OSF ISF MSR OSR      ISR
#> 1                                        3482716   0   0 889   0   0 24661298
#>        SF      SR MU OU IU U
#> 1 1707293 1774534  0  0  0 0