Import proteomics data from Mascot

import_proteomics_mascot(
  file,
  sheet = 1,
  ann_lib = c("org.Hs.eg.db", "org.Mm.eg.db", "org.Rn.eg.db"),
  curation_txt = NULL,
  accession_from = NULL,
  accession_to = NULL,
  xref_df = NULL,
  measurements = c("totalIntensity", "numSpectra"),
  accession_colname = "accession",
  delim = "[/]",
  try_list = c("SYMBOL2EG", "ACCNUM2EG", "UNIPROT2EG", "ENSEMBLPROT2EG", "ALIAS2EG"),
  verbose = FALSE,
  ...
)

Arguments

file

character path to a file containing proteomics data

sheet

integer or character name of worksheet when file is an Excel xlsx formatted file.

ann_lib

character passed to genejam::freshenGenes3(), see documentation for alternate methods of passing one or more annotation libraries.

curation_txt

data.frame whose first column should match the sample column headers found in the PD abundance columns, and subsequent columns contain associated sample annotations. If curation_txt is not supplied, then values will be split into columns by _ underscore or " " whitespace characters.

accession_from, accession_to

character vectors, that help manual curation from one accession number to another, intended when an accession number is not recognized by the Bioconductor annotation library, and a newer accession would be recognized. No gene left behind.

xref_df

data.frame that contains accession numbers in the first column, and annotation columns in additional columns, specifically using "SYMBOL", "ENTREZID", "GENENAME" as replacements for output from genejam::freshenGenes3().

verbose

logical indicating whether to print verbose output.

...

additional arguments are passed to jamba::readOpenxlsx().

Value

SummarizedExperiment object

Examples

# TODO: replace with smaller test data in the appropriate format
mascot_file <- file.path(path.expand("~/Projects/Hu/hu_msprot_turboid"),
   "Lackford_all_090822.xlsx")
protein_df <- jamba::readOpenxlsx(mascot_file, sheet=1)[[1]];
se <- import_proteomics_mascot;