Import data from Proteomics Discoverer

import_proteomics_PD(
  xlsx,
  sheet = 1,
  import_types = c("protein", "peptide"),
  ann_lib = c("org.Hs.eg.db"),
  curation_txt = NULL,
  remove_duplicate_peptides = TRUE,
  accession_from = NULL,
  accession_to = NULL,
  xref_df = NULL,
  verbose = FALSE,
  ...
)

Arguments

xlsx

character path to an Excel .xlsx file as exported from Proteomics Discoverer software.

sheet

integer or character used as index or direct character match with sheet name obtained with openxlsx::getSheetNames(xlsx).

import_types

character indicating which type or types of PD data to import.

ann_lib

character passed to genejam::freshenGenes3(), see documentation for alternate methods of passing one or more annotation libraries.

curation_txt

data.frame whose first column should match the sample column headers found in the PD abundance columns, and subsequent columns contain associated sample annotations. If curation_txt is not supplied, then values will be split into columns by _ underscore or " " whitespace characters.

remove_duplicate_peptides

logical indicating whether to remove rows with duplicate sequence-PTM combinations, which can occur when upstream PD is splitting the same measurement results across multiple annotation rows. Removing duplicate rows will retain the first non-duplicated entry in "SeqPTM" which is composed of the peptide sequence, and shortened post-translational modification in "PTM".

accession_from, accession_to

character vectors, that help manual curation from one accession number to another, intended when an accession number is not recognized by the Bioconductor annotation library, and a newer accession would be recognized. No gene left behind.

xref_df

data.frame that contains accession numbers in the first column, and annotation columns in additional columns, specifically using "SYMBOL", "ENTREZID", "GENENAME" as replacements for output from genejam::freshenGenes3().

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Details

This function is intended to provide a series of steps that import proteomics abundance data produced by Proteomics Discoverer (PD) and return a SummarizedExperiment object ready for downstream analysis.