Import Ingenuity IPA enrichment results

importIPAenrichment(
  ipaFile,
  headerGrep =
    "(^|\t)((expr.|-log.|)p-value|Pvalue|Score\t|Symbol\t|Ratio\t|Consistency.Score|Master.Regulator\t)",
  ipaNameGrep = c("Pathway", "Regulator$", "Regulators", "Regulator", "Disease",
    "Toxicity", "Category", "Categories", "Function", "Symbol$", "^ID$",
    "My.(Lists|Pathways)"),
  geneGrep = c("Molecules in Network", "Target molecules", "Molecules", "Symbol"),
  geneCurateFrom = c("[ ]*[(](complex|includes others)[)][ ]*", "^[, ]+|[, ]+$"),
  geneCurateTo = c("", ""),
  method = 1,
  sheet = 1,
  sep = "\t",
  xlsxMultiSheet = TRUE,
  useXlsxSheetNames = FALSE,
  remove_blank_colnames = TRUE,
  convert_ipa_slash = TRUE,
  ipa_slash_sep = ":",
  revert_ipa_xref = TRUE,
  verbose = FALSE,
  ...
)

Arguments

ipaFile

one of the four input types described above: a character vector of text file names; a character vector of Excel .xlsx file names; a list of data.frame objects.

headerGrep

regular expression pattern used to recognize header columns found in Ingenuity IPA enrichment data.

ipaNameGrep

vector of regular expression patterns used to recognize the name of the enriched entity, for example the biological pathway, or network, or disease category, etc.

geneGrep

regular expression pattern used to recognize the column containing genes, or the molecules tested for enrichment which were found in the enriched entity.

geneCurateFrom, geneCurateTo

vector of patterns and replacements, respectively, used to curate values in the gene column. These replacement rules are used to ensure that genes are delimited consistently, with no leading or trailing delimiters.

method

integer value indicating the method used to import data from a text file, where: method=1 uses data.table::read.table() and the textConnection argument; method=2 uses readr::read_tsv(). The motivation to use data.table::read.table() is it performed better in the presence of UTF-8 characters such as the alpha symbol.

sheet

integer value used only when ipaFile is a vector of Excel .xlsx files, and when the Excel format includes multiple worksheets. This value will extract enrichment data only from one worksheet from each Excel file.

sep

character string used when ipaFile is a vector of text files, to split fields into columns. The default will split fields by the tab character.

xlsxMultiSheet

logical indicating whether input Excel .xlsx files contain multiple worksheets.

useXlsxSheetNames

logicl indicating whether to use the Excel worksheet name for each imported enrichment table, when importing from .xlsx files, and when xlsxMultiSheet=FALSE. When xlsxMultiSheet=TRUE the name is derived from the value matched using ipaNameGrep, because in this case, there are expected to me multiple enrichment tables in one worksheet.

remove_blank_colnames

logical indicating whether to drop colnames() where all values are contained in c(NA, ""). This option may be preferable remove_blank_colnames=FALSE when all values in some column like zScore are NA, but you would still like to retain the column for consistency with other data. We found that IPA does not report zScore values when there are only 4 or fewer genes involved in each enrichment result.

convert_ipa_slash

logical indicating whether to convert IPA gene naming conventions, currently some genes are considered one entity in the IPA system, for example "HSPA1A/HSPA1B" is considered one gene, even though two Entrez gene entries "HSPA1A" and "HSPA1B" can be represented. Regardless whether one or both genes are provided to IPA, it considers it one entity for the purpose of pathway enrichment hypergeometric testing. Unfortunately, the forward slash "/" is also used by clusterProfiler object enrichResult as gene delimiter, and is hard-coded and cannot be changed. So it will automatically consider "HSPA1A/HSPA1B" as two genes, causing a mismatch with the IPA results. When convert_ipa_slash=TRUE by default, it converts the forward slash "/" to the value of argument ipa_slash_sep.

ipa_slash_sep

character string used as a delimited when convert_ipa_slash=TRUE, used to replace genes that contain forward slash "/" to use another character.

revert_ipa_xref

logical indicating whether to revert the IPA gene symbols reported, which requires that the IPA data contains a section "Analysis Ready Molecules".

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Value

list of data.frame objects, where each data.frame

contains enrichment data for one of the Ingenuity IPA enrichment tests.

Details

This function parses Ingenuity IPA enrichment data into a form usable as a list of enrichment data.frame objects for downstream analysis. Each data.frame will represent the results of one Ingenuity IPA enrichment test.

The input data can be one of four forms:

  1. ipaFile can be a text .txt file, where the text file contains all IPA enrichment data in tall format. This format is most common.

  2. ipaFile can be an Excel .xlsx file, which contains all IPA enrichment data in one tall worksheet tab.

  3. ipaFile can be an Excel .xlsx file, where each type of IPA enrichment appears on a separate Excel worksheet tab.

  4. ipaFile can be a list of data.frame objects. This option is intended when the IPA data has already been imported into R as separate data.frame objects.

The basic motivation for this function is two-fold:

  1. Separate multiple IPA enrichment tables.

  2. Rename colnames to be consistent.

When using "Export All" from IPA, the default text format includes multiple enrichment tables concatenated together in one file. Each enrichment table contains its own unique column headers, with descriptive text in the line preceding the column headers. This function is intended to separate the enrichment tables into a list of data.frame objects, and retain the descriptive text as names of the list.

See also

Other jam import functions: curateIPAcolnames()