Curate Ingenuity IPA colnames

curateIPAcolnames(
  jDF,
  ipaNameGrep = c("^Name$", "^ID$", "Canonical Pathways", "Upstream Regulator",
    "Diseases or Functions Annotation", "Diseases . Functions", "My Lists",
    "Ingenuity Toxicity Lists", "My Pathways"),
  geneGrep = c("Molecules in Network", "Target molecules", "Molecules", "Symbol"),
  geneCurateFrom = c(" [(](complex|includes others)[)]", "^[,]+|[,]+$"),
  geneCurateTo = c("", ""),
  convert_ipa_slash = TRUE,
  ipa_slash_sep = ":",
  verbose = TRUE,
  ...
)

Arguments

jDF

data.frame from one Ingenuity IPA enrichment test.

ipaNameGrep

vector of regular expression patterns used to recognize the name of the enriched entity, for example the biological pathway, or network, or disease category, etc.

geneGrep

regular expression pattern used to recognize the column containing genes, or the molecules tested for enrichment which were found in the enriched entity.

geneCurateFrom, geneCurateTo

vector of patterns and replacements, respectively, used to curate values in the gene column. These replacement rules are used to ensure that genes are delimited consistently, with no leading or trailing delimiters.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Details

This function is intended to help curate colnames observed in Ingenuity IPA enrichment data. The IPA enrichment data includes multiple types of enrichment tests, each with slightly different column headers. This function is intended to make the colnames more consistent.

This function will rename the first recognized gene colname to "geneNames" for consistency with downstream analyses.

The values in the recognized gene colname are curated using geneCurateFrom,geneCurateTo for multiple pattern-replacement substitutions. This mechanism is used to ensure consistent delimiters and values used for each enrichment table.

Any colname matching "-log.*p.value" is considered -log10 P-value, and is converted to normal P-values for consistency with downstream analyses.

Any recognized P-value column is renamed to "P-value" for consistency with downstream analyses.

When the recognized P-value column contains a range, for example "0.00017-0.0023", the lower P-value is chosen. In that case, the higher P-value is stored in a new column "max P-value". P-value ranges are reported in the disease category analysis by Ingenuity IPA, after collating individual pathways by disease category and storing the range of enrichment P-values.

See also

Other jam import functions: importIPAenrichment()