Curate Ingenuity IPA colnames

Usage

curateIPAcolnames(
  jDF,
  ipaNameGrep = c("^Name$", "^ID$", "Canonical Pathways", "Upstream Regulator",
    "Diseases or Functions Annotation", "Diseases . Functions", "My Lists",
    "Ingenuity Toxicity Lists", "My Pathways"),
  geneGrep = c("Molecules in Network", "Target molecules", "Molecules", "Symbol"),
  geneCurateFrom = c(" [(](complex|includes others)[)]", "^[,]+|[,]+$"),
  geneCurateTo = c("", ""),
  convert_ipa_slash = TRUE,
  ipa_slash_sep = ":",
  verbose = TRUE,
  ...
)

Arguments

jDF: data.frame from one Ingenuity IPA enrichment test.
ipaNameGrep: vector of regular expression patterns used to recognize the name of the enriched entity, for example the biological pathway, or network, or disease category, etc.
geneGrep: regular expression pattern used to recognize the column containing genes, or the molecules tested for enrichment which were found in the enriched entity.
geneCurateFrom, geneCurateTo: vector of patterns and replacements, respectively, used to curate values in the gene column. These replacement rules are used to ensure that genes are delimited consistently, with no leading or trailing delimiters.
verbose: logical indicating whether to print verbose output.
...: additional arguments are ignored.

Details

This function is intended to help curate colnames observed in Ingenuity IPA enrichment data. The IPA enrichment data includes multiple types of enrichment tests, each with slightly different column headers. This function is intended to make the colnames more consistent.

This function will rename the first recognized gene colname to "geneNames" for consistency with downstream analyses.

The values in the recognized gene colname are curated using geneCurateFrom,geneCurateTo for multiple pattern-replacement substitutions. This mechanism is used to ensure consistent delimiters and values used for each enrichment table.

Any colname matching "-log.*p.value" is considered -log10 P-value, and is converted to normal P-values for consistency with downstream analyses.

Any recognized P-value column is renamed to "P-value" for consistency with downstream analyses.

When the recognized P-value column contains a range, for example "0.00017-0.0023", the lower P-value is chosen. In that case, the higher P-value is stored in a new column "max P-value". P-value ranges are reported in the disease category analysis by Ingenuity IPA, after collating individual pathways by disease category and storing the range of enrichment P-values.

Curate Ingenuity IPA colnames

Usage

Arguments

Details

See also