Import Ingenuity IPA enrichment results
importIPAenrichment(
ipaFile,
headerGrep =
"(^|\t)((expr.|-log.|)p-value|Pvalue|Score\t|Symbol\t|Ratio\t|Consistency.Score|Master.Regulator\t)",
ipaNameGrep = c("Pathway", "Regulator$", "Regulators", "Regulator", "Disease",
"Toxicity", "Category", "Categories", "Function", "Symbol$", "^ID$",
"My.(Lists|Pathways)"),
geneGrep = c("Molecules in Network", "Target molecules", "Molecules", "Symbol"),
geneCurateFrom = c("[ ]*[(](complex|includes others)[)][ ]*", "^[, ]+|[, ]+$"),
geneCurateTo = c("", ""),
method = 1,
sheet = 1,
sep = "\t",
xlsxMultiSheet = TRUE,
useXlsxSheetNames = FALSE,
remove_blank_colnames = TRUE,
convert_ipa_slash = TRUE,
ipa_slash_sep = ":",
revert_ipa_xref = TRUE,
verbose = FALSE,
...
)
one of the four input types described above:
a character vector of text file names; a character vector of
Excel .xlsx
file names; a list of data.frame
objects.
regular expression pattern used to recognize header columns found in Ingenuity IPA enrichment data.
vector of regular expression patterns used to recognize the name of the enriched entity, for example the biological pathway, or network, or disease category, etc.
regular expression pattern used to recognize the column containing genes, or the molecules tested for enrichment which were found in the enriched entity.
vector of patterns and replacements, respectively, used to curate values in the gene column. These replacement rules are used to ensure that genes are delimited consistently, with no leading or trailing delimiters.
integer value indicating the method used to
import data from a text file, where: method=1
uses
data.table::read.table()
and the textConnection
argument;
method=2
uses readr::read_tsv()
. The motivation to use
data.table::read.table()
is it performed better in the
presence of UTF-8 characters such as the alpha symbol.
integer value used only when ipaFile
is
a vector of Excel .xlsx
files, and when the Excel
format includes multiple worksheets. This value will
extract enrichment data only from one worksheet from
each Excel file.
character string used when ipaFile
is a vector
of text files, to split fields into columns. The default
will split fields by the tab character.
logical indicating whether input
Excel .xlsx
files contain multiple worksheets.
logicl indicating whether to use the
Excel worksheet name for each imported enrichment table,
when importing from .xlsx
files, and when
xlsxMultiSheet=FALSE
. When xlsxMultiSheet=TRUE
the
name is derived from the value matched using ipaNameGrep
,
because in this case, there are expected to me multiple
enrichment tables in one worksheet.
logical
indicating whether to drop
colnames()
where all values are contained in c(NA, "")
.
This option may be preferable remove_blank_colnames=FALSE
when all values in some column like zScore
are NA
, but
you would still like to retain the column for consistency
with other data. We found that IPA does not report zScore
values when there are only 4 or fewer genes involved in
each enrichment result.
logical
indicating whether to convert
IPA gene naming conventions, currently some genes are considered
one entity in the IPA system, for example "HSPA1A/HSPA1B"
is
considered one gene, even though two Entrez gene entries
"HSPA1A"
and "HSPA1B"
can be represented. Regardless whether
one or both genes are provided to IPA, it considers it one
entity for the purpose of pathway enrichment hypergeometric testing.
Unfortunately, the forward slash "/"
is also used by
clusterProfiler
object enrichResult
as gene delimiter, and
is hard-coded and cannot be changed. So it will automatically
consider "HSPA1A/HSPA1B"
as two genes, causing a mismatch with
the IPA results.
When convert_ipa_slash=TRUE
by default, it converts the
forward slash "/"
to the value of argument ipa_slash_sep
.
character
string used as a delimited when
convert_ipa_slash=TRUE
, used to replace genes that contain
forward slash "/"
to use another character.
logical
indicating whether to revert the
IPA gene symbols reported, which requires that the IPA data
contains a section "Analysis Ready Molecules"
.
logical indicating whether to print verbose output.
additional arguments are ignored.
list of data.frame
objects, where each data.frame
contains enrichment data for one of the Ingenuity IPA enrichment tests.
This function parses Ingenuity IPA enrichment data into
a form usable as a list of enrichment data.frame
objects for downstream analysis. Each data.frame
will represent the results of one Ingenuity IPA
enrichment test.
The input data can be one of four forms:
ipaFile
can be a text .txt
file,
where the text file contains all IPA enrichment data in
tall format. This format is most common.
ipaFile
can be an Excel .xlsx
file,
which contains all IPA enrichment data in
one tall worksheet tab.
ipaFile
can be an Excel .xlsx
file,
where each type of IPA enrichment appears on a separate
Excel worksheet tab.
ipaFile
can be a list of data.frame
objects.
This option is intended when the IPA data has already
been imported into R as separate data.frame
objects.
The basic motivation for this function is two-fold:
Separate multiple IPA enrichment tables.
Rename colnames to be consistent.
When using "Export All"
from IPA, the default text format
includes multiple enrichment tables concatenated together in one
file. Each enrichment table contains its own unique column
headers, with descriptive text in the line preceding the
column headers. This function is intended to separate the
enrichment tables into a list of data.frame
objects, and
retain the descriptive text as names of the list.
Other jam import functions:
curateIPAcolnames()