Changelog
Source:NEWS.md
Version 0.0.20.900
Bug fixes
-
- Fixed
empty_rule="original"when ‘ENTREZID’ is populated but not valid, causing thefinalentries to be empty. Now it properly fills in the original value. - Fixed when the intermediate column (‘ENTREZID’) had NA instead of “” for empty values. This situation only occurred when input
data.framealso included the intermediate column. Previously these rows were not filled properly.
- Fixed
Version 0.0.19.900
Moved ‘AnnotationDbi’ to Imports. ## Bug Fixes
Fixed regression from 0.0.18.900 in “first_try” that caused it to behave like “first hit”.
Fixed regression with
ignore.case=TRUEcaused by the correction to handle multiple identical values when converted to lowercase.Added testthat with confirmation of
ignore.case=TRUE,revert_split=TRUE, sources, etc.
Changes to existing functions
-
- New argument
revert_split=TRUEwill re-combine columns previously split by argumentsplit. The process took delimited values, split them into individual columns, then performed matching on the singular values. Previously data were left as-is for review, however the new default is to combine columns usingsepas delimiter, for consistency with the delimiter style of this function. - New default value for argument
intermediate="ENTREZID". Previous'intermediate'was not useful or informative, although it was flexible. It is still flexible, but the default case is now informative.
- New argument
Version 0.0.18.900
bug fixes
- Previously
ignore.case=TRUEcould in rare cases return a subset of associated genes, only when there were multiple alias values which differed only in case, for example"cal", "Cal", "CAL". This issue only affects values with mixed-case, thus all human entries are unlikely to be affected (most human gene symbols are already UPPERCASE.) Somebody out there recognizes this case, since ‘cal’, ‘Cal’, ‘CAL’ each map 1-to-1 with one Entrez gene, ‘Gopc’, ‘Fblim1’, ‘S100a11’, respectively. Bless.
Version 0.0.16.900
dependency update
- bumped the version dependency on jamba to
0.0.87.900, due to a bug introduced injamba::cPaste()that adversely affected output for accessions that require multiple rounds of querying.
Version 0.0.15.900
updates to existing functions
-
freshenGenes()new argumentignore.casewhich callsgenejam::imget()as a drop-in replacement formget(). The process was improved by callingAnnotationDbi::keys()instead ofAnnotationDbi::ls(), and this change is at least an order of magnitude faster. In brief benchmarks, usingignore.case=TRUEadds roughly 0.1 seconds per annotation intry_list, but otherwise is the same speed regardless the number of input entries.
Version 0.0.14.900
updates to existing functions
-
freshenGenes()was updated so the input can contain intermediate values, for example ENTREZID values. In fact, now input can contain a mixture of gene symbols, ENTREZID intermediate values, and it will fill in the holes accordingly.- new argument
intermediateto define the colname that contains the intermediate values, most commonly EG which are Entrez gene ID. - Values are propagated in
intermediateexcept whenhandle_multiple="first_hit"any existing value inintermediateis used with no further processing. All otherhandle_multiplewill combine entries intointermediate.
- new argument
new functions
-
is_empty()is a small helper function to determine which entries in a vector are eitherNAor"".
Version 0.0.13.900
new functions
These two new functions are convenience functions. I often find myself wanting the gene symbol and long gene name, so now freshenGenes2() does that by default. To add gene aliases, use freshenGenes3().
-
freshenGenes2()is a simple extension tofreshenGenes()that has"SYMBOL", "GENENAME"in the output by default. -
freshenGenes3()is a simple extension tofreshenGenes()that has"SYMBOL", "GENENAME", "ALIAS"in the output by default.
Version 0.0.12.900
Changes to existing functions
-
get_anno_db()logic to check for reciprocal annotation names was updated to cover more scenarios. Specifically,"org.Hs.egUNIPROT2EG"is properly recognized, it previously was not being recognized by the reciprocal"org.Hs.egUNIPROT"and therefore was being skipped.
Version 0.0.11.900
Changes to existing functions
-
To prepare for a wider release, I decided to rename (!) some arguments, to have snake_case instead of camelCase for consistency. I heard myself complaining about my own package, “Why are some arguments camelCase and others are snake_case? Pick one!” I complain with a smile on my face, but still it’s a fair point.
-
finalListis nowfinal -
tryListis nowtry_list -
annLibis nowann_lib
-
I suppose I should probably rename freshenGenes() to freshen_genes().
Changes to existing functions
-
get_anno_db()new argumentignore.casewhich will build an environment where all keys are converted to lowercase. Ultimately, this option incurs the lowest performance hit, since the keys only need to be converted once, then the environment can be used repeatedly with nativemget()functions.
New function
-
imget()case-insensitivemget()– however once I tested it, I realized this mechanism is fairly slow when using a fairly large annotation object. Also, if querying the same data multiple times usingimget(), there is no re-use and the cost is incurred each operation – very much not ideal. This function will likely be retired soon.
Version 0.0.10.900
New functions
-
better_exists()andbetter_get()which are (not so humbly) improved versions ofbase::exists()andbase::get()respectively. Their sole benefit is to recognize a package prefix in an object name, so things likebetter_exists("base::get")will return TRUE since that object does exist; and subsequentlybetter_get("base::get")will return that object. These functions are mostly useful when using annotation package prefixes such as"KEGG.db::KEGGPATHID2NAME". I needed a simple way to test if it exists.better_exists()also allows multiple input values.
Changes to existing functions
-
get_anno_db()now callsbetter_exists()andbetter_get()which allows using a package prefix with annotation names. -
get_anno_db()argument"revmap_suffix"now allows multiple possible values, it cycles through each until it finds a match, otherwise returnsNULL. Some annotations use suffix"2ENTREZID"instead of"2EG", and still others use"2NAME". I’m sure there will be others.
Version 0.0.9.900
Changes to existing functions
-
freshenGenes()new argumenthandle_multiple="best_each"which returns the best first try for each delimited entry in each input row. For examplec("APOE","APOA")will match"APOE"as an authoritatice gene symbol, but"APOA"is matched as an alias to the new gene symbol"LPA". The output will be"APOA,LPA". Note that output will contain unique entries delimited, but they will not be sorted.
Version 0.0.8.900
The next version will have handle_multiple="best_each" which will find the best match for each entry in a set of delimited gene symbols. Most useful for something like pathway enrichment results, where the goal is to retain all possible genes, yet each gene may require a different type of annotation to find a match. See TODO.md for details.
Updates to existing functions
-
freshenGenes()includes a new example showing how to recognize Affymetrix probesets by using a custom search library. -
freshenGenes()handles multiple annotation libraries, mostly in the form of fully described annotation names, such as"org.Hs.egSYMBOL"and"hgu133plus2ENTREZID". -
freshenGenes()new optionempty_rule="na"which will replace empty entries withNA. Other optionsempty_rule="blank"replaces with"", andempty_rule="original"replaces empty entries in the first output column with the original entry in the first input column. -
get_anno_db()now returns`NULLwhen an annotation is not found, instead of throwing an error. This change allows the calling function to skip missing annotation gracefully without usingtryCatch()to catch the error.
Version 0.0.7.900
Updates to existing functions
-
freshenGenes()now properly ignores NA values without throwing an exception. NA values are left as-is and returned as NA in the final output.
Version 0.0.6.900
Updates to existing functions
-
freshenGenes()new argumentprotect_inline_sephelps to prevent splitting single values that may include the same sep character, for example not splitting"H4 clustered histone 10, pseudogene"into"H4 clustered histone 10"and “pseudogene". Also, the handling offinalListusessepas thesplitsince thatsepis known to have been used in creating the intermediate values, therefore it should be consistent in the final step. This subtle change helps allow a more general split pattern in the first step, such as"[, ]+"which splits at comma and/or space, without splitting at spaces in subsequent steps.
Version 0.0.5.900
Updates to existing functions
-
get_anno_db()was updated to handle reverse-map annotations, for example requestingorg.Hs.egALIASand deriving it fromorg.Hs.egALIAS2EGusingAnnotationDbi::revmap().
Version 0.0.3.900
Updates to existing functions
- Force input
xto character infreshenGenes().
Version 0.0.2.900
Note that genejam requires one Bioconductor annotation package, usually org.Hs.eg.db but can be any valid organism, such as org.Mm.eg.db for mouse, or org.Rn.eg.db for rat.
Bug fixes
- Attempt to fix rare cases of NA values in
mget()by usingjamba::rmNA().