sort alphanumeric values keeping numeric values in proper order
Source:R/jamba-mixedSort.R
mixedSort.Rdsort alphanumeric values keeping numeric values in proper order
Usage
mixedSort(
x,
blanksFirst = TRUE,
na.last = NAlast,
keepNegative = FALSE,
keepInfinite = FALSE,
keepDecimal = FALSE,
ignore.case = TRUE,
useCaseTiebreak = TRUE,
honorFactor = FALSE,
sortByName = FALSE,
verbose = FALSE,
NAlast = TRUE,
...
)Arguments
- x
vector- blanksFirst
logicalwhether to order blank entries before entries containing a value.- na.last
logicalindicating whether to move NA entries at the end of the sort.- keepNegative
logicalwhether to keep '-' associated with adjacent numeric values, in order to sort them as negative values.- keepInfinite
logicalwhether to allow "Inf" to be considered a numeric infinite value.- keepDecimal
logicalwhether to keep the decimal in numbers, sorting as a true number and not as a version number. By default keepDecimal=FALSE, which means "v1.200" should be ordered before "v1.30". When keepDecimal=TRUE, the numeric sort considers only "1.2" and "1.3" and sorts in that order.- ignore.case
logicalwhether to ignore uppercase and lowercase characters when defining the sort order. Note that whenxisfactorthe factor levels are converted usingunique(toupper(levels(x))), therefore the values inxwill be sorted by factor level.- useCaseTiebreak
logicalindicating whether to break ties whenignore.case=TRUE, using mixed case as a tiebreaker.- honorFactor
logical, default TRUE, indicating whether to honor factor level order in the output, otherwise when FALSE it sorts ascharacter.- sortByName
logicalwhether to sort the vector x by names(x) instead of sorting by x itself.- verbose
logicalwhether to print verbose output.- NAlast
logicaldeprecated in favor of argumentna.lastfor consistency withbase::sort().- ...
additional parameters are sent to
mixedOrder.
Value
vector of values from argument x, ordered by
mixedOrder(). The output class should match class(x).
Details
This function is a refactor of gtools mixedsort(), a clever bit of
R coding from the gtools package. It was extended to make it slightly
faster, and to handle special cases slightly differently.
It was driven by the need to sort gene symbols, miRNA symbols, chromosome
names, all with proper numeric order, for example:
- test set:
miR-12,miR-1,miR-122,miR-1b,mir-1a
- gtools::mixedsort:
miR-122,miR-12,miR-1,miR-1a,mir-1b
- mixedSort:
miR-1,miR-1a,miR-1b,miR-12,miR-122
The function does not by default recognize negative numbers as negative,
instead it treats '-' as a delimiter, unless keepNegative=TRUE.
This function also attempts to maintain '.' as part of a decimal number, which can be problematic when sorting IP addresses, for example.
This function is really just a wrapper function for mixedOrder(),
which does the work of defining the appropriate order.
The sort logic is roughly as follows:
Split each term into alternating chunks containing
characterornumericsubstrings, split across columns in a matrix.Apply appropriate
ignore.caselogic to the character substrings, effectively applyingtoupper()on substringsDefine rank order of character substrings in each matrix column, maintaining ties to be resolved in subsequent columns.
Convert
charactertonumericranks viafactorintermediate, defined higher than the highestnumericsubstring value.When
ignore.case=TRUEanduseCaseTiebreak=TRUE, an additional tiebreaker column is defined using thecharactersubstring values without applyingtoupper().A final tiebreaker column is the input string itself, with
toupper()applied whenignore.case=TRUE.Apply order across all substring columns.
Therefore, some expected behaviors:
When
ignore.case=TRUEanduseCaseTiebreak=TRUE(default for both) the input data is ordered without regard to case, then the tiebreaker applies case-specific sort criteria to the final product. This logic is very close to defaultsort()except for the handling of internalnumericvalues inside each string.
See also
Other jam sort functions:
mixedOrder(),
mixedSortDF(),
mixedSorts(),
mmixedOrder()
Examples
x <- c("miR-12","miR-1","miR-122","miR-1b", "miR-1a", "miR-2");
sort(x);
#> [1] "miR-1" "miR-12" "miR-122" "miR-1a" "miR-1b" "miR-2"
mixedSort(x);
#> [1] "miR-1" "miR-1a" "miR-1b" "miR-2" "miR-12" "miR-122"
# test honorFactor
mixedSort(factor(c("Cnot9", "Cnot8", "Cnot10")))
#> [1] Cnot8 Cnot9 Cnot10
#> Levels: Cnot10 Cnot8 Cnot9
mixedSort(factor(c("Cnot9", "Cnot8", "Cnot10")), honorFactor=TRUE)
#> [1] Cnot10 Cnot8 Cnot9
#> Levels: Cnot10 Cnot8 Cnot9
# test ignore.case
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")))
#> [1] Cnot8 CNOT9 Cnot9 Cnot10
#> Levels: CNOT9 Cnot10 Cnot8 Cnot9
mixedSort(factor(c("CNOT9", "Cnot8", "Cnot9", "Cnot10")))
#> [1] Cnot8 CNOT9 Cnot9 Cnot10
#> Levels: CNOT9 Cnot10 Cnot8 Cnot9
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")), ignore.case=FALSE)
#> [1] CNOT9 Cnot8 Cnot9 Cnot10
#> Levels: CNOT9 Cnot10 Cnot8 Cnot9
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")), ignore.case=TRUE)
#> [1] Cnot8 CNOT9 Cnot9 Cnot10
#> Levels: CNOT9 Cnot10 Cnot8 Cnot9
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")), useCaseTiebreak=TRUE)
#> [1] Cnot8 CNOT9 Cnot9 Cnot10
#> Levels: CNOT9 Cnot10 Cnot8 Cnot9
mixedSort(factor(c("CNOT9", "Cnot8", "Cnot9", "Cnot10")), useCaseTiebreak=FALSE)
#> [1] Cnot8 CNOT9 Cnot9 Cnot10
#> Levels: CNOT9 Cnot10 Cnot8 Cnot9