Changelog
Source:NEWS.md
jamses 0.0.69.900
- Bumped version for some jam packages in Imports: jamba, colorjam, multienrichjam
- Bumped version for some jam packages in Suggests: genejam, venndir
bug fixes
-
heatmap_se()
was not properly definingrowData_se
causingrowData_colnames
to be ignored. The bug was that although"rowRanges"
is defined as a slot name,"rowData"
is not. In either case the correct approach is to use accessor functionsrowRanges
androwData()
which handle whatever slot names are in use for the given object class.
new functions
-
se_to_rowcoldata()
- captures logic to definerowData_se
andcolData_se
from a number of possible input classes, removing this logic fromheatmap_se()
.
changes to existing functions
-
- now calls
se_to_rowcoldata()
to definerowData_se
andcolData_se
. - Examples now include
rowData_colnames
to confirm proper output. - Now accepts
matrix
anddata.frame
input, by converting to a simpleSummarizedExperiment
object. - When data contains only one column it defines
centerby_colnames=FALSE
to skip the data centering, otherwise all data becomes zero.
- now calls
-
- Improved help docs.
- New argument
hits_use_lfc=FALSE
which can optionally use log2 fold change values instead of directional valuesc(-1, 0, 1)
. When enabled, the actual log2 fold change is multiplied by the absolute value of the hit column, resulting in the log2 fold change only for entries that met other statistical criteria.
jamses 0.0.68.900
changes to existing functions
-
- Added preliminary support for
SingleCellExperiment
andSeurat
objects. Initial support recognizes the data exactly as provided. Future options may involve filtering, transformation, etc. - Input now uses
values(rowRanges())
when there is no slot name"rowData"
. - If either
rowData()
orcolData()
are empty (0 columns), it defines the rownames/colnames in “row”/“column”, respectively.
- Added preliminary support for
jamses 0.0.67.900
jamses 0.0.66.900
changes to existing functions
-
- Preferred over
shrink_df()
which will be retired or converted to a wrapper toshrinkDataFrame()
. - Calls internal
shrink_matrix()
to avoidsplicejam
as a dependency. - Tests have been added to cover the variations.
- Preferred over
-
- By default it calls
shrinkDataFrame()
unlessuse_new_method=TRUE
. (It is remarkably slower.) It may be updated over time to use more efficient and modern R methods.
- By default it calls
list2im_opt()
default changed todo_sparse=FALSE
-
- Removed
require(SummarizedExperiment)
, added package prefixes. - Added examples with
heatmap_column_group_labels()
,apply_hm_column_title=TRUE
andhm_title_buffer=3
.
- Removed
new functions
-
shrink_matrix()
- copied fromsplicejam::shrinkMatrix()
with improved default numeric function.
jamses 0.0.65.900
bug fixes
-
- Fixed bug causing mismatch in
sheetName
when only a subset ofsheetName
values were too long for the permitted Excel worksheet name.
- Fixed bug causing mismatch in
new functions
-
shortest_unique_abbreviation()
- New function intended to provide shortest unique abbreviation for a
character
vector. - A neat feature enabled by default is that it will retain trailing numeric values without splitting the number, if the number is required to make the abbreviated vector unique.
- This function can be enabled within other functions:
contrast2comps(..., abbreviate=TRUE)
andsave_sestats(..., abbreviate=TRUE)
. It is not enabled by default in either of these functions.
- New function intended to provide shortest unique abbreviation for a
changes to existing functions
-
contrast2comps()
,contrasts_to_factors()
, andsave_sestats()
new argumentabbreviate=FALSE
:- Optionally calls
shortest_unique_abbreviation()
to abbreviate factor levels within each contrast. - It will attempt to create unique abbreviations for each factor independently.
- If any abbreviation is not unique, or if any factor level across multiple columns is not unique, it will create global abbreviations so that each abbreviated value represents a defined factor level.
- It should also properly handle non-standard contrasts, with each factor in order, using the appropriate abbreviation in each column.
- Optionally calls
jamses 0.0.64.900
bug fixes
-
- Apparently some vicious RMarkdown corner case causes
options("warn"=2)
which causes all warnings to become fatal errors. As a result, harmlessgsub()
used whenhm_body_base
is auto-detected - resuling in multiple potential values. The first value is used, but presence of multiple values causesgsub()
to issue a warning, causing an error. It needed to be fixed anyway, and now it is. Butwarn=2
is evil, haha.
- Apparently some vicious RMarkdown corner case causes
jamses 0.0.63.900
bug fixes
-
se_contrast_stats()
was not maintaining correct order of data matrix for a specific scenario, whenhandle_na
was not"none"
, and when the order of groups in thedesign
matrix was non-standard, that is with the first group not in the first column and first row. It obviously should not matter the order of groups, but this scenario limits when the bug occurred, which is somewhat comforting.- The cause was
handle_na_values()
which converted the design matrix tolist
in order to associated colnames to groups. Thelist
was converted to vector, sometimes changing the order of columns returned. - The bug is fixed in
handle_na_values()
by returning columns in the same order they were provided at input.
- The cause was
jamses 0.0.62.900
changes to existing functions
-
- New argument
data_content=c("data", "hits")
changes default behavior to include a"hit sheet"
in the output worksheets. This hit sheet includes all the “hit” columns from each contrast in one large table. For previous behavior usedata_content="data"
. - New argument value
type="list"
will not save to file, instead returns alist
ofdata.frames
named by worksheet name. Default is the same as beforetype="xlsx"
will export to file in Excel xlsx format. - New arguments
hitRule
,hitFormat
,freezePaneColumn
merely provide some previously internal defaults, for optional customization.
- New argument
jamses 0.0.61.900
jamses 0.0.60.900
Bug fixes
-
- When
normgroup
andblock
were both defined, but onenormgroup
did not have multiple values forblock
, it threw an error. In this case, when there is only one unique value forblock
within anormgroup
it is set toNULL
for thatnormgroup
so it is not use inlimma::lmFit()
. -
handle_na="full1"
was not performing as intended, caused when migrating away frommatrixStats::rowMins(x)
toapply(x, 1, min)
, the logic was no longer being applied as described.
- When
jamses 0.0.59.900
Bug fixes
-
se_detected_rows()
was not properly applyingdetect_minpct
for the minimum percentage of replicates per group, impacting cases where groups each differed in size. Bug has been resolved.
Changes to existing functions
-
-
Arguments
block
andnormgroup
both allow three types of input, which were modified slightly to reduce the risk of values being out of sync.- Unnamed
character
vector, must be equal tolength(isamples)
, or whenisamples
is not provided,ncol(se)
. In this case,isamples
orcolnames(se)
are assigned as names. This option prints an alter message suggesting that values be provided with names. However, values are named byisamples
before any other subsetting is performed relative toisamples
. - Named
character
vector, whose names matchcolnames(se)
andisamples
. -
character
vector matchingcolnames(colData(se))
to use one or more annotation columns, which also automatically usesisamples
as appropriate, to maintain synchrony.
- Unnamed
Note that when using
normgroup
and/orblock
the intermediate processing can be reviewed by includingverbose=TRUE
. The output will also print eachcorrelation
value for eachnormgroup
, whenblock
is also provided.
-
jamses 0.0.58.900
Bug fixes
-
- Caused an error when using
normgroup
andblock
together, caused by theblock
not being subset to matchisamples
. Fortunately the bug would always cause failure, since it did not return an incorrect statistical result. -
block
is expected to equallength(isamples)
, and will be named as such so it can also be subset within eachnormgroup
. - when
names(block)
are defined, they are validated to matchisamples
, and will be subset and re-ordered to matchisamples
as needed. - when
names(block)
is not defined, its length must equallength(isamples)
, and thennames(block)
will be defined usingisamples
. - in each
normgroup
theblock
is only applied when there is more than one unique value for the relevantblock[isamples]
.
- Caused an error when using
change to existing functions
-
- Arguments
normgroup
andblock
both acceptcolnames(colData(se))
, or a vector of values; however when supplying a vector of values, they must now be named usingisamples
- it no longer assumes they are the same order asisamples
. Thenames(block)
are used to confirm absolute synchrony.
- Arguments
jamses 0.0.56.900
- Added
edgeR
to Suggests, in order to enable new normalization methods"TMM"
,"TMMwsp"
, and"RLE"
.
Bug fixes
-
se_detected_rows()
- fixed missing package prefix withjamba::tcount()
. - Added more package prefixes, most of which are used only with
verbose=TRUE
.
changes to existing functions
-
se_normalize()
,matrix_normalize()
gained new normalization methods:-
edgeR::calcNormFactors()
for methods:"TMM"
,"TMMwsp"
,"RLE"
- the methods perform very similarly to
jamma::jammanorm()
, log-ratio normalization, with some subtle differences in how they filter the features to be used during normalization. TMM,TMMwsp perform nearly identical to each other. RLE also performs more closely to both TMM than to jammanorm. - Ultimately this effort was not time well spent, the new methods appear to work well when data is high quality, but appear to produce questionable output when data is not consistent across samples. Meanwhile,
"jammanorm"
appears to work consistently, at least producing output that can be reasonably explained. - Data should be reviewed with
jamma::jammaplot()
MA-plots to visualize whether assumptions are met and are valid. For example, when the distribution of signal is inconsistent across samples, it appears as non-horizontal signal on MA-plots, and ultimately either the signal should be filtered prior to downstream statistics (so that the horizontal range of signal is analyzed) or another normalization method should be used in order to adjust the signal to horizontal, for example"quantile"
normalization is intended for such purposes. In future, “LOESS” normalization may also fit this purpose.
- the methods perform very similarly to
-
-
- New argument
output_method_prefix
to customize the method name used to formulate the outputassay_name
. - New argument
output_assay_names
to customize the fullassay_name
used for each output. When supplied, it overrides New argument. -
mcols(assays(se))
is now populated with aS4Vectors::DataFrame
containing thenormalization_method
,source_assay_name
, and associatedparams
converted to column values. Currently experimental, but enabled by default to test the utility. - Converted
stop()
tocli::cli_abort()
for improved message handling.
- New argument
-
- New argument
group_line_requires_label=TRUE
will hide the group line when the associated label contains only whitespace or isNA
. This defines new behavior, although for a rare scenario where there exists an empty (whitespace) group label.
- New argument
jamses 0.0.55.900
-
-
floor_min
is now appropriately applied beforehandle_na
specifically for the case where values may be filtered toNA
then handled byhandle_na
using further logic.
-
-
- new argument
use_gridtext=TRUE
which by default enables markdown and limited HTML output, such as superscripts and subscripts. Implemented to render 37 degrees C using superscripto
.
- new argument
-
- subtle change to limit
rowData_colnames
to those values present incolnames(rowData(se))
. - new argument
hm_name
which defaults to the same legend title, derived fromdata_type
. However, the heatmapname
appears in grid graphical componentsComplexHeatmap::list_components()
and sometimes needs to be specified in order for multiple heatmaps to have unique grid names. - new argument
hm_title
to provide a completely custom heatmap title. This argument is useful especially when usingapply_hm_column_title=TRUE
.
- subtle change to limit
jamses 0.0.54.900
updates to existing functions
-
- when
group_buffer
is adjusted, it also adjusts placement of replicate labels"n=3"
proportionally inside the group box. - new argument
arrow_ex
whereNULL
is adjusted based upon the number of contrast bumps per axis, so more parallel contrasts would cause proportionally smaller arrow sizes. - New argument
contrast_depths
to filter contrasts based upon depth: 1=oneway, 2=twoway. More convenient than filtering contrasts upfront.
- when
-
- Names are now user-friendly to indicate the contents of each Venn set list.
jamses 0.0.53.900
updates to existing functions
-
- new argument:
column_title
mainly used for convenience alongside the next new argument: - new argument:
apply_hm_column_title=FALSE
will optionally hide any existingcolumn_title
and instead applies the heatmap titlehm_title
to thecolumn_title
. This option is convenient when adding two heatmaps together, since it ensures each heatmap has its own individual title. The downside is that the usualcolumn_title
defined bycolumn_split
will not be shown. - new argument:
hm_title_buffer
adds an optional number of whitespace lines to the end of the heatmap titlehm_title
. This whitespace is useful when callingheatmap_column_group_labels()
which draws additional text above the heatmap.
- new argument:
-
- arguments
hm_title_base
,hm_body_base
are now auto-detected usingdetect_heatmap_components()
by default. - when there is no
global_column_title
in the drawn heatmap, it means the heatmap itself did not havecolumn_title
, and therefore the output of this function is shifted down so it appears on the bottom of thecolumn_title
elements. This behavior is intended when using the newheatmap_se()
argumentapply_hm_column_title=TRUE
to draw the heatmap title in thecolumn_title
region. It is recommended to usehm_title_buffer=4
to add sufficient whitespace below the column title to allow room for the column group labels. In principle, the column group labels should behave the same as before when there is ahm_title
included above the heatmap. And if bothglobal_column_title
andcolumn_title
elements are present, by default the labels from this function are placed above thecolumn_title
and below theglobal_column_title
. whew
- arguments
jamses 0.0.52.900
updates to existing functions
-
- fixed missing contrasts in certain designs
- added method to split “equivalent” subset of contrasts using
max_venn_size
which simply subdivides remaining contrasts into subsets when needed. - Removed splitting contrasts solely by depth, because it produced too much nonsense.
- Added example showing how to plot the resulting contrasts with
plot_sedesign()
, which might become a useful “you are here” type of schematic to accompany a Venn diagram. Ideally, thecontrast_colors
should match theset_colors
invenndir()
for consistency.
new functions
-
sub_split_vector()
is an internal function, used to split venn sets bymax_venn_size
, but in its own function because it may get some tweaks.
jamses 0.0.51.900
- added
cli
to package dependencies, to use messaging.
new functions
-
- simple function to convert contrasts to a
data.frame
where each column is an experimental factor, and in the case of contrasts, at least one column per row should represent a comparison of two factor levels.
- simple function to convert contrasts to a
-
- Convenience function to subset an overall set of contrasts to something meaningful to use in Venn diagrams. Mainly intended for RMarkdown use where the set of contrasts is not always known upfront.
- Fixed early issue where certain sets of Venn results were not created.
jamses 0.0.50.900
bug fixes
-
- Fixed error when plotting a twoway contrast and one of the oneway contrast components in the opposite orientation of the twoway contrast components. Only observed in unique cases with a subset of contrasts.
- Fixed errors ultimately caused by having duplicated
contrast_names
in theSEDesign
object. Updatedcontrast_names<-
method to throw an error with clear message.
new functions
-
- Experimental function intended to help reduce “all-vs-all” style contrasts to those with one specific control factor level, per design factor.
- Driven by
slicejam
design with 5 treatments, 3 genotypes, which produced 75 total oneway and twoway contrasts. Yike. Instead this function produces 30 total contrasts, only 22 oneway contrasts versus the baseline level per factor.
jamses 0.0.49.900
new function
-
Wrapper function to automate steps required for
SummarizedExperiment::rbind()
Confirms that no
rownames()
are shared acrossse_list
.Optionally edits
colnames()
for each object, so they can be compared.Determines shared
colnames()
, discarding other columns.Optionally subsets
colnames()
bycolnames_keep
.-
Adjusts
colData()
in one of two ways:-
colData_action="identical"
: keeps only colData columns which have identical values acrossse_list
, discarding other columns. This option is default, since it retains common annotations. -
colData_action="all"
: keeps all colData columns, using a delimiter between multiple values whenever the values differ acrossse_list
. This option is useful for detailed record-keeping, but the annotations are not likely to be useful otherwise.
-
finally it calls
SummarizedExperiment::rbind()
returning one validSummarizedExperiment
object.
jamses 0.0.48.900
silenced messages when loading jamses
-
design
,design<-
contrasts
,contrasts<-
- Previously when loading
jamses
, a message indicate that new generic methods were defined, which was caused either by creating methods for an existing function or S4 method that did not have the same signature. -
design
now imports and matches the arguments ofBiocGenerics::design
-
contrasts
now sets generic functions matchingstats::contrasts()
then adds S4 generics for objectSEDesign
. - Now, no more warning message when loading
jamses
!
- Previously when loading
jamses 0.0.47.900
updates to existing functions
-
-
New argument
max_correlation_rows
with a maximum number of rows to use when calculatinglimma::duplicateCorrelation()
.- This process is only relevant when
block
is defined, andcorrelation=NULL
. In this case,limma::lmFit() by default already calls
limma::duplicateCorrelation()to calculate
correlation. However, when there are more than 10k rows of data, the process is non-linearly slow (they said quadratic) although increasing number of rows does not substantially change the calculated
correlation`. - Therefore
max_correlation_rows=10000
limits the calculation to 10,000 rows by default, substantially improving speed from 10 minutes in 500k rows, down to 10 seconds.
- This process is only relevant when
-
New argument
voom_block_twostep=TRUE
which enables a change to previous behavior.- Only relevant when
use_voom=TRUE
andblock
is defined, butcorrelation
is not defined. - Previous behavior was to perform “one-step voom” which calculated weights without the blocking factor, then calculated correlation by
limma::duplicateCorrelation()
with Voom weights, and blocking factor. - Recommendation by
limma
authors is to run these steps twice, so the second time runningvoom()
will use theblock
andcorrelation
arguments to empower a better model fit, and more accurateweights
used subsequently bylmFit()
. This change should improve output, but will change previous results, only for count data with a blocking factor, which has been fairly rare. - Previous behavior can be re-enabled with
voom_block_twostep=FALSE
.
- Only relevant when
Hid some verbose output.
-
jamses 0.0.46.900
updates to existing functions
-
- new argument
sestats_style
to customize how stat hits are labaled. It allows including directionality, only the number of hits, or only the number by itself. - fixed error when any argument of
axis1,axis2,axis3,axis4
referred to a non-existent column. The error was not a bug, but was avoidable. Now the plot is drawn as if the corresponding argument wereNULL
.
- new argument
-
- new argument
contingency
to customize how two-way connectors are drawn, based upon how close the two contrasts are to each other. The “loop” option draws the connector to loop around the far side of contrast 2, which is effective at avoiding overlaps.
- new argument
-
-
sample_color_list
now allows more flexibility in defining colors, specifically allowing acharacter
vector, or single color to be used for all contrasts. - new arguments
C_min
andL_max
to complete the chroma (C) and luminance (L) numerical limits with min/max.
-
jamses 0.0.45.900
updates to existing functions
-
- So far, early testing is working quite well, minor aesthetic issues, such as unexpected ordering of axis labels when changing the axis order, since it followed the input order and did not re-sort based upon axis label values. Overlapping labels are already an issue. They can be manually adjusted, but it would be nice to have that part automated. Sadly, I don’t think ggrepel accepts rotated text.
- change: contrasts are now sorted by axis to organize factor levels
- change: axis labels are now sorted by combined factor order along x- and y-axes
- change: contrasts are sorted by axis length, so shorter contrasts have less “bump”
- fixed bug with missing contrast labels when subsetting the overall contrasts with argument
which_contrasts
- fixed handling of
colorset
to match one-waycontrast_names
then fill missing color assignments as needed. - arguments
group_color
,group_border
accept named vectors to define colors for each square drawn per experimental group.
-
draw_oneway_contrast()
,draw_twoway_contrast()
- default
label_font=2
for bold contrast labels
- default
-
plot_sedesign()
,draw_oneway_contrast()
,draw_twoway_contrast()
- changed argument name from
label_position
tooneway_position
and fromtwoway_label_position
totwoway_position
, before these functions are used
- changed argument name from
new functions
-
- Takes
sedesign
andsample_color_list
and blends group colors into contrasts based upon the groups in each contrast.
- Takes
-
- Convenience function to convert
SEDesign
object into the associateddata.frame
with one column per experimental design factor. - The
data.frame
rownames match thesamples(sedesign)
. - The colnames are matched with optional
se
object in order to re-usecolnames(colData(se))
and, if present, factor levels. Maintaining factor level order is useful for downstream applications. - This function will be called by
plot_sedesign()
and should be useful forplatjam::design2colors()
as well.
- Convenience function to convert
jamses 0.0.44.900
new functions
This release adds plot_sedesign()
and several helper functions. This function is still in development, released for more convenient testing for lots of edge cases. This function may be ported to use grid
R graphics, but currently uses base R graphics with fixed aspect ratio asp=1
to maintain proper visual proportions for block arrows, and group boxes.
-
- It takes an
sedesign
object and plots the one-way and two-way contrasts in block arrow format. - Block arrows are “bumped” to reduce directly overlapping arrows.
- The design factors can be repositioned along x- and y-axes, at the bottom or top, with combinations of factors, or absence.
- It indicates the
n=3
number of replicates observed per group, and draws shaded squares for groups present, and lack of square for groups which are absent in the design. - When
sestats
is also supplied, it adds labels with the number of statistical hits per contrast. - Labels can be positioned by one-way or two-way contrast type, or by specific contrast name.
- Two-way contrasts can optionally be “flipped” to show the equivalent two-way contrast with different group ordering.
- It takes an
draw_oneway_contrast()
,draw_twoway_contrast()
are both called byplot_sedesign()
, and internally callpoint_handedness()
andmake_block_arrow_polygon()
. These functions help make the process more component-oriented.make_block_arrow_polygon()
is a wrapper function to define block arrow polygons with consistent dimensions, where the arrow stem width also defines the relative size of the arrow head. When the stem has shorter length than the arrow head, the full arrow head is shown without the stem, avoiding the visual artifact of showing an inverted (negative length) arrow stem.point_handedness()
is a helper function which takes two points and a vector angle, then determines “which side” the first point is located along that vector angle relative to the second point. It is used for two-way contrasts to decide how to define intermediate control points between two parallel contrast arrows. The goal is to create an “S” shape, where control points are angled “toward each other”.-
point_slope_intercept()
is a simple geometry helper function.- Given a point and slope, it returns the intercept.
- However, when the slope is infinite (as with a vertical line) instead of returning an infinite intercept, which is not helpful for describing the line, it returns the x-axis intercept.
- The intercept is used by
point_handedness()
to determine the “handedness” of two parallel lines.
jamses 0.0.43.900
- Added
multienrichjam
to dependencies, since it is used byhandle_na_values()
. - Added
statmod
to"Suggests"
since this package is required when using a blocking factor withlimma
.
changes to existing functions
-
- The default heatmap title now also includes the number of columns, whose label is controlled by
column_type="samples"
by default. - When
row_type
orcolumn_type
areNULL
or""
(empty) their respective label is hidden from the heatmap title. When both are hidden, the first line will be hidden completely.
- The default heatmap title now also includes the number of columns, whose label is controlled by
bug fixes
-
- Weird edge case when calling
heatmap_se()
,sestats
contains only one contrast, and the hit matrix annotation contrast is in the form"Group1-Group2"
. The output is correct inside RStudio, the hit matrix name displays"Group1-Group2"
as intended. When the output is knitted inside an RMarkdown, the hit matrix name is displayed"1"
. - Issue was avoided with argument
rename_contrasts=FALSE
when callingheatmap_se()
. - User is using R-3.6.2, so the best guess is a weird edge case where
options(stringsAsFactors=TRUE)
. - The
contrast2comp()
was updated to imposestringsAsFactors=FALSE
andcheck.names=FALSE
in two steps that convertedmatrix
todata.frame
. - The
process_sestats_to_hitim()
was updated to check output ofcontrast2comp()
before assigning tocolnames()
.
- Weird edge case when calling
-
- Error was thrown due to data returned in
list
form instead of matrix form. Instead, optional weights are returned as an attribute, so the data is always consistently returned innumeric
matrix form. Frankly, it is unclear how this bug was tolerated except callingse_contrast_stats()
withhandle_na="none"
. - The argument
na_value
was not being passed tohandle_na_values()
which is important for certain types of data that have high background noise floor values, for example proteomics SpectrumMill has background around 20, also this is in log2 units.
- Error was thrown due to data returned in
changes to existing functions
-
- now properly passes
na_value
tohandle_na_values()
- now properly passes
-
- Nnow returns only
numeric
matrix, and optionally includes a weight matrix as an attribute named"weights"
.
- Nnow returns only
jamses 0.0.42.900
changes to existing functions
-
- changed argument default
noise_floor_value=NA
tonoise_floor_value=0
so that values are not lost during this process by default. Rare change to default argument, intended to be more consistent with “default” expectations of this function. - new argument
useMedian=FALSE
ensures the default behavior is to calculate row group mean and not median. - the
groupFunc()
is now called usingjamba::call_fn_ellipse()
which ensures that arguments are only passed when permitted by that function, either by named argument, or by...
. This change makes it easier to use a custom function which may not have to accept additional arguments.
- changed argument default
-
- new argument option
group_func_name="mean"
which enables basicmean()
across group values. - help docs were reworded to clarify the
group_func_name
anddata_transform
arguments.
- new argument option
jamses 0.0.41.900
new functions
-
- This function adds alternative labels at the top of a rendered
Heatmap
, sometimes a cleaner alternative than using colored annotation boxes. - The specific scenarios to use this function are described in the help docs, and this function may be updated to be more efficient in future.
- Although this function can only be applied after drawing a
Heatmap
, and requires some blank whitespace inside that heatmap. Nonetheless, this function does make remaining steps substantially more convenient than otherwise possible, and the output looks quite nice for a publication-quality final figure.
- This function adds alternative labels at the top of a rendered
-
- a helper function intended to be used with
heatmap_column_group_labels()
to identify thegrid
layout elements defined by the drawnHeatmap
in the current display device.
- a helper function intended to be used with
changes to existing functions
-
- The heatmap title was altered so the centering label uses the pattern
"centered within *centerby_colnames*, versus *control_label*"
- help docs were expanded and re-organized to improve clarity.
- The heatmap title was altered so the centering label uses the pattern
jamses 0.0.40.900
changes to existing functions
-
- MAJOR: default argument
handle_na="partial"
changed tohandle_na="full1"
. This change is done to “recover” all comparisons where one group has entirelyNA
values, by substituting one numeric value for one replicate in the group in order to avoid mis-representing the variability of the group. Meanwhile, groups withnumeric
andNA
values together will use onlynumeric
values in the group, without substituting anumeric
replacement. - New argument
rowData_colnames
to retain gene annotation columns forse
input where the rownames are not sufficient description of each row. This argument removes the need to merge additional annotation afterward. - Arguments are now fully documented.
- handles empty
assay_name
by returningNULL
for each empty assay. Recovers when multipleassay_name
values are supplied but some are missing. It does not recover when no assays are found. - Added minor
verbose
output indicating eachassay_name
.
- MAJOR: default argument
-
run_limma_replicate()
,ebayes2dfs()
- New argument
rowData_df
to support new argumentrowData_colnames
inse_contrast_stats()
. This argument merges annotations into the resulting top tables so that gene annotations can be retained per row.
- New argument
jamses 0.0.39.900
bug fixes
-
- visual glitch apparent in version R-4.2.3, the border color for color legend boxes was not black, so new argument
legend_border_color
was added, and is applied to all legends as relevant.
- visual glitch apparent in version R-4.2.3, the border color for color legend boxes was not black, so new argument
se_normalize()
- Fixed minor issue withif (class(x))
logic to avoid error when comparing multiple values in anif()
statement.
jamses 0.0.38.900
new functions
-
- convenience function to convert contrast names to corresponding
sedesign
object. - Its core utility is that it provides a way to test whether two-way or higher contrasts are mathematically equivalent.
- It may also be useful for test cases to produce a quick
sedesign
with only a specific set of contrasts. - In future
plot_sedesign()
will display the design with layout similar tovcd::mosaic()
, then will use block arrows to indicate contrasts across groups.
- convenience function to convert contrast names to corresponding
new methods
-
contrast_names()
- enables quickly retrieving the vector of contrasts in a
sedesign
object - also enables re-defining contrasts given a specific vector of contrast names.
- enables quickly retrieving the vector of contrasts in a
unit tests
-
contrast2comp()
andcomp2contrast()
are first. - will provide unit tests for all core functions.
jamses 0.0.37.900
changes to existing functions
-
- new argument
color_floor
passed tocolorjam::col_div_xf()
which optionally prevents coloring numeric values below the floor, therefore starts the color gradient at this value up tocolor_max
.
- new argument
jamses 0.0.36.900
bug fixes
-
-
row_split
passed as integer value whencluster_rows=FALSE
caused an error: “Error: Length or nrow ofrow_split
should be same as nrow ofmatrix
.” The error was caused because rows cannot be split into subgroups in the absence of a row clustering. - The same error for
column_split
was also present, and was fixed. - Resolution is that
row_split
orcolumn_split
is converted toNULL
when there is no respective row or column dendrogram or clustering function.
-
changes to existing functions
-
- Default format for mgm and mean is “num” instead of “int”, which assumes log2-scaled values that are generally at or less than 20.
-
- the default gap between row annotations is defined by the ComplexHeatmap option
"ROW_ANNO_PADDING"
, so the gap between heatmap and annotation is the same as the gap between two annotations. In future the gap width might become a separate option but not just yet.
- the default gap between row annotations is defined by the ComplexHeatmap option
jamses 0.0.35.900
changes to existing functions
Dependency on jamma
bumped to version 0.0.28.900
for new arguments to naControlAction
, see below.
-
New arguments for data centering:
controlFloor
,naControlAction
,naControlFloor
controlFloor
is anumeric
value for optional noise floor used for control groups during centering. If the control group mean/median value is belowcontrolFloor
, it is set tocontrolFloor
to define a minimum baseline for centering. It is mostly useful when the effective noise floor for a platform is above zero.naControlAction
andnaControlFloor
handle the specific case where allcontrolSamples
haveNA
values for a particular row. By default, all centered values would becomeNA
thereby making platform measurements disappear from the heatmap. Sometimes it is preferable to define another reference so the non-NA values can be indicated in the heatmap.See
jamma::centerGeneData()
for more information.-
The most common alternatives:
-
naControlAction="row"
: uses the remaining non-NA values from that row -
naControlAction="floor"
: assigns thenumeric
valuenaControlFloor
, which is useful when the reference value should be a known baseline limit of detection.
-
-
- Argument
numShrinkFunc
now includesna.rm=TRUE
by default.
- Argument
jamses 0.0.34.900
changes to existing functions
-
- new argument
use_raster=TRUE
which allows turning it off. When R packagemagick
is not installed, on some systems it causes an error when trying to render ComplexHeatmap withuse_raster=TRUE
. Usinguse_raster=FALSE
avoids the error.
- new argument
-
- small update to enforce
mixedSortDF(..., honorFactor=TRUE)
to ensure thatdata.frame
columns that containfactor
values will use thelevels
to define the proper sort order.
- small update to enforce
jamses 0.0.33.900
package dependencies
-
ComplexHeatmap
now requires version2.13.4
or higher, and points to the Github repository instead of Bioconductor. This change is required for R-3.6.1 whose matching Bioconductor version 3.10 only offers ComplexHeatmap version 2.2.0, and throws an error withheatmap_se()
when rendering row annotations. -
jamba
dependency now requires version0.0.88.900
, to pick up the subtle but important change tomixedSort()
-
colorjam
was bumped to version0.0.23.900
for consistency.
jamses 0.0.32.900
new functions
-
se_collapse_by_column()
which is the equivalent ofse_collapse_by_row()
except that the default math is much simpler, to perform mean calculation with optional outlier removal prior to calculating the mean. - new experimental function
shrink_df()
as potential drop-in replacement forshrinkDataFrame()
. This new function is intended to have simpler processing, with less complicated history thanshrinkDataFrame()
.
changes to existing functions
- minor updates to help text for
se_collapse_by_row()
, added@family
to this and other functions. -
se_contrast_stats()
was changed from usingmatrixStats::rowMaxs()
instead to useapply(x, 1, max, na.rm=TRUE)
- due to persistent Segfaults caused bymatrixStats::rowMaxs()
on multiple architectures (OSX, linux) and with recent package versions, at least for R-3.6.1.
jamses 0.0.31.900
new functions
-
names_contrast2comp()
andnames_comp2contrast()
are analogous tocontrast2comp()
andcomp2contrast()
except they operate on the object names, making it convenient to wrapper aroundlist
objects where the list names are the contrasts or comps.
changes to existing functions
-
- argument
normgroup
now acceptscolnames(colData(se))
to define normgroups by column annotations.
- argument
bug fixes
-
fixed apparent regression in
groups_to_sedesign()
for interaction contrasts- version 0.0.30.900 apparently introduced regression bug in interaction contrasts that returns only the interactions in
list
form instead of all contrasts insedesign
form. Short-lived, fortunately. - Some debug potential was added with
verbose=TRUE
, that prints iterative contrast name generation, with indent to help show when the function is iteratively calling itself.
- version 0.0.30.900 apparently introduced regression bug in interaction contrasts that returns only the interactions in
-
- fixed bug with user-supplied
row_names_gp
orcolumn_names_gp
that does not also definefontsize
. - new arguments
cluster_column_slices=FALSE
,cluster_row_slices=FALSE
to allow this behavior to be overridden when relevant
- fixed bug with user-supplied
changes to existing functions
-
- moved substantial logic out, and into
process_sestats_to_hitim()
, which handled the hit incidence matrix, and deciding which rows are included in the heatmap with defaultrows=NULL
. - quality of life change: when there is only one
assayNames(se)
, thenassay_name
defaults to the only possible value, instead of acting like it couldn’t possibly know what I mean. Haha. -
legend_at
is smarter whencenterby_colnames=FALSE
and there is no data centering, the legend only includes negative values when data contains negative values. -
legend_labels
are smarter whencolor_max <= 1
, it uses the numerical value without transformation. - vastly expanded the examples and the param help text.
- moved substantial logic out, and into
new functions
-
- previously internal logic to
heatmap_se()
, it surpassed the amount of logic hidden inside a function, especially when it’s used twice.
- previously internal logic to
jamses 0.0.30.900
changes to existing functions
-
- removed duplicated
merge()
called across statsdata.frame
results. - implemented
normgroups
for independent model fit among subsets of samples.
- removed duplicated
jamses 0.0.29.900
bug fixes
-
heatmap_se()
did not properly checkcluster_columns
forfunction
input, so it did not allow using a custom clustering function.
jamses 0.0.28.900
bug fixes
-
se_contrast_stats()
withposthoc_test="DEqMS"
and only one contrast threw an error"Error in fit$sca.t[results.table$gene, coef_col] : subscript out of bounds"
. The error actually occurs inebayes2dfs()
the function that converts statistical model fit into a list ofdata.frame
objects.- The
DEqMS
package producesnumeric
matrix results that it also summarizes into adata.frame
viaDEqMS::outputResult()
. When there is only one contrast, many of thenumeric
matrix objects also have one column, and R apparently dropped thecolnames()
for thesca.t
matrix, causingDEqMS::outputResult()
to fail when argumentcoef_col
is given thecharacter
name of the contrast instead of theinteger
column number. - The contrast name is preserved in the
coefficients
matrix, so the contrast is converted to column number, then passed ascoef_col
, and resolves the error. - Calls to
DEqMS::outputResult()
useinteger
values for argumentcoef_col
. - Some internal step in
DEqMS
does not properly handle data with only one contrast, probably forgot to includedrop=FALSE
in matrix subsetting.
- The
changes to existing functions
-
groups_to_sedesign()
updates- input
SummarizedExperiment
is now recognized, usinggroup_colnames
to define the experimental designdata.frame
. Data can also be subset usingisamples
for example. - new argument
contrast_names
to provide specific contrasts upfront, therefore skipping all internal logic of defining contrasts. - internal values
max_depth
andfactor_order
are validated using the designdata.frame
such that it will not pursuemax_depth
norfactor_order
higher than the number of design columns available. - When there are no valid contrasts for the given design groups, the function returns empty contrasts instead of throwing an error. One could argue that an error is actually preferred, since it forces the input data to be valid, and forces the user to review the data.
- input
contrast2comp()
andcomp2contrast()
were reworked to handle alternate delimiters, which mainly involved wrapping the delimiter in proper regular expression patterns. During the process, obscure testing showed certain delimiters cause problems with regular expressions, so those problems were pre-emptively fixed. The help docs were expanded somewhat.sestats_to_df()
was refactored to be cleaner, correcting the issue of colname mislabling whendimname_order
was different than expected. The function now works smoothly with anydimname_order
values, and acceptscharacter
orinteger
input to specify the dimname ordering.
new functions
-
format_hits()
is an internal function, but it might be a useful convenience function so it is exported. It is called bysestats_to_df()
to provide a summarydata.frame
of statistical results. It converts a hit vector into either:- a summary string with hits/up/down,
- one integer count of hits, or
- a vector of integers for c(“hits”, “up”, “down”)`.
jamses 0.0.27.900
bug fixes
-
groups_to_sedesign()
threw an error in some circumstances when the inputifactors
only contained one column, but the column contained levels with delimiters recognized byfactor_sep
, and some levels without the delimiter. The error was:"Error in combn(iMatch, 2) : n < m"
- The workaround was to supply levels with consistent delimiters, or a design that used more than one column initially.
- However, the R code was adjusted to handle these situations properly without workarounds.
jamses 0.0.26.900
updates to existing functions
-
heatmap_se()
updates:- still in progress: several new arguments intended to customize exact font size and grid size for color legends, annotations, around the heatmap.
- new arguments:
show_heatmap_legend
,show_top_legend
,show_left_legend
intended to allow hiding various color legends, mostly to save plot space. -
all_sample_colors
sometimes does not match left or top annotation names, and now will create categorical colors in more scenarios. - new argument
mark_rows
to enable optionalComplexHeatmap::anno_mark()
row labels for a subset of heatmap rows.mark_labels_gp
to customize the displayed font. - new argument
right_annotation
for custom annotation - new arguments
show_top_annotation_name
,show_left_annotation_name
,left_annotation_name_rot
to customize these label positions. - new arguments
mark_rows
andmark_labels_gp
used to defineanno_mark()
call-out labels for a subset of rows in the heatmap. This step is difficult to define upfront when the data inse
is subset by some other aspect ofheatmap_se()
, sinceanno_mark()
requires numeric index positions for each label, and the order of rows may not be known upfront.
minor bug fixes
-
hit_array_to_list()
was returningNA
values when thesestats$hit_array
also includedNA
values. The function now removes anyNA
values before returning each vector. -
heatmap_se()
argumentalt_sestats
was throwing an error when supplied with anumeric
matrix instead ofsestats
object, this input type was been corrected, and the code slightly refactored for clarity. -
heatmap_se()
was passing multiple values to the left_annotation argumentshow_legend
, which only accepts onelogical
value. This bug was corrected.
jamses 0.0.25.900
updates to existing functions
-
heatmap_se()
updates:- new argument
data_type
used in color legend title, and heatmap title. - new arguments
legend_at
,legend_labels
to customize heatmap color gradient legend positions and labels. - data centering can be turned off with
centerby_colnames=FALSE
. -
sestats
andalt_sestats
can be supplied as incidence matrices directly, without requiring them to be insestats
orhit_array
formats. -
top_colnames
can be hidden withtop_colnames=FALSE
; also whencolData(se)
is empty,top_annotation
is hidden. -
assay_name
may contain multiple values to define gene hits insestats
; however only the first matchingassay_name
innames(assays(se))
is used for the heatmap data.
- new argument
jamses 0.0.24.900
updates to existing functions
-
heatmap_se()
new argumentrow_subcluster
:intended to help drill-down into a subset of heatmap rows by using elements of
row_split
as returned byjamba::heatmap_row_order()
.-
The rows are split by
row_split
using the full heatmap data, and can be split by…- cluster number using
row_split=5
integer values, - annotation columns in
rowData(se)
, -
character
vector, or data.frame
- cluster number using
In any case,
heatmap_row_order()
returns alist
ofrownames(se)
, which is used to create the heatmap. In fact, rows are displayed in the same order, but without any associated dendrogram even if there is a dendrogram associated with the subset of rows.row_subcluster
can be aninteger
vector referring to the row list element in order, orcharacter
vector of one or morenames()
associated with the row list elements.
heatmap_se()
argumentscutoff_name
andalt_cutoff_name
default values changed toNULL
, so hits insestats
andalt_sestats
will use all cutoffs by default, unless specified otherwise.
jamses 0.0.23.900
updates to existing functions
-
heatmap_se()
was updatedisamples
is applied to subset the columns displayed in the heatmap, however it no longer subsets data during the data centering step. This change allows centering by patient with time 0 as the control point, which produces a stripe of0
values for each patient at time zero, all other times become difference from time zero. The heatmap can then useisamples
with only samples not at time zero.-
new argument
subset_legend_colors=TRUE
will now filter colors in thetop_annotation
andleft_annotation
color legends to remove categorical colors which are not represented in the data being displayed.- This update prevents showing a huge list of categorical colors that are not relevant to the heatmap. Note that even when two heatmaps are added together (with ComplexHeatmap
+
or%v%
) the amazing coding in ComplexHeatmap will merge together color legends by annotation color name, showing only the unique set of colors. - It seems that ComplexHeatmap retains all colors in the heatmap object, but hides colors that are unused. The
heatmap_se()
code was actually encoding theat
color labels for all colors regardless of which colors were present. So it is unclear the benefit of this option, but the function itself was improved.
- This update prevents showing a huge list of categorical colors that are not relevant to the heatmap. Note that even when two heatmaps are added together (with ComplexHeatmap
color legends are displayed either in order of factor levels, or sorted using
jamba::mixedSort()
new argument
row_anno_fontsize
to control the specific fontsize for row annotation labels. These labels appear for row annotations, and appears beside the column labels on the heatmap.
bug fixes
-
heatmap_se()
whensestats
is supplied butse
does not contain all rows present insestats
hit array, it produced an error. -
heatmap_se()
when annotations with factor columns were subset to remove a color from the legend, the color legend was still encodingat
labels for the full set of colors, causingComplexHeatmap::HeatmapAnnotation()
to print a warning message.
new functions
-
choose_annotation_colnames()
is a wrapper function to hold the simple logic for choosing annotation columns to include forheatmap_se()
. By default it chooses columns with at least one repeated value, at least two unique values, and only includes the first column with matching cardinality. The last criterion ensures it won’t display six columns which show the same pattern of information for all samples.
jamses 0.0.22.900
bug fixes
-
hit_array_to_list()
was returningmatrix
for single contrasts, due toapply()
tendency to simplify output to vector. The method was changed to use a different approach guaranteed to returnlist
. -
heatmap_se()
had a typo in an edge condition when usingalt_sestats
, which has been corrected.
jamses 0.0.21.900
bug fixes
-
heatmap_se()
was not properly auto-detectingshow_row_names
, which only worked whenrows
was supplied specifically. Now it works whenrows
is not supplied and/or onlysestats
is also supplied.
jamses 0.0.20.900
changes to existing functions
-
sestats_to_df()
was updated:- now reports
""
whenever thehit_array
value is NA, because the NA in this context means the cutoff was not tested for statistical hits. This situation happens when usingint_adjp_cutoff
,int_fold_cutoff
,int_mgm_cutoff
that differ from the main contrast cutoffs. Therefore, reporting0
implies the threshold was tested and no hits were found, which is incorrect. - also whenever there are zero hits it reports
"0 hits"
and not"0 hits (0 up, 0 down)"
. This change saves unnecessary text, also makes it easier to see which entries have hits.
- now reports
jamses 0.0.19.900
changes to existing functions
-
heatmap_se()
was updated:- The logic of obtaining a hit list for a contrast must account for multiple cutoffs, particularly when interaction contrast cutoffs differ from pairwise cutoffs. This logic was pushed into new function
hit_array_to_list()
. - The default color legend for
left_annotation
andtop_annotation
now includescolor_bar="discrete"
for numeric annotation columns. The legend will display discrete color breaks instead of a continuous color bar. The breaks are defined in the color function, in most cases the recommended method is callingplatjam::design2colors()
, which typically defines reasonable color breaks.
- The logic of obtaining a hit list for a contrast must account for multiple cutoffs, particularly when interaction contrast cutoffs differ from pairwise cutoffs. This logic was pushed into new function
new functions
-
Input is
hit_array
fromsestats
, as output fromse_contrast_stats()
.-
It optionally takes a subset of the following, which match the
dimnames()
inhit_array
:cutoff_names
contrast_names
assay_names
-
Then it iterates each
contrast_names
:- It returns one named vector of hits for that contrast, named by the measurement, with values
-1
or1
indicating the direction. - When multiple
assay_names
orcutoff_names
are involved, it combines results into one set representing the union of hits.
- It returns one named vector of hits for that contrast, named by the measurement, with values
The intent is to retrieve a
list
of hits acrosscontrast_names
, regardless if there is one or morecutoff_names
orassay_names
.
jamses 0.0.18.900
changes to existing functions
-
sestats_to_df()
was updated:-
when
se_contrast_stats()
uses different interaction statistical thresholds from the pairwise thresholds, the outputhit_array
contains NA values.- previously any non-empty vector simply counted the length
- new behavior is to count non-NA entries
- future “fix” will be for the cell to have length=0, with no
NA
. This situation only happens when*_cutoff
andint_*_cutoff
define different thresholds.
-
jamses 0.0.17.900
functions removed
-
call_fn_ellipse()
was removed and placed in thejamba
package, package dependency was bumped up tojamba(>= 0.0.81.900)
bug fixes
-
heatmap_se()
error when not suppliedrows
norsestats
, now defaults to show allrownames(se)
. -
heatmap_se()
error whencolorjam
package not previously loaded, added missing package prefix.
changes to existing functions
-
heatmap_se()
argumentnormgroup_colname=NULL
andcenterby_colnames=NULL
, instead of previous defaults which should not be used unless relevant to the design.
changes to existing functions
-
save_sestats()
was updated to prioritize sortingrow_type
column headers first in the output Excel table, previously the method looked for known patterns, which obviously failed whenrow_type
differed from those known patterns. Now therow_type
value itself is used in addition to the expected patterns.
jamses 0.0.16.900
changes to existing functions
-
heatmap_se()
new argumentcorrelation
- when
correlation=TRUE
it calculates a sample correlation matrix, and will display a correlation heatmap. - The option was added to this function in order to share much of the remaining logic involving data centering, and heatmap column annotations.
- Note that it is possible to create a correlation heatmap using a subset of statistical hits, which although typically creates a nice-looking correlation structure, may not be that informative since it only includes statistical hits.
- when
bug fixes
-
heatmap_se()
error whenrow_split
was an integer number of clusters, andcluster_rows
is afunction
, which is not permitted byComplexHeatmap::Heatmap()
. Instead, whencluster_rows
orcluster_columns
is a function, the function is evaluated upfront, so the resulting dendrogram or hclust is passed toHeatmap()
, which does permit integer row and column split.- Note that when
row_split
orcolumn_split
are not single numeric values, thefunction
must not be evaluated, otherwise that process fails. Haha. Makes sense. Now both scenarios are handled, hopefully this covers all scenarios.
- Note that when
jamses 0.0.15.900
changes to existing functions
-
- Argument
row_split
now also accepts named vector where names correspond to rownames in the heatmap. - Added help text for all function arguments.
- Argument
-
save_sestats()`
- new argument
use_assay_suffix=TRUE
will try to add abbreviatedassay_names
to the end of each Excel sheet name, when there is more than one uniqueassay_names
value. - The Excel sheet names are required to be unique, and have no more than
max_nchar_sheetname
characters in each string. The sheet names are progressively shortened as necessary, then are made unique when required by callingjamba::makeNames()
.
- new argument
jamses 0.0.14.900
changes to existing functions
This update attempts to extend se_contrast_stats()
with methods from the Bioconductor package DEqMS. This package models error as a function of PSM_count, which is the number of peptide score matrix entries that were combined to formulate an abundance estimate for each protein. The approach is analogous to using voom
as an extension of limma
. In fact, DEqMS is another extension of the limma
error model.
-
- argument
posthoc_test=c("none", "DEqMS")
to allow post-hoc tests such asDEqMS
to be applied after the corelimma
steps. This argument is pushed intorun_limma_replicates()
. - argument
posthoc_args
is a list named by theposthoc_test
value, containing a list of named elements used as arguments for the relevant post-hoc test functions.
- argument
-
- argument
posthoc_test=c("none", "DEqMS")
to allow post-hoc tests such asDEqMS
to be applied after the corelimma
steps. This argument is also passed toebayes2dfs()
. - argument
posthoc_args
is a list named by theposthoc_test
value, containing a list of named elements used as arguments for the relevant post-hoc test functions. - argument
trim_colnames
added"sca.t"
as additional column to omit by default.
- argument
-
- argument
lmFit4
is optional and intended to contain additional output from post-hoc methods such asDEqMS::spectraCounteBayes()
. - argument
trim_colnames
added"sca.t"
as additional column to omit by default. - The order of colnames in the
data.frame
returned was updated to be more consistent for different formats. Specifically, whenposthoc_test="DEqMS"
the P-value columns sort with the"sca.P.Value"
and"sca.adj.pval"
columns before the usual limma columns. - when calling
mark_stat_hits()
it updates the expected colnames whenposthoc_test="DEqMS"
, usingadjp_colname="sca.adj.pval"
andp_colname="sca.P.Value"
. In this way, hits are filtered by the DEqMS adjusted P-value and P-value, respectively. All other statistics are carried over fromlmFit3
and work as before.
- argument
-
- The incidence matrix data for
alt_sestats
was updated to handle proper hit matrix, in haste to make theleft_annotation
extensible the sestats hit matrix was used twice. This is why the package is private, for now.
- The incidence matrix data for
jamses 0.0.13.900
changes to existing functions
-
heatmap_se()
was modified to handle updatedsample_color_list
that containsfunction
output fromcirclize::colorRamp2()
.
jamses 0.0.12.900
changes to existing functions
-
heatmap_se()
was modified to accomodate other object types, such as"MethyLumiSet"
which does not haverowData()
,colData()
,assays()
, accessor functions, instead uses a common alternative for Bioconductor objects:featureData()
,phenoData()
andassayData()
. Theheatmap_se()
changes will try to use those accessor functions for any object whose class does not contain"SummarizedExperiment"
.
jamses 0.0.11.900
new functions
-
se_detected_rows()
is a new function to assist the process of defining “detected” rows which are suitable for downstream analysis. The rules are essentially simple heuristics based upon the total number of replicates, group replicates, fraction of group replicates, and number of sample groups where a row has “valid” measurements. In this case “valid” is defined as meeting a minimum abundance threshold, and is notNA
.
jamses 0.0.10.900
new functions
-
contrast2comp()
,comp2contrast()
are two reciprocal functions intended to convert long contrast names to a short comparison form called a"comp"
. The"comp"
can be converted back to the full contrast name. In cases where a two-way contrast can be written in two equivalent forms, argumentfactor_order
can be used to define the desired order. The statistical results are the same, but sometimes it helps to have specific ordering. See help docs and examples for details, and worked examples. Main driving motivation:ComplexHeatmap::Heatmap()
labels are too long to fit onscreen without making other non-ideal adjustments to plot dimensions.venndir::venndir()
labels are also too long.-
Excel worksheets are limited to 31 characters, a typical two-way contrast with 3-characters for each factor level uses 35 characters, the
"comp"
for uses only 15.- contrast:
"(Aaa_Bbb-Ccc_Bbb)-(Aaa_Ddd-Ccc_Ddd)"
(35 characters) - comp:
"Aaa-Cca:Bba-Dda"
(15 characters)
- contrast:
jamses 0.0.9.900
changes to existing functions
-
save_sestats()
was updated to usejamba-version-0.0.79-900
newwriteOpenxlsx()
, which uses a substantially faster method of saving worksheets, and allows building a multi-sheet Workbook prior to saving to a file. This change avoids the time-consuming, repeated steps: loading, saving, updating, saving, loading, updating, saving, etc. Previously, each successive worksheet added to an Excel file took longer to load and save. For a 15,000 row, 8 columndata.frame
, saving once took 3 seconds, but saving 12 worksheets took 15 minutes! The new approach took about 25 seconds.
new functions
-
heatmap_se()
is a work in progress, but attempts to automate display of expression data as a heatmap usingComplexHeatmap::Heatmap()
. By default it uses statistical hits defined insestats
output fromse_contrast_stats()
. -
call_fn_ellipsis()
is copied frommultienrichjam::call_fn_ellipsis()
. This function allows passing ellipsis argument...
to another function, where that function does not allow...
and therefore all arguments must be strictly defined. Any arguments in...
which are not defined in the other function are removed before calling that function.
jamses 0.0.8.900
changes to existing functions
-
groups_to_sedesign()
new behavior- two-way interaction contrasts now retain the order of first-order contrasts, preferring to alter the reverse
rev(factor_order)
in order to build interaction contrasts on top of existing contrasts. - some verbose output now requires
(verbose >= 2)
- added verbose output for two-way contrasts removed because they are equivalent, but written with different order of comparisons.
- two-way interaction contrasts now retain the order of first-order contrasts, preferring to alter the reverse
jamses 0.0.7.900
changes to existing functions
-
groups_to_sedesign()
new argumentdefault_sort=c("asis", "sort_samples", "mixedSort")
:Note in all cases, a
factor
column is sorted by its factor levels."asis"
: will convert any non-factor
column tofactor
, with levels defined in the order they appear in the input data. Any existingfactor
column will remain as-is."sort_samples"
: will callsort_samples()
in order to recognize control terms, and preferentially place them first, for example “Control”, or “Wildtype”. All other terms are sorted byjamba::mixedSort()
for alphanumeric sorting.-
"mixedSort"
: will usejamba::mixedSortDF()
to sort group columns:-
factor
columns are sorted according to factor levels -
character
columns are alphanumeric sorted usingjamba::mixedSort()
-
To influence the sort order,
data.frame
columns should havefactor
values with defined levels in order. Otherwise by default, the first entries in thedata.frame
provided will become the first factor levels.
validate_sedesign()
default argumentverbose=FALSE
, also silencing other internal messages which were previously always turned on.-
matrix_normalize()
new argumentnormgroup
normalizes subsets of matrix columns independently, then re-assembles the original matrix.Purpose is convenience of maintaining one data matrix, while also applying normalization independently on each sub-matrix. The attributes returned from the normalization method are also combined, for example the normalization factors
"nf"
reported byjamma::jammanorm()
are combined in proper order.-
This feature is distinct from batch adjustment, in that the data in each sub-matrix are never normalized relative to each other.
- For example, one could normalize total RNA-seq and nascent 4sU-seq data independently, without expectation that the two would ever have a common frame of reference to normalize one relative to another.
- Similarly, one could normalize each tissue type independently, which may be appropriate when analyzing data that contains very different mammalian tissue organ samples, such as muscle and brain. It would generally not be appropriate to use quantile normalization across muscle and brain samples, since the overall pattern and distribution of expression values is not expected to be similar. Quantile normalize assumes (and imposes) a common distribution, by adjusting mean expression signal at each quantile to a common mean expression across all samples.
- For a rough approximation of cross-tissue normalization, one could apply
"quantile"
normalization within eachnormgroup
defined by tissue type, then apply"jammanorm"
median normalization to apply a linear adjustment of signal across tissue types. The median normalization does not affect distribution, thus will not affect intra-tissue contrasts, except by adjusting its overall signal which may change downstream assumptions regarding signal thresholds. It is still not advise to compare directly across tissue types, except in some cases a two-way contrast may be appropriate (comparing intra-tissue fold change to intra-tissue fold change.)
se_normalize()
also gained new argumentnormgroup
, although this argument is merely passed through tomatrix_normalize()
.
new functions
-
sestats_to_df()
- simple method to convertsestats
output fromse_contrast_stats()
to adata.frame
summary with number of hits for each comparison. It should be suitable forkable()
output, with row groups by cutoff.
jamses 0.0.6.900
changes to existing functions
-
- new argument
sedesign
allows providingSEDesign
output fromgroups_to_design()
. Whenisamples
is also provided, it will be used to subset thesedesign
as appropriate, usingvalidate_design()
.
- new argument
validate_sedesign()
was updated to prevent the annoying warning when comparing vectors of two different sizes.
new functions
-
save_sestats()
is a wrapper function intended to help export stat summary tables into Excel usingjamba::writeOpenxlsx()
. This function defines useful column style functions for statistical data types, such as P-values, fold changes, log fold changes, etc. It also saves all statistical contrasts, with one contrast in each worksheet. The worksheet names are limited to 31 characters, the limitation of MS Excel.
jamses 0.0.4.900
new functions
-
se_collapse_by_row()
is a new function intended to collapse rows, for example peptides into protein abundance.
jamses 0.0.1.900
New functions
matrix_normalize()
- wrapper function to normalize numeric matrix datase_normalize()
- function to normalize SummarizedExperiment assay datagroups_to_sedesign()
- function to convert experiment group intoSEDesign
with proper design matrix, and contrast matrix. This function was ported fromsplicejam::groups2contrasts
.se_contrast_stats()
- wrapper function to perform numerous statistical contrasts, in this case callinglimma
methods with or without limma-voom.voom_jam()
- lightly extended variation oflimma::voom()
solely to enable presence ofNA
values without distortion of the model fit.handle_na_values()
- function to manage presence of NA values using one of several appropriate strategies.run_limma_replicate()
- wrapper function to performlimma
moderated t-test steps, and to produce consistent, fully-annotated summarydata.frame
.ebayes2dfs()
- internal wrapper function to convertlimma::eBayes()
output to a list ofdata.frame
tables.mark_stat_hits()
- wrapper function used to apply one or more statistical thresholds to mark statistical hits among rows of statistical contrast results.update_function_params()
- internal function used to apply user-specific parameters to a pre-defined set of parameters stored in function formal arguments.update_list_elements()
- internal function called byupdate_function_params()
as a generic way to update individual elements in alist
with user-defined values.
New S4 object class
SEDesign
: SummarizedExperiment design object
-
design
: numeric matrix indicating the experiment design -
contrasts
: numeric matrix indicating statistical contrasts -
samples
:character
vector of sample replicates, also equivalent torownames(design)
Some constraints:
-
rownames(design)
contains sample replicate identifiers, intended to correspond tocolnames(SummarizedExperiment)
-
colnames(design)
contains the name of each sample group -
rownames(contrasts)
contains the name of each sample group - Enforced constraint:
colnames(design) == rownames(contrasts)
Related accessor and assignment methods:
[
subset operations use[samples, groups]
to enable re-ordering, or subsetting data by sample and/or groups. For example, one may want to remove a technical outlier sample from an analysis, or include only a subset of groups in downstream statistical contrasts.-
validate_sedesign()
contains the full logic for constraints and subset operations:- the order of
samples
must matchrownames(design)
- the order of
colnames(design)
must matchrownames(contrasts)
- when a subset of
samples
is defined,design
is updated to remove groups which are no longer represented bymin_reps
replicates. By default, a group with zero replicates is no longer a valid group - When a subset of groups is defined, perhaps by using a subset of samples as described above, the
contrasts
entries are evaluated to see which entries require the groups to be removed. For each contrast that requires a group being removed, the contrast is also removed. This step prevents retaining an incomplete contrast.
- the order of
samples()
will return theobject@samples
samples()<-
to assign new values toobject@samples
, which will also updaterownames(design)
accordingly. Note these values are expected to matchcolnames(SummarizedExperiment)
.groups()
will returncolnames(object@design)
, representing all sample groups defined.groups()<-
will assign values tocolnames(object@design)
, which will also updaterownames(object@contrasts)
if there are statistical contrasts already defined in theobject
.contrastnames()
will returncolnames(object@contrasts)
representing the name of each contrast.contrastnames()<-
will assign values tocolnames(object@contrasts)
, which may be useful to define a sensible label.design()
will return the design matrixobject@design
.design()<-
will assign a new design matrix toobject@design
, then will callvalidate_sedesign()
to enforce the relevant constraints. For examplecolnames(object@design)
must matchrownames(object@contrasts)
if contrasts are defined in the object.contrasts()
will return the contrast matrixobject@contrasts
.contrasts()<-
will assign a new contrast matrix toobject@contrasts
, then will callvalidate_sedesign()
to enforce the relevant constraints. For examplecolnames(object@design)
must matchrownames(object@contrasts)
.