Export a data.frame to Excel xlsx format

writeOpenxlsx(
  x,
  file = NULL,
  wb = NULL,
  sheetName = "Sheet1",
  startRow = 1,
  startCol = 1,
  append = FALSE,
  headerColors = c("lightskyblue1", "lightskyblue2"),
  columnColors = c("aliceblue", "azure2"),
  highlightHeaderColors = c("tan1", "tan2"),
  highlightColors = c("moccasin", "navajowhite"),
  borderColor = "gray75",
  borderPosition = "BottomRight",
  highlightColumns = NULL,
  numColumns = NULL,
  fcColumns = NULL,
  lfcColumns = NULL,
  hitColumns = NULL,
  intColumns = NULL,
  pvalueColumns = NULL,
  numFormat = "#,##0.00",
  fcFormat = "#,##0.0",
  lfcFormat = "#,##0.0",
  hitFormat = "#,##0.0",
  intFormat = "#,##0",
  pvalueFormat = "[>0.01]0.00#;0.00E+00",
  numRule = c(1, 10, 20),
  fcRule = c(-6, 0, 6),
  lfcRule = c(-3, 0, 3),
  hitRule = c(-1.5, 0, 1.5),
  intRule = c(0, 100, 10000),
  pvalueRule = c(0, 0.01, 0.05),
  numStyle = c("#F2F0F7", "#B4B1D4", "#938EC2"),
  fcStyle = c("#4F81BD", "#EEECE1", "#C0504D"),
  lfcStyle = c("#4F81BD", "#EEECE1", "#C0504D"),
  hitStyle = c("#4F81BD", "#EEECE1", "#C0504D"),
  intStyle = c("#EEECE1", "#FDA560", "#F77F30"),
  pvalueStyle = c("#F77F30", "#FDC99B", "#EEECE1"),
  doConditional = TRUE,
  doCategorical = TRUE,
  colorSub = NULL,
  freezePaneColumn = 0,
  freezePaneRow = 2,
  doFilter = TRUE,
  fontName = "Arial",
  fontSize = 12,
  minWidth = getOption("openxlsx.minWidth", 8),
  maxWidth = getOption("openxlsx.maxWidth", 40),
  autoWidth = TRUE,
  colWidths = NULL,
  wrapCells = FALSE,
  wrapHeaders = TRUE,
  headerRowMultiplier = 5,
  keepRownames = FALSE,
  verbose = FALSE,
  ...
)

Arguments

x

data.frame to be saved to an Excel xlsx file.

file

character valid path to save an Excel xlsx file. If the file exists, and append=TRUE the new data will be added to the existing file withthe defined sheetName.

  • Note when file=NULL the output is not saved to a file, instead the Workbook object is returned by this function. The Workbook object can be passed as argument wb in order to add multiple sheets to the same Workbook prior to saving them together. This operation is intended to provide a substantial improvement in speed.

wb

Workbook object as defined in R package openxlsx. When this argument is defined, data is not imported from file, and instead the workbook data is used from wb. This option is intended to improve speed of writing several sheets to the same output file, by preventing the slow read/write steps each time a new sheet is added.

sheetName

character value less with a valid Excel xlsx worksheet name. At this time (version 0.0.29.900) the sheetName is restricted to 31 characters, with no puntuation except "-" and "_".

headerColors, columnColors, highlightHeaderColors, highlightColors, borderColor, borderPosition

default values for the Excel worksheet background and border colors. As of version 0.0.29.900, colors must use Excel-valid color names.

highlightColumns, numColumns, fcColumns, lfcColumns, hitColumns, intColumns, pvalueColumns

integer vector referring the column number in the input data.frame x to define as each column type, as relevant.

numFormat, fcFormat, lfcFormat, hitFormat, intFormat, pvalueFormat

character string with valid Excel cell formatting, for example "#,##0.00" defines a column to use comma-delimited numbers above one thousand, and display two decimal places in all numeric cells. See [https://support.microsoft.com] topic "Excel Create and apply a custom number format." or "Excel Number format codes" for more details. Some examples below:

  • "#,##0" : display only integer values, using comma as delimiter for every thousands place. The number 2142.12 would be represented: "2,142"

  • "###0.0" : display numeric values rounded to the 0.1 place, using no comma delimiter for values above one thousand. The number 2142.12 would be represented: "2142.1"

  • "[>0.01]0.00#;0.00E+00" : this rule is a conditional format, values above 0.01 are represented as numbers rounded to the thousandths position 0.001; values below 0.01 are represented with scientific notation with three digits. The number 0.1256 would be represented: "0.126" The number 0.001256 would be represented: "1.26E-03"

  • "[Red]#,###.00_);[Blue](#,###.00);[Black]0.00_)" : this format applies to positive values, negative values, and zero, in order delimited by semicolons. Positive values are colored red. The string "_)" adds whitespace (defined by "_") equale to the width of the character ")" to the end of positive values. Negative values are surrounded by parentheses "()" and are colored blue. Values equal to zero are represented with two trailing digits, and whitespace ("_") equal to width ")". The whitespace at the end of positive values and zero are used to align all values at the same decimal position.

numRule, fcRule, lfcRule, hitRule, intRule, pvalueRule

numeric vector length=3 indicating the breakpoints for Excel to apply conditional color formatting, using the corresponding style. Note that all conditional formatting applied by this function uses the "3-Color Scale", therefore there should be three values, and three corresponding colors in the corresponding Style arguments.

numStyle, fcStyle, lfcStyle, intStyle, hitStyle, pvalueStyle

character vector length=3 containing three valid R colors. Note that alpha transparency will be removed prior to use in Excel, as required. Note that all conditional formatting applied by this function uses the "3-Color Scale", therefore there should be three colors, which match three values in the corresponding Rule arguments.

doConditional

logical indicating whether to apply conditional formatting of cells, with this function only the background cell color (and contrasting text color) is affected.

doCategorical

logical indicating whether to apply categorical color formatting, of only the background cell colors and contrasting text color. This argument requires colorSub be defined.

colorSub

character vector of R colors, whose names refer to cell values in the input x data.frame.

freezePaneColumn, freezePaneRow

integer value of the row or column before which the Excel "freeze panes" is applied. Note that these values are adjusted relative by startRow and startCol in the Excel worksheet, so that the values are applied relative to the data.frame argument x.

doFilter

logical indicating whether to enable column filtering by default.

fontName

character default font configuration, containing a valid Excel font name.

fontSize

numeric default font size in Excel point units.

minWidth, maxWidth, autoWidth

numeric minimum, maximum size for each Excel cell, in character units as defined by Excel, used when autoWidth=TRUE to restrict cell widths to this range. Note that the argument colWidths is generally preferred, if the numeric widths can be reasonable calculated or anticipated upfront. When autoWidth=FALSE Excel typically auto-sizes cells to the width of the largest value in each column, which may not be ideal when values are extremely large.

colWidths

numeric width of each column in x, recycled to the total number of columns required. Note that when keepRownames=TRUE, the first column will contain rownames(x), therefore the length of colWidths in that case will be ncol(x) + 1.

wrapHeaders

logical indicating whether to enable word wrap for column headers, which is helpful when autoWidth=TRUE since it fixed the cell width while allowing the column header to be seen.

headerRowMultiplier

numeric value to define the row height of the first header row in Excel. This value is defined as a multiple of subsequent rows, and should usually represent the maximum number of lines after word-wrapping, as relevant. This argument is helpful when wrapHeaders=TRUE and autoWidth=TRUE.

keepRownames

logical indicating whether to include rownames(x) in its own column in Excel.

verbose

logical indicating whether to print verbose output.

...

additional arguments are passed to applyXlsxConditionalFormat() and applyXlsxCategoricalFormat() as relevant.

Value

Workbook object as defined by the openxlsx package is returned invisibly with invisible(). This Workbook

can be used in argument wb to provide a speed boost when saving multiple sheets to the same file.

Details

This function is a minor but useful customization of the openxlsx::saveWorkbook() and associated functions, intended to provide some pre-configured formatting of known column types, typically relevant to statistical values, and in some cases, gene or transcript expression values.

There are numerous configurable options when saving an Excel worksheet, most of the defaults in this function are intended not to require changes, but are listed as formal function arguments to make each option visibly obvious.

If colorSub is supplied as a named vector of colors, then by default text values will be colorized accordingly, which can be especially helpful when including data with categorical text values.

This function pre-configures formatting options for the following column data types, each of which has conditional color-formatting, defined numeric ranges, and color scales.

int

integer values, where numeric values are formatted without visible decimal places, and the big.mark="," standard is used to help visually distinguish large integers. The color scale is by default c(0, 100, 10000).

num

numeric values, with fixed number of visible decimal places, which helps visibly align values along each row.

hit

numeric type, a subset of "int" intended when data is flagged with something like a "+1" or "-1" to indicate a statistical increase or decrease.

pvalue

P-value, where numeric values range from 1 down near zero, and values are formatted consistently with scientific notation.

fc

numeric fold change, whose values are expected to range from 1 and higher, and -1 and lower. Decimal places are by default configured to show one decimal place, to simplify the Excel visual summary.

lfc

numeric log fold change, whose values are expected to be centered at zero. Decimal places are by default configured to show one decimal place, to simplify the Excel visual summary.

highlight

character and undefined columns to be highlighted with a brighter background color, and bold text.

For each column data type, a color scale and default numeric range is defined, which allows conditional formatting of cells based upon expected ranges of values.

A screenshot of the file produced by the example is shown below.

xlsx screenshot

Examples

# set up a test data.frame
set.seed(123);
if (FALSE) {
   set.seed(123);
   lfc <- -3:3 + rnorm(7)/3;
   colorSub <- nameVector(
      rainbow(7, s=c(0.8, 1), v=c(0.8, 1)),
      LETTERS[1:7])
   df <- data.frame(name=LETTERS[1:7],
      int=round(4^(1:7)),
      num=(1:7)*4-2 + rnorm(7),
      fold=2^abs(lfc)*sign(lfc),
      lfc=lfc,
      pvalue=10^(-1:-7 + rnorm(7)),
      hit=sample(c(-1,0,0,1,1), replace=TRUE, size=7));
   df;
   writeOpenxlsx(x=df,
      file="jamba_test.xlsx",
      sheetName="jamba_test",
      colorSub=colorSub,
      intColumns=2,
      numColumns=3,
      fcColumns=4,
      lfcColumns=5,
      pvalueColumns=6,
      hitColumn=7,
      freezePaneRow=2,
      freezePaneColumn=2,
      append=FALSE);
}