Import one or more data.frame from Excel xlsx format

readOpenxlsx(
  xlsx,
  sheet = NULL,
  startRow = 1,
  startCol = 1,
  rows = NULL,
  cols = NULL,
  check.names = FALSE,
  check_header = FALSE,
  check_header_n = 10,
  verbose = FALSE,
  ...
)

Arguments

xlsx

character path to an Excel file in xlsx format, compatible with openxlsx::read.xlsx().

sheet

one of NULL, character, or integer vector, where: sheet=NULL will import every sheet; character is a vector of sheet names; and integer is a vector of sheet index values. The sheet names are determined with openxlsx::getSheetNames().

startRow

integer indicating the row number to start importing each sheet. Note that startRow can be a vector with length length(sheet), to specify the startRow for each sheet.

rows

integer vector indicating specific rows to import for each sheet. To specify different rows for each sheet, supply rows as a list of integer vectors.

check.names

logical indicating whether to call make.names() on the colnames of each data.frame.

check_header

logical indicating whether to test for presence of header rows, which are not column headers. When check_header=TRUE, this method simply tests for the presence of rows that have ncol different than the remaining rows of data in the given sheet. When header rows are detected, the values are assigned to column dimnames of the data.frame.

check_header_n

integer number of rows to test for header rows, only used when check_header=TRUE. This step is intended when the top row(s) contain fewer columns with headers, above actual column headers, for example the first row c("Sample", "", "", "Lane", ""), and the second row c("Name", "Type", "Label", "Name", "Type"). In this case the desired output is "Sample_Name","Sample_Type","Sample_Label","Lane_Name","Lane_Type"). This option default is FALSE due to the number of exceptions seen in real data.

verbose

logical indicating whether to print verbose output.

...

additional arguments are passed to openxlsx::read.xlsx().

Details

This function is equivalent to openxlsx::read.xlsx() with a few minor additions:

  1. It returns a list of data.frame objects, one per sheet.

  2. It properly reads the colnames with check.names=FALSE.

By default this function returns every sheet for a given xlsx file.

Some useful details:

  • Empty columns are not skipped during loading, which means a worksheet whose data starts at column 3 will be returned with two empty columns, followed by data from that worksheet. Similarly, any empty columns in the middle of the data in that worksheet will be included in the output.

  • When both startRow and rows are applied, rows takes priority and will be used instead of startRows. In fact startRows will be defined startRows <- min(rows) for each relevant worksheet. However, for each worksheet either argument can be NULL.