# 1 Background

The beachmat package provides a C++ API for handling submatrices generated by DelayedArray’s block processing mechanism. This enables Bioconductor packages to use C++ for high-performance processing of data stored in almost any matrix representation, ranging from dense ordinary matrices, sparse matrices from the Matrix package, file-backed matrices or even cloud-based matrices (e.g., HDF5Array, TileDBArray). Packages use the DelayedArray block processing to realize blocks of data into memory as sparse or dense matrices, which can then be easily consumed by C++ code via beachmat’s API. The aim is to abstract away the specific class (and even type!) of matrix object in R to facilitate development of C++ extensions.

# 2 Historical context

Back in 2017, we had two broad strategies for how to make our various C++ functions work with alternative matrix representations. The first approach was implemented in versions 1 and 2 of beachmat, which takes the Rcpp::RObject in C++, figures out what the underlying representation, and then chooses appropriate methods to read a vector of values from a given row or column. This worked reasonably well and allows people to write C++ code in a manner that is agnostic to the matrix representation. Most importantly, it was a plug-and-play replacement for the usual Rcpp .column() and .row() methods for matrix access, which allowed client packages to quickly get up and running with their existing C++ code.

The second strategy was to perform block-wise looping in R using DelayedArray::blockApply. This would read in a block of values as an ordinary matrix that would then be passed into the C++ code; this means that developers would only have to worry about handling ordinary (column-major, etc.) matrices in their C++ code. The problem was that realization to a dense matrix would sacrifice some of the efficiencies of a sparse matrix format. Also, implementing this would involve amore refactoring as the looping would be shared across both C++ and R - one loop over blocks in R, and another loop within each block in C++ to do the desired computation.

The equation has now changed with proper sparse support in DelayedArray. Block processing can now yield both dense and sparse submatrices (depending on the underlying representation), thus avoiding the previous loss of efficiency from coercion to a dense format. With R-based blockwise looping, we get easy access to BiocParallel-based parallelization; we avoid the smelly calls back into R from C++ that are necessary to handle matrices that are not of a known type; and we get a more user-tunable method of handling large matrix outputs via RealizationSinks. Block processing is now the preferred mechanism for handling large matrices for independent computations across rows or columns.

So, how does beachmat fit into this new paradigm? Inside the block processing loop, we may wish to call a C++ function that takes the block and operates on it. This block may be dense or sparse, but the C++ function may not care if it just wants to get, e.g., the individual row vectors out for further processing. In this context, beachmat can continue to provide agnostic data access from the realized blocks so that developers don’t have to worry about handling different formats in their C++ code. Alternatively, if the sparse format lends itself to a more efficient algorithm, beachmat also provides dedicated sparse classes to only operate on the non-zero values.

The beachmat package currently has several dependencies:

• The compiler should support the C++11 standard. This usually requires GCC version 4.8.1 or higher. You need to tell the build system to use C++11, by modifying the SystemRequirements field of the DESCRIPTION file:

  SystemRequirements: C++11
• Rcpp should be installed. You also need to ensure that Rcpp is loaded when your package is loaded. This requires addition of Rcpp to the Imports field of the DESCRIPTION file:

 Imports: Rcpp

… and a corresponding importFrom specification in the NAMESPACE file:

 importFrom(Rcpp, sourceCpp)

(The exact function to be imported doesn’t matter, as long as the namespace is loaded. Check out the Rcpp documentation for more details.)

beachmat is a header-only library, so linking is as simple as writing:

LinkingTo: Rcpp, beachmat

… in your DESCRIPTION file. (This ensures the inclusion of the Rcpp headers.)

In C or C++ code files, use standard techniques to include header definitions, e.g., #include "beachmat3/numeric_matrix.h" (see here for more details). Header files are available for perusal at the following location (enter in an R session):

system.file(package="beachmat", "include")
## [1] "/tmp/RtmpvcXcVI/Rinst3cc67a28eaa669/beachmat/include"

If you see an expected unqualified-id before 'using' error during compilation, this usually means that the version of GCC used to compile beachmat is out of date and does not fully support C++11.

# 4 Examples of C++ code

The API itself has comprehensive documentation so we will not go into detail there. Instead, we will focus on the most common usage patterns. Let’s say we wanted to compute column sums from a logical, integer or numeric matrix-like object:

#include "beachmat3/beachmat.h"
#include <vector>
#include <algorithm>

// Not necessary in a package context:
// [[Rcpp::depends(beachmat)]]

// [[Rcpp::export(rng=false)]]
Rcpp::NumericVector column_sums (Rcpp::RObject mat) {
std::vector<double> workspace(ptr->get_nrow());
Rcpp::NumericVector output(ptr->get_ncol());

for (size_t i = 0; i < ptr->get_ncol(); ++i) {
auto colptr = ptr->get_col(i, workspace.data());
output[i] = std::accumulate(colptr, colptr + ptr->get_nrow(), 0.0);
}

return output;
}

The read_lin_block() function sets up a pointer to the underlying matrix data that can be extracted with the get_col() method (or get_row(), if we were instead interested in the rows). This may involve memory allocations, so we provide the method with a pointer to “workspace” for it to put values in. The get_col() method then returns a pointer to a vector of data values for each requested column i, which we accumulate and store in the output to be returned to the R session. To demonstrate:

column_sums(matrix(rnorm(1000), ncol=10))
##  [1]  -5.851184   4.899147 -10.591398  -4.184498 -12.950439 -10.926647
##  [7]   1.377843  20.855535   4.857998   1.260113

This code works with either dense or sparse blocks generated by DelayedArray block processing. Additionally, it automatically supports all numeric-type data that might be in mat, i.e., logical, integer or double-precision values. All values are cast to the type of the workspace - in this case, a double-precision vector, but it is equally possible to cast to an integer array by creating an int workspace. This means that we can avoid writing multiple versions of code to handle different types of inputs.

# Integer
column_sums(matrix(rpois(1000, 10), ncol=10))
##  [1] 1054 1058  927 1035 1036 1019 1011 1037 1009 1024
# Logical
column_sums(matrix(rbinom(1000, 1, 0.5)==1, ncol=10))
##  [1] 51 52 58 47 50 37 55 49 43 53
# Sparse
column_sums(Matrix::rsparsematrix(100, 10, 0.1))
##  [1] -0.680  2.811 -1.987  2.730 -1.780  1.350  0.405  0.160  5.710  1.980

The pointer returned by get_col() may point to the workspace and is only valid up to the next use of the workspace. Users should only rely on the returned pointer prior to any subsequent call to get_col(). In addition, there is no guarantee that the workspace is actually populated with data values, as some matrix representations may not require a copy procedure to access the underlying data. If a copy into the workspace is required, the following pattern can be used:

#include "beachmat3/beachmat.h"
#include <vector>
#include <algorithm>

// Not necessary in a package context:
// [[Rcpp::depends(beachmat)]]

// [[Rcpp::export(rng=false)]]
Rcpp::NumericVector column_sums_copy (Rcpp::RObject mat) {
std::vector<double> workspace(ptr->get_nrow());
Rcpp::NumericVector output(ptr->get_ncol());

for (size_t i = 0; i < ptr->get_ncol(); ++i) {
auto colptr = ptr->get_col(i, workspace.data());

// Checking if colptr is already pointing to the workspace.
if (colptr != workspace.data()) {
// If not, forcibly copying contents into the workspace.
std::copy(colptr, colptr + ptr->get_nrow(), workspace.data());
}

output[i] = std::accumulate(workspace.begin(), workspace.end(), 0.0);
}

return output;
}

For some applications, it is possible to implement a more efficient algorithm for sparse data that only uses non-zero entries. We can use read_lin_sparse_block() to create a pointer to a sparse C++ matrix, which overloads the get_col() and get_row() methods to return a listing of the non-zero entries. This is demonstrated below to compute the column sums by only iterating over the non-zero entries. Note that we need two workspace vectors this time, one for the non-zero values and another for their indices. (Again, there is no guarantee that the workspaces are populated, so a manual copy should be performed if this is necessary.)

#include "beachmat3/beachmat.h"
#include <vector>
#include <algorithm>

// Not necessary in a package context:
// [[Rcpp::depends(beachmat)]]

// [[Rcpp::export(rng=false)]]
Rcpp::NumericVector column_sums_sparse (Rcpp::RObject mat) {
std::vector<double> workspace_x(ptr->get_nrow());
std::vector<int> workspace_i(ptr->get_nrow());
Rcpp::NumericVector output(ptr->get_ncol());

for (size_t i = 0; i < ptr->get_ncol(); ++i) {
auto indices = ptr->get_col(i, workspace_x.data(), workspace_i.data());

auto xptr = indices.x;
auto iptr = indices.i; // row indices, not used here.
auto nnzero = indices.n;
output[i] = std::accumulate(xptr, xptr + nnzero, 0.0);
}

return output;
}

To demonstrate:

column_sums_sparse(Matrix::rsparsematrix(100, 10, 0.1))
##  [1]  2.2700  1.8170  1.3370 -0.9800 -4.3174  0.7410 -0.9920 -0.7300  2.1030
## [10]  2.7143
# Errors out as an ordinary matrix isn't sparse:
try(column_sums_sparse(matrix(rpois(1000, 10), ncol=10)))
## Error in column_sums_sparse(matrix(rpois(1000, 10), ncol = 10)) :
##   object has no 'class' attribute

In practice, we would like our C++ function to be able to handle both sparse and dense inputs from block processing. We can do so by check if our matrix is_sparse(), and if it is, promoting it to a sparse instance as shown below. This allows developers to smoothly switch between algorithms according to the observed representation of the data.

#include "beachmat3/beachmat.h"
#include <vector>
#include <algorithm>

// Not necessary in a package context:
// [[Rcpp::depends(beachmat)]]

// [[Rcpp::export(rng=false)]]
Rcpp::NumericVector column_sums_flexible (Rcpp::RObject mat) {
Rcpp::NumericVector output(ptr->get_ncol());
std::vector<double> workspace(ptr->get_nrow());

if (ptr->is_sparse()) {
auto sptr = beachmat::promote_to_sparse(ptr); // note, ptr becomes NULL.
std::vector<int> workspace_i(sptr->get_nrow());

for (size_t i = 0; i < sptr->get_ncol(); ++i) {
auto indices = sptr->get_col(i, workspace.data(), workspace_i.data());
auto xptr = indices.x;
output[i] = std::accumulate(xptr, xptr + indices.n, 0.0);
}
} else {
for (size_t i = 0; i < ptr->get_ncol(); ++i) {
auto vec = ptr->get_col(i, workspace.data());
output[i] = std::accumulate(vec, vec + ptr->get_nrow(), 0.0);
}
}

return output;
}

To demonstrate:

column_sums_flexible(Matrix::rsparsematrix(100, 10, 0.1))
##  [1]  3.8360  1.2840  0.2050 -4.5160 -2.1653  5.9830 -1.7506  6.7850 -2.3300
## [10]  3.1190
column_sums_flexible(matrix(rpois(1000, 10), ncol=10))
##  [1] 1013 1032  988  990  978  965 1001  978 1024 1051

## 4.3 Creating matrix outputs

For most part, any output matrices should be generated using Rcpp::Matrix objects. If these outputs are anticipated to be large, we suggest using the DelayedArray RealizationSink infrastructure to store the results in, e.g., a file-backed matrix across block processing iterations.

On occassion, sparse outputs may need to be generated so beachmat provides some helpful utilities for doing so. The first and more general approach is relevant when the number of non-zero entries is not known in advance. This assumes that you can collect all non-zero entries into a std::map for storage:

#include "beachmat3/beachmat.h"
#include <map>

// Not necessary in a package context:
// [[Rcpp::depends(beachmat)]]

// [[Rcpp::export(rng=false)]]
Rcpp::RObject generate_sparse_general() {
std::map<std::pair<int, int>, double> storage;

storage[std::make_pair(1, 0)] = 0.1; // column 2, row 1
storage[std::make_pair(2, 5)] = 0.2; // column 3, row 6
storage[std::make_pair(0, 3)] = 0.3; // column 1, row 4
storage[std::make_pair(5, 4)] = 0.4; // column 6, row 5

// Create a dgCMatrix with 7 rows and 10 columns
return beachmat::as_gCMatrix<Rcpp::NumericVector>(7, 10, storage);
}

To demonstrate:

generate_sparse_general()
## 7 x 10 sparse Matrix of class "dgCMatrix"
##
## [1,] .   0.1 .   . . .   . . . .
## [2,] .   .   .   . . .   . . . .
## [3,] .   .   .   . . .   . . . .
## [4,] 0.3 .   .   . . .   . . . .
## [5,] .   .   .   . . 0.4 . . . .
## [6,] .   .   0.2 . . .   . . . .
## [7,] .   .   .   . . .   . . . .

Alternatively, a more specific approach is to replace the non-zero values of an existing *gCMatrix object with another vector corresponding to new values in the same order and position as the original values. This is more efficient for operations that do not change the positions of the non-zero entries.

#include "beachmat3/beachmat.h"
#include <map>

// Not necessary in a package context:
// [[Rcpp::depends(beachmat)]]

// [[Rcpp::export(rng=false)]]
Rcpp::RObject generate_sparse_specific(Rcpp::RObject mat) {
auto nnzero = ptr->get_nnzero();
Rcpp::LogicalVector thing(nnzero, 1);
return beachmat::as_gCMatrix<Rcpp::LogicalVector>(mat, thing);
}

To demonstrate:

library(Matrix)
mat <- rsparsematrix(10,10,0.1)
generate_sparse_specific(mat)
## 10 x 10 sparse Matrix of class "lgCMatrix"
##
##  [1,] . . . | . . . | . .
##  [2,] | . . . . . . . . .
##  [3,] . | . . . . . . . .
##  [4,] . . . . . . . | . .
##  [5,] . . | . . . . . . .
##  [6,] . . . . | . . . . .
##  [7,] . | . . . . . . . .
##  [8,] . . . . . | . . . .
##  [9,] . . . . . . . . . |
## [10,] . . . . . . . . . .

# 5 With block processing

Once we have written our desired function, we can perform DelayedArray block processing on any matrix-like input. This can be done directly using the blockApply() function, or we can use beachmat’s wrappers to conveniently split up the input matrix into blocks of consecutive rows or columns:

library(beachmat)
blockedColSums <- function(x, ...) {
out <- colBlockApply(x, FUN=column_sums, ...)
unlist(out) # combining results across blocks.
}

This function is now capable of operating on any matrix-like object that can be subjected to block processing. Roughly speaking, if an object has two dimensions and an as.array() method, it can be processed correctly.

# Ordinary matrices:
x1 <- matrix(rnorm(1000), ncol=20)
blockedColSums(x1)
##  [1]  -6.5780536   9.1084399 -19.1692959  -2.3556211  -0.8432766  -2.1082907
##  [7]   5.0061756 -13.8757183  -9.5111691   3.4053389  -3.5718720   4.7834215
## [13]   1.2795678   4.9452660   2.3864184   6.9285160  -6.1923277  -1.0888765
## [19] -12.8437134  -0.2787327
# Works for Matrix classes:
x2 <- Matrix(x1)
blockedColSums(x2)
##  [1]  -6.5780536   9.1084399 -19.1692959  -2.3556211  -0.8432766  -2.1082907
##  [7]   5.0061756 -13.8757183  -9.5111691   3.4053389  -3.5718720   4.7834215
## [13]   1.2795678   4.9452660   2.3864184   6.9285160  -6.1923277  -1.0888765
## [19] -12.8437134  -0.2787327
# Works for DelayedMatrix objects:
library(DelayedArray)
x3 <- DelayedArray(x1)
blockedColSums(x3)
##  [1]  -6.5780536   9.1084399 -19.1692959  -2.3556211  -0.8432766  -2.1082907
##  [7]   5.0061756 -13.8757183  -9.5111691   3.4053389  -3.5718720   4.7834215
## [13]   1.2795678   4.9452660   2.3864184   6.9285160  -6.1923277  -1.0888765
## [19] -12.8437134  -0.2787327

As hinted above, this process can be easily parallelized by simply passing BPPARAM= to the colBlockApply() function. Note that this only has a benefit for large matrices where the benefits of parallelization offset the associated overhead.

library(BiocParallel)
blockedColSums(x3, BPPARAM=MulticoreParam(2))
##  [1]  -6.5780536   9.1084399 -19.1692959  -2.3556211  -0.8432766  -2.1082907
##  [7]   5.0061756 -13.8757183  -9.5111691   3.4053389  -3.5718720   4.7834215
## [13]   1.2795678   4.9452660   2.3864184   6.9285160  -6.1923277  -1.0888765
## [19] -12.8437134  -0.2787327

# Session information

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=en_GB              LC_COLLATE=C
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods
## [8] base
##
## other attached packages:
##  [1] BiocParallel_1.32.0   DelayedArray_0.24.0   IRanges_2.32.0
##  [4] S4Vectors_0.36.0      MatrixGenerics_1.10.0 matrixStats_0.62.0
##  [7] BiocGenerics_0.44.0   beachmat_2.14.0       Matrix_1.5-1
## [10] knitr_1.40            BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9          magrittr_2.0.3      lattice_0.20-45
##  [4] R6_2.5.1            rlang_1.0.6         fastmap_1.1.0
##  [7] stringr_1.4.1       tools_4.2.1         parallel_4.2.1
## [10] grid_4.2.1          xfun_0.34           cli_3.4.1
## [13] jquerylib_0.1.4     htmltools_0.5.3     yaml_2.3.6
## [16] digest_0.6.30       bookdown_0.29       BiocManager_1.30.19
## [19] codetools_0.2-18    sass_0.4.2          cachem_1.0.6
## [22] evaluate_0.17       rmarkdown_2.17      stringi_1.7.8
## [25] compiler_4.2.1      bslib_0.4.0         jsonlite_1.8.3