TileDBArray 1.11.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.69454490 -0.10002626 -1.37240701 . 0.7270220 0.9228882
## [2,] -1.88621640 1.96016132 -0.03642798 . -0.2716645 0.7715520
## [3,] -0.96762054 -1.11128276 1.56393873 . 0.2872475 -1.8606821
## [4,] 0.99515444 0.66492471 0.28452587 . 1.9616358 0.1091722
## [5,] 0.35176841 -0.75592471 0.48178829 . -0.7490048 1.4103686
## ... . . . . . .
## [96,] 1.4294006 0.8920907 -1.3702401 . 0.99957884 -1.02461322
## [97,] -0.6669383 0.9723704 0.3266392 . 1.65098586 1.35573153
## [98,] -0.8921858 -1.4538322 -1.1199837 . 0.01947456 1.18583296
## [99,] -1.9614874 0.3471487 -0.5227987 . -1.31344856 -0.79464096
## [100,] -1.2105438 2.6367061 1.0056176 . -0.33279424 0.70934991
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.69454490 -0.10002626 -1.37240701 . 0.7270220 0.9228882
## [2,] -1.88621640 1.96016132 -0.03642798 . -0.2716645 0.7715520
## [3,] -0.96762054 -1.11128276 1.56393873 . 0.2872475 -1.8606821
## [4,] 0.99515444 0.66492471 0.28452587 . 1.9616358 0.1091722
## [5,] 0.35176841 -0.75592471 0.48178829 . -0.7490048 1.4103686
## ... . . . . . .
## [96,] 1.4294006 0.8920907 -1.3702401 . 0.99957884 -1.02461322
## [97,] -0.6669383 0.9723704 0.3266392 . 1.65098586 1.35573153
## [98,] -0.8921858 -1.4538322 -1.1199837 . 0.01947456 1.18583296
## [99,] -1.9614874 0.3471487 -0.5227987 . -1.31344856 -0.79464096
## [100,] -1.2105438 2.6367061 1.0056176 . -0.33279424 0.70934991
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . -0.54 0.00
## [997,] 0 0 0 . 0.00 0.00
## [998,] 0 0 0 . 0.00 0.00
## [999,] 0 0 0 . 0.00 0.00
## [1000,] 0 0 0 . 0.00 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.69454490 -0.10002626 -1.37240701 . 0.7270220 0.9228882
## GENE_2 -1.88621640 1.96016132 -0.03642798 . -0.2716645 0.7715520
## GENE_3 -0.96762054 -1.11128276 1.56393873 . 0.2872475 -1.8606821
## GENE_4 0.99515444 0.66492471 0.28452587 . 1.9616358 0.1091722
## GENE_5 0.35176841 -0.75592471 0.48178829 . -0.7490048 1.4103686
## ... . . . . . .
## GENE_96 1.4294006 0.8920907 -1.3702401 . 0.99957884 -1.02461322
## GENE_97 -0.6669383 0.9723704 0.3266392 . 1.65098586 1.35573153
## GENE_98 -0.8921858 -1.4538322 -1.1199837 . 0.01947456 1.18583296
## GENE_99 -1.9614874 0.3471487 -0.5227987 . -1.31344856 -0.79464096
## GENE_100 -1.2105438 2.6367061 1.0056176 . -0.33279424 0.70934991
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 1.6945449 -1.8862164 -0.9676205 0.9951544 0.3517684 -0.4282858
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 1.69454490 -0.10002626 -1.37240701 0.31064798 -1.13092695
## GENE_2 -1.88621640 1.96016132 -0.03642798 0.37990978 -1.17306725
## GENE_3 -0.96762054 -1.11128276 1.56393873 -0.08598871 -0.96044356
## GENE_4 0.99515444 0.66492471 0.28452587 -0.48656632 1.00728528
## GENE_5 0.35176841 -0.75592471 0.48178829 2.24038705 -1.91256078
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 3.38908980 -0.20005253 -2.74481402 . 1.4540441 1.8457765
## GENE_2 -3.77243281 3.92032265 -0.07285596 . -0.5433291 1.5431041
## GENE_3 -1.93524108 -2.22256553 3.12787746 . 0.5744950 -3.7213642
## GENE_4 1.99030888 1.32984942 0.56905175 . 3.9232717 0.2183444
## GENE_5 0.70353683 -1.51184943 0.96357658 . -1.4980097 2.8207373
## ... . . . . . .
## GENE_96 2.8588011 1.7841814 -2.7404801 . 1.99915769 -2.04922644
## GENE_97 -1.3338767 1.9447407 0.6532784 . 3.30197171 2.71146306
## GENE_98 -1.7843717 -2.9076645 -2.2399674 . 0.03894912 2.37166593
## GENE_99 -3.9229749 0.6942974 -1.0455975 . -2.62689713 -1.58928192
## GENE_100 -2.4210875 5.2734122 2.0112351 . -0.66558848 1.41869981
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7 SAMP_8
## 8.780684 13.210949 9.459717 2.951031 4.640307 -6.182279 -3.791079 -2.192534
## SAMP_9 SAMP_10
## -3.864432 -6.198000
out %*% runif(ncol(out))
## <100 x 1> DelayedMatrix object of type "double":
## y
## GENE_1 -0.0809691
## GENE_2 1.0421736
## GENE_3 -3.3744401
## GENE_4 3.5444434
## GENE_5 -1.1479677
## ... .
## GENE_96 -0.3014102
## GENE_97 1.2132590
## GENE_98 -2.1219790
## GENE_99 -2.4853032
## GENE_100 1.3004007
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.31185712 -0.20337135 0.27187507 . -0.3594832 1.5506458
## [2,] -1.29492248 1.20067911 0.86315198 . -0.3087876 -0.8272102
## [3,] -0.10368195 -1.96092948 0.84419429 . -2.2341343 0.9011641
## [4,] -1.66964318 0.09587458 -1.50802406 . -0.3148474 -0.9014763
## [5,] 1.10180233 0.63255238 0.10491275 . 1.1704518 -1.6711524
## ... . . . . . .
## [96,] 1.07487683 -1.25251703 -1.22764811 . -0.4981307 0.5170297
## [97,] 0.04763645 -0.01985419 -0.41234988 . -0.3500659 -1.8083055
## [98,] -0.71303995 -1.34131340 -0.65865032 . -0.8773790 -0.1531136
## [99,] -0.13880932 -1.28663018 0.85522003 . -0.0513340 -0.1042523
## [100,] 0.17692575 1.05671408 -0.42664150 . 0.1265957 -0.6220747
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.31185712 -0.20337135 0.27187507 . -0.3594832 1.5506458
## [2,] -1.29492248 1.20067911 0.86315198 . -0.3087876 -0.8272102
## [3,] -0.10368195 -1.96092948 0.84419429 . -2.2341343 0.9011641
## [4,] -1.66964318 0.09587458 -1.50802406 . -0.3148474 -0.9014763
## [5,] 1.10180233 0.63255238 0.10491275 . 1.1704518 -1.6711524
## ... . . . . . .
## [96,] 1.07487683 -1.25251703 -1.22764811 . -0.4981307 0.5170297
## [97,] 0.04763645 -0.01985419 -0.41234988 . -0.3500659 -1.8083055
## [98,] -0.71303995 -1.34131340 -0.65865032 . -0.8773790 -0.1531136
## [99,] -0.13880932 -1.28663018 0.85522003 . -0.0513340 -0.1042523
## [100,] 0.17692575 1.05671408 -0.42664150 . 0.1265957 -0.6220747
sessionInfo()
## R version 4.3.0 RC (2023-04-18 r84287)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.12 TileDBArray_1.11.1 DelayedArray_0.27.2
## [4] SparseArray_1.1.2 S4Arrays_1.1.2 IRanges_2.35.1
## [7] S4Vectors_0.39.1 MatrixGenerics_1.13.0 matrixStats_0.63.0
## [10] BiocGenerics_0.47.0 Matrix_1.5-4 BiocStyle_2.29.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.4 compiler_4.3.0
## [4] BiocManager_1.30.20 crayon_1.5.2 Rcpp_1.0.10
## [7] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [10] lattice_0.21-8 R6_2.5.1 RcppCCTZ_0.2.12
## [13] XVector_0.41.1 tiledb_0.19.1 knitr_1.42
## [16] bookdown_0.33 bslib_0.4.2 rlang_1.1.1
## [19] cachem_1.0.8 xfun_0.39 sass_0.4.6
## [22] bit64_4.0.5 cli_3.6.1 zlibbioc_1.47.0
## [25] spdl_0.0.4 digest_0.6.31 grid_4.3.0
## [28] data.table_1.14.8 evaluate_0.20 nanotime_0.3.7
## [31] zoo_1.8-12 rmarkdown_2.21 tools_4.3.0
## [34] htmltools_0.5.5