TileDBArray 1.19.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.25339976 1.41421567 1.13591928 . 1.7966069 0.8354164
## [2,] -0.60975523 0.06414092 -0.12917361 . 0.7683345 -0.1355181
## [3,] -0.22190516 -0.21550242 0.55482043 . 1.9838756 -0.5660243
## [4,] -2.14204403 1.06478668 2.47137134 . -1.0994251 -0.7295623
## [5,] -1.26636443 -0.18071192 1.38419742 . 1.2144823 0.4416920
## ... . . . . . .
## [96,] 0.4915760 0.7366418 0.2185287 . -0.676675329 1.205564932
## [97,] -1.5004964 -0.2990177 0.4880321 . -0.372293710 -1.260806741
## [98,] 0.6495552 -0.8429073 -0.6093668 . -1.936576313 1.929189515
## [99,] 1.7659781 1.0952209 -0.6774325 . 0.182731796 0.040204306
## [100,] 0.1456381 3.0434088 1.0125820 . 0.053756347 0.001871278
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.25339976 1.41421567 1.13591928 . 1.7966069 0.8354164
## [2,] -0.60975523 0.06414092 -0.12917361 . 0.7683345 -0.1355181
## [3,] -0.22190516 -0.21550242 0.55482043 . 1.9838756 -0.5660243
## [4,] -2.14204403 1.06478668 2.47137134 . -1.0994251 -0.7295623
## [5,] -1.26636443 -0.18071192 1.38419742 . 1.2144823 0.4416920
## ... . . . . . .
## [96,] 0.4915760 0.7366418 0.2185287 . -0.676675329 1.205564932
## [97,] -1.5004964 -0.2990177 0.4880321 . -0.372293710 -1.260806741
## [98,] 0.6495552 -0.8429073 -0.6093668 . -1.936576313 1.929189515
## [99,] 1.7659781 1.0952209 -0.6774325 . 0.182731796 0.040204306
## [100,] 0.1456381 3.0434088 1.0125820 . 0.053756347 0.001871278
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.25339976 1.41421567 1.13591928 . 1.7966069 0.8354164
## GENE_2 -0.60975523 0.06414092 -0.12917361 . 0.7683345 -0.1355181
## GENE_3 -0.22190516 -0.21550242 0.55482043 . 1.9838756 -0.5660243
## GENE_4 -2.14204403 1.06478668 2.47137134 . -1.0994251 -0.7295623
## GENE_5 -1.26636443 -0.18071192 1.38419742 . 1.2144823 0.4416920
## ... . . . . . .
## GENE_96 0.4915760 0.7366418 0.2185287 . -0.676675329 1.205564932
## GENE_97 -1.5004964 -0.2990177 0.4880321 . -0.372293710 -1.260806741
## GENE_98 0.6495552 -0.8429073 -0.6093668 . -1.936576313 1.929189515
## GENE_99 1.7659781 1.0952209 -0.6774325 . 0.182731796 0.040204306
## GENE_100 0.1456381 3.0434088 1.0125820 . 0.053756347 0.001871278
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 1.2533998 -0.6097552 -0.2219052 -2.1420440 -1.2663644 -1.2398793
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 1.25339976 1.41421567 1.13591928 0.32118932 -0.54636165
## GENE_2 -0.60975523 0.06414092 -0.12917361 0.51758528 1.64882685
## GENE_3 -0.22190516 -0.21550242 0.55482043 -0.46983574 -0.20073435
## GENE_4 -2.14204403 1.06478668 2.47137134 0.38826701 -1.96783271
## GENE_5 -1.26636443 -0.18071192 1.38419742 0.32593917 0.79832297
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 2.5067995 2.8284313 2.2718386 . 3.5932138 1.6708328
## GENE_2 -1.2195105 0.1282818 -0.2583472 . 1.5366690 -0.2710363
## GENE_3 -0.4438103 -0.4310048 1.1096409 . 3.9677512 -1.1320486
## GENE_4 -4.2840881 2.1295734 4.9427427 . -2.1988503 -1.4591247
## GENE_5 -2.5327289 -0.3614238 2.7683948 . 2.4289645 0.8833841
## ... . . . . . .
## GENE_96 0.9831519 1.4732836 0.4370574 . -1.353350658 2.411129864
## GENE_97 -3.0009929 -0.5980353 0.9760642 . -0.744587419 -2.521613481
## GENE_98 1.2991105 -1.6858147 -1.2187336 . -3.873152627 3.858379030
## GENE_99 3.5319563 2.1904419 -1.3548649 . 0.365463591 0.080408612
## GENE_100 0.2912762 6.0868176 2.0251641 . 0.107512694 0.003742557
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -17.943049 9.547023 8.525440 15.474355 -5.732668 3.039406 11.411236
## SAMP_8 SAMP_9 SAMP_10
## -3.900692 -22.132076 14.785434
out %*% runif(ncol(out))
## [,1]
## GENE_1 4.18816376
## GENE_2 1.64364448
## GENE_3 0.46414047
## GENE_4 -1.65621199
## GENE_5 1.70104985
## GENE_6 1.06077105
## GENE_7 2.30640259
## GENE_8 0.15770899
## GENE_9 -5.21994289
## GENE_10 1.61926521
## GENE_11 -0.14340424
## GENE_12 -3.19962592
## GENE_13 1.72887419
## GENE_14 -1.26417607
## GENE_15 -1.88707772
## GENE_16 1.11578046
## GENE_17 1.59052548
## GENE_18 -4.75012610
## GENE_19 -2.53731911
## GENE_20 -0.85965345
## GENE_21 2.31529861
## GENE_22 -0.38348235
## GENE_23 -1.35130975
## GENE_24 -3.06638898
## GENE_25 1.85520286
## GENE_26 -3.47581090
## GENE_27 0.53540637
## GENE_28 -1.05000751
## GENE_29 -4.62676393
## GENE_30 2.57000440
## GENE_31 1.80958360
## GENE_32 -0.97030569
## GENE_33 0.56293672
## GENE_34 -0.78933969
## GENE_35 0.76647154
## GENE_36 0.99747000
## GENE_37 2.98673815
## GENE_38 -1.86631958
## GENE_39 1.83439863
## GENE_40 -0.29665316
## GENE_41 -0.37067542
## GENE_42 -1.45925401
## GENE_43 1.64288065
## GENE_44 -1.04890135
## GENE_45 -0.31571667
## GENE_46 -3.46554834
## GENE_47 1.51333533
## GENE_48 -0.85759657
## GENE_49 2.07499127
## GENE_50 1.62441527
## GENE_51 0.95620459
## GENE_52 0.79379841
## GENE_53 0.65428871
## GENE_54 0.54119884
## GENE_55 -1.48354969
## GENE_56 0.53611730
## GENE_57 0.45854645
## GENE_58 -0.22135864
## GENE_59 0.77469764
## GENE_60 1.47992800
## GENE_61 2.40983004
## GENE_62 0.45154622
## GENE_63 -2.88834992
## GENE_64 -0.08064315
## GENE_65 -1.66835929
## GENE_66 1.11014945
## GENE_67 -1.51391788
## GENE_68 -3.21152412
## GENE_69 3.19273407
## GENE_70 0.98459116
## GENE_71 -1.53898622
## GENE_72 1.31456147
## GENE_73 -3.07230940
## GENE_74 -1.79532060
## GENE_75 -0.07787543
## GENE_76 -1.74281814
## GENE_77 0.06000242
## GENE_78 -2.69294324
## GENE_79 -1.83962209
## GENE_80 -0.28993699
## GENE_81 3.83591765
## GENE_82 2.61769197
## GENE_83 4.51927236
## GENE_84 2.27728818
## GENE_85 -1.99377093
## GENE_86 1.17280940
## GENE_87 -0.44644963
## GENE_88 -0.24655999
## GENE_89 -2.95835951
## GENE_90 -0.08562518
## GENE_91 -2.38435165
## GENE_92 -0.46722383
## GENE_93 -1.32673619
## GENE_94 0.03731062
## GENE_95 -2.41160636
## GENE_96 1.17918432
## GENE_97 -1.44110836
## GENE_98 -2.51003088
## GENE_99 -0.20479434
## GENE_100 7.35486864
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.16367405 0.92967338 0.41583186 . 0.1077510 0.7519651
## [2,] -0.29490138 -1.42626793 -0.09244803 . 0.3441327 -1.0863391
## [3,] 0.08956204 0.05091825 0.88388580 . -0.1825059 -1.9084494
## [4,] 1.60222925 0.36702945 1.32653254 . 0.8883559 0.4822085
## [5,] -1.73190199 -0.74292527 -0.04855275 . -0.9504211 -0.3326300
## ... . . . . . .
## [96,] -0.86242024 0.43644585 0.16663201 . 0.8740486 -0.4571840
## [97,] 0.04811422 -0.02144567 -1.12492935 . 0.8207688 0.8820084
## [98,] -1.00831443 0.25383687 -0.60003458 . -0.5069711 -0.4464017
## [99,] 0.59835813 0.95393656 0.93272415 . 1.3094011 -0.6807035
## [100,] 0.01468769 -0.23096678 0.55848276 . 0.9805230 -0.4036048
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.16367405 0.92967338 0.41583186 . 0.1077510 0.7519651
## [2,] -0.29490138 -1.42626793 -0.09244803 . 0.3441327 -1.0863391
## [3,] 0.08956204 0.05091825 0.88388580 . -0.1825059 -1.9084494
## [4,] 1.60222925 0.36702945 1.32653254 . 0.8883559 0.4822085
## [5,] -1.73190199 -0.74292527 -0.04855275 . -0.9504211 -0.3326300
## ... . . . . . .
## [96,] -0.86242024 0.43644585 0.16663201 . 0.8740486 -0.4571840
## [97,] 0.04811422 -0.02144567 -1.12492935 . 0.8207688 0.8820084
## [98,] -1.00831443 0.25383687 -0.60003458 . -0.5069711 -0.4464017
## [99,] 0.59835813 0.95393656 0.93272415 . 1.3094011 -0.6807035
## [100,] 0.01468769 -0.23096678 0.55848276 . 0.9805230 -0.4036048
sessionInfo()
## R version 4.5.0 (2025-04-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.22 TileDBArray_1.19.0 DelayedArray_0.35.1
## [4] SparseArray_1.9.0 S4Arrays_1.9.1 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.4
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0-1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.32.0
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.5
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.4 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1