TileDBArray 1.13.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.16138579 0.73535063 -1.56671485 . 0.07297654 0.16459167
## [2,] -1.43299328 1.54505765 -0.95774852 . -1.03386116 -0.30973765
## [3,] -0.22796578 2.05053711 -0.69267138 . 1.21571958 0.15884688
## [4,] 0.05231101 -0.73872690 0.57378852 . -0.15877227 -0.48881201
## [5,] 0.13528472 1.64870663 -0.75940579 . 0.71497229 -1.56049085
## ... . . . . . .
## [96,] -0.003737717 0.267212181 0.366525223 . 0.0735055 1.4154775
## [97,] 1.390087411 -1.497816523 0.179906504 . 0.6000708 -0.0897562
## [98,] -1.717632257 -0.407691622 -0.770516020 . -0.5814936 0.2016598
## [99,] 0.513288054 0.595824938 -1.720695794 . -0.2848470 0.8917934
## [100,] -0.003395945 -0.314557090 -1.199863027 . 0.5148298 0.5431169
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.16138579 0.73535063 -1.56671485 . 0.07297654 0.16459167
## [2,] -1.43299328 1.54505765 -0.95774852 . -1.03386116 -0.30973765
## [3,] -0.22796578 2.05053711 -0.69267138 . 1.21571958 0.15884688
## [4,] 0.05231101 -0.73872690 0.57378852 . -0.15877227 -0.48881201
## [5,] 0.13528472 1.64870663 -0.75940579 . 0.71497229 -1.56049085
## ... . . . . . .
## [96,] -0.003737717 0.267212181 0.366525223 . 0.0735055 1.4154775
## [97,] 1.390087411 -1.497816523 0.179906504 . 0.6000708 -0.0897562
## [98,] -1.717632257 -0.407691622 -0.770516020 . -0.5814936 0.2016598
## [99,] 0.513288054 0.595824938 -1.720695794 . -0.2848470 0.8917934
## [100,] -0.003395945 -0.314557090 -1.199863027 . 0.5148298 0.5431169
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.16138579 0.73535063 -1.56671485 . 0.07297654 0.16459167
## GENE_2 -1.43299328 1.54505765 -0.95774852 . -1.03386116 -0.30973765
## GENE_3 -0.22796578 2.05053711 -0.69267138 . 1.21571958 0.15884688
## GENE_4 0.05231101 -0.73872690 0.57378852 . -0.15877227 -0.48881201
## GENE_5 0.13528472 1.64870663 -0.75940579 . 0.71497229 -1.56049085
## ... . . . . . .
## GENE_96 -0.003737717 0.267212181 0.366525223 . 0.0735055 1.4154775
## GENE_97 1.390087411 -1.497816523 0.179906504 . 0.6000708 -0.0897562
## GENE_98 -1.717632257 -0.407691622 -0.770516020 . -0.5814936 0.2016598
## GENE_99 0.513288054 0.595824938 -1.720695794 . -0.2848470 0.8917934
## GENE_100 -0.003395945 -0.314557090 -1.199863027 . 0.5148298 0.5431169
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.16138579 -1.43299328 -0.22796578 0.05231101 0.13528472 0.17912850
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.16138579 0.73535063 -1.56671485 0.18483908 1.66873231
## GENE_2 -1.43299328 1.54505765 -0.95774852 -0.36913627 1.44384418
## GENE_3 -0.22796578 2.05053711 -0.69267138 0.25830937 0.58384179
## GENE_4 0.05231101 -0.73872690 0.57378852 -1.40822917 0.00734946
## GENE_5 0.13528472 1.64870663 -0.75940579 -0.38118168 -0.11306151
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.3227716 1.4707013 -3.1334297 . 0.1459531 0.3291833
## GENE_2 -2.8659866 3.0901153 -1.9154970 . -2.0677223 -0.6194753
## GENE_3 -0.4559316 4.1010742 -1.3853428 . 2.4314392 0.3176938
## GENE_4 0.1046220 -1.4774538 1.1475770 . -0.3175445 -0.9776240
## GENE_5 0.2705694 3.2974133 -1.5188116 . 1.4299446 -3.1209817
## ... . . . . . .
## GENE_96 -0.007475434 0.534424362 0.733050446 . 0.1470110 2.8309550
## GENE_97 2.780174821 -2.995633045 0.359813009 . 1.2001417 -0.1795124
## GENE_98 -3.435264515 -0.815383244 -1.541032041 . -1.1629871 0.4033195
## GENE_99 1.026576108 1.191649875 -3.441391588 . -0.5696941 1.7835867
## GENE_100 -0.006791889 -0.629114180 -2.399726053 . 1.0296596 1.0862338
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## -27.7598774 10.1766160 18.4868408 -0.4737127 2.4269994 -9.6974348
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 17.6370126 0.9282304 12.4824521 4.2584942
out %*% runif(ncol(out))
## [,1]
## GENE_1 1.32902293
## GENE_2 -0.43274965
## GENE_3 1.18504537
## GENE_4 -1.69086479
## GENE_5 -0.50186354
## GENE_6 -0.72782409
## GENE_7 -0.28639705
## GENE_8 -0.05021904
## GENE_9 -3.26371928
## GENE_10 -0.85448161
## GENE_11 0.26964190
## GENE_12 -0.17974907
## GENE_13 -1.06675827
## GENE_14 -2.29766279
## GENE_15 -0.10932051
## GENE_16 -2.56480656
## GENE_17 2.26096252
## GENE_18 2.57338177
## GENE_19 -0.52391103
## GENE_20 -2.27231656
## GENE_21 1.36862836
## GENE_22 1.15329596
## GENE_23 -0.33150281
## GENE_24 -1.47891203
## GENE_25 -1.98194657
## GENE_26 -0.71236689
## GENE_27 -1.91283227
## GENE_28 -2.39979005
## GENE_29 3.02976327
## GENE_30 0.85313002
## GENE_31 0.47401029
## GENE_32 1.41613114
## GENE_33 0.39005705
## GENE_34 -0.35867231
## GENE_35 3.14385688
## GENE_36 -0.08147615
## GENE_37 0.39410788
## GENE_38 -3.86756299
## GENE_39 0.78827255
## GENE_40 2.14037522
## GENE_41 1.42317632
## GENE_42 4.82753104
## GENE_43 -1.26334481
## GENE_44 1.25916568
## GENE_45 0.96152229
## GENE_46 -0.43495949
## GENE_47 0.07344258
## GENE_48 -1.58159767
## GENE_49 0.19477691
## GENE_50 -0.57995919
## GENE_51 0.08935004
## GENE_52 -1.20510264
## GENE_53 -0.55921193
## GENE_54 0.92638630
## GENE_55 -0.91867767
## GENE_56 4.79103934
## GENE_57 1.88741396
## GENE_58 0.87529816
## GENE_59 -1.38996856
## GENE_60 -1.38891997
## GENE_61 -2.22154965
## GENE_62 -1.25282630
## GENE_63 0.83327202
## GENE_64 -2.16266047
## GENE_65 0.38971634
## GENE_66 -1.37817445
## GENE_67 0.25597664
## GENE_68 3.35490099
## GENE_69 2.47703971
## GENE_70 -2.96905605
## GENE_71 -0.12538418
## GENE_72 0.86757903
## GENE_73 -1.21959611
## GENE_74 0.49365377
## GENE_75 3.75164196
## GENE_76 -0.27553702
## GENE_77 -1.41567861
## GENE_78 1.77567169
## GENE_79 -1.84119166
## GENE_80 -4.67670086
## GENE_81 0.62369153
## GENE_82 -1.85390353
## GENE_83 0.46022562
## GENE_84 -0.94218392
## GENE_85 3.35192798
## GENE_86 1.34744559
## GENE_87 0.07669902
## GENE_88 -0.39902437
## GENE_89 -2.87972051
## GENE_90 -1.10345872
## GENE_91 1.13353318
## GENE_92 -0.12902672
## GENE_93 0.15728673
## GENE_94 -3.95277219
## GENE_95 3.20294774
## GENE_96 0.50661230
## GENE_97 0.17058335
## GENE_98 0.50882398
## GENE_99 -0.26071711
## GENE_100 2.97515791
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.24866511 0.41841757 0.82842334 . -0.8461549 0.4567598
## [2,] 0.33494916 -1.65041652 -0.29758065 . 0.6514514 0.4523713
## [3,] -0.78807346 -0.38361961 -1.10291978 . -0.5163232 -1.6054249
## [4,] -0.22978706 0.79087254 -0.07397855 . -0.5993269 0.8444350
## [5,] 2.08638043 0.33242631 0.26686627 . 0.1781162 -0.5149606
## ... . . . . . .
## [96,] 0.7165689 -0.4778324 -0.6244231 . -0.96266945 -0.93480423
## [97,] -0.1909790 1.7964320 -1.0432095 . 1.42113264 0.84062261
## [98,] -1.1854852 0.8445156 -2.6702843 . 1.30282586 1.40231842
## [99,] -2.4199595 0.3432062 0.3477054 . 0.36821801 0.05460774
## [100,] -0.3204397 -0.4818008 -0.4393495 . -0.51812695 -0.77337983
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.24866511 0.41841757 0.82842334 . -0.8461549 0.4567598
## [2,] 0.33494916 -1.65041652 -0.29758065 . 0.6514514 0.4523713
## [3,] -0.78807346 -0.38361961 -1.10291978 . -0.5163232 -1.6054249
## [4,] -0.22978706 0.79087254 -0.07397855 . -0.5993269 0.8444350
## [5,] 2.08638043 0.33242631 0.26686627 . 0.1781162 -0.5149606
## ... . . . . . .
## [96,] 0.7165689 -0.4778324 -0.6244231 . -0.96266945 -0.93480423
## [97,] -0.1909790 1.7964320 -1.0432095 . 1.42113264 0.84062261
## [98,] -1.1854852 0.8445156 -2.6702843 . 1.30282586 1.40231842
## [99,] -2.4199595 0.3432062 0.3477054 . 0.36821801 0.05460774
## [100,] -0.3204397 -0.4818008 -0.4393495 . -0.51812695 -0.77337983
sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.16 TileDBArray_1.13.0 DelayedArray_0.29.9
## [4] SparseArray_1.3.5 S4Arrays_1.3.7 abind_1.4-5
## [7] IRanges_2.37.1 S4Vectors_0.41.6 MatrixGenerics_1.15.1
## [10] matrixStats_1.3.0 BiocGenerics_0.49.1 Matrix_1.7-0
## [13] BiocStyle_2.31.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.0
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.12
## [7] nanoarrow_0.4.0.1 jquerylib_0.1.4 yaml_2.3.8
## [10] fastmap_1.1.1 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.43.1 tiledb_0.26.0
## [16] knitr_1.46 bookdown_0.39 bslib_0.7.0
## [19] rlang_1.1.3 cachem_1.0.8 xfun_0.43
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.2
## [25] zlibbioc_1.49.3 spdl_0.0.5 digest_0.6.35
## [28] grid_4.4.0 lifecycle_1.0.4 data.table_1.15.4
## [31] evaluate_0.23 nanotime_0.3.7 zoo_1.8-12
## [34] rmarkdown_2.26 tools_4.4.0 htmltools_0.5.8.1