TileDBArray 1.15.4
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.96666673 -0.54877668 -0.08284134 . -0.02834821 -0.64422645
## [2,] 0.19409305 -1.22377625 -1.23207261 . -0.28508281 2.33244992
## [3,] 2.14311684 -1.27063252 -0.94163773 . -0.71832135 2.56464077
## [4,] 1.68159419 -0.95322867 -1.15270457 . 0.04166754 -0.12404440
## [5,] 0.62137053 0.86701054 0.13345630 . 0.50351244 -0.13854582
## ... . . . . . .
## [96,] -0.5429649 0.9217662 0.4253939 . 0.5074430 -0.3137212
## [97,] -0.0427965 1.8016802 1.4509917 . 2.4310328 -0.6948934
## [98,] -0.4781768 1.4512218 -2.7579406 . -1.5248245 -1.2731146
## [99,] 2.1298145 0.4291107 0.2102766 . 0.4374052 1.2529710
## [100,] 0.6210344 0.2777834 -0.2411009 . -0.4604146 -1.6771151
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.96666673 -0.54877668 -0.08284134 . -0.02834821 -0.64422645
## [2,] 0.19409305 -1.22377625 -1.23207261 . -0.28508281 2.33244992
## [3,] 2.14311684 -1.27063252 -0.94163773 . -0.71832135 2.56464077
## [4,] 1.68159419 -0.95322867 -1.15270457 . 0.04166754 -0.12404440
## [5,] 0.62137053 0.86701054 0.13345630 . 0.50351244 -0.13854582
## ... . . . . . .
## [96,] -0.5429649 0.9217662 0.4253939 . 0.5074430 -0.3137212
## [97,] -0.0427965 1.8016802 1.4509917 . 2.4310328 -0.6948934
## [98,] -0.4781768 1.4512218 -2.7579406 . -1.5248245 -1.2731146
## [99,] 2.1298145 0.4291107 0.2102766 . 0.4374052 1.2529710
## [100,] 0.6210344 0.2777834 -0.2411009 . -0.4604146 -1.6771151
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.00 0.00 0.00 . 0 0
## [2,] 0.00 0.00 0.00 . 0 0
## [3,] 0.00 0.00 0.00 . 0 0
## [4,] 0.00 0.00 0.52 . 0 0
## [5,] 0.00 0.00 0.00 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE TRUE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.96666673 -0.54877668 -0.08284134 . -0.02834821 -0.64422645
## GENE_2 0.19409305 -1.22377625 -1.23207261 . -0.28508281 2.33244992
## GENE_3 2.14311684 -1.27063252 -0.94163773 . -0.71832135 2.56464077
## GENE_4 1.68159419 -0.95322867 -1.15270457 . 0.04166754 -0.12404440
## GENE_5 0.62137053 0.86701054 0.13345630 . 0.50351244 -0.13854582
## ... . . . . . .
## GENE_96 -0.5429649 0.9217662 0.4253939 . 0.5074430 -0.3137212
## GENE_97 -0.0427965 1.8016802 1.4509917 . 2.4310328 -0.6948934
## GENE_98 -0.4781768 1.4512218 -2.7579406 . -1.5248245 -1.2731146
## GENE_99 2.1298145 0.4291107 0.2102766 . 0.4374052 1.2529710
## GENE_100 0.6210344 0.2777834 -0.2411009 . -0.4604146 -1.6771151
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.9666667 0.1940930 2.1431168 1.6815942 0.6213705 -1.3687450
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.96666673 -0.54877668 -0.08284134 -1.27511315 1.86456265
## GENE_2 0.19409305 -1.22377625 -1.23207261 0.15692254 0.61299717
## GENE_3 2.14311684 -1.27063252 -0.94163773 0.65351482 -1.99175097
## GENE_4 1.68159419 -0.95322867 -1.15270457 2.73258265 0.48089175
## GENE_5 0.62137053 0.86701054 0.13345630 1.73823156 1.06548930
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.9333335 -1.0975534 -0.1656827 . -0.05669641 -1.28845290
## GENE_2 0.3881861 -2.4475525 -2.4641452 . -0.57016563 4.66489983
## GENE_3 4.2862337 -2.5412650 -1.8832755 . -1.43664270 5.12928155
## GENE_4 3.3631884 -1.9064573 -2.3054091 . 0.08333508 -0.24808880
## GENE_5 1.2427411 1.7340211 0.2669126 . 1.00702489 -0.27709164
## ... . . . . . .
## GENE_96 -1.08592976 1.84353233 0.85078778 . 1.0148861 -0.6274423
## GENE_97 -0.08559299 3.60336035 2.90198344 . 4.8620656 -1.3897868
## GENE_98 -0.95635361 2.90244352 -5.51588128 . -3.0496490 -2.5462292
## GENE_99 4.25962891 0.85822140 0.42055328 . 0.8748105 2.5059420
## GENE_100 1.24206874 0.55556679 -0.48220183 . -0.9208293 -3.3542301
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 9.1106358 -0.8734885 -17.0641021 2.1422144 -4.2057107 -9.7520612
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -12.4443052 12.1603927 18.9378178 -9.2209882
out %*% runif(ncol(out))
## [,1]
## GENE_1 1.22094154
## GENE_2 -0.77518520
## GENE_3 -0.71304838
## GENE_4 1.77947185
## GENE_5 3.07075509
## GENE_6 -0.94024555
## GENE_7 -0.36811290
## GENE_8 -3.31268945
## GENE_9 -1.01911455
## GENE_10 -0.96808868
## GENE_11 4.59963643
## GENE_12 -1.66280736
## GENE_13 0.01601344
## GENE_14 0.69614959
## GENE_15 -0.12942342
## GENE_16 -0.88692283
## GENE_17 -0.41338884
## GENE_18 0.10587981
## GENE_19 1.57098196
## GENE_20 1.48507876
## GENE_21 -1.77589247
## GENE_22 -0.20388810
## GENE_23 -1.87085299
## GENE_24 -0.68598806
## GENE_25 1.07667070
## GENE_26 0.23976521
## GENE_27 -1.40738700
## GENE_28 -0.47516135
## GENE_29 3.06347236
## GENE_30 -3.55529196
## GENE_31 -1.22171744
## GENE_32 0.81721427
## GENE_33 -0.20653482
## GENE_34 0.92779684
## GENE_35 -0.37269867
## GENE_36 0.03301316
## GENE_37 -1.81659064
## GENE_38 1.35968907
## GENE_39 -1.91691826
## GENE_40 -0.02121303
## GENE_41 -1.51761077
## GENE_42 -0.14345594
## GENE_43 -2.11805725
## GENE_44 -0.15859237
## GENE_45 1.56752015
## GENE_46 -2.09235296
## GENE_47 -1.34268901
## GENE_48 -1.79397006
## GENE_49 -0.74600123
## GENE_50 0.96149321
## GENE_51 1.50990946
## GENE_52 1.00382370
## GENE_53 -2.55683634
## GENE_54 -2.29645551
## GENE_55 0.83576819
## GENE_56 -1.86570524
## GENE_57 0.51194103
## GENE_58 -1.48454027
## GENE_59 -1.05618921
## GENE_60 0.94605824
## GENE_61 2.40432471
## GENE_62 -0.67161115
## GENE_63 -1.81257247
## GENE_64 1.96713920
## GENE_65 0.75624910
## GENE_66 2.39205577
## GENE_67 -0.68424604
## GENE_68 -3.87899760
## GENE_69 1.08228754
## GENE_70 1.73045021
## GENE_71 -0.34075415
## GENE_72 -0.46709035
## GENE_73 -0.97725256
## GENE_74 -0.09168164
## GENE_75 -2.06415707
## GENE_76 1.50606003
## GENE_77 -0.59300479
## GENE_78 0.23917245
## GENE_79 -0.42579019
## GENE_80 0.41291306
## GENE_81 0.99718762
## GENE_82 -2.81473463
## GENE_83 2.01448193
## GENE_84 1.74275671
## GENE_85 -0.67141882
## GENE_86 1.76607134
## GENE_87 -0.38843048
## GENE_88 2.29344554
## GENE_89 2.37235440
## GENE_90 1.50023388
## GENE_91 -1.92785873
## GENE_92 0.41678826
## GENE_93 2.67231649
## GENE_94 -1.39680178
## GENE_95 0.86456950
## GENE_96 -0.17073243
## GENE_97 2.67425200
## GENE_98 -3.20211276
## GENE_99 1.86255871
## GENE_100 -2.34174958
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.7272340 -0.8281989 1.3410287 . 1.54832255 0.50080572
## [2,] -1.3473640 0.6849275 1.1576598 . 1.35798879 -0.85486040
## [3,] -0.1123482 -0.8117961 -0.5558034 . 0.40940006 0.06847921
## [4,] 0.6212716 0.6528040 -1.1395344 . -0.78114320 -2.67551086
## [5,] 0.2926601 1.5132433 -0.4125247 . -0.59463534 1.16027600
## ... . . . . . .
## [96,] -0.62218933 0.83521703 0.64779629 . -0.34831912 -0.19487510
## [97,] 1.40228300 0.66592313 0.03600730 . 0.63167597 0.08461536
## [98,] 0.89382012 0.09829028 0.11846224 . -1.89260721 -1.27978961
## [99,] -1.35237038 0.33282702 0.52798952 . 0.30235771 0.57805122
## [100,] -0.77954437 -0.05356938 -0.06268355 . -0.14088531 -0.81098864
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.7272340 -0.8281989 1.3410287 . 1.54832255 0.50080572
## [2,] -1.3473640 0.6849275 1.1576598 . 1.35798879 -0.85486040
## [3,] -0.1123482 -0.8117961 -0.5558034 . 0.40940006 0.06847921
## [4,] 0.6212716 0.6528040 -1.1395344 . -0.78114320 -2.67551086
## [5,] 0.2926601 1.5132433 -0.4125247 . -0.59463534 1.16027600
## ... . . . . . .
## [96,] -0.62218933 0.83521703 0.64779629 . -0.34831912 -0.19487510
## [97,] 1.40228300 0.66592313 0.03600730 . 0.63167597 0.08461536
## [98,] 0.89382012 0.09829028 0.11846224 . -1.89260721 -1.27978961
## [99,] -1.35237038 0.33282702 0.52798952 . 0.30235771 0.57805122
## [100,] -0.77954437 -0.05356938 -0.06268355 . -0.14088531 -0.81098864
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /media/volume/teran2_disk/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.15.4 DelayedArray_0.31.13
## [4] SparseArray_1.5.41 S4Arrays_1.5.10 IRanges_2.39.2
## [7] abind_1.4-8 S4Vectors_0.43.2 MatrixGenerics_1.17.0
## [10] matrixStats_1.4.1 BiocGenerics_0.51.3 Matrix_1.7-0
## [13] BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.4.1
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.5.0.1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.45.0 tiledb_0.30.0
## [16] knitr_1.48 bookdown_0.40 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.47
## [22] sass_0.4.9 bit64_4.5.2 cli_3.6.3
## [25] zlibbioc_1.51.1 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.1 lifecycle_1.0.4 data.table_1.16.0
## [31] evaluate_1.0.0 nanotime_0.3.10 zoo_1.8-12
## [34] rmarkdown_2.28 tools_4.4.1 htmltools_0.5.8.1