Contents

1 tidySpatialExperiment - part of tidyomics

Resources to help you get started with tidySpatialExperiment and tidyomics:

The tidyomics ecosystem includes packages for:

2 Introduction

tidySpatialExperiment provides a bridge between the SpatialExperiment [@righelli2022spatialexperiment] package and the tidyverse [@wickham2019welcome] ecosystem. It creates an invisible layer that allows you to interact with a SpatialExperiment object as if it were a tibble; enabling the use of functions from dplyr, tidyr, ggplot2 and plotly. But, underneath, your data remains a SpatialExperiment object.

tidySpatialExperiment also provides five additional utility functions.

2.1 Functions and utilities

Package Functions available
SpatialExperiment All
dplyr arrange,bind_rows, bind_cols, distinct, filter, group_by, summarise, select, mutate, rename, left_join, right_join, inner_join, slice, sample_n, sample_frac, count, add_count
tidyr nest, unnest, unite, separate, extract, pivot_longer
ggplot2 ggplot
plotly plot_ly
Utility Description
as_tibble Convert cell data to a tbl_df
join_features Append feature data to cell data
aggregate_cells Aggregate cell-feature abundance into a pseudobulk SummarizedExperiment object
rectangle Select rectangular region of space
ellipse Select elliptical region of space

2.2 Installation

You can install the stable version of tidySpatialExperiment from Bioconductor with:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("tidySpatialExperiment")

You can install the development version of tidySpatialExperiment from GitHub with:

if (!requireNamespace("devtools", quietly=TRUE))
    install.packages("devtools")

devtools::install_github("william-hutchison/tidySpatialExperiment")

2.3 Load data

Here, we attach tidySpatialExperiment and an example SpatialExperiment object.

# Load example SpatialExperiment object
library(tidySpatialExperiment)
example(read10xVisium)

2.4 SpatialExperiment-tibble abstraction

A SpatialExperiment object represents observations (cells) as columns and variables (features) as rows, as is the Bioconductor convention. Additional information about the cells is accessed through the reducedDims, colData and spatialCoords functions.

tidySpatialExperiment provides a SpatialExperiment-tibble abstraction, representing cells as rows and features as columns, as is the tidyverse convention. colData and spatialCoords are appended as columns to the same abstraction, allowing easy interaction with this additional data.

The default view is now of the SpatialExperiment-tibble abstraction.

spe
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features=50 | Cells=99 | Assays=counts
##    .cell              in_tissue array_row array_col sample_id pxl_col_in_fullres
##    <chr>              <lgl>         <int>     <int> <chr>                  <int>
##  1 AAACAACGAATAGTTC-1 FALSE             0        16 section1                2312
##  2 AAACAAGTATCTCCCA-1 TRUE             50       102 section1                8230
##  3 AAACAATCTACTAGCA-1 TRUE              3        43 section1                4170
##  4 AAACACCAATAACTGC-1 TRUE             59        19 section1                2519
##  5 AAACAGAGCGACTCCT-1 TRUE             14        94 section1                7679
##  6 AAACAGCTTTCAGAAG-1 FALSE            43         9 section1                1831
##  7 AAACAGGGTCTATATT-1 FALSE            47        13 section1                2106
##  8 AAACAGTGTTCCTGGG-1 FALSE            73        43 section1                4170
##  9 AAACATGGTGAGAGGA-1 FALSE            62         0 section1                1212
## 10 AAACATTTCCCGGATT-1 FALSE            61        97 section1                7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>

However, our data maintains its status as a SpatialExperiment object. Therefore, we have access to all SpatialExperiment functions.

spe |>
  colData() |>
  head()
## DataFrame with 6 rows and 4 columns
##                    in_tissue array_row array_col   sample_id
##                    <logical> <integer> <integer> <character>
## AAACAACGAATAGTTC-1     FALSE         0        16    section1
## AAACAAGTATCTCCCA-1      TRUE        50       102    section1
## AAACAATCTACTAGCA-1      TRUE         3        43    section1
## AAACACCAATAACTGC-1      TRUE        59        19    section1
## AAACAGAGCGACTCCT-1      TRUE        14        94    section1
## AAACAGCTTTCAGAAG-1     FALSE        43         9    section1
spe |> 
  spatialCoords() |>
  head()
##                    pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1               2312               1252
## AAACAAGTATCTCCCA-1               8230               7237
## AAACAATCTACTAGCA-1               4170               1611
## AAACACCAATAACTGC-1               2519               8315
## AAACAGAGCGACTCCT-1               7679               2927
## AAACAGCTTTCAGAAG-1               1831               6400
spe |>
  imgData()
## DataFrame with 2 rows and 4 columns
##     sample_id    image_id   data scaleFactor
##   <character> <character> <list>   <numeric>
## 1    section1      lowres   ####   0.0510334
## 2    section2      lowres   ####   0.0510334

3 Integration with the tidyverse ecosystem

3.1 Manipulate with dplyr

Most functions from dplyr are available for use with the SpatialExperiment-tibble abstraction. For example, filter can be used to select cells by a variable of interest.

spe |>
  filter(array_col < 5)
## # A SpatialExperiment-tibble abstraction: 6 × 7
## # Features=50 | Cells=6 | Assays=counts
##   .cell              in_tissue array_row array_col sample_id pxl_col_in_fullres
##   <chr>              <lgl>         <int>     <int> <chr>                  <int>
## 1 AAACATGGTGAGAGGA-1 FALSE            62         0 section1                1212
## 2 AAACGAAGATGGAGTA-1 FALSE            58         4 section1                1487
## 3 AAAGAATGACCTTAGA-1 FALSE            64         2 section1                1349
## 4 AAACATGGTGAGAGGA-1 FALSE            62         0 section2                1212
## 5 AAACGAAGATGGAGTA-1 FALSE            58         4 section2                1487
## 6 AAAGAATGACCTTAGA-1 FALSE            64         2 section2                1349
## # ℹ 1 more variable: pxl_row_in_fullres <int>

And mutate can be used to add new variables, or modify the value of an existing variable.

spe |>
  mutate(in_region = c(in_tissue & array_row < 10))
## # A SpatialExperiment-tibble abstraction: 99 × 8
## # Features=50 | Cells=99 | Assays=counts
##    .cell    in_tissue array_row array_col sample_id in_region pxl_col_in_fullres
##    <chr>    <lgl>         <int>     <int> <chr>     <lgl>                  <int>
##  1 AAACAAC… FALSE             0        16 section1  FALSE                   2312
##  2 AAACAAG… TRUE             50       102 section1  FALSE                   8230
##  3 AAACAAT… TRUE              3        43 section1  TRUE                    4170
##  4 AAACACC… TRUE             59        19 section1  FALSE                   2519
##  5 AAACAGA… TRUE             14        94 section1  FALSE                   7679
##  6 AAACAGC… FALSE            43         9 section1  FALSE                   1831
##  7 AAACAGG… FALSE            47        13 section1  FALSE                   2106
##  8 AAACAGT… FALSE            73        43 section1  FALSE                   4170
##  9 AAACATG… FALSE            62         0 section1  FALSE                   1212
## 10 AAACATT… FALSE            61        97 section1  FALSE                   7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>

3.2 Tidy with tidyr

Most functions from tidyr are also available. Here, nest is used to group the data by sample_id, and unnest is used to ungroup the data.

# Nest the SpatialExperiment object by sample_id
spe_nested <-
  spe |> 
  nest(data = -sample_id)

# View the nested SpatialExperiment object
spe_nested
## # A tibble: 2 × 2
##   sample_id data           
##   <chr>     <list>         
## 1 section1  <SptlExpr[,50]>
## 2 section2  <SptlExpr[,49]>
# Unnest the nested SpatialExperiment objects
spe_nested |>
  unnest(data)
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features=50 | Cells=99 | Assays=counts
##    .cell              in_tissue array_row array_col sample_id pxl_col_in_fullres
##    <chr>              <lgl>         <int>     <int> <chr>                  <int>
##  1 AAACAACGAATAGTTC-1 FALSE             0        16 section1                2312
##  2 AAACAAGTATCTCCCA-1 TRUE             50       102 section1                8230
##  3 AAACAATCTACTAGCA-1 TRUE              3        43 section1                4170
##  4 AAACACCAATAACTGC-1 TRUE             59        19 section1                2519
##  5 AAACAGAGCGACTCCT-1 TRUE             14        94 section1                7679
##  6 AAACAGCTTTCAGAAG-1 FALSE            43         9 section1                1831
##  7 AAACAGGGTCTATATT-1 FALSE            47        13 section1                2106
##  8 AAACAGTGTTCCTGGG-1 FALSE            73        43 section1                4170
##  9 AAACATGGTGAGAGGA-1 FALSE            62         0 section1                1212
## 10 AAACATTTCCCGGATT-1 FALSE            61        97 section1                7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>

3.3 Plot with ggplot2

The ggplot function can be used to create a plot from a SpatialExperiment object. This example also demonstrates how tidy operations can be combined to build up more complex analysis. It should be noted that helper functions such aes are not included and should be imported from ggplot2.

spe |>
  filter(sample_id == "section1" & in_tissue) |>
  
  # Add a column with the sum of feature counts per cell
  mutate(count_sum = purrr::map_int(.cell, ~
    spe[, .x] |> 
      counts() |> 
      sum()
    )) |>
  
  # Plot with tidySpatialExperiment and ggplot2
  ggplot(ggplot2::aes(x = reorder(.cell, count_sum), y = count_sum)) +
  ggplot2::geom_point() +
  ggplot2::coord_flip()

3.4 Plot with plotly

The plot_ly function can also be used to create a plot from a SpatialExperiment object.

spe |>
  filter(sample_id == "section1") |>
  plot_ly(
    x = ~ array_col, 
    y = ~ array_row, 
    color = ~ in_tissue, 
    type = "scatter"
  )

plotly demonstration

4 Integration with the tidyomics ecosystem

4.1 Interactively select cells with tidygate

Different packages from the tidyomics ecosystem are easy to use together. Here, tidygate is used to interactively gate cells based on their array location.

spe_regions <-
  spe |> 
  filter(sample_id == "section1") |>
  mutate(region = tidygate::gate_chr(array_col, array_row))

tidygate demonstration

The gated cells can then be divided into pseudobulks within a SummarizedExperiment object using tidySpatialExperiment’s aggregate_cells utility function.

spe_regions_aggregated <-
  spe_regions |>
  aggregate_cells(region)

5 Utilities

5.1 Append feature data to cell data

The tidyomics ecosystem places the emphasis on interacting with cell data. To interact with feature data, the join_feature function can be used to append feature values to cell data.

# Join feature data in wide format, preserving the SpatialExperiment object
spe |>
  join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "wide") |> 
  head()
## # A SpatialExperiment-tibble abstraction: 99 × 9
## # Features=6 | Cells=99 | Assays=counts
##    .cell              in_tissue array_row array_col sample_id ENSMUSG00000025915
##    <chr>              <lgl>         <int>     <int> <chr>                  <dbl>
##  1 AAACAACGAATAGTTC-1 FALSE             0        16 section1                   0
##  2 AAACAAGTATCTCCCA-1 TRUE             50       102 section1                   0
##  3 AAACAATCTACTAGCA-1 TRUE              3        43 section1                   0
##  4 AAACACCAATAACTGC-1 TRUE             59        19 section1                   0
##  5 AAACAGAGCGACTCCT-1 TRUE             14        94 section1                   0
##  6 AAACAGCTTTCAGAAG-1 FALSE            43         9 section1                   0
##  7 AAACAGGGTCTATATT-1 FALSE            47        13 section1                   0
##  8 AAACAGTGTTCCTGGG-1 FALSE            73        43 section1                   0
##  9 AAACATGGTGAGAGGA-1 FALSE            62         0 section1                   0
## 10 AAACATTTCCCGGATT-1 FALSE            61        97 section1                   0
## # ℹ 89 more rows
## # ℹ 3 more variables: ENSMUSG00000042501 <dbl>, pxl_col_in_fullres <int>,
## #   pxl_row_in_fullres <int>
# Join feature data in long format, discarding the SpatialExperiment object
spe |>
  join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "long") |> 
  head()
## tidySpatialExperiment says: A data frame is returned for independent data analysis.
## # A tibble: 6 × 7
##   .cell       in_tissue array_row array_col sample_id .feature .abundance_counts
##   <chr>       <lgl>         <int>     <int> <chr>     <chr>                <dbl>
## 1 AAACAACGAA… FALSE             0        16 section1  ENSMUSG…                 0
## 2 AAACAACGAA… FALSE             0        16 section1  ENSMUSG…                 0
## 3 AAACAAGTAT… TRUE             50       102 section1  ENSMUSG…                 0
## 4 AAACAAGTAT… TRUE             50       102 section1  ENSMUSG…                 1
## 5 AAACAATCTA… TRUE              3        43 section1  ENSMUSG…                 0
## 6 AAACAATCTA… TRUE              3        43 section1  ENSMUSG…                 0

5.2 Aggregate cells

Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models.

Cell aggregation can be achieved using the aggregate_cells function.

spe |>
  aggregate_cells(in_tissue, assays = "counts")
## class: SummarizedExperiment 
## dim: 50 2 
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000002459 ENSMUSG00000005886 ...
##   ENSMUSG00000104217 ENSMUSG00000104328
## rowData names(1): feature
## colnames(2): FALSE TRUE
## colData names(2): in_tissue .aggregated_cells

5.3 Elliptical and rectangular region selection

To select cells by their geometric region in space, the ellipse and rectangle functions can be used.

spe |>
  filter(sample_id == "section1") |>
  mutate(in_ellipse = ellipse(array_col, array_row, c(20, 40), c(20, 20))) |>
  ggplot(aes(x = array_col, y = array_row, colour = in_ellipse)) +
  geom_point()

6 Important considerations

6.1 Read-only columns

Removing the .cell column will return a tibble. This is consistent with the behaviour in other tidyomics packages.

spe |>
  select(-.cell) |>
  head()
## tidySpatialExperiment says: Key columns are missing. A data frame is returned for independent data analysis.
## # A tibble: 6 × 4
##   in_tissue array_row array_col sample_id
##   <lgl>         <int>     <int> <chr>    
## 1 FALSE             0        16 section1 
## 2 TRUE             50       102 section1 
## 3 TRUE              3        43 section1 
## 4 TRUE             59        19 section1 
## 5 TRUE             14        94 section1 
## 6 FALSE            43         9 section1

The sample_id column cannot be removed with tidyverse functions, and can only be modified if the changes are accepted by SpatialExperiment’s colData function.

# sample_id is not removed, despite the user's request
spe |>
  select(-sample_id)
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features=50 | Cells=99 | Assays=counts
##    .cell              in_tissue array_row array_col sample_id pxl_col_in_fullres
##    <chr>              <lgl>         <int>     <int> <chr>                  <int>
##  1 AAACAACGAATAGTTC-1 FALSE             0        16 section1                2312
##  2 AAACAAGTATCTCCCA-1 TRUE             50       102 section1                8230
##  3 AAACAATCTACTAGCA-1 TRUE              3        43 section1                4170
##  4 AAACACCAATAACTGC-1 TRUE             59        19 section1                2519
##  5 AAACAGAGCGACTCCT-1 TRUE             14        94 section1                7679
##  6 AAACAGCTTTCAGAAG-1 FALSE            43         9 section1                1831
##  7 AAACAGGGTCTATATT-1 FALSE            47        13 section1                2106
##  8 AAACAGTGTTCCTGGG-1 FALSE            73        43 section1                4170
##  9 AAACATGGTGAGAGGA-1 FALSE            62         0 section1                1212
## 10 AAACATTTCCCGGATT-1 FALSE            61        97 section1                7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
# This change maintains separation of sample_ids and is permitted
spe |> 
  mutate(sample_id = stringr::str_c(sample_id, "_modified")) |>
  head()
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features=6 | Cells=99 | Assays=counts
##    .cell              in_tissue array_row array_col sample_id pxl_col_in_fullres
##    <chr>              <lgl>         <int>     <int> <chr>                  <int>
##  1 AAACAACGAATAGTTC-1 FALSE             0        16 section1…               2312
##  2 AAACAAGTATCTCCCA-1 TRUE             50       102 section1…               8230
##  3 AAACAATCTACTAGCA-1 TRUE              3        43 section1…               4170
##  4 AAACACCAATAACTGC-1 TRUE             59        19 section1…               2519
##  5 AAACAGAGCGACTCCT-1 TRUE             14        94 section1…               7679
##  6 AAACAGCTTTCAGAAG-1 FALSE            43         9 section1…               1831
##  7 AAACAGGGTCTATATT-1 FALSE            47        13 section1…               2106
##  8 AAACAGTGTTCCTGGG-1 FALSE            73        43 section1…               4170
##  9 AAACATGGTGAGAGGA-1 FALSE            62         0 section1…               1212
## 10 AAACATTTCCCGGATT-1 FALSE            61        97 section1…               7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
# This change does not maintain separation of sample_ids and produces an error
spe |>
  mutate(sample_id = "new_sample")
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 1 was provided.

The pxl_col_in_fullres and px_row_in_fullres columns cannot be removed or modified with tidyverse functions. This is consistent with the behaviour of dimension reduction data in other tidyomics packages.

# Attempting to remove pxl_col_in_fullres produces an error
spe |>
  select(-pxl_col_in_fullres)
## Error in `select_helper()`:
## ! Can't select columns that don't exist.
## ✖ Column `pxl_col_in_fullres` doesn't exist.
# Attempting to modify pxl_col_in_fullres produces an error
spe |> 
  mutate(pxl_col_in_fullres)
## Error in `dplyr::mutate()`:
## ℹ In argument: `pxl_col_in_fullres`.
## Caused by error:
## ! object 'pxl_col_in_fullres' not found
sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] tidySpatialExperiment_1.1.0     ggplot2_3.5.1                  
##  [3] ttservice_0.4.0                 tidyr_1.3.1                    
##  [5] dplyr_1.1.4                     tidySingleCellExperiment_1.15.0
##  [7] SpatialExperiment_1.15.0        SingleCellExperiment_1.27.0    
##  [9] SummarizedExperiment_1.35.0     Biobase_2.65.0                 
## [11] GenomicRanges_1.57.0            GenomeInfoDb_1.41.0            
## [13] IRanges_2.39.0                  S4Vectors_0.43.0               
## [15] BiocGenerics_0.51.0             MatrixGenerics_1.17.0          
## [17] matrixStats_1.3.0               BiocStyle_2.33.0               
## 
## loaded via a namespace (and not attached):
##  [1] rlang_1.1.3               magrittr_2.0.3           
##  [3] compiler_4.4.0            DelayedMatrixStats_1.27.0
##  [5] vctrs_0.6.5               stringr_1.5.1            
##  [7] pkgconfig_2.0.3           crayon_1.5.2             
##  [9] fastmap_1.1.1             magick_2.8.3             
## [11] XVector_0.45.0            ellipsis_0.3.2           
## [13] labeling_0.4.3            scuttle_1.15.0           
## [15] utf8_1.2.4                rmarkdown_2.26           
## [17] UCSC.utils_1.1.0          tinytex_0.50             
## [19] purrr_1.0.2               xfun_0.43                
## [21] zlibbioc_1.51.0           cachem_1.0.8             
## [23] beachmat_2.21.0           jsonlite_1.8.8           
## [25] highr_0.10                rhdf5filters_1.17.0      
## [27] DelayedArray_0.31.0       Rhdf5lib_1.27.0          
## [29] BiocParallel_1.39.0       parallel_4.4.0           
## [31] R6_2.5.1                  bslib_0.7.0              
## [33] stringi_1.8.3             limma_3.61.0             
## [35] jquerylib_0.1.4           Rcpp_1.0.12              
## [37] bookdown_0.39             knitr_1.46               
## [39] R.utils_2.12.3            Matrix_1.7-0             
## [41] tidyselect_1.2.1          abind_1.4-5              
## [43] yaml_2.3.8                codetools_0.2-20         
## [45] lattice_0.22-6            tibble_3.2.1             
## [47] withr_3.0.0               evaluate_0.23            
## [49] pillar_1.9.0              BiocManager_1.30.22      
## [51] plotly_4.10.4             generics_0.1.3           
## [53] sparseMatrixStats_1.17.0  munsell_0.5.1            
## [55] scales_1.3.0              glue_1.7.0               
## [57] lazyeval_0.2.2            tools_4.4.0              
## [59] data.table_1.15.4         locfit_1.5-9.9           
## [61] rhdf5_2.49.0              grid_4.4.0               
## [63] DropletUtils_1.25.0       edgeR_4.3.0              
## [65] colorspace_2.1-0          GenomeInfoDbData_1.2.12  
## [67] HDF5Array_1.33.0          cli_3.6.2                
## [69] fansi_1.0.6               S4Arrays_1.5.0           
## [71] viridisLite_0.4.2         gtable_0.3.5             
## [73] R.methodsS3_1.8.2         sass_0.4.9               
## [75] digest_0.6.35             SparseArray_1.5.0        
## [77] dqrng_0.3.2               farver_2.1.1             
## [79] rjson_0.2.21              htmlwidgets_1.6.4        
## [81] htmltools_0.5.8.1         R.oo_1.26.0              
## [83] lifecycle_1.0.4           httr_1.4.7               
## [85] statmod_1.5.0