1 Introduction

This vignette showcases key functionalities of the spatialHeatmap package, with a detailed description of the project available here.

1.1 Motivation

The spatialHeatmap package provides functionalities for visualizing cell-, tissue- and organ-specific data of biological assays by coloring the corresponding spatial features defined in anatomical images according to quantitative abundance levels of measured biomolecules, such as transcripts, proteins or metabolites (Zhang et al. 2024). A color key is used to represent the quantitative assay values and can be customized by the user. This core functionality of the package is called a spatial heatmap (SHM) plot. Additional important functionalities include Spatial Enrichment (SE), Spatial Co-Expression (SCE), and Single Cell to SHM Co-Visualization (SC2SHM-CoViz). These extra utilities are useful for identifying biomolecules with spatially selective abundance patterns (SE), groups of biomolecules with related abundance profiles using hierarchical clustering, K-means clustering, or network analysis (SCE), and to co-visualize single cell embedding results with SHMs (SCSHM-CoViz). The latter co-visualization functionality (Figure 1E) is described in a separate vignette called SCSHM-CoViz.

The functionalities of spatialHeatmap can be used either in a command-driven mode from within R or a graphical user interface (GUI) provided by a Shiny App that is part of this project. While the R-based mode provides flexibility to customize and automate analysis routines, the Shiny App includes a variety of convenience features that will appeal to experimentalists and users less familiar with R. The Shiny App can be used on both local computers as well as centralized server-based deployments (e.g. cloud-based or custom servers). This way it can be used for both hosting community data on a public server or running on a private system. The core functionalities of the spatialHeatmap package are illustrated in Figure 1.

Overview of spatialHeatmap functionality. (A) The _spatialHeatmap_ package plots numeric assay data onto spatially annotated images. The assay data can be provided as numeric vectors, tabular data, _SummarizedExperiment_, or _SingleCellExperiment_ objects. The latter two are widely used data containers for organizing both assays as well as associated annotation data and experimental designs. (B) Anatomical and other spatial images need to be provided as annotated SVG (aSVG) files where the spatial features and the corresponding components of the assay data have matching labels (_e.g._ tissue labels). To work with SVG data efficiently, the _SVG_ S4 class container has been developed. The assay data are used to color the matching spatial features in aSVG images according to a color key. (C)-(D) The result is called a spatial heatmap (SHM). (E) Large-scale data mining such as hierarchical clustering and network analysis can be integrated to facilitate the identification of biomolecules with similar abundance profiles. Moreover, (E) Single cell embedding results can be co-visualized with SHMs.

Figure 1: Overview of spatialHeatmap functionality
(A) The spatialHeatmap package plots numeric assay data onto spatially annotated images. The assay data can be provided as numeric vectors, tabular data, SummarizedExperiment, or SingleCellExperiment objects. The latter two are widely used data containers for organizing both assays as well as associated annotation data and experimental designs. (B) Anatomical and other spatial images need to be provided as annotated SVG (aSVG) files where the spatial features and the corresponding components of the assay data have matching labels (e.g. tissue labels). To work with SVG data efficiently, the SVG S4 class container has been developed. The assay data are used to color the matching spatial features in aSVG images according to a color key. (C)-(D) The result is called a spatial heatmap (SHM). (E) Large-scale data mining such as hierarchical clustering and network analysis can be integrated to facilitate the identification of biomolecules with similar abundance profiles. Moreover, (E) Single cell embedding results can be co-visualized with SHMs.

The package supports anatomical images from public repositories or those provided by users. In general any type of image can be used as long as it can be provided in SVG (Scalable Vector Graphics) format and the corresponding spatial features, such as organs, tissues, cellular compartments, are annotated (see aSVG below). The numeric values plotted onto an SHM are usually quantitative measurements from a wide range of profiling technologies, such as microarrays, next generation sequencing (e.g. RNA-Seq and scRNA-Seq), proteomics, metabolomics, or many other small- or large-scale experiments. For convenience, several preprocessing and normalization methods for the most common use cases are included that support raw and/or preprocessed data. Currently, the main application domains of the spatialHeatmap package are numeric data sets and spatially mapped images from biological, agricultural and biomedical areas. Moreover, the package has been designed to also work with many other spatial data types, such as population data plotted onto geographic maps. This high level of flexibility is one of the unique features of spatialHeatmap. Related software tools for biological applications in this field are largely based on pure web applications (Maag 2018; Lekschas et al. 2015; Papatheodorou et al. 2018; Winter et al. 2007; Waese et al. 2017) or local tools (Muschelli, Sweeney, and Crainiceanu 2014) that typically lack customization functionalities. These restrictions limit users to utilizing pre-existing expression data and/or fixed sets of anatomical image collections. To close this gap for biological use cases, we have developed spatialHeatmap as a generic R/Bioconductor package for plotting quantitative values onto any type of spatially mapped images in a programmable environment and/or in an intuitive to use GUI application.

1.2 Design

The core feature of spatialHeatmap is to map assay values (e.g. gene expression data) of one or many biomolecules (e.g. genes) measured under different conditions in form of numerically graded colors onto the corresponding cell types or tissues represented in a chosen SVG image. In the gene profiling field, this feature supports comparisons of the expression values among multiple genes by plotting their SHMs next to each other. Similarly, one can display the expression values of a single or multiple genes across multiple conditions in the same plot (Figure 3). This level of flexibility is very efficient for visualizing complicated expression patterns across genes, cell types and conditions. In case of more complex anatomical images with overlapping multiple layer tissues, it is important to visually expose the tissue layer of interest in the plots. To address this, several default and customizable layer viewing options are provided. They allow to hide features in the top layers by making them transparent in order to expose features below them. This transparency viewing feature is highlighted below in the mouse example (Figure 4). Moreover, one can plot multiple distinct aSVGs in a single SHM plot as shown in Figure 10. This is particularly useful for displaying abundance trends across multiple development stages, where each is represented by its own aSVG image. In addition to static SHM representations, one can visualize them in form of interactive HTML files or videos.

To maximize reusability and extensibility, the package organizes large-scale omics assay data along with the associated experimental design information in a SummarizedExperiment object (Figure 1A; Morgan et al. 2018). The latter is one of the core S4 classes within the Bioconductor ecosystem that has been widely adapted by many other software packages dealing with gene-, protein- and metabolite-level profiling data. In case of gene expression data, the assays slot of the SummarizedExperiment container is populated with a gene expression matrix, where the rows and columns represent the genes and tissue/conditions, respectively. The colData slot contains experimental design definitions including replicate and treatment information. The tissues and/or cell type information in the object maps via colData to the corresponding features in the SVG images using unique identifiers for the spatial features (e.g. tissues or cell types). This allows to color the features of interest in an SVG image according to the numeric data stored in a SummarizedExperiment object. For simplicity the numeric data can also be provided as numeric vectors or data.frames. This can be useful for testing purposes and/or the usage of simple data sets that may not require the more advanced features of the SummarizedExperiment class, such as measurements with only one or a few data points. The details about how to access the SVG images and properly format the associated expression data are provided in the Supplementary Section of this vignette.

1.3 Image Format: SVG

SHMs are images where colors encode numeric values in features of any shape. For plotting SHMs, Scalable Vector Graphics (SVG) has been chosen as image format since it is a flexible and widely adapted vector graphics format that provides many advantages for computationally embedding numerical and other information in images. SVG is based on XML formatted text describing all components present in images, including lines, shapes and colors. In case of biological images suitable for SHMs, the shapes often represent anatomical or cell structures. To assign colors to specific features in SHMs, annotated SVG (aSVG) files are used where the shapes of interest are labeled according to certain conventions so that they can be addressed and colored programmatically. One or multiple aSVG files can be parsed and stored in the SVG S4 container with utilities provided by the spatialHeatmap package (Figure 1B). The main slots of SVG include coordinate, attribute, dimension, svg, and raster. They correspond to feature coordinates, styling attributes (color, line width, etc.), width and heigth, original aSVG instances, and raster image paths, respectively. Raster images are required only when including photographic image components in SHMs (Figure 7), which is optional. Detailed instruction for creating custom aSVGs is provied in a separate tutorial.

SVGs and aSVGs of anatomical structures can be downloaded from many sources including the repositories described below. Alternatively, users can generate them themselves with vector graphics software such as Inkscape. Typically, in aSVGs one or more shapes of a feature of interest, such as the cell shapes of an organ, are grouped together by a common feature identifier. Via these group identifiers one or many feature types can be colored simultaneously in an aSVG according to biological experiments assaying the corresponding feature types with the required spatial resolution. To color spatial features according to numeric assay values, common identifiers are required for spatial features between the assay data and aSVGs. The color gradient used to visually represent the numeric assay values is controlled by a color gradient parameter. To visually interpret the meaning of the colors, the corresponding color key is included in the SHM plots. Additional details for properly formatting and annotating both aSVG images and assay data are provided in the Supplementary Section section of this vignette.

1.4 Data Repositories

If not generated by the user, SHMs can be generated with data downloaded from various public repositories. This includes gene, protein and metabolic profiling data from databases, such as GEO, BAR and Expression Atlas from EMBL-EBI (Papatheodorou et al. 2018). A particularly useful resource, when working with spatialHeatmap, is the EBI Expression Atlas. This online service contains both assay and anatomical images. Its assay data include mRNA and protein profiling experiments for different species, tissues and conditions. The corresponding anatomical image collections are also provided for a wide range of species including animals and plants. In spatialHeatmap several import functions are provided to work with the expression and aSVG repository from the Expression Atlas directly. The aSVG images developed by the spatialHeatmap project are available in its own repository called spatialHeatmap aSVG Repository, where users can contribute their aSVG images that are formatted according to our guidlines.

1.5 Tutorial Overview

The following sections of this vignette showcase the most important functionalities of the spatialHeatmap package using as initial example a simple to understand testing data set, and then more complex mRNA profiling data from the Expression Atlas and GEO databases. The co-visualization functionality is explained in a separate vignette (see SCSHM-CoViz). First, SHM plots are generated for both the testing and mRNA expression data. The latter include gene expression data sets from RNA-Seq and microarray experiments of Human Brain, Mouse Organs, Chicken Organs, and Arabidopsis Shoots. The first three are RNA-Seq data from the Expression Atlas, while the last one is a microarray data set from GEO. Second, gene context analysis tools are introduced, which facilitate the visualization of gene modules sharing similar expression patterns. This includes the visualization of hierarchical clustering results with traditional matrix heatmaps (Matrix Heatmap) as well as co-expression network plots (Network). Third, the Spatial Enrichment functionality is illustrated with mouse RNA-seq data. Lastly, an overview of the corresponding Shiny App is presented that provides access to the same functionalities as the R functions, but executes them in an interactive GUI environment (Chang et al. 2021; Chang and Borges Ribeiro 2018).

2 Getting Started

2.1 Installation

The spatialHeatmap package should be installed from an R (version \(\ge\) 3.6) session with the BiocManager::install command.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spatialHeatmap")

2.2 Packages and Documentation

Next, the packages required for running the sample code in this vignette need to be loaded.

library(spatialHeatmap); library(SummarizedExperiment); library(ExpressionAtlas); library(GEOquery); library(igraph); library(BiocParallel); library(kableExtra); library(org.Hs.eg.db); library(org.Mm.eg.db); library(ggplot2)

The following lists the vignette(s) of this package in an HTML browser. Clicking the corresponding name will open this vignette.

browseVignettes('spatialHeatmap')

To reduce runtime, intermediate results can be cached under ~/.cache/shm.

cache.pa <- '~/.cache/shm' # Path of the cache directory.

A temporary directory is created to save output files.

tmp.dir <- normalizePath(tempdir(check=TRUE), winslash="/", mustWork=FALSE)

3 Spatial Heatmaps

3.1 Quick Start

Spatial Heatmaps (SHMs) are plotted with the shm function. To provide a quick and intuitive overview how these plots are generated, the following uses a generalized tesing example where a small vector of random numeric values is generated that are used to color features in an aSVG image. The image chosen for this example is an aSVG depicting the human brain. The corresponding image file homo_sapiens.brain.svg is included in this package for testing purposes.

3.1.1 aSVG Image

After the full path to the chosen target aSVG image on a user’s system is obtained with the system.file function, the function read_svg is used to import the aSVG information relevant for creating SHMs, which is stored in an SVG object svg.hum.

svg.hum.pa <- system.file("extdata/shinyApp/data", 'homo_sapiens.brain.svg', package="spatialHeatmap")
svg.hum <- read_svg(svg.hum.pa)

All features and their attributes can be accessed with the attribute function, where fill and stroke are the two most important ones providing color and line width information, respectively. The feature column includes group labels for sub-features in the sub.feature column. SHM plots are created by mapping assay data to labels in feature.

feature.hum <- attribute(svg.hum)[[1]]
tail(feature.hum[, 1:6], 3) # Partial features and respective attributes
## # A tibble: 3 × 6
##   feature                 id             fill  stroke sub.feature          index
##   <chr>                   <chr>          <chr>  <dbl> <chr>                <int>
## 1 cerebellar.hemisphere   UBERON_0002245 none   0.016 cerebellar.hemisphe…   247
## 2 nucleus.accumbens       UBERON_0001882 none   0.016 nucleus.accumbens      248
## 3 telencephalic.ventricle UBERON_0002285 none   0.016 telencephalic.ventr…   249

Feature coordinates can be accessed with the coordinate function.

coord.df <- coordinate(svg.hum)[[1]]
tail(coord.df, 3) # Partial features and respective coordinates
## # A tibble: 3 × 4
##       x     y feature                 index
##   <dbl> <dbl> <chr>                   <int>
## 1  194.  326. telencephalic.ventricle   249
## 2  194.  326. telencephalic.ventricle   249
## 3  194.  326. telencephalic.ventricle   249

3.1.2 Numeric Data

The following generates a small named vector for testing, where the data slot contains four numbers, and the name slot is populated with three feature names and one missing one (here ’notMapped"). The numbers are mapped to features (feature.hum) via matching names among the numeric vector and the aSVG, respectively. Accordingly, only numbers and features with matching name counterparts can be colored in the aSVG image. In addition, a summary of the numeric assay to feature mappings is stored in a data.frame returned by the shm function (see below).

set.seed(20) # To obtain reproducible results, a fixed seed is set.
unique(feature.hum$feature)[1:10]
##  [1] "g4320"                 "locus.ceruleus"        "diencephalon"         
##  [4] "medulla.oblongata"     "middle.temporal.gyrus" "caudate.nucleus"      
##  [7] "middle.frontal.gyrus"  "occipital.lobe"        "parietal.lobe"        
## [10] "pineal.gland"
my_vec <- setNames(sample(1:100, 4), c('substantia.nigra', 'putamen', 'prefrontal.cortex', 'notMapped'))
my_vec
##  substantia.nigra           putamen prefrontal.cortex         notMapped 
##                38                63                 2                29

3.1.3 Plot SHM

Before plotting SHMs, the aSVG instance and numeric data are stored in an SPHM object for the sake of efficient data management and reusability.

dat.quick <- SPHM(svg=svg.hum, bulk=my_vec)

Next, the SHM is plotted with the shm function (Figure 2). Internally, the numbers in my_vec are translated into colors based on the color key and then painted onto the corresponding features in the aSVG. In the given example (Figure 2) only three features (‘substantia.nigra’, ‘putamen’, and ‘prefrontal.cortex’) in the aSVG have matching entries in the data my_vec.

shm.res <- shm(data=dat.quick, ID='testing', ncol=1, sub.title.size=20, legend.nrow=3)
SHM of human brain with testing data. The plots from the left to the right represent the color key for the numeric data, followed by four SHM plots and the legend of the spatial features. The numeric values provided in `my_vec` are used to color the corresponding features in the SHM plots according to the color key while the legend plot identifies the spatial regions.

Figure 2: SHM of human brain with testing data
The plots from the left to the right represent the color key for the numeric data, followed by four SHM plots and the legend of the spatial features. The numeric values provided in my_vec are used to color the corresponding features in the SHM plots according to the color key while the legend plot identifies the spatial regions.

The named numeric values in my_vec, that have name matches with the features in the chosen aSVG, are stored in the mapped_feature slot.

# Mapped features
spatialHeatmap::output(shm.res)$mapped_feature
##        ID           feature value    fill                    SVG
## 1 testing  substantia.nigra    38 #FF8700 homo_sapiens.brain.svg
## 2 testing           putamen    63 #FF0000 homo_sapiens.brain.svg
## 3 testing prefrontal.cortex     2 #FFFF00 homo_sapiens.brain.svg

3.2 Human Brain

This subsection introduces how to query and download cell- and tissue-specific assay data in the Expression Atlas database using the ExpressionAtlas package (Keays 2019). After the choosen data is downloaded directly into a user's R session, the expression values for selected genes can be plotted onto a chosen aSVG image with or without prior preprocessing steps (e.g. normalization).

3.2.1 Gene Expression Data

The following example searches the Expression Atlas for expression data derived from specific tissues and species of interest, here ‘cerebellum’ and ‘Homo sapiens’, respectively.

all.hum <- read_cache(cache.pa, 'all.hum') # Retrieve data from cache.
if (is.null(all.hum)) { # Save downloaded data to cache if it is not cached.
  all.hum <- searchAtlasExperiments(properties="cerebellum", species="Homo sapiens")
  save_cache(dir=cache.pa, overwrite=TRUE, all.hum)
}

The search result contains 15 accessions. In the following code, the accession ‘E-GEOD-67196’ from Prudencio et al. (2015) has been chosen, which corresponds to an RNA-Seq profiling experiment of ‘cerebellum’ and ‘frontal cortex’ brain tissue from patients with amyotrophic lateral sclerosis (ALS).

all.hum[2, ]
## DataFrame with 1 row and 4 columns
##      Accession   Species      Type     Title
##    <character> <logical> <logical> <logical>
## 1 E-GEOD-35493        NA        NA        NA

The getAtlasData function allows to download the chosen RNA-Seq experiment from the Expression Atlas and import it into a RangedSummarizedExperiment object.

rse.hum <- read_cache(cache.pa, 'rse.hum') # Read data from cache.
if (is.null(rse.hum)) { # Save downloaded data to cache if it is not cached.
  rse.hum <- getAtlasData('E-GEOD-67196')[[1]][[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, rse.hum)
}

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.hum. The following returns only its first four rows and columns.

colData(rse.hum)[1:2, c(2, 4)]
## DataFrame with 2 rows and 2 columns
##                organism  organism_part
##             <character>    <character>
## SRR1927019 Homo sapiens     cerebellum
## SRR1927020 Homo sapiens frontal cortex

3.2.2 aSVG Image

The following example shows how to download from the above described SVG repositories an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded above. The return_feature function queries the repository for feature- and species-related keywords, here c('frontal cortex', 'cerebellum') and c('homo sapiens', 'brain'), respectively.

The remote data are downloaded before calling return_feature.

# Remote aSVG repos.
data(aSVG.remote.repo)
tmp.dir.ebi <- file.path(tmp.dir, 'ebi.zip')
tmp.dir.shm <- file.path(tmp.dir, 'shm.zip')
# Download the remote aSVG repos as zip files.
download.file(aSVG.remote.repo$ebi, tmp.dir.ebi)
download.file(aSVG.remote.repo$shm, tmp.dir.shm)
remote <- list(tmp.dir.ebi, tmp.dir.shm)

The downloaded aSVG repos are queried and returned aSVG files are saved in an empty directory (tmp.dir) to avoid overwriting of existing SVG files.

tmp.dir <- file.path(tempdir(check=TRUE), 'shm') # Empty directory. 
feature.df.hum <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), dir=tmp.dir, remote=remote) # Query aSVGs
feature.df.hum[1:8, ] # Return first 8 rows for checking
unique(feature.df.hum$SVG) # Return all matching aSVGs

To build this vignettes according to Bioconductor’s package requirements, the following code section uses aSVG file instances included in the spatialHeatmap package rather than the downloaded instances above.

svg.dir <- system.file("extdata/shinyApp/data", package="spatialHeatmap") # Directory of the aSVG collection in spatialHeatmap
feature.df.hum <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=NULL)

Note, the target tissues frontal cortex and cerebellum are included in both the experimental design slot of the downloaded expression data as well as the annotations of the aSVG. This way these features can be colored in the downstream SHM plots. If necessary users can also change from within R the feature identifiers in an aSVG (see Supplementary Section).

tail(feature.df.hum[, c('feature', 'stroke', 'SVG')], 3)
## # A tibble: 3 × 3
##   feature             stroke SVG                      
##   <chr>                <dbl> <chr>                    
## 1 cerebral.cortex      0.1   mus_musculus.brain_sp.svg
## 2 cerebellum           0.1   mus_musculus.brain_sp.svg
## 3 somatosensor.cortex  0.100 mus_musculus.brain_sp.svg

Among the returned aSVG files, homo_sapiens.brain.svg is chosen for creating SHMs. Since it is the same as the Quick Start, the aSVG stored in svg.hum is used in the downstream steps.

3.2.3 Experimental Design

To display ‘pretty’ sample names in columns and legends of downstream tables and plots respectively, the following example imports a ‘targets’ file that can be customized by users in a text program. The targets file content is used to replace the text in the colData slot of the RangedSummarizedExperiment object with a version containing shorter sample names for plotting purposes.

The custom targets file is imported and then loaded into colData slot of rse.hum. A slice of the simplified colData object is shown below.

hum.tar <- system.file('extdata/shinyApp/data/target_human.txt', package='spatialHeatmap')
target.hum <- read.table(hum.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(rse.hum) <- DataFrame(target.hum) # Loading to "colData"
colData(rse.hum)[c(1:2, 41:42), 4:5]
## DataFrame with 4 rows and 2 columns
##             organism_part     disease
##               <character> <character>
## SRR1927019     cerebellum         ALS
## SRR1927020 frontal cortex         ALS
## SRR1927059     cerebellum      normal
## SRR1927060 frontal cortex      normal

3.2.4 Preprocess Assay Data

The raw count gene expression data is stored in the assay slot of rse.hum. The following shows how to apply basic preprocessing routines on the count data, such as normalizing, aggregating replicates, and removing genes with unreliable expression responses, which are optional for plotting SHMs.

The norm_data function is developed to normalize RNA-seq raw count data. The following example uses the ESF normalization option due to its good time performance, which is estimateSizeFactors from DESeq2 (Love, Huber, and Anders 2014).

se.nor.hum <- norm_data(data=rse.hum, norm.fun='ESF', log2.trans=TRUE)

Replicates are aggregated with the summary statistics chosen under the aggr argument (e.g. aggr='mean'). The columns specifying replicates can be assigned to the sam.factor and con.factor arguments corresponding to samples and conditions, respectively. For tracking, the corresponding sample/condition labels are used as column titles in the aggregated assay instance, where they are concatenated with a double underscore as separator (Table 1).

se.aggr.hum <- aggr_rep(data=se.nor.hum, sam.factor='organism_part', con.factor='disease', aggr='mean')
assay(se.aggr.hum)[c(120, 49939, 49977), ]
Table 1: Table 2: Slice of aggregated expression matrix.
cerebellum__ALS frontal.cortex__ALS cerebellum__normal frontal.cortex__normal
ENSG00000006047 1.134172 5.2629629 0.5377534 5.3588310
ENSG00000268433 5.324064 0.3419665 3.4780744 0.1340332
ENSG00000268555 5.954572 2.6148548 4.9349736 2.0351776

The filtering example below retains genes with expression values larger than 5 (log2 space) in at least 1% of all samples (pOA=c(0.01, 5)), and a coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)). After that, the Ensembl gene ids are converted to UniProt ids with the function cvt_id.

se.fil.hum <- filter_data(data=se.aggr.hum, sam.factor='organism_part', con.factor='disease', pOA=c(0.01, 5), CV=c(0.3, 100))
se.fil.hum <- cvt_id(db='org.Hs.eg.db', data=se.fil.hum, from.id='ENSEMBL', to.id='SYMBOL')

3.2.5 SHM: Multiple Genes

Spatial features of interest can be subsetted with the function sub_sf by assigning their indexes (see below) to the argument show. In the following, ‘brain outline’, ‘prefrontal.cortex’, ‘frontal.cortex’, and ‘cerebellum’ are subsetted.

Next, for efficient data management and reusability the subset aSVG and assay data are stored in an SPHM object.

# Subsetting aSVG features.
svg.hum.sub <- sub_sf(svg.hum, show=c(64:132, 162:163, 164, 190:218))
tail(attribute(svg.hum.sub)[[1]][, 1:6], 3)
## # A tibble: 3 × 6
##   feature    id             fill  stroke sub.feature    index
##   <chr>      <chr>          <chr>  <dbl> <chr>          <int>
## 1 cerebellum UBERON_0002037 none   0.016 cerebellum_2_4   216
## 2 cerebellum UBERON_0002037 none   0.016 cerebellum_2_3   217
## 3 cerebellum UBERON_0002037 none   0.016 cerebellum_2_2   218
# Storing assay data and subsetted aSVG in an 'SPHM' object.
dat.hum <- SPHM(svg=svg.hum.sub, bulk=se.fil.hum)

SHMs for multiple genes can be plotted by providing the corresponding gene IDs under the ID argument as a character vector. The shm function will then sequentially arrange the SHMs for each gene in a single composite plot. To facilitate comparisons among expression values across genes and/or conditions, the lay.shm parameter can be assigned gene or con, respectively. For instance, in Figure 3 the SHMs of the genes OLFM4 and PRSS22 are organized by condition in a horizontal view. This functionality is particularly useful when comparing associated genes such as gene families.

res.hum <- shm(data=dat.hum, ID=c('OLFM4', 'PRSS22'), lay.shm='con', legend.r=1.5, legend.nrow=3, h=0.6)
<img src=“/tmp/RtmpWOku1i/Rbuild24534827cedb03/spatialHeatmap/vignettes/spatialHeatmap_files/figure-html/mul-1.png” alt=“SHMs of two genes. The subplots are organized by”condition“. Only cerebellum and frontal cortex are colored, because they are present in both the aSVG and the expression data.” width=“100%” class=“widefigure” />

Figure 3: SHMs of two genes
The subplots are organized by “condition”. Only cerebellum and frontal cortex are colored, because they are present in both the aSVG and the expression data.

In the above example, the normalized expression values of chosen genes are used to color the frontal cortex and cerebellum, where the different conditions, here normal and ALS, are given in separate SHMs. The color and feature mappings are defined by the corresponding color key and legend plot on the left and right, respectively.

By default, spatial features in assay data are mapped to their counterparts in aSVG according to the same identifiers on a one-to-one basis. However, the mapping can be customized, such as mapping a spatial feature in the data to a different or multiple counterparts in the aSVG. This advanced functionality is demonstrated in the Supplementary Section.

3.2.6 SHM: Other Selected Features

SHMs can be saved to interactive HTML files as well as video files. Each HTML file contains an interactive SHM with zooming in and out functionality. Hovering over graphics features will display data, gene, condition and other information. The video will play the SHM subplots in the order specified under the lay.shm argument.

The following example saves the interactive HTML and video files under the directory tmp.dir.

shm(data=dat.hum, ID=c('OLFM4', 'PRSS22'), lay.shm='con', legend.r=1.5, legend.nrow=3, h=0.6, aspr=2.3, animation.scale=0.7, bar.width=0.1, bar.value.size=4, out.dir=tmp.dir)

The following code saves individual SHMs into the same SVG file shm_hum.svg with the color scale and legend plot included.

res <- shm(data=dat.hum, ID=c('OLFM4', 'PRSS22'), lay.shm='con', legend.r=1.5, legend.nrow=3, h=0.5, aspr=2.3, animation.scale=0.7, bar.width=0.08, bar.value.size=12)
ggsave(file="./shm_hum.svg", plot=output(res)$spatial_heatmap, width=10, height=8) 

The following code exports each SHM (associated with a specific gene and condition) as separate SVG files in tmp.dir. In contrast to the original aSVG file, spatial features in the output SVG files are assigned heat colors.

write_svg(input=res, out.dir=tmp.dir)

A meta function plot_meta is developed as a wraper of individual steps necessary for plotting SHMs. The benefit of this function is creating SHMs with the Linux command line as shown below.

Rscript -e "spatialHeatmap::plot_meta(svg.path=system.file('extdata/shinyApp/data', 'mus_musculus.brain.svg', package='spatialHeatmap'), bulk=system.file('extdata/shinyApp/data', 'bulk_mouse_cocluster.rds', package='spatialHeatmap'), sam.factor='tissue', aggr='mean', ID=c('AI593442', 'Adora1'), ncol=1, bar.width=0.1, legend.nrow=5, h=0.6)"

3.2.7 SHM: Customization

To provide a high level of flexibility, many arguments are developed for shm. An overview of important arguments and their utility is provided in Table 3.

Table 3: Table 4: List of important argumnets of ‘shm’.
argument description
1 data An SPHM object containing assay data and aSVG (s).
2 sam.factor Applies to SummarizedExperiment (SE). Column name of sample replicates in colData slot. Default is NULL
3 con.factor Applies to SE. Column name of condition replicates in colData slot. Default is NULL
4 ID A character vector of row items for plotting spatial heatmaps
5 col.com A character vector of color components for building colour scale. Default is c(‘yellow’, ‘orange’,‘red’)
6 col.bar ‘selected’ or ‘all’, the former means use values of ID to build the colour scale while the latter use all values in data. Default is ‘selected’.
7 bar.width A numeric of colour bar width. Default is 0.7
8 trans.scale One of ‘log2’, ‘exp2’, ‘row’, ‘column’, or NULL, which means transform the data by ‘log2’ or ‘2-base expoent’, scale by ‘row’ or ‘column’, or no manipuation respectively.
9 ft.trans A vector of aSVG features to be transparent. Default is NULL.
10 legend.r A numeric to adjust the dimension of the legend plot. Default is 1. The larger, the higher ratio of width to height.
11 sub.title.size The title size of each spatial heatmap subplot. Default is 11.
12 lay.shm ‘gen’ or ‘con’, applies to multiple genes or conditions respectively. ‘gen’ means spatial heatmaps are organised by genes while ‘con’ organised by conditions. Default is ‘gen’
13 ncol The total column number of spatial heatmaps, not including legend plot. Default is 2.
14 ft.legend ‘identical’, ‘all’, or a vector of samples/features in aSVG to show in legend plot. ‘identical’ only shows matching features while ‘all’ shows all features.
15 legend.ncol, legend.nrow Two numbers of columns and rows of legend keys respectively. Default is NULL, NULL, since they are automatically set.
16 legend.position the position of legend keys (‘none’, ‘left’, ‘right’,‘bottom’, ‘top’), or two-element numeric vector. Default is ‘bottom’.
17 legend.key.size, legend.text.size The size of legend keys and labels respectively. Default is 0.5 and 8 respectively.
18 line.size, line.color The size and colour of all plogyon outlines respectively. Default is 0.2 and ‘grey70’ respectively.
19 verbose TRUE or FALSE. Default is TRUE and the aSVG features not mapped are printed to R console.
20 out.dir The directory to save HTML and video files of spatial heatmaps. Default is NULL.

3.3 Mouse Organs

This section generates an SHM plot for mouse data from the Expression Atlas. The code components are very similar to the previous Human Brain example. For brevity, the corresponding text explaining the code has been reduced to a minimum.

3.3.1 Gene Expression Data

The chosen mouse RNA-Seq data compares tissue level gene expression across mammalian species (Merkin et al. 2012). The following searches the Expression Atlas for expression data from ‘kidney’ and ‘Mus musculus’.

all.mus <- read_cache(cache.pa, 'all.mus') # Retrieve data from cache.
if (is.null(all.mus)) { # Save downloaded data to cache if it is not cached.
  all.mus <- searchAtlasExperiments(properties="kidney", species="Mus musculus")
  save_cache(dir=cache.pa, overwrite=TRUE, all.mus)
}

Among the many matching entries, accession ‘E-MTAB-2801’ will be downloaded.

all.mus[7, ]
rse.mus <- read_cache(cache.pa, 'rse.mus') # Read data from cache.
if (is.null(rse.mus)) { # Save downloaded data to cache if it is not cached.
  rse.mus <- getAtlasData('E-MTAB-2801')[[1]][[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, rse.mus)
}

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.mus. The following returns only its first three rows.

colData(rse.mus)[1:3, ]
## DataFrame with 3 rows and 4 columns
##           AtlasAssayGroup     organism organism_part      strain
##               <character>  <character>   <character> <character>
## SRR594393              g7 Mus musculus         brain      DBA/2J
## SRR594394             g21 Mus musculus         colon      DBA/2J
## SRR594395             g13 Mus musculus         heart      DBA/2J

3.3.2 aSVG Image

The following example shows how to retrieve from the remote SVG repositories an aSVG image that matches the tissues and species assayed in the downloaded data above. The sample data from Human Brain are used such as remote.

feature.df.mus <- return_feature(feature=c('heart', 'kidney'), species=c('Mus musculus'), dir=tmp.dir, remote=remote)

To meet the R/Bioconductor package requirements, the following uses aSVG file instances included in the spatialHeatmap package rather than the downloaded instances.

feature.df.mus <- return_feature(feature=c('heart', 'kidney'), species=NULL, dir=svg.dir, remote=NULL) 

Return the names of the matching aSVG files.

unique(feature.df.mus$SVG)
## [1] "gallus_gallus.svg"     "mus_musculus.male.svg"

The mus_musculus.male.svg instance is selected and imported.

svg.mus.pa <- system.file("extdata/shinyApp/data", "mus_musculus.male.svg", package="spatialHeatmap")
svg.mus <- read_svg(svg.mus.pa)

3.3.3 Experimental Design

A sample target file that is included in this package is imported and then loaded to the colData slot of rse.mus. To inspect its content, the first three rows are shown.

mus.tar <- system.file('extdata/shinyApp/data/target_mouse.txt', package='spatialHeatmap')
target.mus <- read.table(mus.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(rse.mus) <- DataFrame(target.mus) # Loading
target.mus[1:3, ]
##           AtlasAssayGroup     organism organism_part strain
## SRR594393              g7 Mus musculus         brain DBA.2J
## SRR594394             g21 Mus musculus         colon DBA.2J
## SRR594395             g13 Mus musculus         heart DBA.2J

3.3.4 Preprocess Assay Data

The raw RNA-Seq counts are preprocessed with the following steps: (1) normalization, (2) aggregation of replicates, and (3) filtering of un-reliable expression data. The details of these steps are explained in the sub-section of the Human Brain example.

rse.mus <- cvt_id(db='org.Mm.eg.db', data=rse.mus, from.id='ENSEMBL', to.id='SYMBOL', desc=TRUE) # Convert Ensembl ids to UniProt ids.  
se.nor.mus <- norm_data(data=rse.mus, norm.fun='CNF', log2.trans=TRUE) # Normalization
se.aggr.mus <- aggr_rep(data=se.nor.mus, sam.factor='organism_part', con.factor='strain', aggr='mean') # Aggregation of replicates
se.fil.mus <- filter_data(data=se.aggr.mus, sam.factor='organism_part', con.factor='strain', pOA=c(0.01, 5), CV=c(0.6, 100)) # Filtering of genes with low counts and variance 

3.3.5 SHM: Transparency

The pre-processed expression data for gene Scml2 is plotted in form of an SHM. In this case the plot includes expression data for 8 tissues across 3 mouse strains.

dat.mus <- SPHM(svg=svg.mus, bulk=se.fil.mus)
shm(data=dat.mus, ID=c('H19'), legend.width=0.7, legend.text.size=10, sub.title.size=9, ncol=3, ft.trans=c('skeletal muscle'), legend.ncol=2, line.size=0.2, line.color='grey70')
SHM of mouse organs. This is a multiple-layer image where the shapes of the 'skeletal muscle' is set transparent to expose 'lung' and 'heart'.

Figure 4: SHM of mouse organs
This is a multiple-layer image where the shapes of the ‘skeletal muscle’ is set transparent to expose ‘lung’ and ‘heart’.

The SHM plots in Figures 4 and below demonstrate the usage of the transparency feature via the ft.trans parameter. The corresponding mouse organ aSVG image includes overlapping tissue layers. In this case the skelectal muscle layer partially overlaps with lung and heart tissues. To view lung and heart in Figure 4, the skelectal muscle tissue is set transparent with ft.trans=c('skeletal muscle').

To fine control the visual effects in feature rich aSVGs, the line.size and line.color parameters are useful. This way one can adjust the thickness and color of complex structures.

gg <- shm(data=dat.mus, ID=c('H19'), legend.text.size=10, sub.title.size=9, ncol=3, legend.ncol=2, line.size=0.1, line.color='grey70')
SHM of mouse organs. This is a multiple-layer image where the view onto 'lung' and 'heart' is obstructed by displaying the 'skeletal muscle' tissue.

Figure 5: SHM of mouse organs
This is a multiple-layer image where the view onto ‘lung’ and ‘heart’ is obstructed by displaying the ‘skeletal muscle’ tissue.

A third example on real data from Expression Atlas is SHMs of time series across chicken organs. Since the procedures are the same with the examples above, this example is illustrated in the Supplementary Section.

3.4 Arabidopsis Shoot

This section generates an SHM for Arabidopsis thaliana tissues with gene expression data from the Affymetrix microarray technology. The chosen experiment used ribosome-associated mRNAs from several cell populations of shoots and roots that were exposed to hypoxia stress (Mustroph et al. 2009). In this case the expression data will be downloaded from GEO with utilites from the GEOquery package (Davis and Meltzer 2007). The data preprocessing routines are specific to the Affymetrix technology. The remaining code components for generating SHMs are very similar to the previous examples. For brevity, the text in this section explains mainly the steps that are specific to this data set.

3.4.1 Gene Expression Data

The GSE14502 data set is downloaded with the getGEO function from the GEOquery package. Intermediately, the expression data is stored in an ExpressionSet container (Huber et al. 2015), and then converted to a SummarizedExperiment object.

gset <- read_cache(cache.pa, 'gset') # Retrieve data from cache.
if (is.null(gset)) { # Save downloaded data to cache if it is not cached.
  gset <- getGEO("GSE14502", GSEMatrix=TRUE, getGPL=TRUE)[[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, gset)
}
se.sh <- as(gset, "SummarizedExperiment")

The gene symbol identifiers are extracted from the rowData component to be used as row names. Similarly, one can work with AGI identifiers by providing below AGI under Gene.Symbol.

rownames(se.sh) <- make.names(rowData(se.sh)[, 'Gene.Symbol'])

A slice of the experimental design stored in the colData slot is returned. Both the samples and treatments are contained in the title column. The samples are indicated by corresponding promoters (pGL2, pCO2, pSCR, pWOL, p35S) and treatments include control and hypoxia.

colData(se.sh)[60:63, 1:2]
## DataFrame with 4 rows and 2 columns
##                            title geo_accession
##                      <character>   <character>
## GSM362227 shoot_hypoxia_pGL2_r..     GSM362227
## GSM362228 shoot_hypoxia_pGL2_r..     GSM362228
## GSM362229 shoot_control_pRBCS_..     GSM362229
## GSM362230 shoot_control_pRBCS_..     GSM362230

3.4.2 aSVG Image

In this example, the aSVG image has been generated in Inkscape from the corresponding figure in Mustroph et al. (2009). Detailed instructions for generating custom aSVG images are provided in the SVG tutorial. The resulting custom aSVG file ‘arabidopsis.thaliana_shoot_shm.svg’ is included in the spatialHeatmap package and imported as below.

svg.sh.pa <- system.file("extdata/shinyApp/data", "arabidopsis.thaliana_shoot_shm.svg", package="spatialHeatmap")
svg.sh <- read_svg(svg.sh.pa)

3.4.3 Experimental Design

A sample target file that is included in this package is imported and then loaded to the colData slot of se.sh. To inspect its content, four selected rows are returned.

sh.tar <- system.file('extdata/shinyApp/data/target_arab.txt', package='spatialHeatmap')
target.sh <- read.table(sh.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(se.sh) <- DataFrame(target.sh) # Loading
target.sh[60:63, ]
##                           col.name     samples conditions
## shoot_hypoxia_pGL2_rep1  GSM362227  shoot_pGL2    hypoxia
## shoot_hypoxia_pGL2_rep2  GSM362228  shoot_pGL2    hypoxia
## shoot_control_pRBCS_rep1 GSM362229 shoot_pRBCS    control
## shoot_control_pRBCS_rep2 GSM362230 shoot_pRBCS    control

3.4.4 Preprocess Assay Data

The downloaded GSE14502 data set has already been normalized with the RMA algorithm (Gautier et al. 2004). Thus, the pre-processing steps can be restricted to replicate aggregation and filtering.

se.aggr.sh <- aggr_rep(data=se.sh, sam.factor='samples', con.factor='conditions', aggr='mean') # Replicate agggregation using mean
se.fil.arab <- filter_data(data=se.aggr.sh, sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100)) # Filtering of genes with low intensities and variance

3.4.5 SHM: Microarray

The expression profile for the HRE2 gene is plotted for the control and the hypoxia treatment across six cell types (Figure 6).

dat.sh <- SPHM(svg=svg.sh, bulk=se.fil.arab)
shm(data=dat.sh, ID=c("HRE2"), legend.ncol=2, legend.text.size=10, legend.key.size=0.02)
SHM of Arabidopsis shoots. The expression profile of the HRE2 gene is plotted for control and hypoxia treatment across six cell types.

Figure 6: SHM of Arabidopsis shoots
The expression profile of the HRE2 gene is plotted for control and hypoxia treatment across six cell types.

3.5 Superimposing raster and vector graphics

spatialHeatmap allows to superimpose raster images with vector-based SHMs. This way one can generate SHMs that resemble photographic representations of tissues, organs or entire organisms. For this to work the shapes represented in the vector-graphics need to be an aligned carbon copy of the raster image. Supported file formats for the raster image are JPG/JPEG and PNG, and for the vector image it is SVG. Matching raster and vector graphics are indicated by identical base names in their file names (e.g. imageA.png and imageA.svg). The layout order in SHMs composed of multiple independent images can be controlled by numbering the corresponding file pairs accordingly such as imageA_1.png and imageA_1.svg, imageA_2.png and imageA_2.svg, etc.

In the following example, the required image pairs have been pre-generated from a study on abaxial bundle sheath (ABS) cells in maize leaves (Bezrutczyk et al. 2021). Their file names are labeled 1 and 2 to indicate two developmental stages.

Import paths of first png/svg image pair:

raster.pa1 <- system.file('extdata/shinyApp/data/maize_leaf_shm1.png', package='spatialHeatmap')
svg.pa1 <- system.file('extdata/shinyApp/data/maize_leaf_shm1.svg', package='spatialHeatmap')

Import paths of second png/svg image pair:

raster.pa2 <- system.file('extdata/shinyApp/data/maize_leaf_shm2.png', package='spatialHeatmap')
svg.pa2 <- system.file('extdata/shinyApp/data/maize_leaf_shm2.svg', package='spatialHeatmap')

The two pairs of png/svg images are imported in the SVG container svg.overlay.

svg.overlay <- read_svg(svg.path=c(svg.pa1, svg.pa2), raster.path=c(raster.pa1, raster.pa2))

A slice of attributes in the first aSVG instance is shown.

attribute(svg.overlay)[[1]][1:3, ]
## # A tibble: 3 × 10
##   feature id      fill    stroke sub.feature index element parent    index.all
##   <chr>   <chr>   <chr>    <dbl> <chr>       <int> <chr>   <chr>         <int>
## 1 rect817 rect817 none    0.0843 rect817         1 rect    container         1
## 2 cell1   cell1   #98f0aa 0      path819         2 g       container         2
## 3 cell1   cell1   #98f0aa 0      path821         3 g       container         2
## # ℹ 1 more variable: index.sub <int>

Create random quantitative assay data.

df.ovl <- data.frame(matrix(runif(6, min=0, max=5), nrow=3))
colnames(df.ovl) <- c('cell1', 'cell2') # Assign column names.
rownames(df.ovl) <- paste0('gene', seq_len(3)) # Assign row names 
df.ovl[1:2, ]
##          cell1       cell2
## gene1 1.637970 3.788981771
## gene2 1.850373 0.009639693

To minimize masking of the features in the SHMs by dense regions in the raster images, the alpha.overlay argument allows to adjust the transparency level. In Figure 7, the spatial features of interest are superimposed onto the raster image.

dat.over <- SPHM(svg=svg.overlay, bulk=df.ovl)
shm(data=dat.over, charcoal=FALSE, ID=c('gene1'), alpha.overlay=0.5, bar.width=0.09, sub.title.vjust=4, legend.r=0.2)
Superimposing raster images with SHMs (colorful backaground). The expression profiles of gene1 are plotted on ABS cells.

Figure 7: Superimposing raster images with SHMs (colorful backaground)
The expression profiles of gene1 are plotted on ABS cells.

Another option for reducing masking effects is to display the raster image in black and white by setting charcoal=TRUE (Figure 8).

shm(data=dat.over, charcoal=TRUE, ID=c('gene1'), alpha.overlay=0.5, bar.width=0.09, sub.title.vjust=4, legend.r=0.2)