1 Introduction

1.1 Overview

The primary utility of the spatialHeatmap package is the generation of spatial heatmaps (SHM) for visualizing cell-, tissue- and organ-specific abundance patterns of biological molecules (e.g. RNAs) in anatomical images (Zhang et al. 2022). This is useful for identifying molecules with spatially enriched (SE) abundance patterns as well as clusters and/or network modules composed of molecules sharing similar abundance patterns such as similar gene expression patterns. These functionalities are introduced in the main vignette of the spatialHeatmap package. The following describes extended functionalities for integrating tissue with single cell data by co-visualizing them in composite plots that combine spatial heatmaps with embedding plots of high-dimensional data. The resulting spatial context information is important for gaining insights into the tissue-level organization of single cell data.

The required quantitative assay data, such as gene expression values, can be provided in a variety of widely used tabular data structures (e.g. data.frame, SummarizedExperiment or SingleCellExperiment). The corresponding anatomic images need to be supplied as annotated SVG (aSVG) images and can be stored in a specific S4 class SVG. The creation of aSVGs is described in the main vignette of this package. For the embedding plots of single cell data, several dimensionality reduction algorithms (e.g. PCA, UMAP or tSNE) are supported. To associate single cells with their source tissues, the user can choose among three major methods including annotation-based, manual and automated methods (Figure 1). Similar to other functionalities in spatialHeatmap, these functionalities are available within R as well as the corresponding Shiny app (Chang et al. 2021).

1.2 Methods for Associating Cells and Bulk Tissues

To co-visualize single cell data with tissue features (Figure 1), the individual cells of the single cell data are mapped via their group labels to the corresponding tissue features in an aSVG image. If the feature labels in an aSVG are different than the corresponding group labels used for the single cell data, e.g. due to variable terminologies, a translation map can be used to avoid manual relabelling. Throughout this vignette the usage of the term feature is a generalization referring in most cases to tissues or organs. For the implementation of the co-visualization tool, spatialHeatmap takes advantage of efficient and reusable S4 classes for both assay data and aSVGs respectively. The former includes the Bioconductor core data structures such as the widely used SingleCellExperiment (SCE) container illustrated in Figure 1.1 (Amezquita et al. 2020). The slots assays, colData, rowData and reducedDims in an SCE contain expression data, cell metadata, molecule metadata and reduced dimensionality embedding results, respectively. The cell group labels are stored in the colData slot as shown in Figure 1.1. The S4 class SVG (Figure 1.3) is developed specifically in spatialHeatmap for storing aSVG instances. The two most important slots coordinate and attribute stores the aSVG feature coordinates and respective attributes such as fill color, line withs, etc. respectively, while other slots dimension, svg, and raster stores image dimension, aSVG file paths, and raster image paths respectively. For handling cell-to-tissue grouping information, three general methods are available including (a) annotation-based, (b) manual and (c) automated. The annotation-based and manual methods are similar by using known cell group labels. The main difference is how the cell labels are provided. In the annotation-based method, existing group labels are available and can be uploaded and/or stored in the SCE object, as is the case in some of the SCE instances provided by the scRNAseq package (Risso and Cole 2022). The manual method allows users to create the cell to tissue associations one-by-one or import them from a tabular file. In contrast to this, the automated method aims to assign single cells to the corresponding source tissues computationally by a co-clustering algorithm (Figure 8). This co-clustering is experimental and requires bulk expression data that are obtained from the tissues represented in the single cell data. The grouping information is visualized by using for each group the same color in both the single cell embedding plot and the tissue spatial heatmap plot (Figure 1.5). The colors can represent any type of custom or numeric information. In a typical use case, either fixed tissue-specific colors or a heat color gradient is used that is proportional to the numeric expression information obtained from the single cell or bulk expression data of a chosen gene. When the expression values among groups are very similar, toggling between the two coloring option is important to track the tissue origin in the single cell data. To color by single cell data, one often wants to first summarize the expression of a given gene across the cells within each group via a meaningful summary statistics, such as mean or median. Cells and tissues with the same group label will be colored the same. When coloring by tissues the color used for each tissue feature will be applied to the corresponding cell groups represented in the embedding plot.