User’s Guide

Simon Cabello-Aguilar¹, Jacques Colinge¹

¹ Institut de Recherche en Cancérologie de Montpellier, Inserm, Montpellier, France ; Institut régional du Cancer Montpellier, Montpellier, France ; Université de Montpellier, Montpellier, France

Introduction

This guide provides an overview of the SingleCellSignalR package, a comprehensive framework to obtain cellular network maps from scRNA-seq data. SingleCellSignalR comes with a complete pipeline integrating existing methods to cluster individual cell transcriptomes and identify cell subpopulations as well as novel cellular network-specific algorithms. More advanced users can substitute their own logic or alternative tools at various stages of data processing. SingleCellSignalR also maps cell subpopulation internal network linked to genes of interest through the integration of regulated KEGG and Reactome pathways together with ligands and receptors involved in inferred cell-cell interactions. The cellular networks can be exported in text files and graphML objects to be further explored with Cytoscape (www.cytoscape.org), yEd (www.yworks.com), or similar software tools.

Quick Start

Independent of the chosen scRNA-seq platform, deep or shallower, data comes as a table of read or unique molecule identifier (UMI) counts, one column per individual cell and one row per gene. Initial processing is required to prepare such data for subsequent analysis and we decided to propose a generic solution for the sake of convenience, though users can easily substitute their own computations. Gene names (HUGO symbols) are provided in the first column of the table.

Each analysis is organized around a working directory (or project folder):

The file containing the read counts should be placed in the working directory.

> file <- "example_data.txt"

Data processing can then start:

> data <- data_prepare(file = file)
log-Normalization
14223 genes
400 cells
Zero rate = 89.6%

The data_prepare() function eliminates non expressed genes before performing read count normalization.

Normalized data are submitted to a clustering algorithm to identify cell subpopulations:

clust <- clustering(data = data, n.cluster = 4, n = 10, method = "simlr",write = FALSE,pdf=FALSE)

#> 4 clusters detected
#> cluster 1 -> 225 cells
#> cluster 2 -> 66 cells
#> cluster 3 -> 10 cells
#> cluster 4 -> 99 cells

We set the method argument to simlr, which caused the SIMLR() function of the SIMLR package [1] to be used. The SIMLR_Estimate_Number_of_Clusters() function determined the number of clusters, between 2 and n (n=10 above).

Next, differentially expressed genes in one cluster compared to the others are identified using the cluster_analysis() function, which relies on edgeR. A result table is automatically created in the cluster-analysis folder:

clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE)

Once the preliminary steps illustrated above are completed, SingleCellSignalR can be used to generate cellular interaction lists using the cell_signaling() function:

signal <- cell_signaling(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE)
#> No such file as table_dge_cluster 1.txt in the cluster-analysis folder
#> No such file as table_dge_cluster 2.txt in the cluster-analysis folder
#> No such file as table_dge_cluster 3.txt in the cluster-analysis folder
#> No such file as table_dge_cluster 4.txt in the cluster-analysis folder
#> Paracrine signaling: 
#> Checking for signaling between cell types
#> 6 interactions from cluster 1 to cluster 2
#> 6 interactions from cluster 1 to cluster 3
#> 2 interactions from cluster 1 to cluster 4
#> 2 interactions from cluster 2 to cluster 1
#> 24 interactions from cluster 2 to cluster 3
#> 18 interactions from cluster 2 to cluster 4
#> 25 interactions from cluster 3 to cluster 1
#> 49 interactions from cluster 3 to cluster 2
#> 36 interactions from cluster 3 to cluster 4
#> 2 interactions from cluster 4 to cluster 1
#> 21 interactions from cluster 4 to cluster 2
#> 17 interactions from cluster 4 to cluster 3

An intercellular network can also be generated to map the overall ligand/receptor interactions invoking the inter_network() function:

inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = clust$cluster, write = FALSE)
#> Doing cluster 1 and cluster 2 ... OK
#> Doing cluster 1 and cluster 3 ... OK
#> Doing cluster 1 and cluster 4 ... OK
#> Doing cluster 2 and cluster 1 ... OK
#> Doing cluster 2 and cluster 3 ... OK
#> Doing cluster 2 and cluster 4 ... OK
#> Doing cluster 3 and cluster 1 ... OK
#> Doing cluster 3 and cluster 2 ... OK
#> Doing cluster 3 and cluster 4 ... OK
#> Doing cluster 4 and cluster 1 ... OK
#> Doing cluster 4 and cluster 2 ... OK
#> Doing cluster 4 and cluster 3 ... OK

At this point the intercellular network have been generated and exported in text and graphML formats in the networks folder.
A summary of the interactions between cell clusters can be output in the form of a chord diagram by the visualize_interactions() function:

visualize_interactions(signal = signal)

This function will create a plot in the R plot window.

The details of the interactions between two clusters, for example cluster 1 and 2, can also be shown in the plot window with the visualize_interactions() function. Note that in the example below we ask for the display of two pairs of cell clusters, pair 1 that contains interactions from cluster 1 to 2, and pair 4 from cluster 2 to 1. (names(signal) returns the cell cluster names in each pair, see function visualize_interactions() details.)

visualize_interactions(signal = signal,show.in=c(1,4))

And these plots can be saved into pdf files in the images folder using the write.in argument of the visualize_interactions() function.

> visualize_interactions(signal = signal,write.in=c(1,4))

red
red
red

Examples of use

SingleCellSignalR package functions have many arguments parameters that can be changed by the user to fit her needs (see Reference Manual for more details). Furthermore, several handy functions that were not illustrated above are provided to generate additional plots or reports.

Exploiting the cell_classifier clustering

After running the example in the Quick Start section, the user can define cell clusters after the output of the cell_classifier(). The demo data set is comprised of a subset of the 10x PBMC dataset [3], i.e. immune cells. The t-SNE map calculated with the clustering() function will also be used. For this example we will set the plot.details argument to TRUE to monitor the choice of the threshold of gene signature scores.

class = cell_classifier(data=data, genes=rownames(data), markers = markers(c("immune")), tsne=clust$`t-SNE`,plot.details=TRUE,write = FALSE)

Let us use the cell clustering obtained with the cell_classifier() function. Although “undefined” cells may be interesting in some cases, here they form a heterogeneous cluster because they represent cells that seem to be in a transition between two states (“T-cells” and “Cytotoxic cells”, or “Neutrophils” and “Macrophages”, see heatmap above). We discard these cells.

# Define the cluster vector and the cluster names 
cluster <- class$cluster
c.names <- class$c.names

# Remove undefined cells 
data <- data[,cluster!=(max(cluster))]
tsne <- clust$`t-SNE`[cluster!=(max(cluster)),]
c.names <- c.names[-max(cluster)]
cluster <- cluster[cluster!=(max(cluster))]

Then the analysis can be carried on.

clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = cluster, c.names = c.names, write = FALSE)
#> edgeR differential gene expression (dge) processing:
#> Looking for differentially expressed genes in T-cells
#> Looking for differentially expressed genes in B-cells
#> Looking for differentially expressed genes in Macrophages
#> Looking for differentially expressed genes in Cytotoxic cells
#> Looking for differentially expressed genes in Neutrophils

Once the cluster analysis is done, the cell_signaling(), inter_network() functions can be used.

signal <- cell_signaling(data = data, genes = genes, cluster = cluster, c.names = c.names, write = FALSE)
#> No such file as table_dge_T-cells.txt in the cluster-analysis folder
#> No such file as table_dge_B-cells.txt in the cluster-analysis folder
#> No such file as table_dge_Macrophages.txt in the cluster-analysis folder
#> No such file as table_dge_Cytotoxic cells.txt in the cluster-analysis folder
#> No such file as table_dge_Neutrophils.txt in the cluster-analysis folder
#> Paracrine signaling: 
#> Checking for signaling between cell types
#> 10 interactions from T-cells to B-cells
#> 20 interactions from T-cells to Macrophages
#> 20 interactions from T-cells to Cytotoxic cells
#> 24 interactions from T-cells to Neutrophils
#> 5 interactions from B-cells to T-cells
#> 26 interactions from B-cells to Macrophages
#> 19 interactions from B-cells to Cytotoxic cells
#> 29 interactions from B-cells to Neutrophils
#> 2 interactions from Macrophages to T-cells
#> 11 interactions from Macrophages to B-cells
#> 20 interactions from Macrophages to Cytotoxic cells
#> 1 interactions from Macrophages to Neutrophils
#> 2 interactions from Cytotoxic cells to T-cells
#> 4 interactions from Cytotoxic cells to B-cells
#> 21 interactions from Cytotoxic cells to Macrophages
#> 21 interactions from Cytotoxic cells to Neutrophils
#> 7 interactions from Neutrophils to T-cells
#> 12 interactions from Neutrophils to B-cells
#> 7 interactions from Neutrophils to Macrophages
#> 29 interactions from Neutrophils to Cytotoxic cells

inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = cluster, write = FALSE)
#> Doing T-cells and B-cells ... OK
#> Doing T-cells and Macrophages ... OK
#> Doing T-cells and Cytotoxic cells ... OK
#> Doing T-cells and Neutrophils ... OK
#> Doing B-cells and T-cells ... OK
#> Doing B-cells and Macrophages ... OK
#> Doing B-cells and Cytotoxic cells ... OK
#> Doing B-cells and Neutrophils ... OK
#> Doing Macrophages and T-cells ... OK
#> Doing Macrophages and B-cells ... OK
#> Doing Macrophages and Cytotoxic cells ... OK
#> Doing Macrophages and Neutrophils ... OK
#> Doing Cytotoxic cells and T-cells ... OK
#> Doing Cytotoxic cells and B-cells ... OK
#> Doing Cytotoxic cells and Macrophages ... OK
#> Doing Cytotoxic cells and Neutrophils ... OK
#> Doing Neutrophils and T-cells ... OK
#> Doing Neutrophils and B-cells ... OK
#> Doing Neutrophils and Macrophages ... OK
#> Doing Neutrophils and Cytotoxic cells ... OK

If we take a look at signal[[6]] (or signal[["B-cells-Macrophages"]])

signal[[6]]
#>      B-cells Macrophages interaction type   LRscore
#> 2589   RPS19       C5AR1        paracrine 0.9154102
#> 1587 HSP90B1       ASGR1        paracrine 0.8673637
#> 1589 HSP90B1        LRP1        paracrine 0.8647858
#> 2599  RPS27A        TLR4        paracrine 0.8373524
#> 1592 HSP90B1        TLR4        paracrine 0.8035628
#> 452     CALR        LRP1        paracrine 0.7773343
#> 1428   GNAI2       C5AR1        paracrine 0.7567524
#> 2969   UBA52        TLR4        paracrine 0.7487672
#> 2989     UBC        TLR4        paracrine 0.7366246
#> 2510    PTMA       VIPR1        paracrine 0.7313738
#> 3230    CD99       PILRA        paracrine 0.7095773
#> 2049     LTB        LTBR        paracrine 0.7049480
#> 228      B2M         HFE        paracrine 0.7007288
#> 1441   GNAI2        FPR1        paracrine 0.6997901
#> 2034  LRPAP1        LRP1        paracrine 0.6561746
#> 1568   HMGB1        TLR4        paracrine 0.6542829
#> 2979     UBB        TLR4        paracrine 0.6505954
#> 224      B2M        CD1B        paracrine 0.6234452
#> 235      B2M       KLRD1        paracrine 0.6234452
#> 1473    GNAS       VIPR1        paracrine 0.6184186
#> 2457    PSAP        LRP1        paracrine 0.6135906
#> 1471    GNAS       PTGIR        paracrine 0.6071115
#> 2954   UBA52       ACVR1        paracrine 0.6027855
#> 2964   UBA52      NOTCH1        paracrine 0.6027855
#> 1567   HMGB1        THBD        paracrine 0.5851681
#> 626     CD55      ADGRE2        paracrine 0.5301838

We can be interested in genes participating in pathways with a receptor of interest inside a cluster of interest. Let us say ASGR1 in “Macrophages”.

intra = intra_network(goi = "ASGR1",data = data,genes = rownames(data),cluster = cluster, coi = "Macrophages", c.names = c.names, signal = signal,write=FALSE)

Now, let us take an overview of the signaling between the cell types.

visualize_interactions(signal)

Let us get deeper and look at the signaling between “T-cells” and “B-cells” for example.

visualize_interactions(signal, show.in=c(1,6))

The following command will save these plots in the images folder.

> visualize_interactions(signal, write.in=c(1,6))

Marker analysis on a cancer dataset

For this example we use the scRNAseq dataset from Tirosh et al. [4]. We use only the data from patient 80.

> file <- "patient_80.txt"
> data <- data_prepare(file = file)
log-Normalization
19452 genes
480 cells
Zero rate = 75.5%

Remark: One can notice that the zero rate is lower than in the previous example which reflects the fact that the sequencing is deeper.
We know that this dataset is composed of melanoma cells and their microenvironment, we hence define our markers table using the markers() function.

> my.markers <- markers(category = c("immune", "tme", "melanoma"))
> head(my.markers)
  T-cells B-cells Macrophages Cytotoxic cells      DC Mast cells Neutrophils NK cells  Treg Endothelial cells   CAFs melanoma
1     CD2    CD19       CD163            PRF1   CCL13      TPSB2        FPR1     XCL1 FOXP3            PECAM1    FAP      MIA
2    CD3D   CD79A        CD14            GZMA   CD209     TPSAB1     SIGLEC5     XCL2                     VWF   THY1      TYR
3    CD3E   CD79B       CSF1R            GZMB HSD11B1       CPA3       CSF3R     NCR1                    CDH5    DCN  SLC45A2
4    CD3G     BLK        C1QC            NKG7              MS4A2        FCAR  KIR2DL3                   CLDN5 COL1A1    CDH19
5    CD8A   MS4A1       VSIG4            GZMH                HDC      FCGR3B  KIR3DL1                   PLVAP COL1A2     PMEL
6   SIRPG   BANK1        C1QA           KLRK1                        CEACAM3  KIR3DL2                   ECSCR COL6A1  SLC24A5

Let us perform the clustering. For this example, we set the method argument to “kmeans” and the n argument to 12.

> clust <- clustering(data = data,n = 12, method = "kmeans")
Estimating the number of clusters
Estimated number of clusters = 6
6 clusters detected
cluster 1 -> 157 cells
cluster 2 -> 15 cells
cluster 3 -> 137 cells
cluster 4 -> 6 cells
cluster 5 -> 114 cells
cluster 6 -> 51 cells

Now we take advantage of the markers argument of the cluster_analysis() function using my.markers obtained above with the markers() function.

> clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, markers = my.markers)
edgeR differential gene expression (dge) processing:
Looking for differentially expressed genes in cluster 1
Looking for differentially expressed genes in cluster 2
Looking for differentially expressed genes in cluster 3
Looking for differentially expressed genes in cluster 4
Looking for differentially expressed genes in cluster 5
Looking for differentially expressed genes in cluster 6

We can see that the clusters 2 and 5 are well defined, they are respectively cancer associated fibroblasts (CAFs) and melanoma cells. The cluster 6 is also clearly composed of endothelial cells. Clusters 1 and 2 are immune cells but the clustering did not succeed in sorting them correctly and cluster 4 counts only 6 cells. Those do not seem to be homogeneous and we decide to remove them.

> data <- data[,clust$cluster!=4]
> cluster <- clust$cluster[clust$cluster!=4]
> cluster[cluster>4] <- cluster[cluster>4] - 1

Then we can name our clusters manually before pursuing the analysis.

> c.names <- c("Immune 1", "CAFs", "Immune 2", "melanoma", "endothelial")
> signal <- cell_signaling(data = data, genes = rownames(data), cluster = cluster, c.names = c.names)
Paracrine signaling: 
Checking for signaling between cell types
78 interactions from Immune 1 to CAFs
30 interactions from Immune 1 to melanoma
11 interactions from Immune 1 to endothelial
67 interactions from CAFs to Immune 1
85 interactions from CAFs to Immune 2
86 interactions from CAFs to melanoma
78 interactions from CAFs to endothelial
84 interactions from Immune 2 to CAFs
55 interactions from Immune 2 to melanoma
44 interactions from Immune 2 to endothelial
12 interactions from melanoma to Immune 1
33 interactions from melanoma to CAFs
19 interactions from melanoma to Immune 2
12 interactions from melanoma to endothelial
53 interactions from endothelial to Immune 1
33 interactions from endothelial to CAFs
69 interactions from endothelial to Immune 2
28 interactions from endothelial to melanoma

Remark: the names of the dge tables in the cluster_analysis folder must be changed according to the cluster names (c.names).
And now visualize!

> visualize_interactions(signal, show.in = c(12,18))

Remark: We observe that in the chord diagrams above, the “specific” interactions were highlighted with a thick black line.
Let us look at one of these specific interactions using the expression_plot_2() function.

> expression_plot_2(data,"NID1","COL13A1",clust$`t-SNE`)

red
red
red

Thank you for reading this guide and for using SingleCellSignalR.

References

Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods. 2017;14:414-6.
McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40:4288-97.
8k PBMCs from a Healthy Donor [Internet]. 2017. Available from: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k
Tirosh I, Izar B, Prakadan SM, Wadsworth MH, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352:189-96.

SingleCellSignalR