# 1 Introduction

Cardinal 2 provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.

Before statistical analysis, it is important to identify the statistical goal of the experiment:

• Unsupervised analysis. The data has no class labels or conditions, and we are interested in exploratory analysis to discover regions of interest in the data.

• Supervised analysis. The data has class labels and we want to train a statistical or machine learning model to predict the class labels of new data.

• Class comparison. The data has class labels or conditions, and we want to test whether the abundance of the mass features is different between conditions.

CardinalWorkflows provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.

# 2 Exploratory analysis

Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.

``````set.seed(2020)
mse <- simulateImage(preset=2, npeaks=10, dim=c(20,20), sdnoise=0.5,
peakheight=c(2,4), representation="centroid")

design <- makeFactor(circle=mse\$circle, square=mse\$square,
bg=!(mse\$circle | mse\$square))

image(mse, design ~ x * y, key=TRUE)`````` ``image(mse, feature=c(1,4,7), layout=c(1,3))`` ## 2.1 Principal components analysis (PCA)

Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of “principal components” that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.

Use `PCA()` to perform PCA on a `MSImagingExperiment`.

``````pca <- PCA(mse, ncomp=3)

summary(pca)``````
``````## Principal components analysis:
##
##   Component Standard deviation
## 1         1          4.7061494
## 2         2          2.6134145
## 3         3          0.6136734``````

We can see that the first two principal components explain most of the variation in the data.

``image(pca, values="scores", superpose=FALSE, layout=c(1,3))`` The loadings of the components show how each mass feature contributes to each component.

``plot(pca, values="loadings", superpose=FALSE, layout=c(1,3), lwd=2)`` Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

``````pca_scores <- DataFrame(resultData(pca, 1, "scores"))

plot(pca_scores, PC1 ~ PC2, groups=design, pch=20)`````` ## 2.2 Feature colocalization

Finding other mass features colocalized with a particular image is a common task in analysis of MS imaging experiments.

Use `colocalize()` to find mass features that are colocalized with another image.

``````coloc <- colocalized(mse, mz=1023)
coloc``````
``````## Colocalized features:
##           mz   circle   square correlation    M1    M2
## 1  1023.7081 2.011661 4.063644   1.0000000 1.000 1.000
## 2  1135.9335 2.434873 3.985370   0.9430259 0.875 0.875
## 3  1200.4653 2.219637 4.166854   0.9292093 0.865 0.865
## 4  1361.2682 0.000000 4.259568   0.6712111 0.710 0.710
## 5  1227.9380 0.000000 4.039750   0.6671688 0.675 0.675
## 6  1453.5096 0.000000 4.187344   0.6657311 0.695 0.695
## 7  1858.8985 0.000000 3.970513   0.6620943 0.705 0.705
## 8   781.2367 1.392247 0.000000   0.3891237 0.650 0.650
## 9   473.9206 2.340799 0.000000   0.3632409 0.600 0.600
## 10  788.8633 1.542205 0.000000   0.3378016 0.605 0.605``````

By default, Pearson correlation is used to rank the colocalized features. Manders’ colocalization coefficients (M1 and M2) are also provided.

``image(mse, mz=coloc\$mz[1:3], layout=c(1,3))`` # 3 Image segmentation

Segmentation (clustering) a dataset is a useful way to summarize an MS imaging experiment and discover regions of interest within the sample.

## 3.1 Spatial shrunken centroids clustering

Spatially-aware nearest shrunken centroids clustering allows simultaneous image segmentation and feature selection.

A smoothing radius `r`, initial number of clusters `k`, and sparsity parameters `s` must be provided.

The larger the sparsity parameter `s`, the fewer mass features will contribute to the segmentation.

Spatial shrunken centroids may result in fewer clusters than the initial number of clusters `k`, so it is recommended to use a value for `k` that is larger than the expected number of clusters, and allow the method to automatically choose the number of clusters.

``````ssc <- spatialShrunkenCentroids(mse, r=1, k=5, s=c(0,3,6,9))

summary(ssc)``````
``````## Spatially-aware nearest shrunken centroids:
##
##  Segmentation / clustering
##  Method = gaussian
##  Distance = chebyshev
##
##   Radius (r) Init (k) Shrinkage (s) Classes Features/Class
## 1          1        5             0       4          10.00
## 2          1        5             3       3          10.00
## 3          1        5             6       3           8.67
## 4          1        5             9       3           7.33``````

Plotting the predicted cluster probabilities shows a clear segmentation into the ground truth image.

``image(ssc, model=list(s=9), values="probability")`` Spatial shrunken centroids calculates t-statistics for each segment and each mass feature. These t-statistics a measure of the difference between the cluster center and the global mean.

``plot(ssc, model=list(s=9), values="statistic", lwd=2)`` Mass features with t-statistics of zero do not contribute to the segmentation. The sign of the t-statistic indicates whether the mass feature is over- or under-expressed in the given cluster relative to the global mean.

Use `topFeatures()` to rank mass features by t-statistic.

``````ssc_top <- topFeatures(ssc, model=list(s=9), class == 1)
ssc_top``````
``````## Top-ranked features:
##           mz   circle   square r k s class   centers  statistic
## 1   473.9206 2.340799 0.000000 1 5 9     1 2.3475172 11.9165555
## 2  1135.9335 2.434873 3.985370 1 5 9     1 4.3294593  4.0936001
## 3   788.8633 1.542205 0.000000 1 5 9     1 0.7122964  0.9371928
## 4   781.2367 1.392247 0.000000 1 5 9     1 0.6067728  0.3683360
## 5  1023.7081 2.011661 4.063644 1 5 9     1 2.8075020  0.0000000
## 6  1200.4653 2.219637 4.166854 1 5 9     1 2.4490113  0.0000000
## 7  1858.8985 0.000000 3.970513 1 5 9     1 1.2245057  0.0000000
## 8  1361.2682 0.000000 4.259568 1 5 9     1 1.3563276 -0.3583295
## 9  1453.5096 0.000000 4.187344 1 5 9     1 1.3431417 -1.3316029
## 10 1227.9380 0.000000 4.039750 1 5 9     1 1.2913549 -2.2072382``````

## 3.2 Spatial Dirichlet Gaussian mixture modeling

Spatially-aware Dirichlet Gaussian mixture models (spatial-DGMM) is a method of image segmentation applied to each mass feature individually, rather than the dataset as a whole.

This is useful for summarizing molecular ion images, and for discovering structures that clustering using all mass features together may miss.

``````dgmm <- spatialDGMM(mse, r=1, k=5, method="adaptive")

summary(dgmm)``````
``````## Spatially-aware Dirichlet Gaussian mixture models:
##
##  Segmentation on 1 group: run0
##  Distance = chebyshev
##
##    Radius (r) Init (k) Feature Classes/Group
## 1           1        5       1             2
## 2           1        5       2             4
## 3           1        5       3             1
## 4           1        5       4             3
## 5           1        5       5             3
## 6           1        5       6             4
## 7           1        5       7             2
## 8           1        5       8             2
## 9           1        5       9             2
## 10          1        5      10             2``````

A different segmentation is fit for each mass feature.

``image(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))`` Each image is modeled as a mixture of Gaussian distributions.

``plot(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))`` Spatial-DGMM segmentations can be especially useful for finding mass features colocalized with a region-of-interest.

When applied to a `SpatialDGMM` object, `colocalize()` is able to use match scores that can have a higher specificity than using Pearson correlation on the raw ion images.

``````coloc2 <- colocalized(dgmm, mse\$square)
subset(coloc2, select=c(-r, -k, -group))``````
``````## Colocalized features:
##           mz   circle   square feature class    Mscore        M1        M2
## 1  1227.9380 0.000000 4.039750       7     2 0.9811321 1.0000000 0.9811321
## 2  1453.5096 0.000000 4.187344       9     2 0.9503106 0.9807692 0.9683544
## 3  1361.2682 0.000000 4.259568       8     2 0.9430380 0.9551282 0.9867550
## 4  1858.8985 0.000000 3.970513      10     2 0.9050633 0.9166667 0.9862069
## 5  1200.4653 2.219637 4.166854       6     3 0.5429864 0.7692308 0.6486486
## 6  1023.7081 2.011661 4.063644       4     3 0.5233161 0.6474359 0.7318841
## 7   473.9206 2.340799 0.000000       1     1 0.4834437 0.9358974 0.5000000
## 8  1135.9335 2.434873 3.985370       5     3 0.3940887 0.5128205 0.6299213
## 9   788.8633 1.542205 0.000000       3     1 0.3900000 1.0000000 0.3900000
## 10  781.2367 1.392247 0.000000       2     1 0.3807829 0.6858974 0.4612069``````

# 4 Classification and cross-validation

Classification of pixels into different known classes (e.g., cancer vs normal) based on the mass spectra is a common application for MS imaging.

``````set.seed(2020)
mse2 <- simulateImage(preset=7, npeaks=10, dim=c(10,10), sdnoise=0.5,
nruns=3, peakdiff=2, representation="centroid")

class <- makeFactor(A=mse2\$circleA, B=mse2\$circleB)

image(mse2, class ~ x * y, key=TRUE, layout=c(1,3))`````` ``image(mse2, feature=1, layout=c(1,3))`` When performing classification, it is important to use cross-validation so that reported accuracies are not overly optimistic.

We strongly recomend making sure that all spectra from the same experiment run belong to the same fold, to reduce predictive bias due to run effects.

## 4.1 Projection to latent structures (PLS)

Projection to latent structures (PLS), also called partial least squares, is a supervised dimension reduction technique. It can be thought of as being similar to PCA, but for classification or regression.

``````cv_pls <- crossValidate(mse2, .y=class, .fun=PLS, ncomp=1:5, .fold=run(mse2))

summary(cv_pls)``````
``````## Cross validation:
##
##  Classification on 2 classes: A B
##  Summarized 3 folds: run0 run1 run2
##
##   ncomp  Accuracy Sensitivity Specificity
## 1     1 0.6608485   0.0000000   1.0000000
## 2     2 0.8100534   0.4811594   0.9690476
## 3     3 0.9200094   0.8405797   0.9776557
## 4     4 0.9132067   0.8550725   0.9553114
## 5     5 0.9088528   0.7884058   0.9648352``````

We can see that using 3 PLS components produces the best cross-validated accuracy.

``````pls <- PLS(mse2, y=class, ncomp=3)

summary(pls)``````
``````## Projection to latent components:
##
##  Classification on 2 classes: A B
##  Method = pls
##
##   Number of Components Accuracy Sensitivity Specificity
## 1                    3      0.9   0.7647059   0.9775281``````

We can plot the fitted values to visualize the prediction.

``image(pls, values="fitted", layout=c(1,3))`` The PLS regression coefficients can be used to select influential features.

``plot(pls, values="coefficients", lwd=2)`` Like PCA, it can be useful to plot the PLS scores against each other to visualize the separation between classes.

``````pls_scores <- DataFrame(resultData(pls, 1, "scores"))

plot(pls_scores, C1 ~ C2, groups=class, pch=20)``````