EpiDISH 2.4.0

The **EpiDISH** package provides tools to infer the fractions of a priori known cell subtypes present in a sample representing a mixture of such cell-types. Inference proceeds via one of 3 methods (Robust Partial Correlations-RPC(Teschendorff et al. 2017), Cibersort-CBS(Newman et al. 2015), Constrained Projection-CP(Houseman et al. 2012)), as determined by the user. Besides, we also provide a function - CellDMC which allows the identification of differentially methylated cell-types in Epigenome-Wide Association Studies(EWAS)(Zheng, Breeze, et al. 2018).

For now, the package contains 4 references, including two whole blood subtypes reference, one generic epithelial reference with epithelial cells, fibroblasts, and total immune cells, and one reference for breast tissue, as described in (Teschendorff et al. 2017) and (Zheng, Webster, et al. 2018).

To show how to use our package, we constructed and stored a dummy beta value matrix *DummyBeta.m*, which contains 2000 CpGs and 10 samples, in our package.

We first load **EpiDISH** package, *DummyBeta.m* and the EpiFibIC reference.

```
library(EpiDISH)
data(centEpiFibIC.m)
data(DummyBeta.m)
```

Notice that *centEpiFibIC.m* has 3 columns, with names of the columns as EPi, Fib and IC. We go ahead and use *epidish* function with *RPC* mode to infer the cell-type fractions.

`out.l <- epidish(beta.m = DummyBeta.m, ref.m = centEpiFibIC.m, method = "RPC") `

Then, we check the output list. *estF* is the matrix of estimated cell-type fractions. *ref* is the reference centroid matrix used, and *dataREF* is the subset of the input data matrix over the probes defined in the reference matrix.

`out.l$estF`

```
## Epi Fib IC
## S1 0.08836819 0.06109607 0.8505357378
## S2 0.07652115 0.57326994 0.3502089007
## S3 0.15417391 0.75663136 0.0891947251
## S4 0.77082647 0.04171941 0.1874541181
## S5 0.03960599 0.31921224 0.6411817742
## S6 0.12751711 0.79642919 0.0760537000
## S7 0.18144315 0.72889883 0.0896580171
## S8 0.20220823 0.40929344 0.3884983293
## S9 0.19398079 0.80540932 0.0006098973
## S10 0.27976647 0.23671333 0.4835201992
```

`dim(out.l$ref)`

`## [1] 599 3`

`dim(out.l$dataREF)`

`## [1] 599 10`

In quality control step of DNAm data preprocessing, we might remove bad probes from all probes on 450k or 850k array; consequently, not all probes in the reference could be found in the given dataset. By checking *ref* and *dataREF*, we can extract the probes actually used to biuld the model and infer the cell-type fractions. If the majority of the probes in the reference cannot be found, the estimated fractionss might be compromised.

And now we show an example of using our package to estimate cell-type fractions of whole blood tissues. We use a subset beta value matrix of GSE42861 (detailed description in manaul page of *LiuDataSub.m*).

```
data(LiuDataSub.m)
BloodFrac.m <- epidish(beta.m = LiuDataSub.m, ref.m = centDHSbloodDMC.m, method = "RPC")$estF
```

We can easily check the inferred fractions with boxplots. From the boxplots, we observe that just as we expected, the major cell-type in whole blood is neutrophil.

`boxplot(BloodFrac.m)`

HEpiDISH is an iterative hierarchical procedure of EpiDISH. HEpiDISH uses two distinct DNAm references, a primary reference for the estimation of fractions of several cell-types and a separate secondary non-overlapping DNAm reference for the estimation of underlying subtype fractions of one of the cell-type in the primary reference.