** DEsingle** is an R package for

** DEsingle** employs the Zero-Inflated Negative Binomial model for differential expression analysis. By estimating the proportion of real and dropout zeros, it not only detects DE genes

For more information, please refer to the manuscript by *Zhun Miao, Ke Deng, Xiaowo Wang and Xuegong Zhang*.

If you use ** DEsingle** in published research, please cite:

Zhun Miao, Ke Deng, Xiaowo Wang, Xuegong Zhang (2018). DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics, bty332. 10.1093/bioinformatics/bty332.

To install ** DEsingle** from

```
if(!require(BiocManager)) install.packages("BiocManager")
BiocManager::install("DEsingle")
```

To install the *developmental version* from **GitHub**:

```
if(!require(devtools)) install.packages("devtools")
devtools::install_github("miaozhun/DEsingle", build_vignettes = TRUE)
```

To load the installed ** DEsingle** in R:

`library(DEsingle)`

** DEsingle** takes two inputs:

`counts`

and `group`

.The input `counts`

is a scRNA-seq **raw read counts matrix** or a ** SingleCellExperiment** object which contains the read counts matrix. The rows of the matrix are genes and columns are cells.

The other input `group`

is a vector of factor which specifies the two groups in the matrix to be compared, corresponding to the columns in `counts`

.

Users can load the test data in ** DEsingle** by

```
library(DEsingle)
data(TestData)
```

The toy data `counts`

in `TestData`

is a scRNA-seq read counts matrix which has 200 genes (rows) and 150 cells (columns).

```
dim(counts)
#> [1] 200 150
counts[1:6, 1:6]
#> E3.46.3383 E3.51.3425 E3.46.3388 E3.51.3423 E3.46.3382 E3.49.3407
#> BTG4 22 0 12 26 0 0
#> GABRB1 0 0 0 0 0 0
#> IL9 0 0 0 0 0 0
#> TAPBPL 2 0 5 1 0 2
#> KANK4 0 0 0 0 0 0
#> CPSF2 12 0 95 0 5 115
```

The object `group`

in `TestData`

is a vector of factor which has two levels and equal length to the column number of `counts`

.

```
length(group)
#> [1] 150
summary(group)
#> 1 2
#> 50 100
```

Here is an example to run ** DEsingle** with read counts matrix input:

```
# Load library and the test data for DEsingle
library(DEsingle)
data(TestData)
# Specifying the two groups to be compared
# The sample number in group 1 and group 2 is 50 and 100 respectively
group <- factor(c(rep(1,50), rep(2,100)))
# Detecting the DE genes
results <- DEsingle(counts = counts, group = group)
# Dividing the DE genes into 3 categories at threshold of FDR < 0.05
results.classified <- DEtype(results = results, threshold = 0.05)
```

The `SingleCellExperiment`

class is a widely used S4 class for storing single-cell genomics data. ** DEsingle** also could take the

`SingleCellExperiment`

data representation as input.Here is an example to run ** DEsingle** with

`SingleCellExperiment`

input:```
# Load library and the test data for DEsingle
library(DEsingle)
library(SingleCellExperiment)
data(TestData)
# Convert the test data in DEsingle to SingleCellExperiment data representation
sce <- SingleCellExperiment(assays = list(counts = as.matrix(counts)))
# Specifying the two groups to be compared
# The sample number in group 1 and group 2 is 50 and 100 respectively
group <- factor(c(rep(1,50), rep(2,100)))
# Detecting the DE genes with SingleCellExperiment input sce
results <- DEsingle(counts = sce, group = group)
# Dividing the DE genes into 3 categories at threshold of FDR < 0.05
results.classified <- DEtype(results = results, threshold = 0.05)
```

`DEtype`

subdivides the DE genes found by `DEsingle`

into 3 types: ** DEs**,

`DEa`

`DEg`

refers to`DEs`

. It is the type of genes that show significant difference in the proportion of real zeros in the two groups, but do not have significant difference in the other cells.*“different expression status”*is for`DEa`

, which refers to genes that are significantly differentially expressed between the groups without significant difference in the proportion of real zeros.*“differential expression abundance”*or`DEg`

refers to genes that have significant difference in both the proportions of real zeros and the expression abundances between the two groups.*“general differential expression”*

The output of `DEtype`

is a matrix containing the DE analysis results, whose rows are genes and columns contain the following items:

`theta_1`

,`theta_2`

,`mu_1`

,`mu_2`

,`size_1`

,`size_2`

,`prob_1`

,`prob_2`

: MLE of the zero-inflated negative binomial distribution’s parameters of group 1 and group 2.`total_mean_1`

,`total_mean_2`

: Mean of read counts of group 1 and group 2.`foldChange`

: total_mean_1/total_mean_2.`norm_total_mean_1`

,`norm_total_mean_2`

: Mean of normalized read counts of group 1 and group 2.`norm_foldChange`

: norm_total_mean_1/norm_total_mean_2.`chi2LR1`

: Chi-square statistic for hypothesis testing of H0.`pvalue_LR2`

: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).`pvalue_LR3`

: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).`FDR_LR2`

: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg’s method (Used to determine the type of a DE gene).`FDR_LR3`

: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg’s method (Used to determine the type of a DE gene).`pvalue`

: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).`pvalue.adj.FDR`

: Adjusted P value of H0’s pvalue using Benjamini & Hochberg’s method (Used to determine whether a gene is a DE gene).`Remark`

: Record of abnormal program information.`Type`

: Types of DE genes.*DEs*represents differential expression status;*DEa*represents differential expression abundance;*DEg*represents general differential expression.`State`

: State of DE genes,*up*represents up-regulated;*down*represents down-regulated.

To extract the significantly differentially expressed genes from the output of `DEtype`

(**note that the same threshold of FDR should be used in this step as in DEtype**):

```
# Extract DE genes at threshold of FDR < 0.05
results.sig <- results.classified[results.classified$pvalue.adj.FDR < 0.05, ]
```

To further extract the three types of DE genes separately:

```
# Extract three types of DE genes separately
results.DEs <- results.sig[results.sig$Type == "DEs", ]
results.DEa <- results.sig[results.sig$Type == "DEa", ]
results.DEg <- results.sig[results.sig$Type == "DEg", ]
```

** DEsingle** integrates parallel computing function with

`BiocParallel`

package. Users could just set `parallel = TRUE`

in function `DEsingle`

to enable parallelization and leave the `BPPARAM`

parameter alone.```
# Load library
library(DEsingle)
# Detecting the DE genes in parallelization
results <- DEsingle(counts = counts, group = group, parallel = TRUE)
```

Advanced users could use a `BiocParallelParam`

object from package `BiocParallel`

to fill in the `BPPARAM`

parameter to specify the parallel back-end to be used and its configuration parameters.

The best choice for Unix and Mac users is to use `MulticoreParam`

to configure a multicore parallel back-end:

```
# Load library
library(DEsingle)
library(BiocParallel)
# Set the parameters and register the back-end to be used
param <- MulticoreParam(workers = 18, progressbar = TRUE)
register(param)
# Detecting the DE genes in parallelization with 18 cores
results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
```

For Windows users, use `SnowParam`

to configure a Snow back-end is a good choice:

```
# Load library
library(DEsingle)
library(BiocParallel)
# Set the parameters and register the back-end to be used
param <- SnowParam(workers = 8, type = "SOCK", progressbar = TRUE)
register(param)
# Detecting the DE genes in parallelization with 8 cores
results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
```

See the *Reference Manual* of `BiocParallel`

package for more details of the `BiocParallelParam`

class.

Users could use the `heatmap()`

function in `stats`

or `heatmap.2`

function in `gplots`

to plot the heatmap of the DE genes DEsingle found, as we did in Figure S3 of the *manuscript*.

For the interpretation of results when ** DEsingle** applied to real data, please refer to the

Use `browseVignettes("DEsingle")`

to see the vignettes of ** DEsingle** in R after installation.

Use the following code in R to get access to the help documentation for ** DEsingle**:

```
# Documentation for DEsingle
?DEsingle
```

```
# Documentation for DEtype
?DEtype
```

```
# Documentation for TestData
?TestData
?counts
?group
```

You are also welcome to view and post *DEsingle* tagged questions on Bioconductor Support Site of DEsingle or contact the author by email for help.

```
sessionInfo()
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] DEsingle_1.5.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.1 knitr_1.22 magrittr_1.5
#> [4] splines_3.6.0 MASS_7.3-51.4 pscl_1.5.2
#> [7] BiocParallel_1.19.0 lattice_0.20-38 stringr_1.4.0
#> [10] tools_3.6.0 parallel_3.6.0 grid_3.6.0
#> [13] nlme_3.1-139 xfun_0.6 gamlss.data_5.1-3
#> [16] miscTools_0.6-22 maxLik_1.3-4 htmltools_0.3.6
#> [19] yaml_2.2.0 survival_2.44-1.1 digest_0.6.18
#> [22] numDeriv_2016.8-1 Matrix_1.2-17 gamlss_5.1-3
#> [25] gamlss.dist_5.1-3 VGAM_1.1-1 bbmle_1.0.20
#> [28] evaluate_0.13 rmarkdown_1.12 sandwich_2.5-1
#> [31] stringi_1.4.3 compiler_3.6.0 stats4_3.6.0
#> [34] zoo_1.8-5
```