1 Introduction

A fundamental problem in biomedical research is the low number of observations, mostly due to a lack of available biosamples, prohibitive costs, or ethical reasons. By augmenting a few real observations with artificially generated samples, their analysis could lead to more robust and higher reproducibility. One possible solution to the problem is the use of generative models, which are statistical models of data that attempt to capture the entire probability distribution from the observations. Using the variational autoencoder (VAE), a well-known deep generative model, this package is aimed to generate samples with gene expression data, especially for single-cell RNA-seq data. Furthermore, the VAE can use conditioning to produce specific cell types or subpopulations. The conditional VAE (CVAE) allows us to create targeted samples rather than completely random ones.

Autoencoders are an unsupervised neural networks that perform data compression from multidimensional to a preferred dimensionality. They reconstruct input data using the hidden layer weights calculated by encoding. The basic idea of an autoencoder is to have an output layer with the same dimensionality as the inputs. The idea is to try to reconstruct each dimension exactly by passing it through the network. It is common but not necessary for an autoencoder to have a symmetric architecture between the input and output. The number of units in the middle layer is typically fewer than that in the input or output. After training an autoencoder, it is not necessary to use both the encoder and decoder portions. For example, when using the approach for dimensionality reduction, one can use the encoder portion in order to create the reduced representations of the data. The reconstructions of the decoder might not be required at all. As a result, an autoencoder is capable of performing dimension reduction. The objective function of this neural network encompasses reconstruction loss. The loss function uses the sum of squared differences between the input and the output in order to force the output to be as similar as possible to the input. Also, the cross-entropy can used as a loss function for quantifying the difference between two probability distributions.

Another interesting application of the autoencoder is one in which we use only the decoder portion of the network. Variational autoencoders are based on Bayesian inference in which the compressed representation follows probability distribution. This constraint differentiates the VAE from standard autoencoder. The VAE can generate new data while conventional autoencoders fail. For example, one might add a term to the loss function to enforce the fact that the hidden variables are drawn from a Gaussian distribution. Then, one might repeatedly draw samples from this Gaussian distribution and use only the decoder portion of the network in order to generate samples of the original data. In this autoencoder, bottleneck vector (latent vector) is replaced by two vectors, namely, mean vector and standard deviation vector. The overall loss function \(J = L + \lambda R\) of the VAE is expressed as a weighted sum of the reconstruction loss \(L\) and the regularization loss \(R\), where \(\lambda > 0\) is the regularization parameter. The term “variational” comes from the close relationship between the regularization and the variational inference method in statistics. One can use a variety of choices for the reconstruction error, and we will use the binary cross-entropy loss between the input and output. The regularization loss is simply the Kullback-Leibler divergence measure of the conditional distributions of the hidden representations of particular points with respect to the standard multivariate Gaussian distribution. Small values of \(\lambda\) will favor exact reconstruction, and the approach will behave like a traditional autoencoder.

One can apply conditioning to variational autoencoders in order to obtain some interesting results. The basic idea in the conditional variational autoencoder is to add an additional conditional input. From an implementation perspective, we can encode category information as a one-hot representation, indicating to the model which class is at the input. One can use an autoencoder for embedding multimodal data in a joint latent space. Multimodal data is essentially data in which the input features are heterogeneous. In addition, by separating the samples into different classes, the data points within the same category become more similar, enhancing the modeling capacity and sample quality of the CVAE.



2 Example

2.1 VAE

Consider artificial data. The data consist of 1000 genes and three groups of 100 samples. Each group has 100 differentially expressed genes.

if (keras::is_keras_available() & reticulate::py_available()) {
    library(VAExprs)
    
    ### simulate differentially expressed genes
    set.seed(1)
    g <- 3
    n <- 100
    m <- 1000
    mu <- 5
    sigma <- 5
    mat <- matrix(rnorm(n*m*g, mu, sigma), m, n*g)
    rownames(mat) <- paste0("gene", seq_len(m))
    colnames(mat) <- paste0("cell", seq_len(n*g))
    group <- factor(sapply(seq_len(g), function(x) { 
        rep(paste0("group", x), n)
    }))
    names(group) <- colnames(mat)
    mu_upreg <- 6
    sigma_upreg <- 10
    deg <- 100
    for (i in seq_len(g)) {
        mat[(deg*(i-1) + 1):(deg*i), group == paste0("group", i)] <- 
            mat[1:deg, group==paste0("group", i)] + rnorm(deg, mu_upreg, sigma_upreg)
    }
    # positive expression only
    mat[mat < 0] <- 0
    x_train <- as.matrix(t(mat))
    
    # heatmap
    heatmap(mat, Rowv = NA, Colv = NA, 
            col = colorRampPalette(c('green', 'red'))(100), 
            scale = "none")
}
## Loading required package: keras
## Loading required package: mclust
## Package 'mclust' version 6.0.0
## Type 'citation("mclust")' for citing this R package in publications.

The VAE model can be built by using the function “fit_vae” with gene expression data and the cell annotation from the object “sce”. The overall loss function of the VAE is expressed as a weighted sum of the reconstruction loss and the regularization loss. The reconstruction loss is the binary cross-entropy loss between the input and output and the regularization loss is simply the Kullback-Leibler divergence measure. Note that the same dataset is used for training and validation.

if (keras::is_keras_available() & reticulate::py_available()) {
    # model parameters
    batch_size <- 32
    original_dim <- 1000
    intermediate_dim <- 512
    epochs <- 100
    
    # VAE
    vae_result <- fit_vae(x_train = x_train, x_val = x_train,
                        encoder_layers = list(layer_input(shape = c(original_dim)),
                                            layer_dense(units = intermediate_dim,
                                                        activation = "relu")),
                        decoder_layers = list(layer_dense(units = intermediate_dim,
                                                        activation = "relu"),
                                            layer_dense(units = original_dim,
                                                        activation = "sigmoid")),
                        epochs = epochs, batch_size = batch_size,
                        use_generator = FALSE,
                        callbacks = keras::callback_early_stopping(
                            monitor = "val_loss",
                            patience = 10,
                            restore_best_weights = TRUE))
}
## normalizing...
## training...
## Train on 300 samples, validate on 300 samples
## Epoch 1/100
## 300/300 - 1s - loss: 669.3172 - val_loss: 624.5613 - 835ms/epoch - 3ms/sample
## Epoch 2/100
## 300/300 - 0s - loss: 616.5250 - val_loss: 608.8014 - 129ms/epoch - 430us/sample
## Epoch 3/100
## 300/300 - 0s - loss: 605.6479 - val_loss: 602.2207 - 90ms/epoch - 299us/sample
## Epoch 4/100
## 300/300 - 0s - loss: 600.9284 - val_loss: 601.2851 - 101ms/epoch - 336us/sample
## Epoch 5/100
## 300/300 - 0s - loss: 600.1435 - val_loss: 599.3640 - 109ms/epoch - 365us/sample
## Epoch 6/100
## 300/300 - 0s - loss: 599.2469 - val_loss: 599.6836 - 97ms/epoch - 325us/sample
## Epoch 7/100
## 300/300 - 0s - loss: 599.0667 - val_loss: 598.2281 - 111ms/epoch - 370us/sample
## Epoch 8/100
## 300/300 - 0s - loss: 597.9134 - val_loss: 597.0694 - 114ms/epoch - 379us/sample
## Epoch 9/100
## 300/300 - 0s - loss: 596.4884 - val_loss: 595.3942 - 110ms/epoch - 368us/sample
## Epoch 10/100
## 300/300 - 0s - loss: 594.2937 - val_loss: 591.9880 - 93ms/epoch - 310us/sample
## Epoch 11/100
## 300/300 - 0s - loss: 591.6170 - val_loss: 590.2017 - 87ms/epoch - 290us/sample
## Epoch 12/100
## 300/300 - 0s - loss: 590.1043 - val_loss: 588.9890 - 86ms/epoch - 285us/sample
## Epoch 13/100
## 300/300 - 0s - loss: 589.0510 - val_loss: 588.3644 - 87ms/epoch - 291us/sample
## Epoch 14/100
## 300/300 - 0s - loss: 589.8769 - val_loss: 588.5687 - 94ms/epoch - 315us/sample
## Epoch 15/100
## 300/300 - 0s - loss: 588.1046 - val_loss: 587.5036 - 103ms/epoch - 344us/sample
## Epoch 16/100
## 300/300 - 0s - loss: 585.7534 - val_loss: 583.7886 - 94ms/epoch - 312us/sample
## Epoch 17/100
## 300/300 - 0s - loss: 583.6723 - val_loss: 582.9545 - 107ms/epoch - 356us/sample
## Epoch 18/100
## 300/300 - 0s - loss: 582.5656 - val_loss: 581.1826 - 110ms/epoch - 365us/sample
## Epoch 19/100
## 300/300 - 0s - loss: 581.1715 - val_loss: 580.9004 - 93ms/epoch - 311us/sample
## Epoch 20/100
## 300/300 - 0s - loss: 580.5290 - val_loss: 580.1440 - 71ms/epoch - 237us/sample
## Epoch 21/100
## 300/300 - 0s - loss: 579.9886 - val_loss: 579.6583 - 90ms/epoch - 301us/sample
## Epoch 22/100
## 300/300 - 0s - loss: 579.8889 - val_loss: 579.7671 - 86ms/epoch - 285us/sample
## Epoch 23/100
## 300/300 - 0s - loss: 580.5718 - val_loss: 579.3868 - 104ms/epoch - 346us/sample
## Epoch 24/100
## 300/300 - 0s - loss: 579.9838 - val_loss: 579.9716 - 108ms/epoch - 360us/sample
## Epoch 25/100
## 300/300 - 0s - loss: 579.9851 - val_loss: 579.2531 - 106ms/epoch - 354us/sample
## Epoch 26/100
## 300/300 - 0s - loss: 579.5523 - val_loss: 578.9599 - 96ms/epoch - 320us/sample
## Epoch 27/100
## 300/300 - 0s - loss: 580.0028 - val_loss: 580.0194 - 90ms/epoch - 301us/sample
## Epoch 28/100
## 300/300 - 0s - loss: 579.8328 - val_loss: 578.8882 - 83ms/epoch - 275us/sample
## Epoch 29/100
## 300/300 - 0s - loss: 579.0862 - val_loss: 578.9362 - 86ms/epoch - 287us/sample
## Epoch 30/100
## 300/300 - 0s - loss: 579.1108 - val_loss: 578.6967 - 98ms/epoch - 327us/sample
## Epoch 31/100
## 300/300 - 0s - loss: 579.1871 - val_loss: 579.0604 - 95ms/epoch - 316us/sample
## Epoch 32/100
## 300/300 - 0s - loss: 579.5091 - val_loss: 578.9243 - 106ms/epoch - 353us/sample
## Epoch 33/100
## 300/300 - 0s - loss: 579.6824 - val_loss: 579.7555 - 108ms/epoch - 360us/sample
## Epoch 34/100
## 300/300 - 0s - loss: 579.8963 - val_loss: 578.9150 - 85ms/epoch - 282us/sample
## Epoch 35/100
## 300/300 - 0s - loss: 579.6059 - val_loss: 578.7627 - 83ms/epoch - 277us/sample
## Epoch 36/100
## 300/300 - 0s - loss: 579.0396 - val_loss: 578.6142 - 85ms/epoch - 284us/sample
## Epoch 37/100
## 300/300 - 0s - loss: 579.1207 - val_loss: 578.6760 - 93ms/epoch - 309us/sample
## Epoch 38/100
## 300/300 - 0s - loss: 579.3928 - val_loss: 578.4127 - 93ms/epoch - 310us/sample
## Epoch 39/100
## 300/300 - 0s - loss: 578.7318 - val_loss: 578.5413 - 89ms/epoch - 296us/sample
## Epoch 40/100
## 300/300 - 0s - loss: 579.1040 - val_loss: 579.1740 - 101ms/epoch - 335us/sample
## Epoch 41/100
## 300/300 - 0s - loss: 579.1521 - val_loss: 578.6594 - 111ms/epoch - 370us/sample
## Epoch 42/100
## 300/300 - 0s - loss: 578.9091 - val_loss: 578.6984 - 104ms/epoch - 346us/sample
## Epoch 43/100
## 300/300 - 0s - loss: 578.7868 - val_loss: 578.3270 - 110ms/epoch - 367us/sample
## Epoch 44/100
## 300/300 - 0s - loss: 579.5669 - val_loss: 578.8687 - 112ms/epoch - 373us/sample
## Epoch 45/100
## 300/300 - 0s - loss: 579.0219 - val_loss: 578.3671 - 92ms/epoch - 306us/sample
## Epoch 46/100
## 300/300 - 0s - loss: 578.8383 - val_loss: 579.0563 - 105ms/epoch - 351us/sample
## Epoch 47/100
## 300/300 - 0s - loss: 578.7296 - val_loss: 578.3843 - 105ms/epoch - 350us/sample
## Epoch 48/100
## 300/300 - 0s - loss: 578.5321 - val_loss: 578.8131 - 105ms/epoch - 350us/sample
## Epoch 49/100
## 300/300 - 0s - loss: 578.6271 - val_loss: 578.3945 - 98ms/epoch - 328us/sample
## Epoch 50/100
## 300/300 - 0s - loss: 578.9157 - val_loss: 578.5486 - 91ms/epoch - 303us/sample
## Epoch 51/100
## 300/300 - 0s - loss: 578.9217 - val_loss: 578.5247 - 98ms/epoch - 328us/sample
## Epoch 52/100
## 300/300 - 0s - loss: 578.4970 - val_loss: 578.4717 - 104ms/epoch - 347us/sample
## Epoch 53/100
## 300/300 - 0s - loss: 578.4125 - val_loss: 578.2284 - 90ms/epoch - 299us/sample
## Epoch 54/100
## 300/300 - 0s - loss: 578.3339 - val_loss: 578.4543 - 98ms/epoch - 326us/sample
## Epoch 55/100
## 300/300 - 0s - loss: 578.6471 - val_loss: 578.1153 - 104ms/epoch - 348us/sample
## Epoch 56/100
## 300/300 - 0s - loss: 579.0787 - val_loss: 578.1224 - 104ms/epoch - 347us/sample
## Epoch 57/100
## 300/300 - 0s - loss: 578.6089 - val_loss: 578.3456 - 100ms/epoch - 334us/sample
## Epoch 58/100
## 300/300 - 0s - loss: 578.4368 - val_loss: 578.2844 - 99ms/epoch - 329us/sample
## Epoch 59/100
## 300/300 - 0s - loss: 578.4072 - val_loss: 577.9244 - 91ms/epoch - 304us/sample
## Epoch 60/100
## 300/300 - 0s - loss: 578.1354 - val_loss: 579.3975 - 102ms/epoch - 342us/sample
## Epoch 61/100
## 300/300 - 0s - loss: 578.7125 - val_loss: 577.9993 - 66ms/epoch - 221us/sample
## Epoch 62/100
## 300/300 - 0s - loss: 578.2282 - val_loss: 578.0052 - 84ms/epoch - 278us/sample
## Epoch 63/100
## 300/300 - 0s - loss: 578.5887 - val_loss: 579.7179 - 84ms/epoch - 279us/sample
## Epoch 64/100
## 300/300 - 0s - loss: 578.4111 - val_loss: 578.0318 - 76ms/epoch - 255us/sample
## Epoch 65/100
## 300/300 - 0s - loss: 578.4815 - val_loss: 577.8353 - 90ms/epoch - 301us/sample
## Epoch 66/100
## 300/300 - 0s - loss: 578.3837 - val_loss: 577.9445 - 104ms/epoch - 346us/sample
## Epoch 67/100
## 300/300 - 0s - loss: 578.3380 - val_loss: 577.8079 - 108ms/epoch - 358us/sample
## Epoch 68/100
## 300/300 - 0s - loss: 578.0230 - val_loss: 577.9786 - 105ms/epoch - 350us/sample
## Epoch 69/100
## 300/300 - 0s - loss: 578.1109 - val_loss: 577.7553 - 104ms/epoch - 346us/sample
## Epoch 70/100
## 300/300 - 0s - loss: 578.0807 - val_loss: 577.6285 - 87ms/epoch - 290us/sample
## Epoch 71/100
## 300/300 - 0s - loss: 577.8974 - val_loss: 577.5832 - 101ms/epoch - 337us/sample
## Epoch 72/100
## 300/300 - 0s - loss: 578.0230 - val_loss: 577.7366 - 97ms/epoch - 322us/sample
## Epoch 73/100
## 300/300 - 0s - loss: 577.8632 - val_loss: 577.5307 - 70ms/epoch - 234us/sample
## Epoch 74/100
## 300/300 - 0s - loss: 577.9143 - val_loss: 577.6702 - 65ms/epoch - 217us/sample
## Epoch 75/100
## 300/300 - 0s - loss: 577.8772 - val_loss: 577.7150 - 82ms/epoch - 273us/sample
## Epoch 76/100
## 300/300 - 0s - loss: 578.0638 - val_loss: 578.3267 - 82ms/epoch - 275us/sample
## Epoch 77/100
## 300/300 - 0s - loss: 578.6269 - val_loss: 577.5640 - 87ms/epoch - 290us/sample
## Epoch 78/100
## 300/300 - 0s - loss: 578.6357 - val_loss: 577.8041 - 94ms/epoch - 315us/sample
## Epoch 79/100
## 300/300 - 0s - loss: 578.1932 - val_loss: 578.0267 - 86ms/epoch - 287us/sample
## Epoch 80/100
## 300/300 - 0s - loss: 577.8921 - val_loss: 577.7778 - 90ms/epoch - 301us/sample
## Epoch 81/100
## 300/300 - 0s - loss: 578.0345 - val_loss: 577.4179 - 94ms/epoch - 313us/sample
## Epoch 82/100
## 300/300 - 0s - loss: 577.7907 - val_loss: 577.5430 - 100ms/epoch - 334us/sample
## Epoch 83/100
## 300/300 - 0s - loss: 577.9973 - val_loss: 577.4262 - 79ms/epoch - 264us/sample
## Epoch 84/100
## 300/300 - 0s - loss: 577.5938 - val_loss: 577.3230 - 77ms/epoch - 255us/sample
## Epoch 85/100
## 300/300 - 0s - loss: 577.5427 - val_loss: 577.2547 - 69ms/epoch - 230us/sample
## Epoch 86/100
## 300/300 - 0s - loss: 577.9067 - val_loss: 577.7788 - 67ms/epoch - 223us/sample
## Epoch 87/100
## 300/300 - 0s - loss: 577.8355 - val_loss: 577.3760 - 85ms/epoch - 284us/sample
## Epoch 88/100
## 300/300 - 0s - loss: 577.8506 - val_loss: 577.6431 - 90ms/epoch - 299us/sample
## Epoch 89/100
## 300/300 - 0s - loss: 577.7658 - val_loss: 577.3819 - 101ms/epoch - 337us/sample
## Epoch 90/100
## 300/300 - 0s - loss: 577.5927 - val_loss: 577.2268 - 96ms/epoch - 320us/sample
## Epoch 91/100
## 300/300 - 0s - loss: 577.7691 - val_loss: 577.3988 - 93ms/epoch - 309us/sample
## Epoch 92/100
## 300/300 - 0s - loss: 577.5346 - val_loss: 577.2153 - 103ms/epoch - 342us/sample
## Epoch 93/100
## 300/300 - 0s - loss: 577.6138 - val_loss: 577.5910 - 93ms/epoch - 311us/sample
## Epoch 94/100
## 300/300 - 0s - loss: 577.6909 - val_loss: 577.2053 - 95ms/epoch - 317us/sample
## Epoch 95/100
## 300/300 - 0s - loss: 577.5589 - val_loss: 577.2641 - 103ms/epoch - 345us/sample
## Epoch 96/100
## 300/300 - 0s - loss: 577.7709 - val_loss: 577.1332 - 104ms/epoch - 346us/sample
## Epoch 97/100
## 300/300 - 0s - loss: 577.4971 - val_loss: 577.1698 - 106ms/epoch - 352us/sample
## Epoch 98/100
## 300/300 - 0s - loss: 577.5361 - val_loss: 577.0743 - 100ms/epoch - 334us/sample
## Epoch 99/100
## 300/300 - 0s - loss: 577.7578 - val_loss: 577.5515 - 101ms/epoch - 336us/sample
## Epoch 100/100
## 300/300 - 0s - loss: 577.8275 - val_loss: 577.4711 - 86ms/epoch - 288us/sample

The function “plot_vae” draws the plot for model architecture.

if (keras::is_keras_available() & reticulate::py_available()) {
    # model architecture
    plot_vae(vae_result$model)
}

The function “gen_exprs” can generate samples with expression data by using the trained model.

if (keras::is_keras_available() & reticulate::py_available()) {
    # sample generation
    set.seed(1)
    gen_sample_result <- gen_exprs(vae_result, num_samples = 100)
    
    # heatmap
    heatmap(cbind(t(x_train), t(gen_sample_result$x_gen)),
            col = colorRampPalette(c('green', 'red'))(100),
            Rowv=NA)
}
## generating...
## post-processing...

The function “plot_aug” uses reduced dimension plots for augmented data visualization.

if (keras::is_keras_available() & reticulate::py_available()) {
    # plot for augmented data
    plot_aug(gen_sample_result, "PCA")
}



2.2 CVAE

The “yan” data set is single-cell RNA sequencing data with 20214 genes and 90 cells from human preimplantation embryos and embryonic stem cells at different passages. The rows in the dataset correspond to genes and columns correspond to cells. The “SingleCellExperiment” class can be used to store and manipulate single-cell genomics data. It extends the “RangedSummarizedExperiment” class and follows similar conventions. The object “sce” can be created by the data “yan” with cell type annotation “ann”.

if (keras::is_keras_available() & reticulate::py_available()) {
    library(VAExprs)
    library(SC3)
    library(SingleCellExperiment)
    
    # create a SingleCellExperiment object
    sce <- SingleCellExperiment::SingleCellExperiment(
        assays = list(counts = as.matrix(yan)),
        colData = ann
    )
    
    # define feature names in feature_symbol column
    rowData(sce)$feature_symbol <- rownames(sce)
    # remove features with duplicated names
    sce <- sce[!duplicated(rowData(sce)$feature_symbol), ]
    # remove genes that are not expressed in any samples
    sce <- sce[which(rowMeans(assay(sce)) > 0),]
    dim(assay(sce))
    
    # model parameters
    batch_size <- 32
    original_dim <- 19595
    intermediate_dim <- 256
    epochs <- 100
    
    # model
    cvae_result <- fit_vae(object = sce,
                        encoder_layers = list(layer_input(shape = c(original_dim)),
                                            layer_dense(units = intermediate_dim,
                                                        activation = "relu")),
                        decoder_layers = list(layer_dense(units = intermediate_dim,
                                                        activation = "relu"),
                                            layer_dense(units = original_dim,
                                                        activation = "sigmoid")),
                        epochs = epochs, batch_size = batch_size,
                        use_generator = TRUE,
                        callbacks = keras::callback_early_stopping(
                            monitor = "loss",
                            patience = 20,
                            restore_best_weights = TRUE))
    
    # model architecture
    plot_vae(cvae_result$model)
}
## pre-processing...
## normalizing...
## training...
## Epoch 1/100
## 3/3 - 6s - loss: 13007.0186 - 6s/epoch - 2s/step
## Epoch 2/100
## 3/3 - 1s - loss: 10660.4922 - 1s/epoch - 336ms/step
## Epoch 3/100
## 3/3 - 0s - loss: 35647.4753 - 262ms/epoch - 87ms/step
## Epoch 4/100
## 3/3 - 0s - loss: 9923.6852 - 217ms/epoch - 72ms/step
## Epoch 5/100
## 3/3 - 0s - loss: 10011.8714 - 198ms/epoch - 66ms/step
## Epoch 6/100
## 3/3 - 0s - loss: 10004.9326 - 261ms/epoch - 87ms/step
## Epoch 7/100
## 3/3 - 0s - loss: 10034.5072 - 195ms/epoch - 65ms/step
## Epoch 8/100
## 3/3 - 1s - loss: 9867.3421 - 1s/epoch - 488ms/step
## Epoch 9/100
## 3/3 - 0s - loss: 9617.1432 - 284ms/epoch - 95ms/step
## Epoch 10/100
## 3/3 - 1s - loss: 9386.9251 - 965ms/epoch - 322ms/step
## Epoch 11/100
## 3/3 - 0s - loss: 9039.3135 - 242ms/epoch - 81ms/step
## Epoch 12/100
## 3/3 - 1s - loss: 8836.1605 - 1s/epoch - 483ms/step
## Epoch 13/100
## 3/3 - 1s - loss: 8748.0713 - 1s/epoch - 402ms/step
## Epoch 14/100
## 3/3 - 2s - loss: 8703.3649 - 2s/epoch - 561ms/step
## Epoch 15/100
## 3/3 - 1s - loss: 8609.7067 - 1s/epoch - 474ms/step
## Epoch 16/100
## 3/3 - 1s - loss: 8698.4977 - 1s/epoch - 368ms/step
## Epoch 17/100
## 3/3 - 1s - loss: 8527.1471 - 1s/epoch - 431ms/step
## Epoch 18/100
## 3/3 - 0s - loss: 8363.8242 - 191ms/epoch - 64ms/step
## Epoch 19/100
## 3/3 - 0s - loss: 8452.3236 - 171ms/epoch - 57ms/step
## Epoch 20/100
## 3/3 - 0s - loss: 8353.2894 - 222ms/epoch - 74ms/step
## Epoch 21/100
## 3/3 - 2s - loss: 8152.1781 - 2s/epoch - 574ms/step
## Epoch 22/100
## 3/3 - 1s - loss: 8440.6846 - 1s/epoch - 348ms/step
## Epoch 23/100
## 3/3 - 0s - loss: 7953.9025 - 214ms/epoch - 71ms/step
## Epoch 24/100
## 3/3 - 0s - loss: 8299.4954 - 181ms/epoch - 60ms/step
## Epoch 25/100
## 3/3 - 0s - loss: 8067.3595 - 191ms/epoch - 64ms/step
## Epoch 26/100
## 3/3 - 0s - loss: 8162.6340 - 199ms/epoch - 66ms/step
## Epoch 27/100
## 3/3 - 0s - loss: 8040.2767 - 203ms/epoch - 68ms/step
## Epoch 28/100
## 3/3 - 0s - loss: 7982.8617 - 217ms/epoch - 72ms/step
## Epoch 29/100
## 3/3 - 0s - loss: 8099.2238 - 209ms/epoch - 70ms/step
## Epoch 30/100
## 3/3 - 0s - loss: 8159.3965 - 242ms/epoch - 81ms/step
## Epoch 31/100
## 3/3 - 0s - loss: 8121.4985 - 172ms/epoch - 57ms/step
## Epoch 32/100
## 3/3 - 0s - loss: 7724.7772 - 180ms/epoch - 60ms/step
## Epoch 33/100
## 3/3 - 0s - loss: 7809.7161 - 170ms/epoch - 57ms/step
## Epoch 34/100
## 3/3 - 0s - loss: 8023.5571 - 175ms/epoch - 58ms/step
## Epoch 35/100
## 3/3 - 0s - loss: 7792.6302 - 173ms/epoch - 58ms/step
## Epoch 36/100
## 3/3 - 0s - loss: 7944.9435 - 175ms/epoch - 58ms/step
## Epoch 37/100
## 3/3 - 0s - loss: 7939.4217 - 175ms/epoch - 58ms/step
## Epoch 38/100
## 3/3 - 0s - loss: 7912.8634 - 198ms/epoch - 66ms/step
## Epoch 39/100
## 3/3 - 0s - loss: 7809.0353 - 199ms/epoch - 66ms/step
## Epoch 40/100
## 3/3 - 0s - loss: 7797.7288 - 172ms/epoch - 57ms/step
## Epoch 41/100
## 3/3 - 0s - loss: 7870.4933 - 172ms/epoch - 57ms/step
## Epoch 42/100
## 3/3 - 0s - loss: 7731.4793 - 205ms/epoch - 68ms/step
## Epoch 43/100
## 3/3 - 0s - loss: 7868.3799 - 205ms/epoch - 68ms/step
## Epoch 44/100
## 3/3 - 0s - loss: 7947.5632 - 175ms/epoch - 58ms/step
## Epoch 45/100
## 3/3 - 1s - loss: 7710.1488 - 650ms/epoch - 217ms/step
## Epoch 46/100
## 3/3 - 1s - loss: 7836.7293 - 1s/epoch - 342ms/step
## Epoch 47/100
## 3/3 - 0s - loss: 7808.5776 - 172ms/epoch - 57ms/step
## Epoch 48/100
## 3/3 - 0s - loss: 7867.4924 - 170ms/epoch - 57ms/step
## Epoch 49/100
## 3/3 - 0s - loss: 7904.7886 - 178ms/epoch - 59ms/step
## Epoch 50/100
## 3/3 - 0s - loss: 7791.6152 - 174ms/epoch - 58ms/step
## Epoch 51/100
## 3/3 - 0s - loss: 7676.3708 - 197ms/epoch - 66ms/step
## Epoch 52/100
## 3/3 - 0s - loss: 7741.5789 - 181ms/epoch - 60ms/step
## Epoch 53/100
## 3/3 - 0s - loss: 7793.9129 - 197ms/epoch - 66ms/step
## Epoch 54/100
## 3/3 - 0s - loss: 7780.0326 - 228ms/epoch - 76ms/step
## Epoch 55/100
## 3/3 - 0s - loss: 7870.2383 - 192ms/epoch - 64ms/step
## Epoch 56/100
## 3/3 - 0s - loss: 7795.3146 - 193ms/epoch - 64ms/step
## Epoch 57/100
## 3/3 - 0s - loss: 7826.2811 - 233ms/epoch - 78ms/step
## Epoch 58/100
## 3/3 - 0s - loss: 7869.9715 - 176ms/epoch - 59ms/step
## Epoch 59/100
## 3/3 - 0s - loss: 7911.2243 - 181ms/epoch - 60ms/step
## Epoch 60/100
## 3/3 - 0s - loss: 7782.0685 - 179ms/epoch - 60ms/step
## Epoch 61/100
## 3/3 - 0s - loss: 7701.5955 - 175ms/epoch - 58ms/step
## Epoch 62/100
## 3/3 - 0s - loss: 7878.5324 - 172ms/epoch - 57ms/step
## Epoch 63/100
## 3/3 - 0s - loss: 7875.3861 - 168ms/epoch - 56ms/step
## Epoch 64/100
## 3/3 - 0s - loss: 7769.0283 - 164ms/epoch - 55ms/step
## Epoch 65/100
## 3/3 - 0s - loss: 7674.1496 - 177ms/epoch - 59ms/step
## Epoch 66/100
## 3/3 - 0s - loss: 7744.0920 - 166ms/epoch - 55ms/step
## Epoch 67/100
## 3/3 - 1s - loss: 7728.4561 - 921ms/epoch - 307ms/step
## Epoch 68/100
## 3/3 - 0s - loss: 7684.2539 - 179ms/epoch - 60ms/step
## Epoch 69/100
## 3/3 - 0s - loss: 7934.4969 - 172ms/epoch - 57ms/step
## Epoch 70/100
## 3/3 - 0s - loss: 7744.0055 - 180ms/epoch - 60ms/step
## Epoch 71/100
## 3/3 - 0s - loss: 7679.1649 - 169ms/epoch - 56ms/step
## Epoch 72/100
## 3/3 - 0s - loss: 7820.6733 - 181ms/epoch - 60ms/step
## Epoch 73/100
## 3/3 - 0s - loss: 7729.9513 - 169ms/epoch - 56ms/step
## Epoch 74/100
## 3/3 - 0s - loss: 7630.5934 - 179ms/epoch - 60ms/step
## Epoch 75/100
## 3/3 - 0s - loss: 7813.2274 - 170ms/epoch - 57ms/step
## Epoch 76/100
## 3/3 - 0s - loss: 7678.9375 - 170ms/epoch - 57ms/step
## Epoch 77/100
## 3/3 - 0s - loss: 7802.6995 - 167ms/epoch - 56ms/step
## Epoch 78/100
## 3/3 - 0s - loss: 7484.8968 - 180ms/epoch - 60ms/step
## Epoch 79/100
## 3/3 - 0s - loss: 7882.6691 - 171ms/epoch - 57ms/step
## Epoch 80/100
## 3/3 - 0s - loss: 7691.4989 - 172ms/epoch - 57ms/step
## Epoch 81/100
## 3/3 - 0s - loss: 7498.4032 - 173ms/epoch - 58ms/step
## Epoch 82/100
## 3/3 - 0s - loss: 7540.2259 - 172ms/epoch - 57ms/step
## Epoch 83/100
## 3/3 - 0s - loss: 7788.2656 - 188ms/epoch - 63ms/step
## Epoch 84/100
## 3/3 - 0s - loss: 7634.7166 - 193ms/epoch - 64ms/step
## Epoch 85/100
## 3/3 - 0s - loss: 7796.6541 - 296ms/epoch - 99ms/step
## Epoch 86/100
## 3/3 - 0s - loss: 7784.1838 - 167ms/epoch - 56ms/step
## Epoch 87/100
## 3/3 - 0s - loss: 7847.8693 - 167ms/epoch - 56ms/step
## Epoch 88/100
## 3/3 - 0s - loss: 7791.8617 - 167ms/epoch - 56ms/step
## Epoch 89/100
## 3/3 - 0s - loss: 7494.2054 - 171ms/epoch - 57ms/step
## Epoch 90/100
## 3/3 - 0s - loss: 7657.9860 - 174ms/epoch - 58ms/step
## Epoch 91/100
## 3/3 - 0s - loss: 7733.0472 - 174ms/epoch - 58ms/step
## Epoch 92/100
## 3/3 - 0s - loss: 7676.6842 - 171ms/epoch - 57ms/step
## Epoch 93/100
## 3/3 - 0s - loss: 7639.8420 - 175ms/epoch - 58ms/step
## Epoch 94/100
## 3/3 - 0s - loss: 7553.4854 - 175ms/epoch - 58ms/step
## Epoch 95/100
## 3/3 - 0s - loss: 7750.9767 - 174ms/epoch - 58ms/step
## Epoch 96/100
## 3/3 - 0s - loss: 7666.4191 - 183ms/epoch - 61ms/step
## Epoch 97/100
## 3/3 - 0s - loss: 7851.7357 - 178ms/epoch - 59ms/step
## Epoch 98/100
## 3/3 - 0s - loss: 7631.4425 - 276ms/epoch - 92ms/step
if (keras::is_keras_available() & reticulate::py_available()) {
    # sample generation
    set.seed(1)
    gen_sample_result <- gen_exprs(cvae_result, 100,
                                batch_size, use_generator = TRUE)
    
    # plot for augmented data
    plot_aug(gen_sample_result, "PCA")
}
## generating...
## post-processing...



3 Session information

sessionInfo()
## R Under development (unstable) (2023-10-22 r85388)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] SC3_1.31.0                  GenomicRanges_1.55.0       
## [3] SummarizedExperiment_1.33.0 SingleCellExperiment_1.25.0
## [5] IRanges_2.37.0              S4Vectors_0.41.0           
## [7] VAExprs_1.9.0               mclust_6.0.0               
## [9] keras_2.13.0               
## 
## loaded via a namespace (and not attached):
##   [1] sylly.en_0.1-3            sylly_0.1-6              
##   [3] later_1.3.1               bitops_1.0-7             
##   [5] textstem_0.1.4            tibble_3.2.1             
##   [7] matlab_1.0.4              lifecycle_1.0.3          
##   [9] doParallel_1.0.17         NLP_0.2-1                
##  [11] lattice_0.22-5            SnowballC_0.7.1          
##  [13] magrittr_2.0.3            sass_0.4.7               
##  [15] rmarkdown_2.25            jquerylib_0.1.4          
##  [17] yaml_2.3.7                httpuv_1.6.12            
##  [19] text2vec_0.6.3            doRNG_1.8.6              
##  [21] reticulate_1.34.0         cowplot_1.1.1            
##  [23] RColorBrewer_1.1-3        abind_1.4-5              
##  [25] zlibbioc_1.49.0           rvest_1.0.3              
##  [27] ttgsea_1.11.0             purrr_1.0.2              
##  [29] BiocGenerics_0.49.0       RCurl_1.98-1.12          
##  [31] WriteXLS_6.4.0            float_0.3-1              
##  [33] CatEncoders_0.1.1         GenomeInfoDbData_1.2.11  
##  [35] data.tree_1.0.0           tm_0.7-11                
##  [37] ggrepel_0.9.4             irlba_2.3.5.1            
##  [39] tokenizers_0.3.0          pheatmap_1.0.12          
##  [41] DelayedMatrixStats_1.25.0 codetools_0.2-19         
##  [43] DelayedArray_0.29.0       scuttle_1.13.0           
##  [45] xml2_1.3.5                tidyselect_1.2.0         
##  [47] PRROC_1.3.1               farver_2.1.1             
##  [49] ScaledMatrix_1.11.0       viridis_0.6.4            
##  [51] matrixStats_1.0.0         stats4_4.4.0             
##  [53] base64enc_0.1-3           jsonlite_1.8.7           
##  [55] rsparse_0.5.1             BiocNeighbors_1.21.0     
##  [57] e1071_1.7-13              ellipsis_0.3.2           
##  [59] scater_1.31.0             iterators_1.0.14         
##  [61] foreach_1.5.2             tools_4.4.0              
##  [63] stringdist_0.9.10         Rcpp_1.0.11              
##  [65] glue_1.6.2                gridExtra_2.3            
##  [67] tfruns_1.5.1              SparseArray_1.3.0        
##  [69] xfun_0.40                 MatrixGenerics_1.15.0    
##  [71] GenomeInfoDb_1.39.0       dplyr_1.1.3              
##  [73] withr_2.5.1               fastmap_1.1.1            
##  [75] fansi_1.0.5               digest_0.6.33            
##  [77] rsvd_1.0.5                mime_0.12                
##  [79] R6_2.5.1                  colorspace_2.1-0         
##  [81] koRpus_0.13-8             koRpus.lang.en_0.1-4     
##  [83] RhpcBLASctl_0.23-42       DiagrammeR_1.0.10        
##  [85] utf8_1.2.4                generics_0.1.3           
##  [87] data.table_1.14.8         robustbase_0.99-0        
##  [89] class_7.3-22              stopwords_2.3            
##  [91] httr_1.4.7                htmlwidgets_1.6.2        
##  [93] S4Arrays_1.3.0            whisker_0.4.1            
##  [95] pkgconfig_2.0.3           gtable_0.3.4             
##  [97] tensorflow_2.14.0         XVector_0.43.0           
##  [99] pcaPP_2.0-3               htmltools_0.5.6.1        
## [101] scales_1.2.1              Biobase_2.63.0           
## [103] png_0.1-8                 lgr_0.4.4                
## [105] knitr_1.44                rstudioapi_0.15.0        
## [107] visNetwork_2.1.2          proxy_0.4-27             
## [109] cachem_1.0.8              stringr_1.5.0            
## [111] parallel_4.4.0            vipor_0.4.5              
## [113] mlapi_0.1.1               pillar_1.9.0             
## [115] grid_4.4.0                vctrs_0.6.4              
## [117] promises_1.2.1            slam_0.1-50              
## [119] BiocSingular_1.19.0       beachmat_2.19.0          
## [121] xtable_1.8-4              cluster_2.1.4            
## [123] beeswarm_0.4.0            evaluate_0.22            
## [125] zeallot_0.1.0             mvtnorm_1.2-3            
## [127] cli_3.6.1                 compiler_4.4.0           
## [129] rlang_1.1.1               crayon_1.5.2             
## [131] rngtools_1.5.2            webchem_1.3.0            
## [133] rrcov_1.7-4               labeling_0.4.3           
## [135] ggbeeswarm_0.7.2          stringi_1.7.12           
## [137] viridisLite_0.4.2         BiocParallel_1.37.0      
## [139] DeepPINCS_1.11.0          munsell_0.5.0            
## [141] Matrix_1.6-1.1            sparseMatrixStats_1.15.0 
## [143] ggplot2_3.4.4             shiny_1.7.5.1            
## [145] ROCR_1.0-11               bslib_0.5.1              
## [147] DEoptimR_1.1-3



4 References

Aggarwal, C. C. (2018). Neural Networks and Deep Learning. Springer.

Al-Jabery, K., Obafemi-Ajayi, T., Olbricht, G., & Wunsch, D. (2019). Computational Learning Approaches to Data Analytics in Biomedical Applications. Academic Press.

Cinelli, L. P., Marins, M. A., da Silva, E. A. B., & Netto, S. L. (2021). Variational Methods for Machine Learning with Applications to Deep Networks. Springer.

Das, H., Pradhan, C., & Dey, N. (2020). Deep Learning for Data Analytics: Foundations, Biomedical Applications, and Challenges. Academic Press.

Marouf, M., Machart, P., Bansal, V., Kilian, C., Magruder, D. S., Krebs, C. F., & Bonn, S. (2020). Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks. Nature communications, 11(1), 1-12.

Pedrycz, W., & Chen, S. M. (Eds.). (2020). Deep Learning: Concepts and Architectures. Springer.

Yan, W. Q. (2020). Computational Methods for Deep Learning: Theoretic, Practice and Applications. Springer.