library(MOFA2) library(tidyverse) library(pheatmap)
To illustrate the MEFISTO method in MOFA2 we simulate a small example data set with 4 different views and one covariates defining a timeline using
make_example_data. The simulation is based on 4 factors, two of which vary smoothly along the covariate (with different lengthscales) and two are independent of the covariate.
set.seed(2020) # set number of samples and time points N <- 200 time <- seq(0,1,length.out = N) # generate example data dd <- make_example_data(sample_cov = time, n_samples = N, n_factors = 4, n_features = 200, n_views = 4, lscales = c(0.5, 0.2, 0, 0)) # input data data <- dd$data # covariate matrix with samples in columns time <- dd$sample_cov rownames(time) <- "time"
Let’s have a look at the simulated latent temporal processes, which we want to recover:
df <- data.frame(dd$Z, t(time)) df <- gather(df, key = "factor", value = "value", starts_with("simulated_factor")) ggplot(df, aes(x = time, y = value)) + geom_point() + facet_grid(~factor)
Using the MEFISTO framework is very similar to using MOFA2. In addition to the omics data, however, we now additionally specify the time points for each sample. If you are not familiar with the MOFA2 framework, it might be helpful to have a look at MOFA2 tutorials first.
To create the MOFA object we need to specify the training data and the covariates for pattern detection and inference of smooth factors. Here,
sample_cov is a matrix with samples in columns and one row containing the time points. The sample order must match the order in data columns. Alternatively, a data frame can be provided containing one
sample columns with samples names matching the sample names in the data.
First, we start by creating a standard MOFA model.
sm <- create_mofa(data = dd$data)
## Creating MOFA object from a list of matrices (features as rows, sample as columns)...
Now, we can add the additional temporal covariate, that we want to use for training.
sm <- set_covariates(sm, covariates = time) sm
## Untrained MEFISTO model with the following characteristics: ## Number of views: 4 ## Views names: view_1 view_2 view_3 view_4 ## Number of features (per view): 200 200 200 200 ## Number of groups: 1 ## Groups names: group1 ## Number of samples (per group): 200 ## Number of covariates per sample: 1 ##
We now successfully created a MOFA object that contains 4 views, 1 group and 1 covariate giving the time point for each sample.
Before training, we can specify various options for the model, the training and the data preprocessing. If no options are specified, the model will use the default options. See also
get_default_training_options to have a look at the defaults and change them where required. For illustration, we only use a small number of iterations.
Importantly, to activate the use of the covariate for a functional decomposition (MEFISTO) we now additionally to the standard MOFA options need to specify
mefisto_options. For this you can just use the default options (
get_default_mefisto_options), unless you want to make use of advanced options such as alignment across groups.
data_opts <- get_default_data_options(sm) model_opts <- get_default_model_options(sm) model_opts$num_factors <- 4 train_opts <- get_default_training_options(sm) train_opts$maxiter <- 100 mefisto_opts <- get_default_mefisto_options(sm) sm <- prepare_mofa(sm, model_options = model_opts, mefisto_options = mefisto_opts, training_options = train_opts, data_options = data_opts)
Now, the MOFA object is ready for training. Using
run_mofa we can fit the model, which is saved in the file specified as
outfile. If none is specified the output is saved in a temporary location.
outfile = file.path(tempdir(),"model.hdf5") sm <- run_mofa(sm, outfile, use_basilisk = TRUE)
plot_variance_explained we can explore which factor is active in which view.
plot_factor_cor shows us whether the factors are correlated.
r <- plot_factor_cor(sm)