Contents

1 Introduction

The method in this package is an extension of the classic Nested Effects Models provided in the package . Nested Effects Models is a pathway reconstruction method, which takes into account effects of downstream genes. Those effects are observed for every knock-out of an upstream pathway gene, and the nested structure of observed effects can then be used to reconstruct the pathway structure. However, classic Nested Effects Models do not account for double knock-outs. In this package , one additional layer of complexity is added. For every two genes, acting on one gene together, the relationship is evaluated and added to the model as a logical gate. Genetic relationships are represented by the logics OR (no relationship), AND (functional overlap), NOT (masking or inhibiting) and XOR (mutual prevention from acting on gene C). Please see the references for a more detailed description of NEMs and epiNEMs.

Simulation and application results are imported from pre-calculated data sets to shorten the runtime of this vignette.

2 Loading epiNEM

library(epiNEM)

3 Quick start

The data should be in the form of binary effects stemming from knock-out data including single and double perturbations with effect reporters, e.g. genes, as rows and perturbations as columns. A one in row i and column j denotes an effect of reporter i for perturbation j. These effects are usually derived from comparing control/wild-type experiments with respective perturbation experiments (e.g. differential expression with edgeR, Robinson et al., 2010). The binary data is the main input for the epiNEM function, which contains the inference algorithm to estimate the underlying network structure of the data. For up to four signalling genes (=different perturbation targets) exhaustive search is available. For five or more a greedy search is implemented as an alternative.

data <- matrix(sample(c(0,1), 100*4, replace = TRUE), 100, 4)
colnames(data) <- c("A", "A.B", "B", "C")
rownames(data) <- paste("E", 1:100, sep = "_")
print(head(data))
##     A A.B B C
## E_1 1   0 0 1
## E_2 0   1 0 1
## E_3 0   0 1 0
## E_4 1   0 1 0
## E_5 1   0 0 1
## E_6 0   0 1 0
res <- epiNEM(data, method = "exhaustive")
plot(res)
Toy example result for a double mutant (A,B) and one modulator
(C).

Figure 1: Toy example result for a double mutant (A,B) and one modulator
(C).

The plot shows the inferred network with the signaling genes (bright red), the inferred logic (green) and attachment of the number of effect reporters (grey).

Alternatively the input can be a larger matrix with single and double knock-outs. EpiNEM can perform a systematic analysis to identify most likely modulators for a signaling gene pair of a double knock-out.

data <- matrix(sample(c(0,1), 100*9, replace = TRUE), 100, 9)
colnames(data) <- c("A.B", "A.C", "B.C", "A", "B", "C", "D", "E", "G")
rownames(data) <- paste("E", 1:100, sep = "_")
res <- epiScreen(data)
## [1] "A.B"
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "G"
## [1] "A.C"
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "G"
## [1] "B.C"
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "G"
plot(res)
Distribution of inferred logics for each double knock-out.

Figure 2: Distribution of inferred logics for each double knock-out

plot(res, global = FALSE, ind = 1)
Ranked modulators for one double knock-out.

Figure 3: Ranked modulators for one double knock-out

If the “global” parameter is set to FALSE, detailed results are plotted for each or a specific (“ind” parameter) pair of a double knock-out.

The results (logics) of the knock-out screens have been annotated according to the following legend of effects (black), where the x-axis denotes the gene and the y-axis denotes the type of knock-out. E.g. if C is regulated via the OR gate, it is effected by by both single and the double knock-out.

epiAnno()
Perfect binary effects matrix for each logic.

Figure 4: Perfect binary effects matrix for each logic

4 Simulations

We compare epiNEM to original NEMs (Markowetz et al., 2005), Boolean Nested Effects Models (B-NEM, link, Pirkl et al., 2016), Aracne (Margolin et al., 2006) and the PC algorithm (Kalisch et al., 2007) for 100 four node networks and respective data experiencing different levels of noise. The evaluation is done by accounting for the accuracy of discovered edges and in the case of B-NEM and epiNEM also the truth tables and logical gates.

We simulate code based on a ground truth and infer an optimal network with all aforementioned methods solely based on the data. The B-NEM package is necessary to run the full simulation: link. If the comparison to B-NEM is omitted, the B-NEM package is not necessary for epiNEM to function.

B-NEM is a method for general Boolean network inference based on multivariate read-out from combinatorial perturbation experiments. EpiNEM is specifically designed for knock-out screens including double and single knock-outs. Thus epiNEM is faster especially for large scale screens and more accurate, because it is designed for triples.

We boxplot running time, accuracy for the inferred edges, logical gates and expected data (truth table) for all five respectively two methods.

data(sim)
plot(sim)
Simulation results.

Figure 5: Simulation results

The plot shows that as expected B-NEM is the slowest followed by epiNEM, while NEM, ARACNE and the PC algorithm are much faster. However, the last three can not derive any logics in the network structure. epiNEM and B-NEM show almost equally high accuracy of inferred edges and truth tables (expected data). However, epiNEM reaches a much higher accuracy for inferred logic gates, which is due to the much higher degree of equivalent network structures in the case of B-NEM.

5 Yeast knock-out screens

In this section we analyse previously published yeast knock-out screens. The screens consist of gene expression data derived from double and single knock-out mutants. We use epiNEM on each double mutant combined with each single mutant.

5.1 Identification of significant modulators for pairs of double knock-outs

The first knock-out screen is from van Wageningen et al., 2010 and was created to investigate relationships between phosphorylation based pathways.

We applied epiNEM to all triples and infer logics and calculated the log-likelihood.

The following plot shows the global distribution of the inferred logical gates for all pairs. The single modulators are on the y-axis and the double knock-out pairs are on the x-axis.

data(wagscreen)

doubles <- wagscreen$doubles

dataWag <- wagscreen$dataWag

## clean up the results:

if (length(grep("fus3|ptp2.ptc2", wagscreen$doubles)) > 0) {
    wagscreen$doubles <- wagscreen$doubles[-grep("fus3|ptp2.ptc2",
                                                 wagscreen$doubles)]
    
    wagscreen$dataWag <- wagscreen$dataWag[, -grep("fus3|ptp2.ptc2",
                                                   colnames(wagscreen$dataWag))]
    
    wagscreen$ll <- wagscreen$ll[, -grep("fus3|ptp2.ptc2",
                                         colnames(wagscreen$ll))]
    
    wagscreen$logic <- wagscreen$logic[, -grep("fus3|ptp2.ptc2",
                                               colnames(wagscreen$logic))]
}

plot(wagscreen, xrot = 45, borderwidth = 0)