Contents

1 Introduction

RandomWalkRestartMH (Random Walk with Restart on Multiplex and Heterogeneous Networks) is an R package built to provide an easy interface to perform Random Walk with Restart in different types of complex networks:

  1. Monoplex networks (Single networks).
  2. Multiplex networks.
  3. Heterogeneous networks.
  4. Multiplex-Heterogeneous networks.

It is based on the work we presented in the article:

https://academic.oup.com/bioinformatics/article/35/3/497/5055408

We have recently extended the method in order to take into account weighted networks. In addition, the package is now able to perform Random Walk with Restart on:

  1. Full multiplex-heterogeneous networks.

RWR simulates an imaginary particle that starts on a seed(s) node(s) and follows randomly the edges of a network. At each step, there is a restart probability, r, meaning that the particle can come back to the seed(s) (Pan et al. 2004). This imaginary particle can explore the following types of networks:

The user can integrate single networks (monoplex networks) to create a multiplex network. The multiplex network can also be integrated, thanks to bipartite relationships, with another multiplex network containing nodes of different nature. Proceeding this way, a network both multiplex and heterogeneous will be generated. To do so, follow the instructions detailed below

Please note that this version of the package does not deal with directed networks. New features will be included in future updated versions of RandomWalkRestartMH.

2 Installation of the RandomWalkRestartMH package

First of all, you need a current version of R. RandomWalkRestartMH is a freely available package deposited on Bioconductor and GitHub. You can install it by running the following commands on an R console:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("RandomWalkRestartMH")

or to install the latest version from GitHub before it is released in Bioconductor:

devtools::install_github("alberto-valdeolivas/RandomWalkRestartMH")

3 A Detailed Workflow

In the following paragraphs, we describe how to use the RandomWalkRestartMH package to perform RWR on different types of biological networks. Concretely, we use a protein-protein interaction (PPI) network, a pathway network, a disease-disease similarity network and combinations thereof. These networks are obtained as detailed in (Valdeolivas et al. 2018). The PPI and the Pathway network were reduced by only considering genes/proteins expressed in the adipose tissue, in order to reduce the computation time of this vignette.

The goal in the example presented here is, as described in (Valdeolivas et al. 2018), to find candidate genes potentially associated with diseases by a guilt-by-association approach. This is based on the fact that genes/proteins with similar functions or similar phenotypes tend to lie closer in biological networks. Therefore, the larger the RWR score of a gene, the more likely it is to be functionally related with the seeds.

We focus on a real biological example: the SHORT syndrome (MIM code: 269880) and its causative gene PIK3R1 as described in (Valdeolivas et al. 2018). We will see throughout the following paragraphs how the RWR results evolve due to the the integration and exploration of additional networks.

3.1 Random Walk with Restart on a Monoplex Network

RWR has usually been applied within the framework of single PPI networks in bioinformatics (Kohler et al. 2008). A gene or a set of genes, so-called seed(s), known to be implicated in a concrete function or in a specific disease, are chosen as the starting point(s) of the algorithm. The RWR particle explores the neighbourhood of the seeds and the algorithm computes a score for all the nodes of the network. The larger it is the score of a node, the closer it is to the seed(s).

Let us generate an object of the class Multiplex, even if it is a monoplex network, with our PPI network.

library(RandomWalkRestartMH)
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
data(PPI_Network) # We load the PPI_Network

## We create a Multiplex object composed of 1 layer (It's a Monoplex Network) 
## and we display how it looks like
PPI_MultiplexObject <- create.multiplex(list(PPI=PPI_Network))
PPI_MultiplexObject
## Number of Layers:
## [1] 1
## 
## Number of Nodes:
## [1] 4317
## 
## IGRAPH 6799c0e UNW- 4317 18062 -- 
## + attr: name (v/c), weight (e/n), type (e/c)
## + edges from 6799c0e (vertex names):
##  [1] AAMP   --VPS52     AAMP   --BHLHE40   AAMP   --GABARAPL2 AAMP   --MAP1LC3B 
##  [5] VPS52  --TXN2      VPS52  --DDX6      VPS52  --MFAP1     VPS52  --PRKAA1   
##  [9] VPS52  --LMO4      VPS52  --STX11     VPS52  --KANK2     VPS52  --PPP1R18  
## [13] VPS52  --TXLNA     VPS52  --KIAA1217  VPS52  --VPS28     VPS52  --ATP6V1D  
## [17] VPS52  --TPM3      VPS52  --KIF5B     VPS52  --NOP2      VPS52  --RNF41    
## [21] VPS52  --WTAP      VPS52  --MAPK3     VPS52  --ZMAT2     VPS52  --VPS51    
## [25] BHLHE40--AES       BHLHE40--PRKAA1    BHLHE40--CCNK      BHLHE40--RBPMS    
## [29] BHLHE40--COX5B     BHLHE40--UBE2I     BHLHE40--MAGED1    BHLHE40--PLEKHB2  
## + ... omitted several edges

To apply the RWR on a monoplex network, we need to compute the adjacency matrix of the network and normalize it by column (Kohler et al. 2008), as follows:

AdjMatrix_PPI <- compute.adjacency.matrix(PPI_MultiplexObject)
AdjMatrixNorm_PPI <- normalize.multiplex.adjacency(AdjMatrix_PPI)

Then, we need to define the seed(s) before running the RWR algorithm on this PPI network. As commented above, we are focusing on the example of the SHORT syndrome. Therefore, we take the PIK3R1 gene as seed, and we execute RWR.

SeedGene <- c("PIK3R1")
## We launch the algorithm with the default parameters (See details on manual)
RWR_PPI_Results <- Random.Walk.Restart.Multiplex(AdjMatrixNorm_PPI,
                        PPI_MultiplexObject,SeedGene)
# We display the results
RWR_PPI_Results
## Top 10 ranked Nodes:
##    NodeNames       Score
## 1       GRB2 0.006845881
## 2       EGFR 0.006169129
## 3        CRK 0.005674261
## 4       ABL1 0.005617041
## 5        FYN 0.005611086
## 6      CDC42 0.005594680
## 7       SHC1 0.005577900
## 8       CRKL 0.005509182
## 9    KHDRBS1 0.005443541
## 10     TYRO3 0.005441887
## 
## Seed Nodes used:
## [1] "PIK3R1"

Finally, we can create a network (an igraph object) with the top scored genes. Visualize the top results within their interaction network is always a good idea in order to prioritize genes, since we can have a global view of all the potential candidates. The results are presented in Figure 1

## In this case we selected to induce a network with the Top 15 genes.
TopResults_PPI <-
    create.multiplexNetwork.topResults(RWR_PPI_Results,PPI_MultiplexObject,
        k=15)
## We print that cluster with its interactions.
par(mar=c(0.1,0.1,0.1,0.1))
plot(TopResults_PPI, vertex.label.color="black",vertex.frame.color="#ffffff",
    vertex.size= 20, edge.curved=.2,
    vertex.color = ifelse(igraph::V(TopResults_PPI)$name == "PIK3R1","yellow",
    "#00CCFF"), edge.color="blue",edge.width=0.8)