Using CONDOR for community detection in bipartite graphs

Introduction

COmplex Network Description Of Regulators (CONDOR) implements methods for clustering bipartite networks and estimating the contribution of each node to its community’s modularity. For an application of this method to identify diesease-associated single nucleotide polymorphisms, see (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005033).

Installation

CONDOR can be installed through netZooR as follows:

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("netZooR")

Implementing the Bipartite Modularity Maximization

The code in condorModularityMax is an implementation of the method described in Michael Barber’s paper Modularity and community detection in bipartite networks (Phys. Rev. E 76, 066102 (2007)). A few general comments:

Workflow

library(netZooR)

condor works with an edgelist (elist in the code below) as its input.

r = c(1,1,1,2,2,2,3,3,3,4,4);
b = c(1,2,3,1,2,4,2,3,4,3,4);
reds <- c("Alice","Sue","Janine","Mary")
blues <- c("Bob","John","Ed","Hank")
elist <- data.frame(red=reds[r], blue=blues[b])

In elist, notice all nodes of the same type–women and men in this case–appear in the same column together. This is a requirement. createCondorObject will throw an error if a node appears in both columns.

condor.object <- createCondorObject(elist)

A condor.object is just a list. You can look at the different items using names

names(condor.object)

## [1] "G"          "edges"      "Qcoms"      "modularity" "red.memb"  
## [6] "blue.memb"  "qscores"

condorCluster will cluster the nodes and produce the overall modularity along with two community membership data.frames:

condor.object <- condorCluster(condor.object)

## [1] "modularity of projected graph 0"
## [1] "Q = 0"
## [1] "Q = 0.132231404958678"
## [1] "Q = 0.198347107438017"
## [1] "Q = 0.231404958677686"
## [1] "Q = 0.231404958677686"

print(condor.object$red.memb)

##   red.names com
## 1     Alice   2
## 2    Janine   1
## 3      Mary   1
## 4       Sue   2

print(condor.object$blue.memb)

##   blue.names com
## 1        Bob   2
## 2         Ed   1
## 3       Hank   1
## 4       John   2

Nodes in first community are {Alice, John, Bob, Sue}, nodes in second community are {Ed, Janine, Hank, Mary} based on the modularity maximization. Here’s a picture:

gtoy = graph.edgelist(as.matrix(elist),directed=FALSE)
set.graph.attribute(gtoy, "layout", layout.kamada.kawai(gtoy))

## IGRAPH 1f87767 UN-- 8 11 -- 
## + attr: layout (g/n), name (v/c)
## + edges from 1f87767 (vertex names):
##  [1] Alice--Bob    Alice--John   Alice--Ed     Bob  --Sue    John --Sue   
##  [6] Sue  --Hank   John --Janine Ed   --Janine Hank --Janine Ed   --Mary  
## [11] Hank --Mary

V(gtoy)[c(reds,blues)]$color <- c(rep("red",4),rep("blue",4))

plot(gtoy,vertex.label.dist=2)

To get each node’s modularity contribution (as a fraction of the community’s modularity), run

condor.object <- condorQscore(condor.object)

If you have a subset of nodes that you think are more likely to lie at the cores of your communities, you can test this using condorCoreEnrich:

q_women <- condor.object$qscores$red.qscore
core_stats <- condorCoreEnrich(test_nodes=c("Alice","Mary"),
                                 q=q_women,perm=TRUE,plot.hist=TRUE)

condor also works on weighted bipartite networks. The package comes with a quantitative pollination network data set (Small 1976) taken from the NCEAS interaction webs data base, containing interactions between 13 plants and 34 pollinators.

data(small1976)
condor.object <- createCondorObject(small1976)
condor.object <- condorCluster(condor.object, project=FALSE)

## [1] "modularity of projected graph 0.525346928655047"
## [1] "Q = 0.52666696475026"
## [1] "Q = 0.52666696475026"

condorPlotHeatmap(condor.object)

Session Information

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /media/volume/teran2_disk/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] netZooR_1.10.0      matrixcalc_1.0-6    yarn_1.32.0        
## [4] pandaR_1.38.0       Biobase_2.66.0      BiocGenerics_0.52.0
## [7] reticulate_1.39.0   igraph_2.1.1       
## 
## loaded via a namespace (and not attached):
##   [1] STRINGdb_2.18.0             fs_1.6.5                   
##   [3] matrixStats_1.4.1           bitops_1.0-9               
##   [5] httr_1.4.7                  RColorBrewer_1.1-3         
##   [7] doParallel_1.0.17           Rgraphviz_2.50.0           
##   [9] repr_1.1.7                  tools_4.4.1                
##  [11] doRNG_1.8.6                 backports_1.5.0            
##  [13] utf8_1.2.4                  R6_2.5.1                   
##  [15] vegan_2.6-8                 HDF5Array_1.34.0           
##  [17] mgcv_1.9-1                  rhdf5filters_1.18.0        
##  [19] permute_0.9-7               prettyunits_1.2.0          
##  [21] base64_2.0.2                preprocessCore_1.68.0      
##  [23] cli_3.6.3                   penalized_0.9-52           
##  [25] sass_0.4.9                  readr_2.1.5                
##  [27] genefilter_1.88.0           askpass_1.2.1              
##  [29] Rsamtools_2.22.0            pbdZMQ_0.3-13              
##  [31] siggenes_1.80.0             illuminaio_0.48.0          
##  [33] rentrez_1.2.3               AnnotationForge_1.48.0     
##  [35] scrime_1.3.5                plotrix_3.8-4              
##  [37] limma_3.62.0                RSQLite_2.3.7              
##  [39] generics_0.1.3              GOstats_2.72.0             
##  [41] quantro_1.40.0              BiocIO_1.16.0              
##  [43] gtools_3.9.5                dplyr_1.1.4                
##  [45] GO.db_3.20.0                Matrix_1.7-1               
##  [47] fansi_1.0.6                 S4Vectors_0.44.0           
##  [49] abind_1.4-8                 lifecycle_1.0.4            
##  [51] yaml_2.3.10                 edgeR_4.4.0                
##  [53] SummarizedExperiment_1.36.0 gplots_3.2.0               
##  [55] rhdf5_2.50.0                SparseArray_1.6.0          
##  [57] BiocFileCache_2.14.0        grid_4.4.1                 
##  [59] blob_1.2.4                  crayon_1.5.3               
##  [61] lattice_0.22-6              GenomicFeatures_1.58.0     
##  [63] annotate_1.84.0             KEGGREST_1.46.0            
##  [65] pillar_1.9.0                knitr_1.48                 
##  [67] beanplot_1.3.1              GenomicRanges_1.58.0       
##  [69] rjson_0.2.23                codetools_0.2-20           
##  [71] glue_1.8.0                  downloader_0.4             
##  [73] data.table_1.16.2           vctrs_0.6.5                
##  [75] png_0.1-8                   gtable_0.3.6               
##  [77] assertthat_0.2.1            gsubfn_0.7                 
##  [79] cachem_1.1.0                xfun_0.48                  
##  [81] S4Arrays_1.6.0              survival_3.7-0             
##  [83] iterators_1.0.14            statmod_1.5.0              
##  [85] nlme_3.1-166                Category_2.72.0            
##  [87] bit64_4.5.2                 progress_1.2.3             
##  [89] filelock_1.0.3              GenomeInfoDb_1.42.0        
##  [91] bslib_0.8.0                 nor1mix_1.3-3              
##  [93] KernSmooth_2.23-24          colorspace_2.1-1           
##  [95] DBI_1.2.3                   nnet_7.3-19                
##  [97] tidyselect_1.2.1            bit_4.5.0                  
##  [99] compiler_4.4.1              curl_5.2.3                 
## [101] chron_2.3-61                httr2_1.0.5                
## [103] graph_1.84.0                xml2_1.3.6                 
## [105] ggdendro_0.2.0              DelayedArray_0.32.0        
## [107] rtracklayer_1.66.0          scales_1.3.0               
## [109] caTools_1.18.3              hexbin_1.28.4              
## [111] quadprog_1.5-8              RBGL_1.82.0                
## [113] rappdirs_0.3.3              stringr_1.5.1              
## [115] digest_0.6.37               rmarkdown_2.28             
## [117] GEOquery_2.74.0             XVector_0.46.0             
## [119] htmltools_0.5.8.1           pkgconfig_2.0.3            
## [121] base64enc_0.1-3             sparseMatrixStats_1.18.0   
## [123] MatrixGenerics_1.18.0       highr_0.11                 
## [125] dbplyr_2.5.0                fastmap_1.2.0              
## [127] rlang_1.1.4                 UCSC.utils_1.2.0           
## [129] DelayedMatrixStats_1.28.0   jquerylib_0.1.4            
## [131] jsonlite_1.8.9              BiocParallel_1.40.0        
## [133] mclust_6.1.1                RCurl_1.98-1.16            
## [135] magrittr_2.0.3              GenomeInfoDbData_1.2.13    
## [137] Rhdf5lib_1.28.0             IRkernel_1.3.2             
## [139] munsell_0.5.1               Rcpp_1.0.13                
## [141] proto_1.0.0                 sqldf_0.4-11               
## [143] stringi_1.8.4               RJSONIO_1.3-1.9            
## [145] zlibbioc_1.52.0             MASS_7.3-61                
## [147] plyr_1.8.9                  bumphunter_1.48.0          
## [149] org.Hs.eg.db_3.20.0         minfi_1.52.0               
## [151] parallel_4.4.1              Biostrings_2.74.0          
## [153] IRdisplay_1.1               splines_4.4.1              
## [155] multtest_2.62.0             hash_2.2.6.3               
## [157] hms_1.1.3                   locfit_1.5-9.10            
## [159] uuid_1.2-1                  RUnit_0.4.33               
## [161] base64url_1.4               rngtools_1.5.2             
## [163] reshape2_1.4.4              biomaRt_2.62.0             
## [165] stats4_4.4.1                XML_3.99-0.17              
## [167] evaluate_1.0.1              tzdb_0.4.0                 
## [169] foreach_1.5.2               tidyr_1.3.1                
## [171] openssl_2.2.2               purrr_1.0.2                
## [173] reshape_0.8.9               ggplot2_3.5.1              
## [175] xtable_1.8-4                restfulr_0.0.15            
## [177] viridisLite_0.4.2           RCy3_2.26.0                
## [179] tibble_3.2.1                memoise_2.0.1              
## [181] AnnotationDbi_1.68.0        GenomicAlignments_1.42.0   
## [183] IRanges_2.40.0              cluster_2.1.6              
## [185] GSEABase_1.68.0