Contents

1 Introduction

This vignette demonstrates the scGraphVerse workflow on a single-cell RNA-seq dataset. We show how to:

  1. Load and preprocess public PBMC data.
  2. Infer gene regulatory networks with GENIE3.
  3. Build consensus networks and detect communities.
  4. Validate inferred edges using STRINGdb.

2 1. Dataset and Preprocessing

In this real-data analysis, we’ll work with two public PBMC (Peripheral Blood Mononuclear Cell) datasets from 10X Genomics. Our strategy is to focus specifically on B cells and identify a common set of highly expressed genes across both datasets. This approach allows us to compare regulatory networks between different experimental batches while controlling for cell type and gene selection effects.

Data Processing Workflow: 1. Load two PBMC datasets (3k and 4k cells) from TENxPBMCData 2. Use SingleR for automated cell type annotation 3. Select top 500 most expressed genes in B cells from each dataset 4. Find intersection of expressed genes to ensure comparable gene sets 5. Subset datasets to B cells only with common gene set

This preprocessing ensures we have clean, comparable data for network inference.

# Note: This vignette requires external packages from Suggests
# Install if needed:
# BiocManager::install(c("TENxPBMCData", "scater", "SingleR", "celldex"))

# Helper function to preprocess PBMC data
preprocess_pbmc <- function(pbmc_obj) {
    sce <- scater::logNormCounts(pbmc_obj)
    symbols_tenx <- SummarizedExperiment::rowData(sce)$Symbol_TENx
    valid <- !is.na(symbols_tenx) & symbols_tenx != ""
    sce <- sce[valid, ]
    rownames(sce) <- make.unique(symbols_tenx[valid])
    SummarizedExperiment::assay(sce, "logcounts") <-
        as.matrix(SummarizedExperiment::assay(sce, "logcounts"))
    colnames(sce) <- paste0("cell_", seq_len(ncol(sce)))
    return(sce)
}

# 1. Load and preprocess PBMC data
if (requireNamespace("TENxPBMCData", quietly = TRUE)) {
    pbmc_obj <- TENxPBMCData::TENxPBMCData("pbmc3k")
    pbmc_obj1 <- TENxPBMCData::TENxPBMCData("pbmc4k")

    sce <- preprocess_pbmc(pbmc_obj)
    sce1 <- preprocess_pbmc(pbmc_obj1)

    # 2. Cell type annotation
    if (requireNamespace("celldex", quietly = TRUE) &&
        requireNamespace("SingleR", quietly = TRUE)) {
        ref <- celldex::HumanPrimaryCellAtlasData()
        pred1 <- SingleR::SingleR(test = sce1, ref = ref, labels=ref$label.main)
        SummarizedExperiment::colData(sce1)$pcell <- pred1$labels

        pred <- SingleR::SingleR(test = sce, ref = ref, labels=ref$label.main)
        SummarizedExperiment::colData(sce)$pcell <- pred$labels

        # 3. Select top 100 B-cell genes
        genes <- selgene(
            object = sce,
            top_n = 100,
            cell_type = "B_cell",
            cell_type_col = "pcell",
            remove_rib = TRUE,
            remove_mt = TRUE
        )

        genes1 <- selgene(
            object = sce1,
            top_n = 100,
            cell_type = "B_cell",
            cell_type_col = "pcell",
            remove_rib = TRUE,
            remove_mt = TRUE
        )

        # 4. Find intersection
        commong <- intersect(genes, genes1)

        # 5. Subset data
        pbmc1_sub <- sce[
            commong,
            SummarizedExperiment::colData(sce)$pcell == "B_cell"
        ]
        pbmc2_sub <- sce1[
            commong,
            SummarizedExperiment::colData(sce1)$pcell == "B_cell"
        ]

        pbmc1_sub <- pbmc1_sub[, 1:85]
        pbmc2_sub <- pbmc2_sub[, 1:85]

        # Create MultiAssayExperiment for multi-sample analysis
        bcell_mae <- create_mae(list(pbmc1 = pbmc1_sub, pbmc2 = pbmc2_sub))
    }
}
#> see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation
#> loading from cache
#> see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation
#> loading from cache
#> Using SCE assay 'logcounts' (log-normalized).
#> Subsetted to 344 cells where pcell = 'B_cell'.
#> Removed mitochondrial genes matching '^MT-'.
#> Removed ribosomal genes matching '^RP[SL]'.
#> Top 100 genes selected based on mean expression.
#> Using SCE assay 'logcounts' (log-normalized).
#> Subsetted to 607 cells where pcell = 'B_cell'.
#> Removed mitochondrial genes matching '^MT-'.
#> Removed ribosomal genes matching '^RP[SL]'.
#> Top 100 genes selected based on mean expression.

3 2. Network Inference

Now we’ll infer gene regulatory networks from our preprocessed B cell data using GENIE3. Unlike the simulation study where we had control over all parameters, here we’re working with real biological data that presents unique challenges: B cells have distinct expression patterns, lower total gene counts compared to the full transcriptome, and batch effects between the two PBMC datasets.

The function now accepts a MultiAssayExperiment object as input, which provides a structured way to handle multiple experimental conditions.

# Choose method: "GENIE3", "GRNBoost2", "ZILGM", "PCzinb" or "JRF"
method <- "GENIE3"
if (exists("bcell_mae")) {
    networks <- infer_networks(
        count_matrices_list = bcell_mae,
        method = method,
        nCores = 1
    )
}

3.1 2.1. Building Adjacency Matrices

Here we apply a more stringent threshold (99th percentile) compared to the simulation study (95th percentile). This is appropriate for real data analysis where we expect higher noise levels and want to focus on the most confident regulatory relationships. With B cell data, we’re particularly interested in high-confidence edges controlling B cell function.

The functions now return and work with SummarizedExperiment objects for better data organization and metadata tracking.

if (exists("networks")) {
    # Weighted adjacency (returns SummarizedExperiment)
    wadj_se <- generate_adjacency(networks)

    # Symmetrize (returns SummarizedExperiment)
    swadj_se <- symmetrize(wadj_se, weight_function = "mean")

    # Binary cutoff (top 1%, returns SummarizedExperiment)
    binary_se <- cutoff_adjacency(
        count_matrices = bcell_mae,
        weighted_adjm_list = swadj_se,
        n = 1,
        method = method,
        quantile_threshold = 0.99,
        nCores = 1
    )

    # Plot
    if (requireNamespace("ggraph", quietly = TRUE)) {
        plotg(binary_se)
    }
}

plot Graphs

4 3. Consensus and Community Detection

With only two PBMC datasets, our consensus network will include edges that appear in both batches, providing high confidence in the regulatory relationships.

if (exists("binary_se")) {
    # Consensus by vote
    consensus <- create_consensus(binary_se, method = "vote")

    # Plot consensus network
    if (requireNamespace("ggraph", quietly = TRUE)) {
        plotg(consensus)
    }
}

plot consensus Graph

Here we demonstrate the use of classify_edges() and plot_network_comparison() for detailed network comparison analysis. The false_plot parameter controls whether to include false positive/negative plots.

if (exists("consensus")) {
    # Note: For comparison with external databases like STRINGdb,
    # use stringdb_adjacency() to fetch reference networks

    # Community detection
    if (requireNamespace("igraph", quietly = TRUE)) {
        communities <- community_path(consensus)
    }
}
#> 
#> 
#> Detecting communities...

comparing Graphs with reference

#> Running pathway enrichment...
#> 'select()' returned 1:1 mapping between keys and columns
#> Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
#> Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns

5 4. Validation with STRINGdb

Unlike the simulation study where we used STRINGdb to create ground truth, here we use it for validation of our inferred B cell regulatory network. The keep_all_genes = TRUE parameter ensures we retain information for all genes in our dataset, even those not found in STRING, which is important for comprehensive edge mining analysis.

# Note: This requires internet connection for STRINGdb downloads
# This section requires the STRINGdb package from Suggests
if (exists("consensus") && requireNamespace("STRINGdb", quietly = TRUE)) {
    str_result <- stringdb_adjacency(
        genes = rownames(consensus),
        species = 9606,
        required_score = 900,
        keep_all_genes = TRUE
    )

    # Extract binary adjacency and symmetrize
    str_binary <- str_result$binary
    ground_truth_se <- symmetrize(
        build_network_se(list(string_network = str_binary)),
        weight_function = "mean"
    )
    ground_truth <- SummarizedExperiment::assay(ground_truth_se, 1)

    # Edge mining: TP rates (returns list of dataframes)
    em <- edge_mining(
        consensus,
        ground_truth = ground_truth,
        query_edge_types = "TP"
    )

    # Display results
    if (length(em) > 0) {
        print(head(em[[1]]))
    }
}
#> Registered S3 method overwritten by 'gplots':
#>   method         from     
#>   reorder.factor DescTools
#> Initializing STRINGdb...
#> Mapping genes to STRING IDs...
#> Mapped 85 genes to STRING IDs.
#> Retrieving **physical** interactions from STRING API...
#> Found 68 STRING physical interactions.
#> Adjacency matrices constructed successfully.
#>    gene1   gene2 edge_type pubmed_hits
#> 2  ACTG1   ARPC2        TP           1
#> 41 HLA-A   HLA-B        TP        2922
#> 43 HLA-A   HLA-C        TP        1190
#> 44 HLA-B   HLA-C        TP        1194
#> 67  CD74 HLA-DRA        TP          38
#> 70 HLA-B HLA-DRA        TP          21
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  PMIDs
#> 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             34938284
#> 41 41069639,41050417,41046214,41035653,41034087,41011272,41003814,41001029,40962674,40944463,40941618,40910546,40896631,40868188,40873549,40866583,40835749,40820324,40791414,40770645,40739281,40721532,40668377,40637775,40617889,40577315,40575193,40574847,40565849,40535491,40517559,40510810,40508101,40508057,40504263,28520367,40475155,40469303,40451247,40405507,40390041,40387827,40366340,40360437,40342296,40314039,40303409,40284905,40239304,40230599,40200142,40171627,40170583,40149438,40127639,40087529,40074802,40071333,40052889,40020215,40010968,39979805,39937552,39933824,39928675,39914254,39901240,39898909,39892763,39890193,39836838,39833815,39824805,39804685,39768617,39738923,39728756,39727970,39726387,39723380,39716692,39715961,39713404,39694865,39684614,39672954,39642776,39605664,39586371,39558667,39530176,39469985,39459542,39456988,39447013,39435845,39374941,39337366,39336927,39306605
#> 43 41097894,41035653,41011272,41003814,40941618,40912174,40896631,40873549,40842414,40820324,40804474,40791414,40770645,40739281,40721532,40668377,40642077,40637873,40577315,40574847,40535491,40508057,40469303,40429368,40405507,40402323,40390041,40366340,40361234,40284905,40239304,40223959,40223381,40200142,40074802,40020215,39979805,39933824,39928675,39901240,39836838,39835139,39804685,39768617,39731480,39723380,39715961,39694865,39672954,39642776,39621955,39605664,39558667,39469985,39459542,39447013,39374941,39337964,39306605,39294236,39234850,39230170,39222660,39201523,39201460,39185409,39173558,39167735,39141684,39111186,39107575,39085621,39044005,39015089,38978376,38940584,38936012,38924641,38907894,38900146,38840925,38825445,38785545,38758119,38757301,38680423,38674456,38667507,38657099,38645783,38581695,38575661,38568509,38547417,38535155,38524775,38485627,38478091,38469033,38459565
#> 44 41104340,41035653,41032234,41011272,41003814,40941618,40896631,40873549,40862422,40820673,40820324,40806754,40791414,40784610,40770645,40744770,40739281,40721532,40706259,40693714,40668377,40663089,40577315,40574847,40557158,40535491,40508057,40492078,40469303,40405507,40390041,40366340,40345160,40284905,40239304,40200142,40193244,40074802,40065352,40020215,40004750,39979805,39933824,39928675,39901240,39836838,39807693,39804685,39790007,39768617,39723380,39715961,39694865,39672954,39663186,39651149,39642776,39605664,39558667,39533856,39483265,39470005,39469985,39459542,39457610,39447013,39436105,39374941,39364548,39329950,39307952,39306605,39294236,39234850,39230170,39222660,39201523,39201460,39185409,39173558,39167735,39158076,39125781,39111186,39107575,39101335,39085621,39060355,39041326,39019884,39015482,39015089,38988514,38978376,38978245,38940584,38936012,38907894,38900146,38887196
#> 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               40993240,40806540,40800081,40717170,40517735,40338916,40226614,40223959,39937308,39865657,39694280,39483471,39280905,38367852,38129488,37150240,36975409,36211422,35845067,35804895,35626874,35378753,35208514,34589742,34137173,33235138,33042148,32305991,32259312,32218455,30365966,29867922,28862321,26500140,26246143,24321376,23127183,15467430
#> 70                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        40087529,37780578,37317627,35991636,35759748,35739510,35592524,35356004,34275915,32547543,32218410,31009447,29867922,25901754,24278156,23028341,22952805,21543121,19143813,14564482,12445311
#>    query_status
#> 2    hits_found
#> 41   hits_found
#> 43   hits_found
#> 44   hits_found
#> 67   hits_found
#> 70   hits_found

6 5. Conclusion

This case study illustrates how scGraphVerse enables end-to-end GRN reconstruction and validation in single-cell data. Users can swap inference algorithms, tune thresholds, and incorporate external prior networks.

7 Session Information

sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] TENxPBMCData_1.27.0         HDF5Array_1.37.0           
#>  [3] h5mread_1.1.1               rhdf5_2.53.6               
#>  [5] DelayedArray_0.35.3         SparseArray_1.9.1          
#>  [7] S4Arrays_1.9.1              abind_1.4-8                
#>  [9] Matrix_1.7-4                SingleCellExperiment_1.31.1
#> [11] SummarizedExperiment_1.39.2 Biobase_2.69.1             
#> [13] GenomicRanges_1.61.5        Seqinfo_0.99.2             
#> [15] IRanges_2.43.5              S4Vectors_0.47.4           
#> [17] BiocGenerics_0.55.4         generics_0.1.4             
#> [19] MatrixGenerics_1.21.0       matrixStats_1.5.0          
#> [21] scGraphVerse_0.99.20        BiocStyle_2.37.1           
#> 
#> loaded via a namespace (and not attached):
#>   [1] R.methodsS3_1.8.2           dichromat_2.0-0.1          
#>   [3] gld_2.6.8                   Biostrings_2.77.2          
#>   [5] vctrs_0.6.5                 ggtangle_0.0.7             
#>   [7] perturbR_0.1.3              digest_0.6.37              
#>   [9] png_0.1-8                   shape_1.4.6.1              
#>  [11] proxy_0.4-27                Exact_3.3                  
#>  [13] pcaPP_2.0-5                 BiocBaseUtils_1.11.2       
#>  [15] gypsum_1.5.0                ggrepel_0.9.6              
#>  [17] bst_0.3-24                  magick_2.9.0               
#>  [19] hdrcde_3.4                  MASS_7.3-65                
#>  [21] fontLiberation_0.1.0        reshape2_1.4.4             
#>  [23] foreach_1.5.2               qvalue_2.41.0              
#>  [25] withr_3.0.2                 xfun_0.53                  
#>  [27] ggfun_0.2.0                 survival_3.8-3             
#>  [29] doRNG_1.8.6.2               memoise_2.0.1              
#>  [31] ggbeeswarm_0.7.2            clusterProfiler_4.17.0     
#>  [33] gson_0.1.0                  systemfonts_1.3.1          
#>  [35] tidytree_0.4.6              networkD3_0.4.1            
#>  [37] gtools_3.9.5                R.oo_1.27.1                
#>  [39] WeightSVM_1.7-16            KEGGREST_1.49.2            
#>  [41] httr_1.4.7                  GENIE3_1.31.0              
#>  [43] hash_2.2.6.3                rhdf5filters_1.21.4        
#>  [45] rstudioapi_0.17.1           DOSE_4.3.0                 
#>  [47] curl_7.0.0                  ScaledMatrix_1.17.0        
#>  [49] ggraph_2.2.2                polyclip_1.10-7            
#>  [51] ExperimentHub_2.99.5        stringr_1.5.2              
#>  [53] pracma_2.4.4                doParallel_1.0.17          
#>  [55] evaluate_1.0.5              BiocFileCache_2.99.6       
#>  [57] hms_1.1.4                   glmnet_4.1-10              
#>  [59] bookdown_0.45               irlba_2.3.5.1              
#>  [61] colorspace_2.1-2            filelock_1.0.3             
#>  [63] reticulate_1.43.0           readxl_1.4.5               
#>  [65] magrittr_2.0.4              readr_2.1.5                
#>  [67] viridis_0.6.5               ggtree_3.99.1              
#>  [69] lattice_0.22-7              XML_3.99-0.19              
#>  [71] scuttle_1.19.0              cowplot_1.2.0              
#>  [73] class_7.3-23                pillar_1.11.1              
#>  [75] nlme_3.1-168                iterators_1.0.14           
#>  [77] caTools_1.18.3              compiler_4.5.1             
#>  [79] beachmat_2.25.5             stringi_1.8.7              
#>  [81] DescTools_0.99.60           plyr_1.8.9                 
#>  [83] mpath_0.4-2.26              fda_6.3.0                  
#>  [85] crayon_1.5.3                scater_1.37.0              
#>  [87] gbm_2.2.2                   gridGraphics_0.5-1         
#>  [89] chron_2.3-62                haven_2.5.5                
#>  [91] graphlayouts_1.2.2          org.Hs.eg.db_3.22.0        
#>  [93] bit_4.6.0                   rootSolve_1.8.2.4          
#>  [95] dplyr_1.1.4                 fastmatch_1.1-6            
#>  [97] codetools_0.2-20            BiocSingular_1.25.0        
#>  [99] bslib_0.9.0                 e1071_1.7-16               
#> [101] lmom_3.2                    alabaster.ranges_1.9.1     
#> [103] fds_1.8                     MultiAssayExperiment_1.35.9
#> [105] splines_4.5.1               Rcpp_1.1.0                 
#> [107] dbplyr_2.5.1                sparseMatrixStats_1.21.0   
#> [109] cellranger_1.1.0            knitr_1.50                 
#> [111] blob_1.2.4                  BiocVersion_3.22.0         
#> [113] robin_2.0.0                 fs_1.6.6                   
#> [115] DelayedMatrixStats_1.31.0   pscl_1.5.9                 
#> [117] expm_1.0-0                  ggplotify_0.1.3            
#> [119] sqldf_0.4-11                tibble_3.3.0               
#> [121] tzdb_0.5.0                  tweenr_2.0.3               
#> [123] pkgconfig_2.0.3             tools_4.5.1                
#> [125] cachem_1.1.0                RSQLite_2.4.3              
#> [127] viridisLite_0.4.2           DBI_1.2.3                  
#> [129] numDeriv_2016.8-1.1         distributions3_0.2.3       
#> [131] celldex_1.19.0              fastmap_1.2.0              
#> [133] rmarkdown_2.30              scales_1.4.0               
#> [135] grid_4.5.1                  AnnotationHub_3.99.6       
#> [137] sass_0.4.10                 patchwork_1.3.2            
#> [139] BiocManager_1.30.26         graph_1.87.0               
#> [141] alabaster.schemas_1.9.0     SingleR_2.11.4             
#> [143] rpart_4.1.24                farver_2.1.2               
#> [145] tidygraph_1.3.1             gsubfn_0.7                 
#> [147] yaml_2.3.10                 deSolve_1.40               
#> [149] cli_3.6.5                   purrr_1.1.0                
#> [151] lifecycle_1.0.4             askpass_1.2.1              
#> [153] rainbow_3.8                 mvtnorm_1.3-3              
#> [155] BiocParallel_1.43.4         gtable_0.3.6               
#> [157] parallel_4.5.1              ape_5.8-1                  
#> [159] jsonlite_2.0.0              bitops_1.0-9               
#> [161] ggplot2_4.0.0               bit64_4.6.0-1              
#> [163] yulab.utils_0.2.1           alabaster.matrix_1.9.0     
#> [165] BiocNeighbors_2.3.1         proto_1.0.0                
#> [167] jquerylib_0.1.4             alabaster.se_1.9.0         
#> [169] GOSemSim_2.35.2             R.utils_2.13.0             
#> [171] lazyeval_0.2.2              alabaster.base_1.9.5       
#> [173] htmltools_0.5.8.1           enrichplot_1.29.2          
#> [175] GO.db_3.22.0                rappdirs_0.3.3             
#> [177] data.tree_1.2.0             tinytex_0.57               
#> [179] glue_1.8.0                  STRINGdb_2.21.0            
#> [181] httr2_1.2.1                 XVector_0.49.1             
#> [183] gdtools_0.4.4               RCurl_1.98-1.17            
#> [185] qpdf_1.4.1                  treeio_1.33.0              
#> [187] mclust_6.1.1                ks_1.15.1                  
#> [189] gridExtra_2.3               boot_1.3-32                
#> [191] igraph_2.2.0                R6_2.6.1                   
#> [193] tidyr_1.3.1                 fdatest_2.1.1              
#> [195] ggiraph_0.9.2               gplots_3.2.0               
#> [197] forcats_1.0.1               labeling_0.4.3             
#> [199] cluster_2.1.8.1             rngtools_1.5.2             
#> [201] Rhdf5lib_1.31.1             aplot_0.2.9                
#> [203] plotrix_3.8-4               tidyselect_1.2.1           
#> [205] vipor_0.4.7                 ggforce_0.5.0              
#> [207] fontBitstreamVera_0.1.1     AnnotationDbi_1.71.2       
#> [209] rsvd_1.0.5                  KernSmooth_2.23-26         
#> [211] S7_0.2.0                    fontquiver_0.2.1           
#> [213] data.table_1.17.8           htmlwidgets_1.6.4          
#> [215] fgsea_1.35.8                RColorBrewer_1.1-3         
#> [217] rlang_1.1.6                 rentrez_1.2.4              
#> [219] beeswarm_0.4.0