1 ggkegg

This package aims to import, parse, and analyze KEGG data such as KEGG PATHWAY and KEGG MODULE. The package supports visualizing KEGG information using ggplot2 and ggraph through using the grammar of graphics. The package enables the direct visualization of the results from various omics analysis packages and the connection to the other tidy manipulation packages. In this documentation, the basic usage of ggkegg is presented. Please refer to the documentation for the detailed usage.

1.1 Introduction

There are many great packages performing KEGG PATHWAY analysis in R. pathview fetches KEGG PATHWAY information, enabling the output of images reflecting various user-defined values on the map. KEGGlincs can overlay LINCS data to KEGG PATHWAY, and examine the map using Cytoscape. graphite acquires pathways including KEGG and Reactome, convert them into graphNEL format, and provides an interface for topological analysis. KEGGgraph also downloads KEGG PATHWAY information and converts it into a format analyzable in R. Extending to these packages, the purpose of developing this package, ggkegg, is to allow for tidy manipulation of KEGG information by the power of tidygraph, to plot the relevant information in flexible and customizable ways using grammar of graphics, to examine the global and overview maps consisting of compounds and reactions.

1.2 Pathway

The users can obtain a KEGG PATHWAY tbl_graph by pathway function. If you want to cache the file, please specify use_cache=TRUE, and if you already have the XML files of the pathway, please specify the directory of the file with directory argument. Here, we obtain Cell cycle pathway (hsa04110) using cache. pathway_id column is inserted to node and edge by default, which allows for the identification of the pathway ID in the other functions.

library(ggkegg)
library(tidygraph)
library(dplyr)
graph <- ggkegg::pathway("hsa04110", use_cache=TRUE)
graph
## # A tbl_graph: 134 nodes and 157 edges
## #
## # A directed acyclic multigraph with 40 components
## #
## # Node Data: 134 × 18 (active)
##    name    type  reaction graphics_name     x     y width height fgcolor bgcolor
##    <chr>   <chr> <chr>    <chr>         <dbl> <dbl> <dbl>  <dbl> <chr>   <chr>  
##  1 hsa:10… gene  <NA>     CDKN2A, ARF,…   532  -218    46     17 #000000 #BFFFBF
##  2 hsa:51… gene  <NA>     FZR1, CDC20C…   981  -630    46     17 #000000 #BFFFBF
##  3 hsa:41… gene  <NA>     MCM2, BM28, …   553  -681    46     17 #000000 #BFFFBF
##  4 hsa:23… gene  <NA>     ORC6, ORC6L.…   494  -681    46     17 #000000 #BFFFBF
##  5 hsa:10… gene  <NA>     ANAPC10, APC…   981  -392    46     17 #000000 #BFFFBF
##  6 hsa:10… gene  <NA>     ANAPC10, APC…   981  -613    46     17 #000000 #BFFFBF
##  7 hsa:65… gene  <NA>     SKP1, EMC19,…   188  -613    46     17 #000000 #BFFFBF
##  8 hsa:65… gene  <NA>     SKP1, EMC19,…   432  -285    46     17 #000000 #BFFFBF
##  9 hsa:983 gene  <NA>     CDK1, CDC2, …   780  -562    46     17 #000000 #BFFFBF
## 10 hsa:701 gene  <NA>     BUB1B, BUB1b…   873  -392    46     17 #000000 #BFFFBF
## # ℹ 124 more rows
## # ℹ 8 more variables: graphics_type <chr>, coords <chr>, xmin <dbl>,
## #   xmax <dbl>, ymin <dbl>, ymax <dbl>, orig.id <chr>, pathway_id <chr>
## #
## # Edge Data: 157 × 6
##    from    to type  subtype_name    subtype_value pathway_id
##   <int> <int> <chr> <chr>           <chr>         <chr>     
## 1   118    39 GErel expression      -->           hsa04110  
## 2    50    61 PPrel inhibition      --|           hsa04110  
## 3    50    61 PPrel phosphorylation +p            hsa04110  
## # ℹ 154 more rows

The output can be analysed readily using tidygraph and dplyr verbs. For example, centrality calculations can be performed as follows.

graph |> 
    mutate(degree=centrality_degree(mode="all"),
        betweenness=centrality_betweenness()) |> 
    activate(nodes) |>
    filter(type=="gene") |>
    arrange(desc(degree)) |>
    as_tibble() |>
    relocate(degree, betweenness)
## # A tibble: 112 × 20
##    degree betweenness name        type  reaction graphics_name     x     y width
##     <dbl>       <dbl> <chr>       <chr> <chr>    <chr>         <dbl> <dbl> <dbl>
##  1     11       144   hsa:7157    gene  <NA>     TP53, BCC7, …   590  -337    46
##  2     10         8   hsa:993     gene  <NA>     CDC25A, CDC2…   614  -496    46
##  3      9         0   hsa:983     gene  <NA>     CDK1, CDC2, …   689  -562    46
##  4      9        78.7 hsa:5925    gene  <NA>     RB1, OSRC, P…   353  -630    46
##  5      8        15   hsa:5347    gene  <NA>     PLK1, PLK, S…   862  -562    46
##  6      8         7   hsa:1111 h… gene  <NA>     CHEK1, CHK1.…   696  -393    46
##  7      7         0   hsa:983     gene  <NA>     CDK1, CDC2, …   780  -562    46
##  8      7       161.  hsa:1026    gene  <NA>     CDKN1A, CAP2…   459  -407    46
##  9      7         5   hsa:994 hs… gene  <NA>     CDC25B...       830  -496    46
## 10      6         7   hsa:9088    gene  <NA>     PKMYT1, MYT1…   763  -622    46
## # ℹ 102 more rows
## # ℹ 11 more variables: height <dbl>, fgcolor <chr>, bgcolor <chr>,
## #   graphics_type <chr>, coords <chr>, xmin <dbl>, xmax <dbl>, ymin <dbl>,
## #   ymax <dbl>, orig.id <chr>, pathway_id <chr>

1.2.1 Plot the pathway using ggraph

The parsed tbl_graph can be used to plot the information by ggraph using the grammar of graphics. The components in the graph such as nodes, edges, and text can be plotted layer by layer.

graph <- graph |> mutate(showname=strsplit(graphics_name, ",") |>
                    vapply("[", 1, FUN.VALUE="a"))

ggraph(graph, layout="manual", x=x, y=y)+
    geom_edge_parallel(aes(linetype=subtype_name),
        arrow=arrow(length=unit(1,"mm"), type="closed"),
        end_cap=circle(1,"cm"),
        start_cap=circle(1,"cm"))+
    geom_node_rect(aes(fill=I(bgcolor),
                      filter=type == "gene"),
                  color="black")+
    geom_node_text(aes(label=showname,
                      filter=type == "gene"),
                  size=2)+
    theme_void()