1 PaxtoolsR Tutorial
2 Overview
- 2.1 BioPAX, Paxtools, Pathway Commons, and the Simple Interaction Format
- 2.2 Limitations
3 Basics
4 Handling BioPAX OWL Files
5 Searching Pathway Commons
6 Extracting Information from BioPAX Datasets Using traverse()
7 Common Data Visualization Pathways and Network Analysis
8 Gene Set Enrichment Analysis with Pathway Commons
9 ID Mapping
10 Troubleshooting
- 10.1 File Paths
- 10.2 Memory Limits: Specify JVM Maximum Heap Size
11 Session Information
12 References
Appendix

1 PaxtoolsR Tutorial

The paxtoolsr package exposes a number of the algorithms and functions provided by the Paxtools Java library and Pathway Commons webservice allowing them to be used in R.

2 Overview

2.1 BioPAX, Paxtools, Pathway Commons, and the Simple Interaction Format

The Biological Pathway Exchange (BioPAX) format is a community-driven standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery. The BioPAX format using syntax for data exchange based on the OWL (Web Ontology Language) that aids pathway data integration; classes in the BioPAX ontology are described here. Ontologies are formal systems for knowledge representation allowing machine-readability of pathway data; one well-known example of a biological ontology is the Gene Ontology for biological terms.

Paxtools is a Java libary that allows users to interact with biological pathways represented in the BioPAX language. Pathway Commons is a resource that integrates biological pathway information for a number of public pathway databases, including Reactome, PantherDB, HumanCyc, etc. that are represented using the BioPAX language.

NOTE: BioPAX can encode very detailed information about biological processes. Analysis of this data, however, can be complicated as one needs to consider a wide array of n-ary relationships, different states of entities and generics. An alternative approach is to derive higher order relations based on a set of templates to define a simple binary network between biological entities and use conventional graph algorithms to analyze it. For many users of this package, the binary representation termed the Simple Interaction Format (SIF) will be the main entry point to the usage of BioPAX data. Conversion of BioPAX datasets to the SIF format is done through a series of simplification rules.

2.2 Limitations

The Paxtools Java library produces that full model of a given BioPAX data set that can be searched via code. The paxtoolsr provides a limited set of functionality mainly to produce SIF representations of networks that can be analyzed in R.

3 Basics

3.1 Installation

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("paxtoolsr")

3.2 Getting Started

Load paxtoolsr package:

library(paxtoolsr)

A list of all accessible vignettes and methods is available with the following command:

help.search("paxtoolsr")

For help on any paxtoolsr package functions, use one of the following command formats:

help(graphPc)
?graphPc

3.3 Common Function Return Types

paxtoolsr return two main types of values data.frame and XMLInternalDocument. Data.frames are table like data structures. XMLInternalDocument is a representation provided by the XML package and this data structure form is returned for functions that search or return raw BioPAX results. An XMLInternalDocument can be used as the input for any function requiring a BioPAX file.

4 Handling BioPAX OWL Files

paxtoolsr provides several functions for handling BioPAX OWL files. paxtoolsr provides several functions for handling BioPAX OWL files: merging, validation, conversion to other formats. Many databases with protein-protein interactions and pathway information export the BioPAX format and BioPAX files; databases that support the BioPAX format can be found on PathGuide, a resource for pathway information.

4.1 Merging BioPAX Files

We illustrate how to merge two BioPAX files. Only entities that share IDs will be merged; no additional merging occurs on cross-references. The merging occurs as described further in the Java library documentation. Throughout this BioPAX and Pathway Commons tutorial we use the system.file() command to access sample BioPAX files included with the paxtoolsr package. Merging may result in warning messages caused as a result of redundant actions being checked against by the Java library; these messages may be ignored.

file1 <- system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package = "paxtoolsr")
file2 <- system.file("extdata", "biopax3-short-metabolic-pathway.owl", package = "paxtoolsr")

mergedFile <- mergeBiopax(file1, file2)

Here we summarize information about one of the BioPAX files provide in the paxtoolsr package. The summarize() function produces a counts for various BioPAX classes and can be used to filter through BioPAX files matching particular characteristics. In the example below, we show that the merged file contains the sum of the Catalysis elements from the original two BioPAX files. This can be used iterate over and to identify files with particular properties quickly or to summarize across the files from a set.

s1 <- summarize(file1)
s2 <- summarize(file2)
s3 <- summarize(mergedFile)

s1$Catalysis
s2$Catalysis
s3$Catalysis

## [1] "5"
## [1] "2"
## [1] "7"

4.2 Validating BioPAX Files

To validate BioPAX paxtoolsr the types of validation performed are described in the BioPAX Validator publication by Rodchenkov I, et al.

errorLog <- validate(system.file("extdata", "raf_map_kinase_cascade_reactome.owl",
    package = "paxtoolsr"), onlyErrors = TRUE)

4.3 Converting BioPAX Files to Other Formats

It is often useful to convert BioPAX into other formats. Currently, paxtoolsr supports conversion to Gene Set Enrichment Analysis (GSEA, .gmt), Systems Biology Graphical Notation (SBGN, .sbgn), Simple Interaction Format (SIF).

4.3.1 Simple Interaction Format (SIF) Network

The basic SIF format includes a three columns: PARTICIPANT_A, INTERACTION_TYPE, and PARTICIPANT_B; possible INTERACTION_TYPEs are described here.

sif <- toSif(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package = "paxtoolsr"))

SIF representations of networks are returned as data.frame objects. SIF representations can be readily be visualized in network analysis tools, such as Cytoscape, which can be interfaced with through the R package, RCytoscape.

head(sif)

##               PARTICIPANT_A INTERACTION_TYPE              PARTICIPANT_B
## 1  Adenosine 5'-diphosphate  used-to-produce  Adenosine 5'-triphosphate
## 2  Adenosine 5'-diphosphate      reacts-with beta-D-glucose 6-phosphate
## 3  Adenosine 5'-diphosphate  used-to-produce             beta-D-glucose
## 4 Adenosine 5'-triphosphate  used-to-produce   Adenosine 5'-diphosphate
## 5 Adenosine 5'-triphosphate  used-to-produce beta-D-glucose 6-phosphate
## 6 Adenosine 5'-triphosphate      reacts-with             beta-D-glucose

4.3.2 Extended Simple Interaction Format (SIF) Network

Often analysis requires additional items of information, this could be the literature references related to a resource or the name of the data source where an interaction was derived. This information can be retrieved as part of an extended SIF network. A BioPAX dataset can be converted to extended SIF network.

# Select additional node and edge properties
inputFile <- system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package = "paxtoolsr")

results <- toSifnx(inputFile = inputFile)

The results object is a list with two entries: nodes and edges. nodes will be a data.table where each row corresponds to a biological entity, an EntityReference, and will contain any user-selected node properties as additional columns. Similarly, edges will be a data.table with a SIF extended with any user-selected properties for an Interaction as additional columns. Information on possible properties for an EntityReference or Interaction is available through the BioPAX ontology. It is also possible to download a pre-computed extended SIF representation for the entire Pathway Commons database that includes information about the data sources for interactions and identifiers for nodes; refer to documentation of the method for more details about the returned entries.

NOTE: Conversion of results entries from data.table to data.frame can be done using setDF in the data.table package.

NOTE: downloadPc2 may take several minutes to complete.

results <- downloadPc2(version = "12")

It is suggested that the results of this command be saved locally rather than using this command frequently to speed up work. Caching is attempted automatically, the location of downloaded files for this cache is available with this command:

Sys.getenv("PAXTOOLSR_CACHE")

5 Searching Pathway Commons

Networks can also be loaded using Pathway Commons rather than from local BioPAX files. First, we show how Pathway Commons can be searched.

## Search Pathway Commons for 'glycolysis'-related pathways
searchResults <- searchPc(q = "glycolysis", type = "pathway")

All functions that query Pathway Commons include a flag verbose that allows users to see the query URL sent to Pathway Commons for debugging purposes.

## Search Pathway Commons for 'glycolysis'-related pathways
searchResults <- searchPc(q = "glycolysis", type = "pathway", verbose = TRUE)

## URL:  http://www.pathwaycommons.org/pc2/search.xml?q=glycolysis&page=0&type=pathway

## No encoding supplied: defaulting to UTF-8.

Pathway Commons search results are returned as an XML object.

str(searchResults)

## Classes 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>

These results can be filtered using the XML package using XPath expressions; examples of XPath expressions and syntax. The examples here shows how to pull the name for the pathway and the URI that contains information about the pathway in the BioPAX format.

xpathSApply(searchResults, "/searchResponse/searchHit/name", xmlValue)[1]

## [1] "glycolysis I"

xpathSApply(searchResults, "/searchResponse/searchHit/pathway", xmlValue)[1]

## [1] "http://identifiers.org/reactome/R-HSA-1430728"

Alternatively, these XML results can be converted to data.frames using the XML and plyr libraries.

library(plyr)
searchResultsDf <- ldply(xmlToList(searchResults), data.frame)

# Simplified results
simplifiedSearchResultsDf <- searchResultsDf[, c("name", "uri", "biopaxClass")]
head(simplifiedSearchResultsDf)

##                                                                name
## 1                                                      glycolysis I
## 2                                                        Glycolysis
## 3                                                        Glycolysis
## 4                                                        Glycolysis
## 5 Glycolysis and Gluconeogenesis ( Glycolysis and Gluconeogenesis )
## 6  Regulation of glycolysis by fructose 2,6-bisphosphate metabolism
##                                                                       uri
## 1 http://pathwaycommons.org/pc12/Pathway_140db615921bfd883faad5acccf6474c
## 2                           http://identifiers.org/panther.pathway/P00024
## 3                             http://identifiers.org/reactome/R-HSA-70171
## 4                                 http://identifiers.org/smpdb/SMP0000040
## 5 http://pathwaycommons.org/pc12/Pathway_6f8fa42c30306904f19f8df99a6594a7
## 6                           http://identifiers.org/reactome/R-HSA-9634600
##   biopaxClass
## 1     Pathway
## 2     Pathway
## 3     Pathway
## 4     Pathway
## 5     Pathway
## 6     Pathway

This type of searching can be used to locally save BioPAX files retrieved from Pathway Commons.

## Use an XPath expression to extract the results of interest. In this case,
## the URIs (IDs) for the pathways from the results
tmpSearchResults <- xpathSApply(searchResults, "/searchResponse/searchHit/uri", xmlValue)

## Generate temporary file to save content into
biopaxFile <- tempfile()

## Extract a URI for a pathway in the search results and save into a file
idx <- which(grepl("panther", simplifiedSearchResultsDf$uri) & grepl("glycolysis",
    simplifiedSearchResultsDf$name, ignore.case = TRUE))
uri <- simplifiedSearchResultsDf$uri[idx]
saveXML(getPc(uri, format = "BIOPAX"), biopaxFile)

6 Extracting Information from BioPAX Datasets Using traverse()

The traverse function allows the extraction of specific entries from BioPAX records. traverse() functionality should be available for any uniprot.org or purl.org URI.

# Convert the Uniprot ID to a URI that would be found in Pathway Commons
uri <- paste0("http://identifiers.org/uniprot/P31749")

# Get URIs for only the ModificationFeatures of the protein
xml <- traverse(uri = uri, path = "ProteinReference/entityFeature:ModificationFeature")

# Extract all the URIs
uris <- xpathSApply(xml, "//value/text()", xmlValue)

# For the first URI get the modification position and type
tmpXml <- traverse(uri = uris[1], path = "ModificationFeature/featureLocation:SequenceSite/sequencePosition")
cat("Modification Position: ", xpathSApply(tmpXml, "//value/text()", xmlValue))

## Modification Position:  14

tmpXml <- traverse(uri = uris[1], path = "ModificationFeature/modificationType/term")
cat("Modification Type: ", xpathSApply(tmpXml, "//value/text()", xmlValue))

## Modification Type:  N6-acetyllysine MOD_RES N6-acetyllysine

7 Common Data Visualization Pathways and Network Analysis

7.1 Visualizing SIF Interactions from Pathway Commons using R Graph Libraries

A common use case for paxtoolsr to retrieve a network or sub-network from a pathway derived from a BioPAX file or a Pathway Commons query. Next, we show how to visualize subnetworks loaded from BioPAX files and retrieved using the Pathway Commons webservice. To do this, we use the igraph R graph library because it has simple methods for loading edgelists, analyzing the networks, and visualizing these networks.

Next, we show how subnetworks queried from Pathway Commons can be visualized directly in R using the igraph library. Alternatively, these graphical plots can be made using Cytoscape either by exporting the SIF format and then importing the SIF format into Cytoscape or by using the RCytoscape package to work with Cytoscape directly from R.

library(igraph)

We load the network from a BioPAX file using the SIF format:

sif <- toSif(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package = "paxtoolsr"))

# graph.edgelist requires a matrix
g <- graph.edgelist(as.matrix(sif[, c(1, 3)]), directed = FALSE)

## Warning: `graph.edgelist()` was deprecated in igraph 2.0.0.
## ℹ Please use `graph_from_edgelist()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

plot(g, layout = layout.fruchterman.reingold)

7.2 Pathway Commons Graph Query

Next, we show how to perform graph search using Pathway Commons useful for finding connections and neighborhoods of elements. This can be used to extract the neighborhood of a single gene that is then filtered for a specific interaction type: “controls-state-change-of”. State change here indicates a modification of another molecule (e.g. post-translational modifications). This interaction type is directed, and it is read as “A controls a state change of B”.

gene <- "BDNF"
t1 <- graphPc(source = gene, kind = "neighborhood", format = "SIF", verbose = TRUE)

## URL:  http://www.pathwaycommons.org/pc2/graph?kind=neighborhood&source=BDNF&format=SIF

t2 <- t1[which(t1[, 2] == "controls-state-change-of"), ]

# Show only 100 interactions for simplicity
g <- graph.edgelist(as.matrix(t2[1:100, c(1, 3)]), directed = FALSE)
plot(g, layout = layout.fruchterman.reingold)

The example below shows the extraction of a sub-network connecting a set of proteins:

genes <- c("AKT1", "IRS1", "MTOR", "IGF1R")
t1 <- graphPc(source = genes, kind = "PATHSBETWEEN", format = "SIF", verbose = TRUE)

## URL:  http://www.pathwaycommons.org/pc2/graph?kind=PATHSBETWEEN&source=AKT1&source=IRS1&source=MTOR&source=IGF1R&format=SIF

t2 <- t1[which(t1[, 2] == "controls-state-change-of"), ]

# Show only 100 interactions for simplicity
g <- graph.edgelist(as.matrix(t2[1:100, c(1, 3)]), directed = FALSE)
plot(g, layout = layout.fruchterman.reingold)

7.3 Overlaying Experimental Data on Pathway Commons Networks

Often, it is useful not only to visualize a biological pathway, but also to overlay a given network with some form of biological data, such as gene expression values for genes in the network.

library(RColorBrewer)

# Generate a color palette that goes from white to red that contains 10 colors
numColors <- 10
colors <- colorRampPalette(brewer.pal(9, "Reds"))(numColors)

# Generate values that could represent some experimental values
values <- runif(length(V(g)$name))

# Scale values to generate indicies from the color palette
xrange <- range(values)
newrange <- c(1, numColors)

factor <- (newrange[2] - newrange[1])/(xrange[2] - xrange[1])
scaledValues <- newrange[1] + (values - xrange[1]) * factor
indicies <- as.integer(scaledValues)

# Color the nodes based using the indicies and the color palette created above
g <- set.vertex.attribute(g, "color", value = colors[indicies])

## Warning: `set.vertex.attribute()` was deprecated in igraph 2.0.0.
## ℹ Please use `set_vertex_attr()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# get.vertex.attribute(h, 'color')

plot(g, layout = layout.fruchterman.reingold)

7.4 Network Statistics

Often it is useful to produce statistics on a network. Here we show how to determine SIF network statistics and statistics on BioPAX files.

7.4.1 SIF Network Statistics

Once Pathway Commons and BioPAX networks are loaded as graphs using the igraph R package, it is possible to analyze these networks. Here we show how to get common statistics for the current network retrieved from Pathway Commons:

# Degree for each node in the igraph network
degree(g)

##   ACVR1    AKT1    AKT2    AKT3  ACVR1B  ACVR2A  ACVR2B  ACVRL1  AKT1S1  DEPTOR 
##       3      42      31      27       3       3       3       3       2       2 
##    IRS1 LAMTOR1 LAMTOR2 LAMTOR3 LAMTOR4 LAMTOR5   MLST8    MTOR    RHEB    RORC 
##       1       1       1       1       1       1       2       3       1       3 
##   RPTOR   RRAGA   RRAGB   RRAGD SLC38A9     ALK   AMHR2    ARAF  BMPR1A  BMPR1B 
##       2       1       1       1       1       3       3       3       3       3 
##   BMPR2    BRAF     BTK  CAMK2A  CAMK2B  CAMK2D  CAMK2G    CCNH    CDC7    CDK1 
##       3       3       3       3       3       3       3       3       3       3 
##    CDK2    CDK3    CDK4    CDK5    CDK6 
##       3       3       3       3       3

# Number of nodes
length(V(g)$name)

## [1] 45

# Clustering coefficient
transitivity(g)

## [1] 0

# Network density
graph.density(g)

## Warning: `graph.density()` was deprecated in igraph 2.0.0.
## ℹ Please use `edge_density()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] 0.1010101

# Network diameter
diameter(g)

## [1] 3

Another common task determine paths between nodes in a network.

# Get the first shortest path between two nodes
tmp <- get.shortest.paths(g, from = "IRS1", to = "MTOR")

## Warning: `get.shortest.paths()` was deprecated in igraph 2.0.0.
## ℹ Please use `shortest_paths()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# igraph seems to return different objects on Linux versus OS X for
# get.shortest.paths()
if (is(tmp[[1]], "list")) {
    path <- tmp[[1]][[1]]  # Linux
} else {
    path <- tmp[[1]]  # OS X
}

# Convert from indicies to vertex names
V(g)$name[path]

## [1] "IRS1" "AKT1" "MTOR"

8 Gene Set Enrichment Analysis with Pathway Commons

The processing of the microarray data is taken from the following webpage: Bioconductor Tutorial on Microarray Processing and Gene Set Analysis with for grabbing gene sets from a Pathway Commons pathway and using same data as in the example, but stored in the estrogen R package.

To access microarray data sets, users should consider retrieving data from the NCBI Gene Expression Omnibus (GEO) using the GEOQuery package.

The first thing we’ll do is load up the necessary packages.

library(paxtoolsr)  # To retrieve data from Pathway Commons
library(clusterProfiler)  # Enrichment analysis
library(org.Hs.eg.db)
library(XML)  # To parse XML files

We then retrieve a pathway of interest using the the Pathway Commons search functionality.

# Generate a Gene Set Search Pathway Commons for 'glycolysis'-related pathways
searchResults <- searchPc(q = "glycolysis", type = "pathway")

## Use an XPath expression to extract the results of interest. In this case,
## the URIs (IDs) for the pathways from the results
searchResults <- xpathSApply(searchResults, "/searchResponse/searchHit/uri", xmlValue)

## Generate temporary files to save content into
biopaxFile <- tempfile()

## Extract the URI for the first pathway in the search results and save into a
## file
uri <- searchResults[2]
saveXML(getPc(uri, "BIOPAX"), biopaxFile)

And then, we convert this pathway to a gene set.

## Generate temporary files to save content into
gseaFile <- tempfile()

## Generate a gene set for the BioPAX pathway with gene symbols NOTE: Not all
## search results are guaranteed to result in gene sets
tmp <- toGSEA(biopaxFile, gseaFile, "HGNC Symbol", FALSE)
geneSet <- tmp$geneSet

Finally, we process a gene list by applying the gene set entrichment analysis clusterProfiler Bioconductor package using Pathway Commons gene sets either from toGSEA or downloadPc2 functions.

library(clusterProfiler)

# Example gene list at the end of some end analysis
geneList <- c("ALDOA", "ENO1", "GAPDH", "GPI", "HK1", "PFKL", "PGK1", "PKM")

# Read Pathway Commons V12 KEGG dataset inluded with package
gmt <- readGmt(system.file("extdata", "test_PathwayCommons12.kegg.hgnc.gmt", package = "paxtoolsr"),
    returnInfo = TRUE)

geneSetList <- lapply(seq_along(gmt), function(x, n, i) {
    tmp <- x[[i]]
    data.frame(id = n[i], name = tmp[["name"]], gene = tmp[["geneSet"]], stringsAsFactors = FALSE)
}, x = gmt, n = names(gmt))

tmp <- do.call("rbind", geneSetList)
rownames(tmp) <- 1:nrow(tmp)  # For convenience 

pc2gene <- tmp[, c("id", "gene")]
pc2name <- tmp[, c("id", "name")]

enrichOutput <- clusterProfiler::enricher(geneList, pvalueCutoff = 0.05, minGSSize = 10,
    maxGSSize = 500, TERM2GENE = pc2gene, TERM2NAME = pc2name)
enrichOutput@result

##                                                                                        ID
## http://identifiers.org/kegg.pathway/hsa00010 http://identifiers.org/kegg.pathway/hsa00010
## http://identifiers.org/kegg.pathway/hsa00500 http://identifiers.org/kegg.pathway/hsa00500
## http://identifiers.org/kegg.pathway/hsa00520 http://identifiers.org/kegg.pathway/hsa00520
## http://identifiers.org/kegg.pathway/hsa00030 http://identifiers.org/kegg.pathway/hsa00030
## http://identifiers.org/kegg.pathway/hsa00052 http://identifiers.org/kegg.pathway/hsa00052
## http://identifiers.org/kegg.pathway/hsa00051 http://identifiers.org/kegg.pathway/hsa00051
## http://identifiers.org/kegg.pathway/hsa00620 http://identifiers.org/kegg.pathway/hsa00620
## http://identifiers.org/kegg.pathway/hsa00230 http://identifiers.org/kegg.pathway/hsa00230
##                                                                              Description
## http://identifiers.org/kegg.pathway/hsa00010                Glycolysis / Gluconeogenesis
## http://identifiers.org/kegg.pathway/hsa00500               Starch and sucrose metabolism
## http://identifiers.org/kegg.pathway/hsa00520 Amino sugar and nucleotide sugar metabolism
## http://identifiers.org/kegg.pathway/hsa00030                   Pentose phosphate pathway
## http://identifiers.org/kegg.pathway/hsa00052                        Galactose metabolism
## http://identifiers.org/kegg.pathway/hsa00051             Fructose and mannose metabolism
## http://identifiers.org/kegg.pathway/hsa00620                         Pyruvate metabolism
## http://identifiers.org/kegg.pathway/hsa00230                           Purine metabolism
##                                              GeneRatio BgRatio       pvalue
## http://identifiers.org/kegg.pathway/hsa00010       5/5  35/866 8.091085e-08
## http://identifiers.org/kegg.pathway/hsa00500       2/5  30/866 1.087885e-02
## http://identifiers.org/kegg.pathway/hsa00520       2/5  40/866 1.905166e-02
## http://identifiers.org/kegg.pathway/hsa00030       1/5  18/866 9.991613e-02
## http://identifiers.org/kegg.pathway/hsa00052       1/5  21/866 1.157623e-01
## http://identifiers.org/kegg.pathway/hsa00051       1/5  25/866 1.365426e-01
## http://identifiers.org/kegg.pathway/hsa00620       1/5  27/866 1.467852e-01
## http://identifiers.org/kegg.pathway/hsa00230       1/5  56/866 2.847009e-01
##                                                  p.adjust       qvalue
## http://identifiers.org/kegg.pathway/hsa00010 6.472868e-07 4.258466e-07
## http://identifiers.org/kegg.pathway/hsa00500 4.351539e-02 2.862855e-02
## http://identifiers.org/kegg.pathway/hsa00520 5.080444e-02 3.342397e-02
## http://identifiers.org/kegg.pathway/hsa00030 1.677545e-01 1.103648e-01
## http://identifiers.org/kegg.pathway/hsa00052 1.677545e-01 1.103648e-01
## http://identifiers.org/kegg.pathway/hsa00051 1.677545e-01 1.103648e-01
## http://identifiers.org/kegg.pathway/hsa00620 1.677545e-01 1.103648e-01
## http://identifiers.org/kegg.pathway/hsa00230 2.847009e-01 1.873032e-01
##                                                              geneID Count
## http://identifiers.org/kegg.pathway/hsa00010 ENO1/GAPDH/GPI/HK1/PKM     5
## http://identifiers.org/kegg.pathway/hsa00500                GPI/HK1     2
## http://identifiers.org/kegg.pathway/hsa00520                GPI/HK1     2
## http://identifiers.org/kegg.pathway/hsa00030                    GPI     1
## http://identifiers.org/kegg.pathway/hsa00052                    HK1     1
## http://identifiers.org/kegg.pathway/hsa00051                    HK1     1
## http://identifiers.org/kegg.pathway/hsa00620                    PKM     1
## http://identifiers.org/kegg.pathway/hsa00230                    PKM     1

9 ID Mapping

Functions and results from paxtoolsr functions can be used in conjunction with the ID mapping functions of the clusterProfiler Bioconductor package.

sif <- toSif(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package = "paxtoolsr"))

ids <- c(sif$PARTICIPANT_A, sif$PARTICIPANT_B)

output <- clusterProfiler::bitr(ids, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Hs.eg.db")
output

10 Troubleshooting

10.1 File Paths

Use properly delimited and full paths (do not use relative paths, such as ../directory/file or ~/directory/file) to files should be used with the paxtoolsr package.

toSif("/directory/file")
# or
toSif("X:\\directory\\file")

10.2 Memory Limits: Specify JVM Maximum Heap Size

By default paxtoolsr uses a maximum heap size limit of 512MB. For large BioPAX files, this limit may be insufficient. The code below shows how to change this limit and observe that the change was made.

NOTE: This limit cannot be changed once the virtual machine has been initialized by loading the library, so the memory heap size limit must be changed beforehand.

options(java.parameters = "-Xmx1024m")

library(paxtoolsr)

# Megabyte size
mbSize <- 1048576

runtime <- .jcall("java/lang/Runtime", "Ljava/lang/Runtime;", "getRuntime")
maxMemory <- .jcall(runtime, "J", "maxMemory")
maxMemoryMb <- maxMemory/mbSize
cat("Max Memory: ", maxMemoryMb, "\n")

11 Session Information

sessionInfo()

## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
##  [3] LC_TIME=en_GB                 LC_COLLATE=C                 
##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] org.Hs.eg.db_3.19.1    AnnotationDbi_1.67.0   IRanges_2.39.0        
##  [4] S4Vectors_0.43.0       Biobase_2.65.0         BiocGenerics_0.51.0   
##  [7] clusterProfiler_4.13.0 RColorBrewer_1.1-3     igraph_2.0.3          
## [10] plyr_1.8.9             paxtoolsr_1.39.0       XML_3.99-0.16.1       
## [13] rJava_1.0-11           knitr_1.47             BiocStyle_2.33.0      
## 
## loaded via a namespace (and not attached):
##   [1] jsonlite_1.8.8          magrittr_2.0.3          magick_2.8.3           
##   [4] farver_2.1.2            rmarkdown_2.27          fs_1.6.4               
##   [7] zlibbioc_1.51.0         vctrs_0.6.5             memoise_2.0.1          
##  [10] ggtree_3.13.0           tinytex_0.51            htmltools_0.5.8.1      
##  [13] curl_5.2.1              gridGraphics_0.5-1      sass_0.4.9             
##  [16] bslib_0.7.0             cachem_1.1.0            lifecycle_1.0.4        
##  [19] pkgconfig_2.0.3         gson_0.1.0              Matrix_1.7-0           
##  [22] R6_2.5.1                fastmap_1.2.0           GenomeInfoDbData_1.2.12
##  [25] digest_0.6.35           aplot_0.2.2             enrichplot_1.25.0      
##  [28] colorspace_2.1-0        patchwork_1.2.0         RSQLite_2.3.7          
##  [31] fansi_1.0.6             httr_1.4.7              polyclip_1.10-6        
##  [34] compiler_4.4.0          bit64_4.0.5             withr_3.0.0            
##  [37] BiocParallel_1.39.0     viridis_0.6.5           DBI_1.2.2              
##  [40] highr_0.11              ggforce_0.4.2           R.utils_2.12.3         
##  [43] MASS_7.3-60.2           rappdirs_0.3.3          rjson_0.2.21           
##  [46] HDO.db_0.99.1           tools_4.4.0             scatterpie_0.2.2       
##  [49] ape_5.8                 R.oo_1.26.0             glue_1.7.0             
##  [52] nlme_3.1-164            GOSemSim_2.31.0         shadowtext_0.1.3       
##  [55] grid_4.4.0              reshape2_1.4.4          fgsea_1.31.0           
##  [58] generics_0.1.3          gtable_0.3.5            tzdb_0.4.0             
##  [61] R.methodsS3_1.8.2       tidyr_1.3.1             data.table_1.15.4      
##  [64] hms_1.1.3               tidygraph_1.3.1         utf8_1.2.4             
##  [67] XVector_0.45.0          ggrepel_0.9.5           pillar_1.9.0           
##  [70] stringr_1.5.1           yulab.utils_0.1.4       vroom_1.6.5            
##  [73] splines_4.4.0           dplyr_1.1.4             tweenr_2.0.3           
##  [76] treeio_1.29.0           lattice_0.22-6          bit_4.0.5              
##  [79] tidyselect_1.2.1        GO.db_3.19.1            Biostrings_2.73.1      
##  [82] gridExtra_2.3           bookdown_0.39           xfun_0.44              
##  [85] graphlayouts_1.1.1      stringi_1.8.4           UCSC.utils_1.1.0       
##  [88] lazyeval_0.2.2          ggfun_0.1.5             yaml_2.3.8             
##  [91] evaluate_0.23           codetools_0.2-20        ggraph_2.2.1           
##  [94] archive_1.1.8           tibble_3.2.1            qvalue_2.37.0          
##  [97] BiocManager_1.30.23     ggplotify_0.1.2         cli_3.6.2              
## [100] munsell_0.5.1           jquerylib_0.1.4         Rcpp_1.0.12            
## [103] GenomeInfoDb_1.41.1     png_0.1-8               parallel_4.4.0         
## [106] ggplot2_3.5.1           readr_2.1.5             blob_1.2.4             
## [109] DOSE_3.31.1             viridisLite_0.4.2       tidytree_0.4.6         
## [112] scales_1.3.0            purrr_1.0.2             crayon_1.5.2           
## [115] rlang_1.1.3             cowplot_1.1.3           fastmatch_1.1-4        
## [118] KEGGREST_1.45.0         formatR_1.14

12 References

Appendix

Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 2011 Jan;39(Database issue):D685-90. doi: 10.1093/nar/gkq1039. Epub 2010 Nov 10.
Rodchenkov I, Demir E, Sander C, Bader GD. The BioPAX Validator. Bioinformatics. 2013 Oct 15;29(20):2659-60. doi: 10.1093/bioinformatics/btt452. Epub 2013 Aug 5.

Using PaxtoolsR: A BioPAX and Pathway Commons Tutorial in R

02 June, 2024

Contents