MetaGxOvarian: A Package for Ovarian Cancer Gene Expression Analysis

list(name = “Michael Zon”, affiliation = “Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada”) list(name = “Deena M.A. Gendoo”, affiliation = c(“Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada”, “Department of Medical Biophysics, University of Toronto, Toronto, Canada”)) list(name = “Natchar Ratanasirigulchai”, affiliation = “Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada”) list(name = “Gregory Chen”, affiliation = “Department of Medical Biophysics, University of Toronto, Toronto, Canada”) list(name = “Levi Waldron”, affiliation = c(“Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA”, “Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA”)) list(name = “Benjamin Haibe-Kains”, email = “benjamin.haibe.kains@utoronto.ca”, affiliation = “Department of Medical Biophysics, University of Toronto, Toronto, Canada”)

Installing the Package

The MetaGxOvarian package is a compendium of Ovarian Cancer datasets. The package is publicly available and can be installed from Bioconductor into R version 3.6.0 or higher.

To install the MetaGxOvarian package from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("MetaGxOvarian")

Loading Datasets

First we load the MetaGxOvarian package into the workspace.

To load the packages into R, please use the following commands:

library(MetaGxOvarian)
esets <- MetaGxOvarian::loadOvarianEsets()[[1]]

This will load 26 expression datasets. Users can modify the parameters of the function to restrict datasets that do not meet certain criteria for loading. Some example parameters are shown below:

Obtaining Sample Counts in Datasets

numSamples <- vapply(seq_along(esets), FUN=function(i, esets) {
    length(sampleNames(esets[[i]]))
    }, numeric(1), esets=esets)


SampleNumberSummaryAll <- data.frame(NumberOfSamples = numSamples,
                                     row.names = names(esets))
total <- sum(SampleNumberSummaryAll[,"NumberOfSamples"])
SampleNumberSummaryAll <- rbind(SampleNumberSummaryAll, total)
rownames(SampleNumberSummaryAll)[nrow(SampleNumberSummaryAll)] <- "Total"

xtable(SampleNumberSummaryAll, digits = 2)
## % latex table generated in R 4.4.0 by xtable 1.8-4 package
## % Tue Oct 31 10:41:04 2023
## \begin{table}[ht]
## \centering
## \begin{tabular}{rr}
##   \hline
##  & NumberOfSamples \\ 
##   \hline
## E.MTAB.386 & 129.00 \\ 
##   GSE2109 & 202.00 \\ 
##   GSE6008 & 101.00 \\ 
##   GSE6822 & 62.00 \\ 
##   GSE8842 & 83.00 \\ 
##   GSE9891 & 276.00 \\ 
##   GSE12418 & 54.00 \\ 
##   GSE12470 & 49.00 \\ 
##   GSE13876 & 157.00 \\ 
##   GSE14764 & 79.00 \\ 
##   GSE17260 & 110.00 \\ 
##   GSE18520 & 59.00 \\ 
##   GSE20565 & 135.00 \\ 
##   GSE26193 & 14.00 \\ 
##   GSE26712 & 191.00 \\ 
##   GSE30009 & 103.00 \\ 
##   GSE30161 & 58.00 \\ 
##   GSE32062 & 257.00 \\ 
##   GSE32063 & 40.00 \\ 
##   GSE44104 & 47.00 \\ 
##   GSE49997 & 204.00 \\ 
##   GSE51088 & 172.00 \\ 
##   PMID15897565 & 63.00 \\ 
##   PMID17290060 & 117.00 \\ 
##   PMID19318476 & 42.00 \\ 
##   TCGAOVARIAN & 536.00 \\ 
##   Total & 3340.00 \\ 
##    \hline
## \end{tabular}
## \end{table}

Access Phenotype Data

We can also obtain a summary of the phenotype data (pData) for each expression dataset. Here, we assess the proportion of samples in every datasets that contain a specific pData variable.

pDataID <- c("sample_type", "histological_type", "primarysite", "summarygrade",
             "summarystage", "tumorstage", "grade",
             "age_at_initial_pathologic_diagnosis", "pltx", "tax",
             "neo", "days_to_tumor_recurrence", "recurrence_status",
             "days_to_death", "vital_status")


pDataPercentSummaryTable <- NULL
pDataSummaryNumbersTable <- NULL

pDataSummaryNumbersList = lapply(esets, function(x)
  vapply(pDataID, function(y) sum(!is.na(pData(x)[,y])), numeric(1)))

pDataPercentSummaryList = lapply(esets, function(x)
  vapply(pDataID, function(y)
    sum(!is.na(pData(x)[,y]))/nrow(pData(x)), numeric(1))*100)

pDataSummaryNumbersTable = sapply(pDataSummaryNumbersList, function(x) x)
pDataPercentSummaryTable = sapply(pDataPercentSummaryList, function(x) x)

rownames(pDataSummaryNumbersTable) <- pDataID
rownames(pDataPercentSummaryTable) <- pDataID
colnames(pDataSummaryNumbersTable) <- names(esets)
colnames(pDataPercentSummaryTable) <- names(esets)

pDataSummaryNumbersTable <- rbind(pDataSummaryNumbersTable, total)
rownames(pDataSummaryNumbersTable)[nrow(pDataSummaryNumbersTable)] <- "Total"


# Generate a heatmap representation of the pData
pDataPercentSummaryTable<-t(pDataPercentSummaryTable)
pDataPercentSummaryTable<-cbind(Name=(rownames(pDataPercentSummaryTable))
                                ,pDataPercentSummaryTable)

nba<-pDataPercentSummaryTable
gradient_colors = c("#ffffff","#ffffd9","#edf8b1","#c7e9b4","#7fcdbb",
                    "#41b6c4","#1d91c0","#225ea8","#253494","#081d58")

library(lattice)
nbamat<-as.matrix(nba)
rownames(nbamat)<-nbamat[,1]
nbamat<-nbamat[,-1]
Interval<-as.numeric(c(10,20,30,40,50,60,70,80,90,100))

levelplot(nbamat,col.regions=gradient_colors,
          main="Available Clinical Annotation",
          scales=list(x=list(rot=90, cex=0.5),
                      y= list(cex=0.5),key=list(cex=0.2)),
          at=seq(from=0,to=100,length=10),
          cex=0.2, ylab="", xlab="", lattice.options=list(),
          colorkey=list(at=as.numeric(factor(c(seq(from=0, to=100, by=10)))),
                  labels=as.character(c( "0","10%","20%","30%", "40%","50%",
                                         "60%", "70%", "80%","90%", "100%"),
                                      cex=0.2,font=1,col="brown",height=1,
                                      width=1.4), col=(gradient_colors)))

plot of chunk sample_number_summaries_pdata

Session Info

## R Under development (unstable) (2023-10-22 r85388)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] lattice_0.22-5              MetaGxOvarian_1.23.0       
##  [3] SummarizedExperiment_1.33.0 GenomicRanges_1.55.1       
##  [5] GenomeInfoDb_1.39.0         IRanges_2.37.0             
##  [7] S4Vectors_0.41.1            MatrixGenerics_1.15.0      
##  [9] matrixStats_1.0.0           ExperimentHub_2.11.0       
## [11] AnnotationHub_3.11.0        BiocFileCache_2.11.1       
## [13] dbplyr_2.4.0                Biobase_2.63.0             
## [15] BiocGenerics_0.49.0         xtable_1.8-4               
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.43.0               impute_1.77.0                
##  [3] xfun_0.40                     vctrs_0.6.4                  
##  [5] tools_4.4.0                   bitops_1.0-7                 
##  [7] generics_0.1.3                curl_5.1.0                   
##  [9] tibble_3.2.1                  fansi_1.0.5                  
## [11] AnnotationDbi_1.65.0          RSQLite_2.3.2                
## [13] highr_0.10                    blob_1.2.4                   
## [15] pkgconfig_2.0.3               Matrix_1.6-1.1               
## [17] lifecycle_1.0.3               GenomeInfoDbData_1.2.11      
## [19] compiler_4.4.0                Biostrings_2.71.1            
## [21] httpuv_1.6.12                 htmltools_0.5.6.1            
## [23] RCurl_1.98-1.12               yaml_2.3.7                   
## [25] interactiveDisplayBase_1.41.0 later_1.3.1                  
## [27] pillar_1.9.0                  crayon_1.5.2                 
## [29] ellipsis_0.3.2                DelayedArray_0.29.0          
## [31] cachem_1.0.8                  abind_1.4-5                  
## [33] mime_0.12                     tidyselect_1.2.0             
## [35] digest_0.6.33                 purrr_1.0.2                  
## [37] dplyr_1.1.3                   BiocVersion_3.19.1           
## [39] grid_4.4.0                    fastmap_1.1.1                
## [41] SparseArray_1.3.0             cli_3.6.1                    
## [43] magrittr_2.0.3                S4Arrays_1.3.0               
## [45] utf8_1.2.4                    withr_2.5.2                  
## [47] filelock_1.0.2                promises_1.2.1               
## [49] rappdirs_0.3.3                bit64_4.0.5                  
## [51] XVector_0.43.0                httr_1.4.7                   
## [53] bit_4.0.5                     png_0.1-8                    
## [55] memoise_2.0.1                 shiny_1.7.5.1                
## [57] evaluate_0.22                 knitr_1.45                   
## [59] rlang_1.1.1                   Rcpp_1.0.11                  
## [61] glue_1.6.2                    DBI_1.1.3                    
## [63] BiocManager_1.30.22           R6_2.5.1                     
## [65] zlibbioc_1.49.0