1 Introduction

The depmap package aims to provide a reproducible research framework to cancer dependency data described by Tsherniak, Aviad, et al. “Defining a cancer dependency map.” Cell 170.3 (2017): 564-576.. The data found in the depmap package has been formatted to facilitate the use of common R packages such as dplyr and ggplot2. We hope that this package will allow researchers to more easily mine, explore and visually illustrate dependency data taken from the Depmap cancer genomic dependency study.

2 Installation instructions

To install depmap, the BiocManager Bioconductor Project Package Manager is required. If BiocManager is not already installed, it will need to be done so beforehand. Type (within R) install.packages(“BiocManager”) (This needs to be done just once.)

install.packages("BiocManager")
BiocManager::install("UCLouvain-CBIO/depmap")

The depmap package fully depends on the ExperimentHub Bioconductor package, which allows the data accessed in this package to be stored and retrieved from the cloud.

library("depmap")
library("ExperimentHub")

3 Available data

The depmap package currently contains seven datasets available through ExperimentHub.

The data found in this R package has been converted from a “wide” format .csv file to “long” format .rda file. None of the values taken from the original datasets have been changed, although the columns have been re-arranged. Descriptions of the changes made are described under the Details section after querying the relevant dataset.

## create ExperimentHub query object
eh <- ExperimentHub()
## snapshotDate(): 2019-07-10
query(eh, "depmap")
## ExperimentHub with 14 records
## # snapshotDate(): 2019-07-10 
## # $dataprovider: Broad Institute
## # $species: Homo sapiens
## # $rdataclass: tibble
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
## #   tags, rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH2260"]]' 
## 
##            title             
##   EH2260 | rnai_19Q1         
##   EH2261 | crispr_19Q1       
##   EH2262 | copyNumber_19Q1   
##   EH2263 | RPPA_19Q1         
##   EH2264 | TPM_19Q1          
##   ...      ...               
##   EH2552 | copyNumber_19Q2   
##   EH2553 | RPPA_19Q2         
##   EH2554 | TPM_19Q2          
##   EH2555 | mutationCalls_19Q2
##   EH2556 | metadata_19Q2

Each dataset has a ExperimentHub accession number, (e.g. EH2260).

3.1 RNA inference knockout data

## see ?depmap and browseVignettes('depmap') for documentation
## downloading 0 resources
## loading from cache 
##     'EH2260 : 2260'

The rnai dataset contains the combined genetic dependency data for RNAi - induced gene knockdown for select genes and cancer cell lines. This data corresponds to the D2_combined_genetic_dependency_scores.csv file found in the 19Q2 depmap release and includes 17309 genes, 712 cell lines, 30 primary diseases and 31 lineages.

## access `rnai_19Q1` by EH number
rnai <- eh[["EH2260"]]
rnai
## # A tibble: 12,324,008 x 6
##    depmap_id  cell_line        gene            gene_name entrez_id dependency
##    <chr>      <chr>            <chr>           <chr>     <chr>          <dbl>
##  1 ACH-001270 127399_SOFT_TIS… A1BG (1)        A1BG      1             NA    
##  2 ACH-001270 127399_SOFT_TIS… NAT2 (10)       NAT2      10            NA    
##  3 ACH-001270 127399_SOFT_TIS… ADA (100)       ADA       100           NA    
##  4 ACH-001270 127399_SOFT_TIS… CDH2 (1000)     CDH2      1000          -0.195
##  5 ACH-001270 127399_SOFT_TIS… AKT3 (10000)    AKT3      10000         -0.256
##  6 ACH-001270 127399_SOFT_TIS… MED6 (10001)    MED6      10001         -0.174
##  7 ACH-001270 127399_SOFT_TIS… NR2E3 (10002)   NR2E3     10002         -0.140
##  8 ACH-001270 127399_SOFT_TIS… NAALAD2 (10003) NAALAD2   10003         NA    
##  9 ACH-001270 127399_SOFT_TIS… DUXB (10003341… DUXB      100033411     NA    
## 10 ACH-001270 127399_SOFT_TIS… PDZK1P1 (10003… PDZK1P1   100034743     NA    
## # … with 12,323,998 more rows

3.2 CRISPR-Cas9 knockout data

## see ?depmap and browseVignettes('depmap') for documentation
## downloading 0 resources
## loading from cache 
##     'EH2261 : 2261'

The crispr dataset contains the (batch corrected CERES inferred gene effect) CRISPR-Cas9 knockout data of select genes and cancer cell lines. This data corresponds to the gene_effect_corrected.csv file from the 19Q2 depmap release. Data from this dataset includes 17634 genes, 558 cell lines, 26 primary diseases, 28 lineages.

## access `crispr_19Q1` by EH number
crispr <- eh[["EH2261"]]
crispr
## # A tibble: 9,839,772 x 6
##    depmap_id  cell_line                 gene   gene_name entrez_id dependency
##    <chr>      <chr>                     <chr>  <chr>     <chr>          <dbl>
##  1 ACH-000004 HEL_HAEMATOPOIETIC_AND_L… A1BG … A1BG      1             0.135 
##  2 ACH-000005 HEL9217_HAEMATOPOIETIC_A… A1BG … A1BG      1            -0.212 
##  3 ACH-000007 LS513_LARGE_INTESTINE     A1BG … A1BG      1             0.0433
##  4 ACH-000009 C2BBE1_LARGE_INTESTINE    A1BG … A1BG      1             0.0705
##  5 ACH-000011 253J_URINARY_TRACT        A1BG … A1BG      1             0.191 
##  6 ACH-000012 HCC827_LUNG               A1BG … A1BG      1            -0.0104
##  7 ACH-000013 ONCODG1_OVARY             A1BG … A1BG      1             0.0210
##  8 ACH-000014 HS294T_SKIN               A1BG … A1BG      1             0.113 
##  9 ACH-000015 NCIH1581_LUNG             A1BG … A1BG      1            -0.0742
## 10 ACH-000017 SKBR3_BREAST              A1BG … A1BG      1             0.133 
## # … with 9,839,762 more rows

3.3 WES copy number data

## see ?depmap and browseVignettes('depmap') for documentation
## downloading 0 resources
## loading from cache 
##     'EH2262 : 2262'

The copyNumber dataset contains the WES copy number data, relating to the numerical log-fold copy number change measured against the baseline copy number of select genes and cell lines. This dataset corresponds to the public_19Q1_gene_cn.csv from the 19Q2 depmap release. This dataset includes 23299 genes, 1604 cell lines, 38 primary diseases and 33 lineages.

## access `copyNumber_19Q1` by EH number
copyNumber <- eh[["EH2262"]]
copyNumber
## # A tibble: 37,371,596 x 6
##    depmap_id  cell_line            gene   gene_name entrez_id log_copy_number
##    <chr>      <chr>                <chr>  <chr>     <chr>               <dbl>
##  1 ACH-000011 253J_URINARY_TRACT   A1BG … A1BG      1                 0.131  
##  2 ACH-000026 253JBV_URINARY_TRACT A1BG … A1BG      1                -0.237  
##  3 ACH-000086 ACCMESO1_PLEURA      A1BG … A1BG      1                 0.134  
##  4 ACH-000557 AML193_HAEMATOPOIET… A1BG … A1BG      1                -0.0208 
##  5 ACH-000838 AMO1_HAEMATOPOIETIC… A1BG … A1BG      1                 0.170  
##  6 ACH-000080 BDCM_HAEMATOPOIETIC… A1BG … A1BG      1                 0.00703
##  7 ACH-000992 BICR18_UPPER_AERODI… A1BG … A1BG      1                -0.376  
##  8 ACH-000228 BICR31_UPPER_AERODI… A1BG … A1BG      1                 1.16   
##  9 ACH-000771 BICR56_UPPER_AERODI… A1BG … A1BG      1                 0.0197 
## 10 ACH-000415 BICR6_UPPER_AERODIG… A1BG … A1BG      1                 0.280  
## # … with 37,371,586 more rows

3.4 CCLE Reverse Phase Protein Array data

## see ?depmap and browseVignettes('depmap') for documentation
## downloading 0 resources
## loading from cache 
##     'EH2263 : 2263'

The RPPA dataset contains the CCLE Reverse Phase Protein Array (RPPA) data which corresponds to the CCLE_RPPA_20180123.csv file from the 19Q2 depmap release. This dataset includes 214 genes, 899 cell lines, 28 primary diseases, 28 lineages.

## access `RPPA_19Q1` by EH number
RPPA <- eh[["EH2263"]]
RPPA
## # A tibble: 192,386 x 4
##    depmap_id  cell_line                                 antibody   expression
##    <chr>      <chr>                                     <chr>           <dbl>
##  1 ACH-000698 DMS53_LUNG                                14-3-3_be…    -0.105 
##  2 ACH-000489 SW1116_LARGE_INTESTINE                    14-3-3_be…     0.359 
##  3 ACH-000431 NCIH1694_LUNG                             14-3-3_be…     0.0287
##  4 ACH-000707 P3HR1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_be…     0.120 
##  5 ACH-000509 HUT78_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_be…    -0.269 
##  6 ACH-000522 UMUC3_URINARY_TRACT                       14-3-3_be…    -0.171 
##  7 ACH-000613 HOS_BONE                                  14-3-3_be…    -0.0253
##  8 ACH-000829 HUNS1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_be…    -0.170 
##  9 ACH-000557 AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_be…     0.0819
## 10 ACH-000614 RVH421_SKIN                               14-3-3_be…     0.222 
## # … with 192,376 more rows

3.5 CCLE RNAseq gene expression data

## see ?depmap and browseVignettes('depmap') for documentation
## downloading 0 resources
## loading from cache 
##     'EH2264 : 2264'

The TPM dataset contains the CCLE RNAseq gene expression data. This shows expression data only for protein coding genes (using scale log2(TPM+1)). This data corresponds to the CCLE_depMap_19Q1_TPM.csv file from the 19Q2 depmap release. This dataset includes 55825 genes, 1165 cell lines, 33 primary Diseases, 32 lineages.

## access `TPM_19Q1` by EH number
TPM <- eh[["EH2264"]]
TPM
## # A tibble: 67,360,300 x 6
##    depmap_id  cell_line           gene       gene_name ensembl_id  expression
##    <chr>      <chr>               <chr>      <chr>     <chr>            <dbl>
##  1 ACH-000956 22RV1_PROSTATE      TSPAN6 (E… TSPAN6    ENSG000000…      2.65 
##  2 ACH-000948 2313287_STOMACH     TSPAN6 (E… TSPAN6    ENSG000000…      3.00 
##  3 ACH-000026 253JBV_URINARY_TRA… TSPAN6 (E… TSPAN6    ENSG000000…      4.57 
##  4 ACH-000011 253J_URINARY_TRACT  TSPAN6 (E… TSPAN6    ENSG000000…      4.58 
##  5 ACH-000323 42MGBA_CENTRAL_NER… TSPAN6 (E… TSPAN6    ENSG000000…      4.59 
##  6 ACH-000905 5637_URINARY_TRACT  TSPAN6 (E… TSPAN6    ENSG000000…      5.88 
##  7 ACH-000520 59M_OVARY           TSPAN6 (E… TSPAN6    ENSG000000…      4.11 
##  8 ACH-000973 639V_URINARY_TRACT  TSPAN6 (E… TSPAN6    ENSG000000…      5.05 
##  9 ACH-000896 647V_URINARY_TRACT  TSPAN6 (E… TSPAN6    ENSG000000…      5.94 
## 10 ACH-000070 697_HAEMATOPOIETIC… TSPAN6 (E… TSPAN6    ENSG000000…      0.151
## # … with 67,360,290 more rows

3.6 Cancer cell lines

## see ?depmap and browseVignettes('depmap') for documentation
## downloading 0 resources
## loading from cache 
##     'EH2266 : 2266'

The metadata dataset contains the metadata about all of the cancer cell lines. It corresponds to the depmap_19Q1_cell_lines.csv file found in the 19Q2 depmap release. This dataset includes 0 genes, 1676 cell lines, 38 primary diseases and 33 lineages.

## access `metadata_19Q1` by EH number
metadata <- eh[["EH2266"]]
metadata
## # A tibble: 1,676 x 9
##    depmap_id cell_line aliases cosmic_id sanger_id primary_disease
##    <chr>     <chr>     <chr>       <dbl>     <dbl> <chr>          
##  1 ACH-0000… NIHOVCAR… NIH:OV…    905933      2201 Ovarian Cancer 
##  2 ACH-0000… HL60_HAE… HL-60      905938        55 Leukemia       
##  3 ACH-0000… CACO2_LA… CACO2;…        NA        NA Colon/Colorect…
##  4 ACH-0000… HEL_HAEM… HEL        907053       783 Leukemia       
##  5 ACH-0000… HEL9217_… HEL 92…        NA        NA Leukemia       
##  6 ACH-0000… MONOMAC6… MONO-M…    908148      2167 Leukemia       
##  7 ACH-0000… LS513_LA… LS513      907795       569 Colon/Colorect…
##  8 ACH-0000… C2BBE1_L… C2BBe1     910700      2104 Colon/Colorect…
##  9 ACH-0000… NCIH2077… NCI-H2…        NA        NA Lung Cancer    
## 10 ACH-0000… 253J_URI… 253J           NA        NA Bladder Cancer 
## # … with 1,666 more rows, and 3 more variables: subtype_disease <chr>,
## #   gender <chr>, source <chr>

3.7 Mutation calls

## see ?depmap and browseVignettes('depmap') for documentation
## downloading 0 resources
## loading from cache 
##     'EH2265 : 2265'

The mutationCalls dataset contains all merged mutation calls (coding region, germline filtered) found in the depmap dependency study. This dataset corresponds with the depmap_19Q1_mutation_calls.csv file found in the 19Q2 depmap release and includes 19350 genes, 1601 cell lines, 37 primary diseases and 33 lineages.

## access `mutationCalls_19Q1` by EH number
mutationCalls <- eh[["EH2265"]]
mutationCalls
## # A tibble: 1,243,145 x 35
##    depmap_id gene_name entrez_id ncbi_build chromosome start_pos end_pos
##    <chr>     <chr>         <dbl>      <dbl> <chr>          <dbl>   <dbl>
##  1 ACH-0000… VPS13D        55187         37 1           12359347  1.24e7
##  2 ACH-0000… AADACL4      343066         37 1           12726308  1.27e7
##  3 ACH-0000… IFNLR1       163702         37 1           24484172  2.45e7
##  4 ACH-0000… TMEM57        55219         37 1           25785018  2.58e7
##  5 ACH-0000… ZSCAN20        7579         37 1           33954141  3.40e7
##  6 ACH-0000… POU3F1         5453         37 1           38512139  3.85e7
##  7 ACH-0000… MAST2         23139         37 1           46498028  4.65e7
##  8 ACH-0000… GBP4         115361         37 1           89657103  8.97e7
##  9 ACH-0000… VAV3          10451         37 1          108247170  1.08e8
## 10 ACH-0000… NBPF20    100288142         37 1          148346689  1.48e8
## # … with 1,243,135 more rows, and 28 more variables: strand <chr>,
## #   var_class <chr>, var_type <chr>, ref_allele <chr>,
## #   tumor_seq_allele1 <chr>, dbSNP_RS <chr>, dbSNP_val_status <chr>,
## #   genome_change <chr>, annotation_transcript <chr>,
## #   tumor_sample_barcode <chr>, cDNA_change <chr>, codon_change <chr>,
## #   protein_change <chr>, is_deleterious <lgl>, is_tcga_hotspot <lgl>,
## #   tcga_hsCnt <dbl>, is_cosmic_hotspot <lgl>, cosmic_hsCnt <dbl>,
## #   ExAC_AF <dbl>, VA_WES_AC <chr>, CGA_WES_AC <chr>, sanger_WES_AC <chr>,
## #   sanger_recalib_WES_AC <chr>, RNAseq_AC <chr>, HC_AC <chr>, RD_AC <chr>,
## #   WGS_AC <chr>, var_annotation <chr>

4 The Broad Institute data

If desired, the original data from which the depmap package were derived from can be downloaded from the Broad Institute website. The instructions on how to download these files and how the data was transformed and loaded into the depmap package can be found in the make_data.R file found in ./inst/scripts. (It should be noted that the original uncompressed .csv files are 1.5GB in total and take a moderate amount of time to download.)

5 Session information

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] depmap_0.99.5        ExperimentHub_1.11.1 AnnotationHub_2.17.3
## [4] BiocFileCache_1.9.1  dbplyr_1.4.2         BiocGenerics_0.31.5 
## [7] dplyr_0.8.3          BiocStyle_2.13.2    
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.5              xfun_0.8                     
##  [3] purrr_0.3.2                   vctrs_0.2.0                  
##  [5] htmltools_0.3.6               stats4_3.6.1                 
##  [7] yaml_2.2.0                    utf8_1.1.4                   
##  [9] interactiveDisplayBase_1.23.0 blob_1.2.0                   
## [11] rlang_0.4.0                   pillar_1.4.2                 
## [13] later_0.8.0                   glue_1.3.1                   
## [15] DBI_1.0.0                     rappdirs_0.3.1               
## [17] bit64_0.9-7                   stringr_1.4.0                
## [19] memoise_1.1.0                 evaluate_0.14                
## [21] Biobase_2.45.0                knitr_1.23                   
## [23] IRanges_2.19.10               httpuv_1.5.1                 
## [25] curl_3.3                      AnnotationDbi_1.47.0         
## [27] fansi_0.4.0                   Rcpp_1.0.1                   
## [29] xtable_1.8-4                  backports_1.1.4              
## [31] promises_1.0.1                BiocManager_1.30.4           
## [33] S4Vectors_0.23.17             mime_0.7                     
## [35] bit_1.1-14                    digest_0.6.20                
## [37] stringi_1.4.3                 bookdown_0.12                
## [39] shiny_1.3.2                   cli_1.1.0                    
## [41] tools_3.6.1                   magrittr_1.5                 
## [43] tibble_2.1.3                  RSQLite_2.1.1                
## [45] crayon_1.3.4                  pkgconfig_2.0.2              
## [47] zeallot_0.1.0                 assertthat_0.2.1             
## [49] rmarkdown_1.14                httr_1.4.0                   
## [51] R6_2.4.0                      compiler_3.6.1