This vignette illustrates use cases and visualizations of the data found in the depmap package. See the depmap vignette for details about the datasets.

1 Introduction

The depmap package aims to provide a reproducible research framework to cancer dependency data described by Tsherniak, Aviad, et al. “Defining a cancer dependency map.” Cell 170.3 (2017): 564-576.. The data found in the depmap package has been formatted to facilitate the use of common R packages such as dplyr and ggplot2. We hope that this package will allow researchers to more easily mine, explore and visually illustrate dependency data taken from the Depmap cancer genomic dependency study.

2 Use cases

Perhaps the most interesting datasets found within the depmap package are those that relate to the cancer gene dependency score, such as rnai and crispr. These datasets contain a score expressing how vital a particular gene is in terms of how lethal the knockout/knockdown of that gene is on a target cell line. For example, a highly negative dependency score implies that a cell line is highly dependent on that gene.

Load necessary libaries.

library("dplyr")
library("ggplot2")
library("viridis")
library("tibble")
library("gridExtra")
library("stringr")
library("depmap")
library("ExperimentHub")

Load the rnai, crispr and copyNumber datasets for visualization.

## create ExperimentHub query object
eh <- ExperimentHub()
query(eh, "depmap")
## ExperimentHub with 22 records
## # snapshotDate(): 2019-10-22 
## # $dataprovider: Broad Institute
## # $species: Homo sapiens
## # $rdataclass: tibble
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
## #   tags, rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH2260"]]' 
## 
##            title                
##   EH2260 | rnai_19Q1            
##   EH2261 | crispr_19Q1          
##   EH2262 | copyNumber_19Q1      
##   EH2263 | RPPA_19Q1            
##   EH2264 | TPM_19Q1             
##   ...      ...                  
##   EH3083 | RPPA_19Q3            
##   EH3084 | TPM_19Q3             
##   EH3085 | mutationCalls_19Q3   
##   EH3086 | metadata_19Q3        
##   EH3087 | drug_sensitivity_19Q3
rnai <- eh[["EH2260"]]
crispr <- eh[["EH2261"]]
copyNumber <- eh[["EH2262"]]
# note: the datasets listed above are from the 19Q1 release. Newer datasets,
# such as 19Q2 and 19Q3 are available.

2.1 Find dependency score for “BRCA1” on “184A1_Breast”

We will demonstrate how to obtain individual dependency scores corresponding to a specific gene and cell lineage. For example, shown below is the dependency of a breast cancer lineage, such as 184A1_BREAST has on a human tumor suppressor gene, like BRCA1 when it is knocked down via rnai. Shown below is the comparison for data found within the rnai dataset. This shows a score which is slightly positive, indicating that the knockdown of this gene is slightly beneficial to the vitality of this cancer cell lineage. However, it may be insightful to put this single dependency score in context.

dep_score_BRCA1_184A1Breast <- rnai %>%
                                select(cell_line, gene_name, dependency) %>%
                                filter(cell_line == "184A1_BREAST",
                                       gene_name == "BRCA1")

dep_score_BRCA1_184A1Breast
## # A tibble: 1 x 3
##   cell_line    gene_name dependency
##   <chr>        <chr>          <dbl>
## 1 184A1_BREAST BRCA1         0.0144

2.2 Average gene dependency for “BRCA1”

Shown below is the average dependency score for BRCA1 for all cancer cell lines in the rnai dataset.

brca1_dep_score_avg_rnai <- rnai %>%
                                select(gene_name, dependency) %>%
                                filter(gene_name == "BRCA1") %>%
                                summarise(mean_dependency_brca1 =
                                              mean(dependency, na.rm=TRUE))

brca1_dep_score_avg_rnai
## # A tibble: 1 x 1
##   mean_dependency_brca1
##                   <dbl>
## 1                -0.158

2.3 Average gene dependency for all genes in the rnai dataset

Or to see the average gene dependency across all genes in the entire rnai dataset. As one can see below, the average dependency for an average gene in the rnai dataset is slightly negative but close to zero.

all_gene_dep_score_avg_rnai <- rnai %>%
                            select(gene_name, dependency) %>%
                            summarise(mean_dependency_all_genes_rnai =
                                          mean(dependency, na.rm=TRUE))
all_gene_dep_score_avg_rnai
## # A tibble: 1 x 1
##   mean_dependency_all_genes_rnai
##                            <dbl>
## 1                        -0.0659

2.4 Cell lines in the rnai dataset with “soft tissue” in the name

If we are interested researching soft tissue sarcomas and wanted to find the cell lines withing the rnai dataset that had “soft tissue” in the CCLE name of cancer cell line, and sort by the highest dependency score. The results of such a search is shown below. Note: CCLE names are in ALL CAPS with an underscore.

soft_tissue_dependency_rnai <- rnai %>%
                                select(cell_line, gene_name, dependency) %>%
                                filter(stringr::str_detect(cell_line,
                                                           "SOFT_TISSUE")) %>%
                                arrange(dependency)

soft_tissue_dependency_rnai
## # A tibble: 432,725 x 3
##    cell_line          gene_name dependency
##    <chr>              <chr>          <dbl>
##  1 FUJI_SOFT_TISSUE   RPL14          -3.60
##  2 SJRH30_SOFT_TISSUE RAN            -3.41
##  3 SJRH30_SOFT_TISSUE RPL14          -3.36
##  4 SJRH30_SOFT_TISSUE RBX1           -3.31
##  5 HS729_SOFT_TISSUE  PSMA3          -3.22
##  6 SJRH30_SOFT_TISSUE RUVBL2         -3.13
##  7 KYM1_SOFT_TISSUE   RPL14          -3.03
##  8 RH41_SOFT_TISSUE   RBX1           -3.01
##  9 HS729_SOFT_TISSUE  NUTF2          -2.90
## 10 SJRH30_SOFT_TISSUE NUTF2          -2.85
## # … with 432,715 more rows

2.5 Cell lines with dependency for a entrez_id of interest

Sometimes it is difficult to find the subset with the exact gene name one wishes to find. In this case, it is better to search by entrez_id. For example, a recent paper describes gene knockdown of NRF2 increases chemosensitivity in certain types of cancer. It might be interesting to see what interactions knockdown of this gene has on other cancer cell lines. However, searching by filter(gene_name == “NRF2”) will not yield any results. We know from NCBI that the Entrez ID for this gene is “4780” and it is possible to search this dataset by that criteria. Here it can be shown that the gene name for NRF2 in the rnai dataset is NFE2L2.

entrez_id_NRF2 <- rnai %>%
                select(entrez_id, cell_line, gene_name, dependency) %>%
                filter(entrez_id == "4780")

entrez_id_NRF2
## # A tibble: 712 x 4
##    entrez_id cell_line                              gene_name dependency
##    <chr>     <chr>                                  <chr>          <dbl>
##  1 4780      127399_SOFT_TISSUE                     NFE2L2        0.0788
##  2 4780      1321N1_CENTRAL_NERVOUS_SYSTEM          NFE2L2       -0.105 
##  3 4780      143B_BONE                              NFE2L2        0.0617
##  4 4780      184A1_BREAST                           NFE2L2       -0.0333
##  5 4780      184B5_BREAST                           NFE2L2       -0.0360
##  6 4780      22RV1_PROSTATE                         NFE2L2        0.116 
##  7 4780      2313287_STOMACH                        NFE2L2       -0.0752
##  8 4780      600MPE_BREAST                          NFE2L2       -0.195 
##  9 4780      697_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE NFE2L2        0.0128
## 10 4780      769P_KIDNEY                            NFE2L2       -0.0951
## # … with 702 more rows

2.6 Cell lines with dependency for “NFE2L2”

Below the highest dependency scores via rnai knock down of a specific gene, NFE2L2 will be obtained and the cancer cell lines associated with those values will be listed. It appears that the knockdown of this gene is strongly associated with cell death with in lung and kidney cancer cell lines.

top_dep_score_NFE2L2_rnai <- rnai %>%
                        select(cell_line, gene_name, dependency) %>%
                        filter(gene_name == "NFE2L2") %>%
                        arrange(dependency)

top_dep_score_NFE2L2_rnai
## # A tibble: 712 x 3
##    cell_line     gene_name dependency
##    <chr>         <chr>          <dbl>
##  1 NCIH2066_LUNG NFE2L2        -1.29 
##  2 NCIH2122_LUNG NFE2L2        -0.886
##  3 CAKI2_KIDNEY  NFE2L2        -0.865
##  4 NCIH1792_LUNG NFE2L2        -0.860
##  5 NCIH28_PLEURA NFE2L2        -0.802
##  6 A498_KIDNEY   NFE2L2        -0.779
##  7 LC1SQSF_LUNG  NFE2L2        -0.743
##  8 LK2_LUNG      NFE2L2        -0.689
##  9 NCIH1437_LUNG NFE2L2        -0.603
## 10 AU565_BREAST  NFE2L2        -0.601
## # … with 702 more rows

2.7 Genes for cell line “NCIH2066_LUNG”

If we would like to obtain the top 10 lowest dependency scores for a particular cell line (for example NCIH2066_LUNG) along with the genes associated with those values:

top_dep_score_NCIH2066_LUNG_rnai <- rnai %>%
                                select(cell_line, gene_name, dependency) %>%
                                filter(cell_line == "NCIH2066_LUNG") %>%
                                arrange(dependency)

top_dep_score_NCIH2066_LUNG_rnai
## # A tibble: 17,309 x 3
##    cell_line     gene_name dependency
##    <chr>         <chr>          <dbl>
##  1 NCIH2066_LUNG KIF11          -3.46
##  2 NCIH2066_LUNG ATP6V0C        -3.02
##  3 NCIH2066_LUNG CKAP5          -3.02
##  4 NCIH2066_LUNG CASP8AP2       -2.87
##  5 NCIH2066_LUNG RAN            -2.81
##  6 NCIH2066_LUNG SF3B2          -2.76
##  7 NCIH2066_LUNG USP39          -2.72
##  8 NCIH2066_LUNG SNRNP200       -2.65
##  9 NCIH2066_LUNG TACC3          -2.61
## 10 NCIH2066_LUNG MAD2L1         -2.59
## # … with 17,299 more rows

2.8 Most and least RNAi dependency genes

Below shows the most significant genes that deplete cancer cell lines upon knockdown and their dependency scores for the entire rnai data.

greatest_dep_score_gene_rnai <- rnai %>%
                            select(cell_line, gene_name, dependency) %>%
                            arrange(dependency)

greatest_dep_score_gene_rnai
## # A tibble: 12,324,008 x 3
##    cell_line                     gene_name dependency
##    <chr>                         <chr>          <dbl>
##  1 SW1088_CENTRAL_NERVOUS_SYSTEM UBC            -5.93
##  2 COV318_OVARY                  UBC            -5.42
##  3 MEL285_UVEA                   PSMB5          -4.97
##  4 MEL285_UVEA                   PSMA3          -4.79
##  5 COLO678_LARGE_INTESTINE       NXF1           -4.74
##  6 CW2_LARGE_INTESTINE           CTNNB1         -4.71
##  7 COV318_OVARY                  PUF60          -4.65
##  8 CCK81_LARGE_INTESTINE         MCL1           -4.60
##  9 CW2_LARGE_INTESTINE           USP39          -4.60
## 10 MEL285_UVEA                   VARS           -4.59
## # … with 12,323,998 more rows

Below shows the least significant genes that induce cancer cell line vitality upon knockdown and their dependency scores for the entire rnai data. Unsurprisingly, we see high incidence of “TP53”, a well known cancer driver.

lowest_dep_score_gene_rnai <- rnai %>%
                            select(cell_line, gene_name, dependency) %>%
                            arrange(desc(dependency))

lowest_dep_score_gene_rnai
## # A tibble: 12,324,008 x 3
##    cell_line                     gene_name dependency
##    <chr>                         <chr>          <dbl>
##  1 SKNSH_AUTONOMIC_GANGLIA       TP53            2.77
##  2 OVTOKO_OVARY                  TP53            2.38
##  3 NB1_AUTONOMIC_GANGLIA         UBBP4           2.07
##  4 RVH421_SKIN                   TP53            2.01
##  5 SNU738_CENTRAL_NERVOUS_SYSTEM COPB2           1.96
##  6 SNU1079_BILIARY_TRACT         TP53            1.95
##  7 C32_SKIN                      TP53            1.93
##  8 JHUEM2_ENDOMETRIUM            CDKN2A          1.92
##  9 MEL285_UVEA                   TP53            1.91
## 10 NCIH28_PLEURA                 MED12           1.89
## # … with 12,323,998 more rows

2.9 Most and least CRISPR-Cas9 dependency genes

Below we will apply some of the same selections as shown in the above examples on the crispr gene knockout dataset and observe the difference between that dataset and rnai. First we will look at the most significant dependency scores in the crispr dataset. As can be seen below, there is a different population of significant genes with the highest dependency score.

greatest_dep_score_gene_crispr <- crispr %>%
                                select(cell_line, gene_name, dependency) %>%
                                arrange(dependency)

greatest_dep_score_gene_crispr
## # A tibble: 9,839,772 x 3
##    cell_line                                gene_name dependency
##    <chr>                                    <chr>          <dbl>
##  1 NCIH446_LUNG                             HIST2H3A       -3.18
##  2 KE97_STOMACH                             BUB3           -3.09
##  3 HT1376_URINARY_TRACT                     RAN            -3.04
##  4 EN_ENDOMETRIUM                           HIST2H3A       -2.91
##  5 KE97_STOMACH                             CCT3           -2.84
##  6 HSB2_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  RAN            -2.82
##  7 KE97_STOMACH                             SNRPD1         -2.80
##  8 EN_ENDOMETRIUM                           RAN            -2.78
##  9 SR786_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE TBC1D3C        -2.75
## 10 NCIH2887_LUNG                            RAN            -2.75
## # … with 9,839,762 more rows

First we will look at the least significant (most cancer inducing) dependency scores in the crispr dataset.

lowest_dep_score_gene_crispr <- crispr %>%
                            select(cell_line, gene_name, dependency) %>%
                            arrange(desc(dependency))

lowest_dep_score_gene_crispr
## # A tibble: 9,839,772 x 3
##    cell_line                                gene_name dependency
##    <chr>                                    <chr>          <dbl>
##  1 TC32_BONE                                PTEN            5.44
##  2 KE97_STOMACH                             UBA52           4.97
##  3 KE97_STOMACH                             HNRNPA1         4.58
##  4 KS1_CENTRAL_NERVOUS_SYSTEM               TP53            4.07
##  5 KE97_STOMACH                             RPS27           4.03
##  6 DKMG_CENTRAL_NERVOUS_SYSTEM              TP53            3.88
##  7 SR786_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE TBC1D3          3.71
##  8 KE97_STOMACH                             HNRNPA3         3.64
##  9 KE97_STOMACH                             PSME1           3.61
## 10 KE97_STOMACH                             RPL34           3.31
## # … with 9,839,762 more rows

2.10 Differences in RNAi and CRISPR-Cas9 dependency scores

Here we will plot the difference in expression between the most signficant genes found in the crispr and rnai datasets.