Note: if you use MAGeCKFlute in published research, please cite: Binbin Wang, Mei Wang, Wubing Zhang. “Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute.” Nature Protocols (2019), doi: 10.1038/s41596-018-0113-7.

Input data - results from MAGeCK

MAGeCK (Wei Li and Liu. 2014) and MAGeCK-VISPR (Wei Li and Liu. 2015) are developed by our lab previously, to analyze CRISPR/Cas9 screen data in different scenarios(Tim Wang 2014, Hiroko Koike-Yusa (2014), Ophir Shalem1 (2014), Luke A.Gilbert (2014), Silvana Konermann (2015)). Both algorithms use negative binomial models to model the variances of sgRNAs, and use Robust Rank Aggregation (for MAGeCK) or maximum likelihood framework (for MAGeCK-VISPR) for a robust identification of selected genes.

The command mageck mle computes beta scores and the associated statistics for all genes in multiple conditions. The beta score describes how the gene is selected: a positive beta score indicates a positive selection, and a negative beta score indicates a negative selection.

The command mageck test uses Robust Rank Aggregation (RRA) for robust identification of CRISPR-screen hits, and outputs the summary results at both sgRNA and gene level.

Quick start

Here we show the most basic steps for integrative analysis pipeline. MAGeCKFlute package includes the MAGeCK results from one intrinsic dataset, including countsummary, rra.gene_summary, rra.sgrna_summary, and mle.gene_summary. We will work with them in this document.

Downstream analysis pipeline for MAGeCK RRA

All pipeline results are written into local directory “./MAGeCKFlute_PLX/”, and all figures are integrated into file “PLX_Flute.rra_summary.pdf”.

Downstream analysis pipeline for MAGeCK MLE CRISPR screens can be conducted using three different cell populations: the day 0 population, a drug-treated population (treatment) and a control population (mockdrug control, typically treated with a vehicle such as DMSO). The CRISPR screens are required to analyze with MAGeCK MLE, which generates a gene summary file like mle.gene_summary below. FluteMLE is useful in this scenario to perform downstream analysis.

If CRISPR screens don’t include a control cell population, FluteMLE supports users to use Depmap screens as control.

All pipeline results are written into local directory “./MAGeCKFlute_PLX/”, and all figures are integrated into file “PLX_Flute.mle_summary.pdf”.

Section I: Quality control

** Count summary ** MAGeCK Count in MAGeCK/MAGeCK-VISPR generates a count summary file, which summarizes some basic QC scores at raw count level, including map ratio, Gini index, and NegSelQC. Use function ‘data’ to load the dataset, and have a look at the file with a text editor to see how it is formatted.

##                                   File    Label    Reads   Mapped Percentage
## 1 ../data/GSC_0131_Day23_Rep1.fastq.gz day23_r1 62818064 39992777     0.6366
## 2  ../data/GSC_0131_Day0_Rep2.fastq.gz  day0_r2 47289074 31709075     0.6705
## 3  ../data/GSC_0131_Day0_Rep1.fastq.gz  day0_r1 51190401 34729858     0.6784
## 4 ../data/GSC_0131_Day23_Rep2.fastq.gz day23_r2 58686580 37836392     0.6447
##   TotalsgRNAs Zerocounts GiniIndex NegSelQC NegSelQCPval
## 1       64076         57   0.08510        0            1
## 2       64076         17   0.07496        0            1
## 3       64076         14   0.07335        0            1
## 4       64076         51   0.08587        0            1
##   NegSelQCPvalPermutation NegSelQCPvalPermutationFDR NegSelQCGene
## 1                       1                          1            0
## 2                       1                          1            0
## 3                       1                          1            0
## 4                       1                          1            0

## Warning in data.table::melt(countsummary[, c("Label", "Mapped", "Unmapped")], :
## The melt generic in data.table has been passed a data.frame and will attempt
## to redirect to the relevant reshape2 method; please note that reshape2 is
## deprecated, and this redirection is now deprecated as well. To continue using
## melt methods from reshape2 while both libraries are attached, e.g. melt.list,
## you can prepend the namespace like reshape2::melt(countsummary[, c("Label",
## "Mapped", "Unmapped")]). In the next version, this warning will become an error.

Section II: Downstream analysis of MAGeCK RRA

For experiments with two experimental conditions, we recommend MAGeCK-RRA for identification of essential genes from CRISPR/Cas9 knockout screens. Gene summary file in MAGeCK-RRA results summarizes the statistical significance of positive selections and negative selections.

##       id num  neg.score neg.p.value  neg.fdr neg.rank neg.goodsgrna neg.lfc
## 1    NF2   4 4.1770e-12  2.9738e-07 0.000275        1             4 -1.3580
## 2 SRSF10   4 4.4530e-11  2.9738e-07 0.000275        2             4 -1.8544
## 3 EIF2B4   8 2.8994e-10  2.9738e-07 0.000275        3             8 -1.5325
## 4  LAS1L   6 1.4561e-09  2.9738e-07 0.000275        4             6 -2.2402
## 5   RPL3  15 2.3072e-09  2.9738e-07 0.000275        5            12 -1.0663
## 6 ATP6V0   7 3.8195e-09  2.9738e-07 0.000275        6             7 -1.6380
##   pos.score pos.p.value pos.fdr pos.rank pos.goodsgrna pos.lfc
## 1   1.00000     1.00000       1    16645             0 -1.3580
## 2   1.00000     1.00000       1    16647             0 -1.8544
## 3   1.00000     1.00000       1    16646             0 -1.5325
## 4   0.99999     0.99999       1    16570             0 -2.2402
## 5   0.95519     0.99205       1    15359             2 -1.0663
## 6   1.00000     1.00000       1    16644             0 -1.6380

Visualization of negative selections and positive selections

Read the MAGeCK-RRA results

Two type of scores are optional, lfc (logrithm fold change) and rra (Robust Rank Aggregation score).

##       id   Score      FDR
## 1    NF2 -1.3580 0.000275
## 2 SRSF10 -1.8544 0.000275
## 3 EIF2B4 -1.5325 0.000275
## 4  LAS1L -2.2402 0.000275
## 5   RPL3 -1.0663 0.000275
## 6 ATP6V0 -1.6380 0.000275

Compute the similarity between the CRISPR screen with Depmap screens

##          estimate p.value
## A2780   0.5791483       0
## OVCAR8  0.5789564       0
## R262    0.5755967       0
## HS578T  0.5754257       0
## SF295   0.5751215       0
## KYSE180 0.5749847       0

Omit common essential genes from the data

##          estimate       p.value
## TE11    0.2394741 5.392597e-149
## OVCAR8  0.2390556 6.636024e-154
## YH13    0.2386189 2.470833e-153
## HUPT3   0.2380013 3.869874e-147
## NCIH661 0.2379885 1.640237e-152
## LXF289  0.2360406 5.494129e-150