Crispr/Cas9 is a prokaryotic immune system turned into a genome engineering tool, with a Cas9/gRNA complex at the heart of its functioning. The gRNA (guide RNA) contains a 20 nucleotide (proto)spacer which guides the complex to a genomic locus with identical sequence, but only when followed by an NGG PAM (protospacer adjacent motif (see Figure 1A). The Cas9 enzyme performs some effector action at that locus: wildtype Cas9 cuts both strands after spacer nucleotide 17, Cas9Nickase cuts only a single strand (two variants exist, cutting respectively each of the strands), while catalytically dead dCas9, fused to some effector domain (KRAB, VPR, APEX) performs some alternative action (e.g. KRAB represses, VPR activates, APEX biotinylates).
Prime Editing (Anzalone et al., 2019) extends this further, by coupling Cas9 to a Reverse Transcriptase (RT). As shown in Figure 1, the guide RNA is extended with a 3’ extension that contains primer binding site (binding a complementary DNA region that acts as a primer for reverse transcription) and reverse transcription template (the template used for reverse transcription). The spacer still acts as targeting agent, guiding the Cas9/RT complex to targeted genomic locus.
Guide RNA design involves finding good guide RNAs to hit the targets of interest. Plasmids with guide RNA scaffolds, ready for cloning desired spacers into, can be readily ordered from AddGene. The actual task, therefore, boils down to finding good spacers for the targets of interest. For prime editing, it additionally involves finding appropriate primer binding site and reverse transcription template to edit the target site of interest. In general, a good spacer needs to fulfill two requirements:
Minimal off-target (mis)matches, so that only intended targets are hit.
Maximal on-target efficiency. Over the years, several sequence-to-efficiency prediction models have been developed, of which the Doench 2016 score has currently become the community standard. The score is not perfect, and examples can be found where the prediction and the actual outcome differ. Yet, to-date it is the best heuristic to predict on-target efficiency, worth making use of.
Multicrispr was developed to make guide RNA design easier. As shown below, it contains functions to define and transform targets, find spacers, compute/add offtarget counts and efficiency scores (Doench 2016), and finally return all of this as a GRanges object.
Installing multicrispr is simple:
# From BioC
install.packages("BiocManager")
BiocManager::install(version='devel')
BiocManager::install("multicrispr")
# From gitlab:
#url <- 'https://gitlab.gwdg.de/loosolab/software/multicrispr.git'
#remotes::install_git(url, repos = BiocManager::repositories())
Doench et al. (2016) ’s python package azimuth for on-target efficiency prediction using their method can be easily installed and activated using reticulate:
# Install once
# reticulate::conda_create('azienv', 'python=2.7')
# reticulate::conda_install('azienv', 'azimuth', pip = TRUE)
# reticulate::conda_install('azienv', 'scikit-learn==0.17.1', pip = TRUE)
# Then activate
reticulate::use_condaenv('azienv')
Bowtie-indexed genomes for quick offtarget analysis can be installed using index_genome
. For the two genomes used in the examples, mm10 and hg38, the functions downloads pre-build indexes from our data server, allowing a quick exploration (set download=FALSE
to build index anew):
index_genome(BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10)
index_genome(BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38 )
bed_to_granges converts a (0-based) BED coordinate file into a (1-based) GRanges.
genes_to_granges and genefile_to_granges convert entrez/ensembl gene identifiers into GRanges using txdb
gene models.
char_to_granges uses a 1-based coordinate vector to specify the GRanges. It can be used to prime edit the prion disease locus in the PRNP gene (Anzalone et al., 2019)
# char_to_granges: Anzalone et al. (2019) prime editing targets
require(multicrispr)
bsgenome <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
targets <- char_to_granges(c(PRNP = 'chr20:4699600:+'), bsgenome)
plot_intervals(targets)
find_primespacers finds prime editing spacers, 3’ extensions and nicking spacers. For both, on- and off-target analysis is performed. Desired edits are always to be specified with respect to the defauylt (+) strand.
spacers <- find_primespacers(targets, bsgenome=bsgenome, edits = "T")