The target identification of RNA-binding proteins is very similar in methodology to the detection of differentially expressed regions. Protocols such as eCLIP or iCLIP sequencing provide count data for individual nucleotides. DEWSeq implements a sliding window approach for the analysis of enriched regions in the immunoprecipitation (IP) sample compared to its size-matched input (SMI) control samples. This vignette explains properties of eCLIP and iCLIP data related methods and shows how to perform a differentially expressed sliding-window approach to detect RNA-binding protein’s binding sites.
DEWSeq 1.18.0
Wolfgang Huber and Matthias Hentze for mentoring, advice and discussion. Benjamin Lang and Gian Tartaglia for great help with functional analysis and benchmarking, as well as feedback on the vignette. Ina Huppertz for helpful feedback and language improvement on the vignette. Mike Love, Simon Anders, Bernd Klaus and Frederick Ziebell for comments and discussion.
RNA-binding proteins (RBPs) play a key role in the life-time of RNAs. They are involved in RNA synthesis, stability, degradation, transport and translation and add an important layer of regulation in the cell. Over 1,900 murine and over 1,400 human RBPs were detected in different high-throughput detection studies, many of them without known RNA-binding function (Hentze et al. 2018).
It is of great interest to detect an RBP’s binding sites to study the underlying mechanism of its regulatory potential. Individual nucleotide resolution crosslinking and immunoprecipitation (iCLIP) (König et al. 2010) and the further enhanced CLIP (eCLIP) protocol (Van Nostrand et al. 2016) rely on UV crosslinking inducing covalent bonds of RNA and proteins in close proximity. When reverse transcribing the RNA fragment bound to the protein, a majority of the time the reverse transcriptase will terminate at the crosslink site. Although eCLIP introduces updates in chemistry, the use of a size-matched input (SMI) control sample is an essential addition to the protocol which can be also adapted to iCLIP or similar protocols.
In iCLIP and eCLIP, truncation events are extracted as one nucleotide position next to the cDNA fragment (aligned read). In the classical protocols real truncations cannot be distinguished from read-through reads or other reads coming from otherwise truncated reads, which might be caused by RNA modifications or the crosslinking sites of other proteins. This might be different for each individual proteins (and the remaining polypeptide of the digested protein). Other protocols like HITS-CLIP and PAR-CLIP (Hafner et al. 2010) rely exclusively on read-through events (although using other reverse transcriptases). While hybrid approaches exist, the technical difficulty of these protocols requires many optimizations steps, which makes them rather hard to combine.
In summary, iCLIP and eCLIP protocols provide count data for single-nucleotide positions which might be the result of many heuristic events. These are described in the next chapter.
Unlike transcription factors, RNA-binding proteins have many different binding modes (Hentze et al. 2018), some bind in a sequence specific manner, some have preference for structures (like stem-loops), some prefer to bind RNA modifications, others are mostly found at UTRs. A large portion of RBPs do not have a known RNA-binding domain and bind using disordered regions with unknown target preferences.