In this vignette, we demonstrate the block bootstrap functionality implemented in nullranges. See the main nullranges vignette for an overview of the idea of bootstrapping, or the diagram below.
nullranges contains an implementation of a block bootstrap for genomic data, as proposed by Bickel et al. (2010), such that features (ranges) are sampled from the genome in blocks. The original block bootstrapping algorithm for genomic data is implemented in a python software called Genome Structure Correlation, GSC.
Our description of the bootRanges methods is described in Mu et al. (2023).
Minimal code for running
bootRanges() is shown below. Genome segmentation
seg and excluded regions
exclude are optional.
eh <- ExperimentHub() ah <- AnnotationHub() seg <- eh[["EH7307"]] # genome segmentation for hg38 exclude <- ah[["AH107305"]] # Kundaje excluded regions for hg38, see below set.seed(5) # set seed for reproducibility blockLength <- 5e5 # size of blocks to bootstrap R <- 10 # number of iterations of the bootstrap # remove non-standard chromosomes and mitochondrial genome ranges <- keepStandardChromosomes(ranges, pruning.mode="coarse") seqlevels(ranges, pruning.mode="coarse") <- setdiff(seqlevels(ranges), "MT") # generate bootstraps boots <- bootRanges(ranges, blockLength=blockLength, R=R, seg=seg, exclude=exclude) # `boots` can then be used with plyranges commands
Several algorithms are implemented in
bootRanges(), including segmented or not, where in the segmented version, blocks are sampled with respect to a particular genome segmentation. Overall, we recommend segmented block bootstrap given the heterogeneity of structure across the entire genome. If the purpose is block bootstrapping ranges within a smaller set of sequences, such as motifs within transcript sequence, then the unsegmented algorithm would be sufficient.
In a segmented block bootstrap, the blocks are sampled and placed within regions of a genome segmentation. That is, for a genome segmented into states \(1,2, \dots, S\), blocks from state s will be used to tile the ranges of state s in each bootstrap sample. The process can be visualized in (A), a block with length \(L_b\) is randomly sampled with replacement from state “red” and the features (ranges) that overlap this block are then copied to the first tile (which is in the “red” state). The sampling is allowed across chromosome (as shown here), as long as the two blocks are in the same state.
An example workflow of
bootRanges() used in combination with plyranges (Lee, Cook, and Lawrence 2019) is diagrammed in (B), and can be summarized as:
bootRanges()with optional arguments
exclude(excluded regions as compiled by Ogata et al. (2023)) to create a BootRanges object (\(y'\))