1 Lists

To efficiently work with several datasets, we recommend storing the GRanges objects within a standard, named list, e.g. grl <- list(a_rep1 = gr1, b_rep1 = gr2, ...).

library(BRGenomics)
data("PROseq")
data("txs_dm6_chr4")
# make 3 datasets
ps1 <- PROseq[seq(1, length(PROseq), 3)]
ps2 <- PROseq[seq(2, length(PROseq), 3)]
ps3 <- PROseq[seq(3, length(PROseq), 3)]

# use the "=" assignment in list() to give names to the list elements
ps_list <- list(ps1 = ps1, ps2 = ps2, ps3 = ps3)
names(ps_list)
## [1] "ps1" "ps2" "ps3"
ps_list
## $ps1
## GRanges object with 15794 ranges and 1 metadata column:
##           seqnames    ranges strand |     score
##              <Rle> <IRanges>  <Rle> | <integer>
##       [1]     chr4      1295      + |         1
##       [2]     chr4     42590      + |         2
##       [3]     chr4     42595      + |         1
##       [4]     chr4     42618      + |         1
##       [5]     chr4     42622      + |         2
##       ...      ...       ...    ... .       ...
##   [15790]     chr4   1307114      - |         1
##   [15791]     chr4   1307122      - |         1
##   [15792]     chr4   1307300      - |         1
##   [15793]     chr4   1316537      - |         1
##   [15794]     chr4   1319369      - |         1
##   -------
##   seqinfo: 7 sequences from an unspecified genome
## 
## $ps2
## GRanges object with 15793 ranges and 1 metadata column:
##           seqnames    ranges strand |     score
##              <Rle> <IRanges>  <Rle> | <integer>
##       [1]     chr4     41428      + |         1
##       [2]     chr4     42593      + |         5
##       [3]     chr4     42596      + |         1
##       [4]     chr4     42619      + |         2
##       [5]     chr4     42652      + |         3
##       ...      ...       ...    ... .       ...
##   [15789]     chr4   1307032      - |         1
##   [15790]     chr4   1307115      - |         2
##   [15791]     chr4   1307126      - |         1
##   [15792]     chr4   1307301      - |         1
##   [15793]     chr4   1318960      - |         1
##   -------
##   seqinfo: 7 sequences from an unspecified genome
## 
## $ps3
## GRanges object with 15793 ranges and 1 metadata column:
##           seqnames    ranges strand |     score
##              <Rle> <IRanges>  <Rle> | <integer>
##       [1]     chr4     42588      + |         1
##       [2]     chr4     42594      + |         2
##       [3]     chr4     42601      + |         1
##       [4]     chr4     42621      + |         1
##       [5]     chr4     42657      + |         1
##       ...      ...       ...    ... .       ...
##   [15789]     chr4   1307075      - |         1
##   [15790]     chr4   1307120      - |         1
##   [15791]     chr4   1307283      - |         1
##   [15792]     chr4   1307742      - |         1
##   [15793]     chr4   1319004      - |         1
##   -------
##   seqinfo: 7 sequences from an unspecified genome

A named list like the one above can be passed as an argument to nearly every function in BRGenomics, and many functions will automatically return dataframes, or melted dataframes that use the list names as the sample names (which can simplify plotting with ggplot2 or lattice).


Note that BRGenomics does also support the use of GRangesList or CompressedGRangesList classes for grouping multiple datasets. However, care should be taken when using these, as many functions with methods for GRanges objects also have methods for GRangesList objects. Furthermore, GRangesList objects can be automatically coerced into CompressedGRangesList objects, which can lower memory usage but can also incur significant performance penalties.


1.1 Example: Summarizing Signal from Multiple Samples

We can pass our list of GRanges directly to getCountsByRegions to simultaneously count reads for each dataset, and melt the result for plotting with ggplot:

getCountsByRegions(ps_list, txs_dm6_chr4[1:5], ncores = 1)
##   ps1 ps2 ps3
## 1   1   0   0
## 2  20  22  17
## 3   4   4   5
## 4  36  47  43
## 5  84  90  89
# melt, and use the optional region_names argument
txs_counts <- getCountsByRegions(ps_list, txs_dm6_chr4, melt = TRUE,
                                 region_names = txs_dm6_chr4$tx_name,
                                 ncores = 1)
head(txs_counts)
##        region signal sample
## 1 FBtr0346692      1    ps1
## 2 FBtr0344900     20    ps1
## 3 FBtr0340499      4    ps1
## 4 FBtr0333704     36    ps1
## 5 FBtr0333705     84    ps1
## 6 FBtr0100246   1017    ps1
library(ggplot2)
ggplot(txs_counts, aes(x = sample, y = signal, fill = sample)) + 
  geom_violin() + 
  theme_bw()

1.2 Example: Making Browser Shots in R

By using the getCountsByPositions() on several datasets over a single region-of-interest, we can bake our own genome browser shots within R. Using the same region we plotted earlier:

cbp_maxtx <- getCountsByPositions(ps_list, txs_dm6_chr4[135], 
                                  melt = TRUE, ncores = 1)
head(cbp_maxtx)
##   region position signal sample
## 1      1        1      0    ps1
## 2      1        2      0    ps1
## 3      1        3      0    ps1
## 4      1        4      0    ps1
## 5      1        5      0    ps1
## 6      1        6      0    ps1
ggplot(cbp_maxtx, aes(x = position, y = signal)) + 
    facet_wrap(~sample, ncol = 1, strip.position = "right") + 
    geom_col(size = 0.5, color = "darkgray") +
    coord_cartesian(expand = FALSE) +
    labs(title = txs_dm6_chr4$tx_name[135],
         x = "Distance from TSS", y = "PRO-seq Signal") + 
    theme_classic() + 
    theme(strip.text.y = element_text(angle = 0),
          strip.background = element_blank(),
          axis.line.x = element_blank(),
          axis.ticks.x = element_blank())
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.