HicAggR package provides a set of tools for the analysis of genomic 3D conformation data (HiC). It is designed around Aggregation Peak Analysis (APA), with the intent to provide easy-to-use sets of functions that would allow users to integrate 1D-genomics data with 3D-genomics data. This package does not perform TAD calling, loop calling, or compartment analysis. There are other packages available for these analyses (eg: TopDom (from CRAN), HiTC (bioconductor), HiCDOC (bioconductor)…). The package offers however tools to perform downstream analyses on called loops,or make use of called TADs to explore interactions within loops or explore intra/inter-TAD interactions. To this end, pixels of interest surrounding called loops or genomic features are extracted from whole HiC matrices (as submatrices), abiding to TAD and/or compartmental constraints in order to give user an insight into the interaction patterns of features of interest or called loops.
The straight-forward workflow with this package would be:
GRanges
We also propose (since version 0.99.3) a function to find couples that demonstrate a non-random (or non-background) like interaction signals. The function performs z.test to compare target couples to less-plausible or background couples to identify target couples with significantly higher interaction values than the background signal (compareToBackground
).
Please visit dedicated github pages and the github repository.
The principal objective is to simplify the extraction of interaction signals from HiC data, by making submatrices easy to access and treat if necessary. As such the contact matrices are imported as a list of ContactMatrix
objects per combination of chromosomes(eg: “chr1_chr1” or “chr1_chr2”). This allows the user to access the data easily for dowstream analyses.
Data were obtained from Drosophila melanogaster S2 cells. 1. HiC test dataset Directly downloaded from the 4DN platform. * Control Condition
* Heat Shock Condition
2. Genomic coordinates:
* ChIPseq peaks of Beaf-32 protein in wild type cells (GSM1278639). * Reference annotation data for TSS from the UCSC database.
* Topologically associating domains (TAD) annotations were defined as described (F. Ramirez, 2018).
For a test, please download HiC data in .hic format (Juicer) and .mcool format (HiCExplorer). Examples for each format are provided below.
if(length(BiocFileCache::bfcinfo(bfc)$rname)==0 ||
!"Control_HIC.hic"%in%BiocFileCache::bfcinfo(bfc)$rname){
Hic.url <- paste0("https://4dn-open-data-public.s3.amazonaws.com/",
"fourfront-webprod/wfoutput/7386f953-8da9-47b0-acb2-931cba810544/",
"4DNFIOTPSS3L.hic")
if(.Platform$OS.type == "windows"){
HicOutput.pth <- BiocFileCache::bfcadd(
x = bfc,rname = "Control_HIC.hic",
fpath = Hic.url,
download = TRUE,
config = list(method="auto",mode="wb"))
}else{
HicOutput.pth <- BiocFileCache::bfcadd(
x = bfc, rname = "Control_HIC.hic",
fpath = Hic.url,
download = TRUE,
config = list(method="auto"))
}
}else{
HicOutput.pth <- BiocFileCache::bfcpath(bfc)[
which(BiocFileCache::bfcinfo(bfc)$rname=="Control_HIC.hic")]
}
if(length(BiocFileCache::bfcinfo(bfc)$rname)==0 ||
!"HeatShock_HIC.mcool"%in%BiocFileCache::bfcinfo(bfc)$rname){
Mcool.url <- paste0("https://4dn-open-data-public.s3.amazonaws.com/",
"fourfront-webprod/wfoutput/4f1479a2-4226-4163-ba99-837f2c8f4ac0/",
"4DNFI8DRD739.mcool")
if(.Platform$OS.type == "windows"){
McoolOutput.pth <- BiocFileCache::bfcadd(
x = bfc, rname = "HeatShock_HIC.mcool",
fpath = Mcool.url,
download = TRUE,
config = list(method="auto",mode="wb"))
}else{
McoolOutput.pth <- BiocFileCache::bfcadd(
x = bfc, rname = "HeatShock_HIC.mcool",
fpath = Mcool.url,
download = TRUE,
config = list(method="auto"))
}
}else{
McoolOutput.pth <- as.character(BiocFileCache::bfcpath(bfc)[
which(BiocFileCache::bfcinfo(bfc)$rname=="HeatShock_HIC.mcool")])
}
These kind of data can be imported in R with rtracklayer package.
View
seq | start | end | strand | name | score |
---|---|---|---|---|---|
2L | 35594 | 35725 | * | Beaf32_2 | 76 |
2L | 47296 | 47470 | * | Beaf32_3 | 44 |
2L | 65770 | 65971 | * | Beaf32_5 | 520 |
View
seq | start | end | strand | name | class |
---|---|---|---|---|---|
2L | 71757 | 71757 | + | FBgn0031213 | active |
2L | 76348 | 76348 | + | FBgn0031214 | inactive |
2L | 106903 | 106903 | + | FBgn0005278 | active |
Required genomic informations used by the functions during the entire pipeline are a data.frame containing chromosomes names and sized and the binSize, corresponding to the HiC matrices at the same resolution.
The package supports the import and normalization of HiC data.
NOTE: Since version 0.99.2, the package supports import of balanced HiC matrices in .hic, .cool/.mcool formats. It also supports the import of ‘o/e’ matrices in .hic format.
HicAggR can import HiC data stored in the main formats: .hic, .cool, .mcool, .h5 (since version 0.99.2). The pacakage imports by default the raw counts. Therefore, it is necessary to perform the balancing and observed/expected correction steps.
The balancing is done such that every bin of the matrix has approximately the same number of contacts within the contactMatrix.
To correct effects due to genomic distance the matrix is corrected by the expected values for each genomic distance. The expected values are by default calculated as the average values of contacts per chromosome and per distance.
NOTE: Since version 0.99.3, 2 more options to calculate expected values have been implemented. We designated the methods as the “lieberman” and the “mean_total”. These methods were implemented based on the options proposed by HiCExplorer’s hicTransform program. The “lieberman” method computes per distance (d) expected values by dividing the sum of all contacts by the difference of chromosome length and distance(d).
The “mean_total” is simply the average of all contact values including 0 values, which are ignored in the default method (“mean_non_zero”)
OverExpectedHiC
function, expected counts can be plotted as a function of genomic distances with tibble by taking the expected attributes.Each element of the list corresponds to a ContactMatrix object (dgCMatrix object, sparse matrix format) storing contact frequencies for one chromosome (cis-interactions, ex: “2L_2L”) or between two chromosomes (trans-interactions, ex: “2L_2R”). HiC data format is based on InteractionSet and Matrix packages.
str(HiC_Ctrl.cmx_lst,max.level = 4)
#> List of 3
#> $ 2L_2L:Formal class 'ContactMatrix' [package "InteractionSet"] with 5 slots
#> .. ..@ matrix :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> .. ..@ anchor1 : int [1:4703] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..@ anchor2 : int [1:4703] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..@ regions :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> .. ..@ metadata:List of 10
#> .. .. ..$ name : chr "2L_2L"
#> .. .. ..$ type : chr "cis"
#> .. .. ..$ kind : chr "U"
#> .. .. ..$ symmetric : logi TRUE
#> .. .. ..$ resolution : num 5000
#> .. .. ..$ expected : num [1:2410023] 719 673 719 370 673 ...
#> .. .. ..$ removedCounts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> .. .. ..$ observed : num [1:2410023] 1329 1721 1852 1076 2191 ...
#> .. .. ..$ normalizer : num [1:2410023] 1.399 1.115 0.888 1.139 0.908 ...
#> .. .. ..$ mtx : chr "norm"
#> $ 2L_2R:Formal class 'ContactMatrix' [package "InteractionSet"] with 5 slots
#> .. ..@ matrix :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> .. ..@ anchor1 : int [1:4703] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..@ anchor2 : int [1:5058] 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 ...
#> .. ..@ regions :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> .. ..@ metadata:List of 9
#> .. .. ..$ name : chr "2L_2R"
#> .. .. ..$ type : chr "trans"
#> .. .. ..$ kind : chr NA
#> .. .. ..$ symmetric : logi FALSE
#> .. .. ..$ resolution : num 5000
#> .. .. ..$ removedCounts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> .. .. ..$ observed : num [1:1292193] 1 1 1 1 1 1 1 1 1 1 ...
#> .. .. ..$ normalizer : num [1:1292193] 2.73 2.82 2.78 2.48 4.4 ...
#> .. .. ..$ expected : num 0.0596
#> $ 2R_2R:Formal class 'ContactMatrix' [package "InteractionSet"] with 5 slots
#> .. ..@ matrix :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> .. ..@ anchor1 : int [1:5058] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..@ anchor2 : int [1:5058] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..@ regions :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> .. ..@ metadata:List of 10
#> .. .. ..$ name : chr "2R_2R"
#> .. .. ..$ type : chr "cis"
#> .. .. ..$ kind : chr "U"
#> .. .. ..$ symmetric : logi TRUE
#> .. .. ..$ resolution : num 5000
#> .. .. ..$ expected : num [1:2769177] 746 703 746 152 192 ...
#> .. .. ..$ removedCounts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> .. .. ..$ observed : num [1:2769177] 374 368 614 95 131 420 18 17 26 332 ...
#> .. .. ..$ normalizer : num [1:2769177] 2.62 2.19 1.84 2.94 2.46 ...
#> .. .. ..$ mtx : chr "norm"
#> - attr(*, "resolution")= num 5000
#> - attr(*, "chromSize")= tibble [2 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ name : chr [1:2] "2L" "2R"
#> ..$ length : num [1:2] 23513712 25286936
#> ..$ dimension: num [1:2] 4703 5058
#> - attr(*, "matricesKind")= tibble [3 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ name : chr [1:3] "2L_2L" "2L_2R" "2R_2R"
#> ..$ type : chr [1:3] "cis" "trans" "cis"
#> ..$ kind : chr [1:3] "U" NA "U"
#> ..$ symmetric: logi [1:3] TRUE FALSE TRUE
#> - attr(*, "mtx")= chr "o/e"
#> - attr(*, "expected")= tibble [4,961 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ distance: num [1:4961] 1 5001 10001 15001 20001 ...
#> ..$ expected: num [1:4961] 733 689 377 257 191 ...
#>
The list has the attributes described below. These attributes are accessible via:
attributes(HiC_Ctrl.cmx_lst)
#> $names
#> [1] "2L_2L" "2L_2R" "2R_2R"
#>
#> $resolution
#> [1] 5000
#>
#> $chromSize
#> # A tibble: 2 × 3
#> name length dimension
#> <chr> <dbl> <dbl>
#> 1 2L 23513712 4703
#> 2 2R 25286936 5058
#>
#> $matricesKind
#> # A tibble: 3 × 4
#> name type kind symmetric
#> <chr> <chr> <chr> <lgl>
#> 1 2L_2L cis U TRUE
#> 2 2L_2R trans <NA> FALSE
#> 3 2R_2R cis U TRUE
#>
#> $mtx
#> [1] "o/e"
#>
#> $expected
#> # A tibble: 4,961 × 2
#> distance expected
#> <dbl> <dbl>
#> 1 1 733.
#> 2 5001 689.
#> 3 10001 377.
#> 4 15001 257.
#> # ℹ 4,957 more rows
#>
OverExpectedHiC
function. It gives a tibble with the expected counts as a function of genomic distance.Each contactmatrix in the list have metadata. These are accessible via:
str(S4Vectors::metadata(HiC_Ctrl.cmx_lst[["2L_2L"]]))
#> List of 10
#> $ name : chr "2L_2L"
#> $ type : chr "cis"
#> $ kind : chr "U"
#> $ symmetric : logi TRUE
#> $ resolution : num 5000
#> $ expected : num [1:2410023] 719 673 719 370 673 ...
#> $ removedCounts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> .. ..@ i : int [1:266585] 2 10 18 19 20 28 32 37 38 40 ...
#> .. ..@ p : int [1:4704] 0 0 0 0 0 0 0 0 0 0 ...
#> .. ..@ Dim : int [1:2] 4703 4703
#> .. ..@ Dimnames:List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : NULL
#> .. ..@ x : num [1:266585] 1 1 1 2 1 1 1 1 1 2 ...
#> .. ..@ factors : list()
#> $ observed : num [1:2410023] 1329 1721 1852 1076 2191 ...
#> $ normalizer : num [1:2410023] 1.399 1.115 0.888 1.139 0.908 ...
#> $ mtx : chr "norm"
#>
BalanceHiC
).OverExpectedHiC
function. It gives the expected vector that convert the normalized counts into the observed/expected counts (normalized counts / expected = observed/expected).This part of the data corresponds to the positioning data (ChIPseq peaks, genomic features and annotations, genes, etc) on the genome. To integrate such annotations with HiC data in 2D matrices, annotations must be processed as followed.
The first step is the indexing of the features. It allows the features to be splitted and grouped into bins corresponding to the HiC bin size.
anchors_Index.gnr <- IndexFeatures(
gRangeList = list(Beaf=Beaf32_Peaks.gnr),
genomicConstraint = TADs_Domains.gnr,
chromSizes = chromSizes,
binSize = binSize,
metadataColName = "score",
method = "max"
)
View
seqnames | start | end | width | strand | name | bin | constraint | Beaf.score | Beaf.name | Beaf.bln |
---|---|---|---|---|---|---|---|---|---|---|
2L | 70001 | 75000 | 5000 | * | 2L:15:Tad_1 | 2L:15 | Tad_1 | 205 | Beaf32_8 | TRUE |
2L | 100001 | 105000 | 5000 | * | 2L:21:Tad_2 | 2L:21 | Tad_2 | 1830 | Beaf32_11 | TRUE |
2L | 110001 | 115000 | 5000 | * | 2L:23:Tad_3 | 2L:23 | Tad_3 | 1707 | Beaf32_14 | TRUE |
baits_Index.gnr <- IndexFeatures(
gRangeList = list(Tss=TSS_Peaks.gnr),
genomicConstraint = TADs_Domains.gnr,
chromSizes = chromSizes,
binSize = binSize,
metadataColName = "score",
method = "max"
)
View
seqnames | start | end | width | strand | name | bin | constraint | Tss.class | Tss.name | Tss.bln |
---|---|---|---|---|---|---|---|---|---|---|
2L | 75001 | 80000 | 5000 | * | 2L:16:Tad_1 | 2L:16 | Tad_1 | inactive | FBgn0031214 | TRUE |
2L | 105001 | 110000 | 5000 | * | 2L:22:Tad_3 | 2L:22 | Tad_3 | active | FBgn0026…. | TRUE |
2L | 115001 | 120000 | 5000 | * | 2L:24:Tad_3 | 2L:24 | Tad_3 | active | FBgn0031219 | TRUE |
By using features names and bin IDs, it is possible to filter a subset of features. Example: Subset TSS that are not in the same bin than a Beaf32 peak.
non_Overlaps.ndx <- match(baits_Index.gnr$bin,
anchors_Index.gnr$bin, nomatch=0L)==0L
baits_Index.gnr <- baits_Index.gnr[non_Overlaps.ndx,]
View
seqnames | start | end | width | strand | name | bin | constraint | Tss.class | Tss.name | Tss.bln |
---|---|---|---|---|---|---|---|---|---|---|
2L | 75001 | 80000 | 5000 | * | 2L:16:Tad_1 | 2L:16 | Tad_1 | inactive | FBgn0031214 | TRUE |
2L | 105001 | 110000 | 5000 | * | 2L:22:Tad_3 | 2L:22 | Tad_3 | active | FBgn0026…. | TRUE |
2L | 115001 | 120000 | 5000 | * | 2L:24:Tad_3 | 2L:24 | Tad_3 | active | FBgn0031219 | TRUE |
metadataColName
) can be summarized according to the defined method (method
), Example: Max peak score of the bin is kept in metadata column score
.SearchPairs
function takes as input one or two indexed features and returns all putative pairs within the same constraint (ex: wihtin the same TAD).
If only one indexed features is defined in indexAnchor, SearchPairs
will return symetrical homotypic pairs (A<->A), if indexAnchor and indexBait are defined, it will return asymetrical heterotypic pairs (A<->B).
seq | start | end | seq | start | end | name | constraint | distance | orientation | submatrix.name | name | bin | Beaf.name | Beaf.score | Beaf.bln | name | bin | Tss.name | Tss.class | Tss.bln |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2L | 70001 | 75000 | 2L | 75001 | 80000 | 2L:15_2L:16 | Tad_1 | 5000 | TRUE | 2L:15_2L:16 | 2L:15:Tad_1 | 2L:15 | Beaf32_8 | 205 | TRUE | 2L:16:Tad_1 | 2L:16 | FBgn0031214 | inactive | TRUE |
2L | 110001 | 115000 | 2L | 105001 | 110000 | 2L:23_2L:22 | Tad_3 | 5000 | FALSE | 2L:22_2L:23 | 2L:23:Tad_3 | 2L:23 | Beaf32_14 | 1707 | TRUE | 2L:22:Tad_3 | 2L:22 | FBgn0026…. | active | TRUE |
2L | 120001 | 125000 | 2L | 105001 | 110000 | 2L:25_2L:22 | Tad_3 | 15000 | FALSE | 2L:22_2L:25 | 2L:25:Tad_3 | 2L:25 | Beaf32_15 | 484 | TRUE | 2L:22:Tad_3 | 2L:22 | FBgn0026…. | active | TRUE |
indexBait
is NULL, SearchPairs
will return homotypic pairs with indexAnchor
.In extracted matrices, the middle of the Y axis corresponds to the center of the first element and interacts with the center of second element in the middle of the X axis.
The middle of the Y axis corresponds to the start of the range and interacts with the middle of the X axis which corresponds to the end of the range.
In this case, extracted matrices are resized and scaled in order to fit all regions into the same area.
The region’s start is defined by the center of the first element and the region’s end by the center of the second element.
Step 1: generate a GRanges object of TAD boundaries by concatenating starts and ends of TADs.
domains_Border.gnr <- c(
GenomicRanges::resize(TADs_Domains.gnr, 1, "start"),
GenomicRanges::resize(TADs_Domains.gnr, 1, "end" )
) |>
sort()
Step 2: Filter and reduce TAD boundaries GRanges object according to HiC resolution (binSize) + Store TAD names.
domains_Border_Bin.gnr <- BinGRanges(
gRange = domains_Border.gnr,
binSize = binSize,
verbose = FALSE
)
domains_Border_Bin.gnr$subname <- domains_Border_Bin.gnr$name
domains_Border_Bin.gnr$name <- domains_Border_Bin.gnr$bin
View
seq | start | end | strand | name | score | class | bin | subname |
---|---|---|---|---|---|---|---|---|
2L | 70001 | 75000 | * | 2L:15 | 3 | active | 2L:15 | Tad_1 |
2L | 90001 | 95000 | * | 2L:19 | 3, 8 | active | 2L:19 | Tad_1, Tad_2 |
2L | 100001 | 105000 | * | 2L:21 | 8 | active | 2L:21 | Tad_2, Tad_3 |
Step 3: This defines a GRanges object. In the folowing examples, the same information is needed in a GInteraction object class.
domains_Border_Bin.gni <-
InteractionSet::GInteractions(
domains_Border_Bin.gnr,domains_Border_Bin.gnr)
seq | start | end | name | score | class | bin | subname | seq | start | end | name | score | class | bin | subname |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2L | 70001 | 75000 | 2L:15 | 3 | active | 2L:15 | Tad_1 | 2L | 70001 | 75000 | 2L:15 | active | 3 | 2L:15 | Tad_1 |
2L | 90001 | 95000 | 2L:19 | 3, 8 | active | 2L:19 | Tad_1, Tad_2 | 2L | 90001 | 95000 | 2L:19 | active | 3, 8 | 2L:19 | Tad_1, Tad_2 |
2L | 100001 | 105000 | 2L:21 | 8 | active | 2L:21 | Tad_2, Tad_3 | 2L | 100001 | 105000 | 2L:21 | active | 8 | 2L:21 | Tad_2, Tad_3 |
Here the start and the end of each ranges are in a same bin.
hicResolution
is NULL, the function will atuomatically use the resolution of the hicLst
attributes.The modularity of the workflow allows the user to filter interactions, pairs or extracted submatrices at any step of the analysis. FilterInteractions
function takes as input either a GInteraction object or a list of submatrices, and a list of targets of choice and a selectionFunction defining how targets are filtered.
Target list must be defined by a named list corresponding to the same names of each element and correspond to the column of the GInteraction (or the attributes “interactions” of the matrices to be filtered). Then each element must be a character list to match this column or a function that will test each row in the column and return a bolean.
structureTarget.lst <- list(
first_colname_of_GInteraction = c("value"),
second_colname_of_GInteraction = function(eachElement){
min_th<value && value<max_th}
)
Interactions, pairs or extracted submatrices are filtered by metadata elements from GRanges objects used in SearchPairs
. Those metadata are stored in the attributes of the list of submatrices that are accessible as follow:
attributes(interactions_RFmatrix_ctrl.lst)$interactions
names(S4Vectors::mcols(attributes(interactions_RFmatrix_ctrl.lst)$interactions))
View
seq | start | end | seq | start | end | name | constraint | distance | orientation | submatrix.name | name | bin | Beaf.name | Beaf.score | Beaf.bln | name | bin | Tss.name | Tss.class | Tss.bln |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2L | 120001 | 125000 | 2L | 105001 | 110000 | 2L:25_2L:22 | Tad_3 | 15000 | FALSE | 2L:22_2L:25 | 2L:25:Tad_3 | 2L:25 | Beaf32_15 | 484 | TRUE | 2L:22:Tad_3 | 2L:22 | FBgn0026…. | active | TRUE |
2L | 470001 | 475000 | 2L | 420001 | 425000 | 2L:95_2L:85 | Tad_17 | 50000 | FALSE | 2L:85_2L:95 | 2L:95:Tad_17 | 2L:95 | Beaf32_62 | 37 | TRUE | 2L:85:Tad_17 | 2L:85 | FBgn0031253 | active | TRUE |
2L | 470001 | 475000 | 2L | 450001 | 455000 | 2L:95_2L:91 | Tad_17 | 20000 | FALSE | 2L:91_2L:95 | 2L:95:Tad_17 | 2L:95 | Beaf32_62 | 37 | TRUE | 2L:91:Tad_17 | 2L:91 | FBgn0015924 | active | TRUE |
2L | 2490001 | 2495000 | 2L | 2530001 | 2535000 | 2L:499_2L:507 | Tad_64 | 40000 | TRUE | 2L:499_2L:507 | 2L:499:Tad_64 | 2L:499 | Beaf32_204 | 231 | TRUE | 2L:507:Tad_64 | 2L:507 | FBgn0264943 | inactive | TRUE |
2L | 2675001 | 2680000 | 2L | 2650001 | 2655000 | 2L:536_2L:531 | Tad_67 | 25000 | FALSE | 2L:531_2L:536 | 2L:536:Tad_67 | 2L:536 | Beaf32_210 | 124 | TRUE | 2L:531:Tad_67 | 2L:531 | FBgn0031442 | inactive | TRUE |
2L | 2800001 | 2805000 | 2L | 2770001 | 2775000 | 2L:561_2L:555 | Tad_70 | 30000 | FALSE | 2L:555_2L:561 | 2L:561:Tad_70 | 2L:561 | Beaf32_227 | 185 | TRUE | 2L:555:Tad_70 | 2L:555 | FBgn0019…. | active | TRUE |
2L | 2805001 | 2810000 | 2L | 2770001 | 2775000 | 2L:562_2L:555 | Tad_70 | 35000 | FALSE | 2L:555_2L:562 | 2L:562:Tad_70 | 2L:562 | Beaf32_227 | 185 | TRUE | 2L:555:Tad_70 | 2L:555 | FBgn0019…. | active | TRUE |
2L | 2975001 | 2980000 | 2L | 2990001 | 2995000 | 2L:596_2L:599 | Tad_76 | 15000 | TRUE | 2L:596_2L:599 | 2L:596:Tad_76 | 2L:596 | Beaf32_244 | 98 | TRUE | 2L:599:Tad_76 | 2L:599 | FBgn0025681 | active | TRUE |
2L | 3345001 | 3350000 | 2L | 3330001 | 3335000 | 2L:670_2L:667 | Tad_86 | 15000 | FALSE | 2L:667_2L:670 | 2L:670:Tad_86 | 2L:670 | Beaf32_2…. | 179 | TRUE | 2L:667:Tad_86 | 2L:667 | FBgn0031523 | inactive | TRUE |
2L | 3350001 | 3355000 | 2L | 3330001 | 3335000 | 2L:671_2L:667 | Tad_86 | 20000 | FALSE | 2L:667_2L:671 | 2L:671:Tad_86 | 2L:671 | Beaf32_281 | 100 | TRUE | 2L:667:Tad_86 | 2L:667 | FBgn0031523 | inactive | TRUE |
In this example, Pairs will be filtered on anchor.Beaf.name, bait.Tss.name, name (which correponds to the submatrix IDs) and distance. The aim of the example is to filter Pairs or submatrices that have:
The selectionFunction defines which operations (union(), intersect(), setdiff()…) are used to filter the set of Pairs with target elements. For more examples, see Selection function tips and examples section.
With a GInteraction object as input, FilterInteractions
will return the indices of filtered elements.
With the targets and selectionFun defined above:
With a matrices list as input, FilterInteractions
will return the filtered matrices list, with updated attributes.
With the targets and selectionFun defined above:
For example, to filter the top 100 first elements, select the top 100 first names
first100_interactions_RFmatrix_ctrl.lst <- FilterInteractions(
matrices = interactions_RFmatrix_ctrl.lst,
targets = first100_targets,
selectionFun = NULL
)
attributes(first100_interactions_RFmatrix_ctrl.lst)$interactions
#> GInteractions object with 100 interactions and 15 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 |
#> <Rle> <IRanges> <Rle> <IRanges> |
#> 2L:25_2L:22 2L 120001-125000 --- 2L 105001-110000 |
#> 2L:95_2L:85 2L 470001-475000 --- 2L 420001-425000 |
#> 2L:95_2L:91 2L 470001-475000 --- 2L 450001-455000 |
#> 2L:499_2L:507 2L 2490001-2495000 --- 2L 2530001-2535000 |
#> 2L:536_2L:531 2L 2675001-2680000 --- 2L 2650001-2655000 |
#> ... ... ... ... ... ... .
#> 2L:4313_2L:4299 2L 21560001-21565000 --- 2L 21490001-21495000 |
#> 2L:4315_2L:4299 2L 21570001-21575000 --- 2L 21490001-21495000 |
#> 2L:4312_2L:4305 2L 21555001-21560000 --- 2L 21520001-21525000 |
#> 2L:4313_2L:4305 2L 21560001-21565000 --- 2L 21520001-21525000 |
#> 2L:4315_2L:4305 2L 21570001-21575000 --- 2L 21520001-21525000 |
#> name constraint distance orientation
#> <character> <character> <integer> <logical>
#> 2L:25_2L:22 2L:25_2L:22 Tad_3 15000 FALSE
#> 2L:95_2L:85 2L:95_2L:85 Tad_17 50000 FALSE
#> 2L:95_2L:91 2L:95_2L:91 Tad_17 20000 FALSE
#> 2L:499_2L:507 2L:499_2L:507 Tad_64 40000 TRUE
#> 2L:536_2L:531 2L:536_2L:531 Tad_67 25000 FALSE
#> ... ... ... ... ...
#> 2L:4313_2L:4299 2L:4313_2L:4299 Tad_486 70000 FALSE
#> 2L:4315_2L:4299 2L:4315_2L:4299 Tad_486 80000 FALSE
#> 2L:4312_2L:4305 2L:4312_2L:4305 Tad_486 35000 FALSE
#> 2L:4313_2L:4305 2L:4313_2L:4305 Tad_486 40000 FALSE
#> 2L:4315_2L:4305 2L:4315_2L:4305 Tad_486 50000 FALSE
#> submatrix.name anchor.bin anchor.name bait.bin
#> <character> <character> <character> <character>
#> 2L:25_2L:22 2L:22_2L:25 2L:25 2L:25:Tad_3 2L:22
#> 2L:95_2L:85 2L:85_2L:95 2L:95 2L:95:Tad_17 2L:85
#> 2L:95_2L:91 2L:91_2L:95 2L:95 2L:95:Tad_17 2L:91
#> 2L:499_2L:507 2L:499_2L:507 2L:499 2L:499:Tad_64 2L:507
#> 2L:536_2L:531 2L:531_2L:536 2L:536 2L:536:Tad_67 2L:531
#> ... ... ... ... ...
#> 2L:4313_2L:4299 2L:4299_2L:4313 2L:4313 2L:4313:Tad_486 2L:4299
#> 2L:4315_2L:4299 2L:4299_2L:4315 2L:4315 2L:4315:Tad_486 2L:4299
#> 2L:4312_2L:4305 2L:4305_2L:4312 2L:4312 2L:4312:Tad_486 2L:4305
#> 2L:4313_2L:4305 2L:4305_2L:4313 2L:4313 2L:4313:Tad_486 2L:4305
#> 2L:4315_2L:4305 2L:4305_2L:4315 2L:4315 2L:4315:Tad_486 2L:4305
#> bait.name anchor.Beaf.score anchor.Beaf.name
#> <character> <numeric> <list>
#> 2L:25_2L:22 2L:22:Tad_3 484 Beaf32_15
#> 2L:95_2L:85 2L:85:Tad_17 37 Beaf32_62
#> 2L:95_2L:91 2L:91:Tad_17 37 Beaf32_62
#> 2L:499_2L:507 2L:507:Tad_64 231 Beaf32_204
#> 2L:536_2L:531 2L:531:Tad_67 124 Beaf32_210
#> ... ... ... ...
#> 2L:4313_2L:4299 2L:4299:Tad_486 748 Beaf32_1348
#> 2L:4315_2L:4299 2L:4299:Tad_486 529 Beaf32_1349
#> 2L:4312_2L:4305 2L:4305:Tad_486 748 Beaf32_1348
#> 2L:4313_2L:4305 2L:4305:Tad_486 748 Beaf32_1348
#> 2L:4315_2L:4305 2L:4305:Tad_486 529 Beaf32_1349
#> anchor.Beaf.bln bait.Tss.class bait.Tss.name
#> <logical> <list> <list>
#> 2L:25_2L:22 TRUE active FBgn0026787,FBgn0005278
#> 2L:95_2L:85 TRUE active FBgn0031253
#> 2L:95_2L:91 TRUE active FBgn0015924
#> 2L:499_2L:507 TRUE inactive FBgn0264943
#> 2L:536_2L:531 TRUE inactive FBgn0031442
#> ... ... ... ...
#> 2L:4313_2L:4299 TRUE inactive FBgn0053837
#> 2L:4315_2L:4299 TRUE inactive FBgn0053837
#> 2L:4312_2L:4305 TRUE inactive FBgn0053873
#> 2L:4313_2L:4305 TRUE inactive FBgn0053873
#> 2L:4315_2L:4305 TRUE inactive FBgn0053873
#> bait.Tss.bln
#> <logical>
#> 2L:25_2L:22 TRUE
#> 2L:95_2L:85 TRUE
#> 2L:95_2L:91 TRUE
#> 2L:499_2L:507 TRUE
#> 2L:536_2L:531 TRUE
#> ... ...
#> 2L:4313_2L:4299 TRUE
#> 2L:4315_2L:4299 TRUE
#> 2L:4312_2L:4305 TRUE
#> 2L:4313_2L:4305 TRUE
#> 2L:4315_2L:4305 TRUE
#> -------
#> regions: 335 ranges and 0 metadata columns
#> seqinfo: 2 sequences from an unspecified genome
Warning! A selection of some matrices removes attributes.
nSample.num = 3
set.seed(123)
targets = list(name=sample(
attributes(interactions_RFmatrix_ctrl.lst)$interactions$name,nSample.num))
sampled_interactions_RFmatrix_ctrl.lst <- FilterInteractions(
matrices = interactions_RFmatrix_ctrl.lst,
targets = targets,
selectionFun = NULL
)
attributes(sampled_interactions_RFmatrix_ctrl.lst)$interactions
#> GInteractions object with 3 interactions and 15 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 |
#> <Rle> <IRanges> <Rle> <IRanges> |
#> 2L:1012_2L:1015 2L 5055001-5060000 --- 2L 5070001-5075000 |
#> 2L:3002_2L:2978 2L 15005001-15010000 --- 2L 14885001-14890000 |
#> 2R:1824_2R:1820 2R 9115001-9120000 --- 2R 9095001-9100000 |
#> name constraint distance orientation
#> <character> <character> <integer> <logical>
#> 2L:1012_2L:1015 2L:1012_2L:1015 Tad_132 15000 TRUE
#> 2L:3002_2L:2978 2L:3002_2L:2978 Tad_356 120000 FALSE
#> 2R:1824_2R:1820 2R:1824_2R:1820 Tad_608 20000 FALSE
#> submatrix.name anchor.bin anchor.name bait.bin
#> <character> <character> <character> <character>
#> 2L:1012_2L:1015 2L:1012_2L:1015 2L:1012 2L:1012:Tad_132 2L:1015
#> 2L:3002_2L:2978 2L:2978_2L:3002 2L:3002 2L:3002:Tad_356 2L:2978
#> 2R:1824_2R:1820 2R:1820_2R:1824 2R:1824 2R:1824:Tad_608 2R:1820
#> bait.name anchor.Beaf.score anchor.Beaf.name
#> <character> <numeric> <list>
#> 2L:1012_2L:1015 2L:1015:Tad_132 660 Beaf32_374,Beaf32_375
#> 2L:3002_2L:2978 2L:2978:Tad_356 49 Beaf32_1001
#> 2R:1824_2R:1820 2R:1820:Tad_608 977 Beaf32_1734
#> anchor.Beaf.bln bait.Tss.class bait.Tss.name bait.Tss.bln
#> <logical> <list> <list> <logical>
#> 2L:1012_2L:1015 TRUE active FBgn0261608 TRUE
#> 2L:3002_2L:2978 TRUE inactive FBgn0266840 TRUE
#> 2R:1824_2R:1820 TRUE active FBgn0082927 TRUE
#> -------
#> regions: 335 ranges and 0 metadata columns
#> seqinfo: 2 sequences from an unspecified genome
Without any selectionFunction, FilterInteractions
will return all indices corresponding to each target in the list. Then, the indices of interest can be selected in a second step. For the examples we take the folowing targets:
targets <- list(
anchor.Beaf.name = c("Beaf32_8","Beaf32_15"),
bait.Tss.name = c("FBgn0031214","FBgn0005278"),
name = c("2L:74_2L:77"),
distance = function(columnElement){
return(14000==columnElement || columnElement == 3000)
}
)
Reduce(intersect, list(a,b,c)) |> sort()
#> [1] "G"
intersect(a,b) |> intersect(c) |> sort()
#> [1] "G"
Reduce(union, list(a,b,c)) |> sort()
#> [1] "A" "B" "C" "D" "E" "F" "G"
union(a,b) |> union(c) |> sort()
#> [1] "A" "B" "C" "D" "E" "F" "G"
ExtractSubmatrix
returns submatrices orientated according to 5’->3’ orientation of the chromosome. In the case of heterotypic or asymetric pairs (anchor != bait), anchors and baits are thus mixed on Y and X axis of the matrices.
OrientateMatrix
function allows to force all matrices to be orientated in a way that anchors will be systematically on Y axis and baits on X axis.
PrepareMtxList
can be used to perform operations on the matrices list. This function prepares the matrices list by performing a per matrix operation by transforming values. For example values can be quantilized inorder to rank local interactions and highlight on the contacts with the highest values. This function can also be used to correct orientations, just as OrientateMatrix
. The reason for giving access for the user to this originally hidden function is to have a uniformly prepared matrices list for both quantifications and visualisations.
oriented_quantiled_first100_interactions_RFmatrix_ctrl.lst <-
PrepareMtxList(
first100_interactions_RFmatrix_ctrl.lst,
transFun = 'quantile',
orientate = TRUE)
#> 69 matrices are oriented
oriented_first100_interactions_RFmatrix_ctrl.lst <-
PrepareMtxList(
first100_interactions_RFmatrix_ctrl.lst,
orientate = TRUE)
#> 69 matrices are oriented
GetQuantif
function takes as input a list of submatrices and returns a vector of contact frequencies in a given typeof regions where these contacts are computed with a function.
The GetQuantif
function extracts per submatrix the average values of the 3*3 central pixels by default (see GetQuantif
).
Example: Average of the values in the centered 3x3 square.
The GetQuantif
function also takes custom area and operation in parameter.
Example: Interactions values on the matrice.mtx[33:35,67:69] area, averaged after removing all zeros.
GetQuantif(
matrices = oriented_first100_interactions_RFmatrix_ctrl.lst,
areaFun = function(matrice.mtx){matrice.mtx[33:35,67:69]},
operationFun = function(area.mtx){
area.mtx[which(area.mtx==0)]<-NA;
return(mean(area.mtx,na.rm=TRUE))
}
) |>
c() |>
unlist() |>
head()
#> 2L:25_2L:22 2L:95_2L:85 2L:95_2L:91 2L:499_2L:507 2L:536_2L:531
#> 1.2652268 1.0275631 1.8478690 0.8648371 1.1975185
#> 2L:561_2L:555
#> 1.6126441
By default, returned values are named with submatrix ID. If varName
is set with an element metadata column name from GInteraction attributes, values are returned values are named according to this element.
Example: Named quantifications with anchor.Beaf.name
namedCenter.num <- GetQuantif(
matrices = oriented_first100_interactions_RFmatrix_ctrl.lst,
areaFun = "center",
operationFun = "mean",
varName = "anchor.Beaf.name"
)
Note that changing submatrix ID for other names can create name duplicates:
Example: The 46th matrix is correspond to two Beaf32 peaks, i.e. it has two anchor.Beaf.name
name | anchor.Beaf.name | |
---|---|---|
45 | 2L:2676_2L:2680 | Beaf32_950 |
46 | 2L:2768_2L:2765 | Beaf32_981 |
47 | 2L:2971_2L:2977 | Beaf32_1000 |
48 | 2L:3002_2L:2977 | Beaf32_1001 |
49 | 2L:2971_2L:2978 | Beaf32_1000 |
50 | 2L:3002_2L:2978 | Beaf32_1001 |
As a consequence, the value in center.num is duplicated in namedCenter.num
unlist(c(center.num))[45:50]
#> 2L:2676_2L:2680 2L:2768_2L:2765 2L:2971_2L:2977 2L:3002_2L:2977 2L:2971_2L:2978
#> 0.7318490 0.8071858 0.5238616 0.7522744 0.5438497
#> 2L:3002_2L:2978
#> 0.7135216
unlist(c(namedCenter.num))[45:51]
#> Beaf32_378 Beaf32_408 Beaf32_408 Beaf32_408 Beaf32_437 Beaf32_518 Beaf32_521
#> 1.1762447 1.0476893 0.8868763 0.7318490 0.8071858 0.5238616 0.7522744
Duplicated value index are stored in attributes.
GetQuantif(
matrices = oriented_first100_interactions_RFmatrix_ctrl.lst,
areaFun = function(matrice.mtx){matrice.mtx[5,5]},
operationFun = function(area.mtx){area.mtx}
) |>
head()
#> 2L:25_2L:22 2L:95_2L:85 2L:95_2L:91 2L:499_2L:507 2L:536_2L:531
#> 1.6036209 1.2231158 1.2129980 0.8516612 0.3858703
#> 2L:561_2L:555
#> 0.8043792
GetQuantif(
matrices = oriented_first100_interactions_RFmatrix_ctrl.lst,
areaFun = function(matrice.mtx){matrice.mtx[4:6,4:6]},
operationFun = function(area){area}
) |>
head()
#> 2L:25_2L:22 2L:95_2L:85 2L:95_2L:91 2L:499_2L:507 2L:536_2L:531
#> 1.599228 NA NA 1.716875 1.603621
#> 2L:561_2L:555
#> NA
operationFun
is NULL then it will return values of the selected region without NA.GetQuantif(
matrices = oriented_first100_interactions_RFmatrix_ctrl.lst,
areaFun = function(matrice.mtx){matrice.mtx[4:6,4:6]},
operationFun = NULL
) |>
head()
#> 2L:25_2L:22 2L:95_2L:85 2L:95_2L:91 2L:499_2L:507 2L:536_2L:531
#> 1.599228 1.716875 1.603621 1.834523 1.731368
#> 2L:561_2L:555
#> 1.628212
Aggregation
function takes as input a list of submatrices and returns an aggregated matrix using the aggregation function defined by the user.
Aggregation
function has some default aggregation functions like sum
, mean
or median
(see Aggregation
)
# rm0 argument can be added to PrepareMtxList to assign NA to 0 values.
oriented_first100_interactions_RFmatrix_ctrl.lst =
PrepareMtxList(
oriented_first100_interactions_RFmatrix_ctrl.lst,
rm0 = FALSE)
agg_sum.mtx <- Aggregation(
matrices = oriented_first100_interactions_RFmatrix_ctrl.lst,
aggFun = "sum"
)
Aggregation
function can take as input two list of submatrices from two samples or conditions and returns a differential aggregated matrix. Two ways to obtain differential aggregation are applied, first is by assessing differences on each individual pairs of submatrices then aggregate the differences; second is by aggregating matrices and assess differences on the aggregated matrices (see examples below).
first100_targets = list(
submatrix.name = names(interactions_RFmatrix_ctrl.lst)[1:100]
)
first100_interactions_RFmatrix_ctrl.lst <- FilterInteractions(
matrices = interactions_RFmatrix_ctrl.lst,
targets = first100_targets,
selectionFun = NULL
)
Orientation
oriented_first100_interactions_RFmatrix_ctrl.lst <-
OrientateMatrix(first100_interactions_RFmatrix_ctrl.lst)
interactions_RFmatrix.lst <- ExtractSubmatrix(
genomicFeature = interactions.gni,
hicLst = HiC_HS.cmx_lst,
referencePoint = "rf",
matriceDim = 101
)
Filtration
first100_interactions_RFmatrix.lst <- FilterInteractions(
matrices = interactions_RFmatrix.lst,
targets = first100_targets,
selectionFun = NULL
)
Orientation
oriented_first100_interactions_RFmatrix_ctrl.lst =
PrepareMtxList(first100_interactions_RFmatrix_ctrl.lst,
minDist = NULL,
maxDist = NULL,
rm0 = FALSE,
orientate = TRUE
)
#> 69 matrices are oriented
oriented_first100_interactions_RFmatrix.lst =
PrepareMtxList(first100_interactions_RFmatrix.lst,
minDist = NULL,
maxDist = NULL,
rm0 = FALSE,
orientate = TRUE
)
#> 69 matrices are oriented
diffAggreg.mtx <- Aggregation(
ctrlMatrices = oriented_first100_interactions_RFmatrix_ctrl.lst,
matrices = oriented_first100_interactions_RFmatrix.lst,
aggFun = "mean",
diffFun = "substraction",
scaleCorrection = TRUE,
correctionArea = list(
i = c(1:30),
j = c(72:101)
),
statCompare = TRUE)
On PrepareMtxList
function:
PrepareMtxList
acts as a one stop function to perform value treatment and orientation correction allowing to have consistent matrices list for both quantification and visualization process to come.rm0
is TRUE
all zeros in matrices list will be replaced by NA.OrientateMatrix
function.PrepareMtxList
keeps former attributes matrices list and adds new ones:
On Aggregation
function:
matrices
or ctrlMatrices
parametersstatCompare
may not be set TRUE
every time (due to memory requirement).Aggregation
on one sample keeps former attributes of the matrices list and add new ones:
Aggregation
on two samples adds additional attributes:
oriented_interactions_RFmatrix_ctrl.lst <-
OrientateMatrix(interactions_RFmatrix_ctrl.lst)
#> 95 matrices are oriented
orientedAggreg.mtx <- Aggregation(
ctrlMatrices=oriented_interactions_RFmatrix_ctrl.lst,
aggFun="mean"
)
oriented_interactions_RFmatrix.lst <-
OrientateMatrix(interactions_RFmatrix.lst)
#> 95 matrices are oriented
diffAggreg.mtx <- Aggregation(
ctrlMatrices = oriented_interactions_RFmatrix_ctrl.lst,
matrices = oriented_interactions_RFmatrix.lst,
aggFun = "mean",
diffFun = "log2+1",
scaleCorrection = TRUE,
correctionArea = list( i=c(1:30) , j=c(72:101) ),
statCompare = TRUE
)
ggAPA
function creates a ggplot object (ggplot2::geom_raster)
It is possible to set a specific range of values of the scale, for this remove a percentage of values using upper tail, lower tail or both tails of the distribution.
ggAPA(
aggregatedMtx = orientedAggreg.mtx,
title = "APA 30% trimmed on upper side",
trim = 30,
tails = "upper"
)
#> Warning in max(unlist(bounds.num_lst[1]), na.rm = TRUE): no non-missing
#> arguments to max; returning -Inf