% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils_annotation.R
\name{calculateLigationPvalues}
\alias{calculateLigationPvalues}
\title{Calculate p-values and abundance fractions for RNA duplexes}
\usage{
calculateLigationPvalues(gi, df_counts, id_col = "gene_id")
}
\arguments{
\item{gi}{\code{GInteraction} object annotated with gene/transcript names}

\item{df_counts}{\code{data.frame} A two- column dataframe with gene/transcript
counts to. The first column should match the 'gene_id' feature in anno_gr.
The second column is the respective count.}

\item{id_col}{the prefix for gene/transcript metadata id fields in input gi.
Two fields of <id_col>.A and <id.col>.B are expected. Otherwise throws error.}
}
\value{
\code{GInteractions} object with new fields
}
\description{
Calculates p-values by applying Fisher test to each gene/transcript pair
Uses BH correction, outputs duplex abundance relative to the per - gene/transcript
count, and counts of other RNA duplexes formed by either or none gene/transcript
in this pair.
}
\details{
H0: RNA duplex not existing and reported due to the random ligation of fragments
H1: RNA duplex is true and formed because of existing the RNA-RNA interaction

The probability of random ligation is modeled as \(P(a, b)\)
given by the following equation:
The probability \eqn{P(a, b)} is defined as:

\eqn{
P(a, b) \propto
\begin{cases}
    2 \cdot P(a) \cdot P(b) & \textnormal{if } a:b \textnormal{ is observed and } a \neq b \\
    P(a) \cdot P(b) & \textnormal{if } a:b \textnormal{ is observed and } a = b \\
    0 & \textnormal{else}
\end{cases}
}

where The probability (P(a)) (same as for P(b) ) is calculated as:
\eqn{
P(a) = \frac{\textnormal{N reads(a)}}{\textnormal{total N reads}}
}

p-value calculated by comparing observed duplex abundance to the expected
as the are under the curve distribution to the right of the observed.
P(a, b) is normalized to sum up to one.
}
\examples{
data("RNADuplexesSampleData")
gi <- calculateLigationPvalues(RNADuplexSampleDGs, df_counts = RNADuplexesGeneCounts)
hist(gi$p.adj, breaks = 20)
}
