% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Run_QC.R
\name{Run_QC}
\alias{Run_QC}
\title{QC Control After Upstream Pre-Processing for Sequencing-Based Spatial Transcriptomics}
\usage{
Run_QC(config, matched.data, gene.matrix, show.config = TRUE)
}
\arguments{
\item{config}{Path to the YAML configuration file.}

\item{matched.data}{A data frame containing spatial transcriptomics data, including UMI counts and spatial coordinates, this is usually obtained from 'Run_loc_match' function.}

\item{gene.matrix}{A gene count matrix, this is usually obtained from 'Run_ST' function.}

\item{show.config}{Logical value indicating whether to print the configuration. Defaults to TRUE.}
}
\value{
A list containing filtered gene counts with matched spatial coordinates after QC.
}
\description{
QC Control After Upstream Pre-Processing for Sequencing-Based Spatial Transcriptomics
}
\details{
This function performs QC control on sequencing-based spatial transcriptomics data after upstream pre-processing step such as 'Run_ST' step. Ensure the output directory is the same with the 'Run_ST' one.
Filtering is performed either use specific UMI threshold or assign the threshold to 'DropletUtils'.

"max_slope"

In this approach, the filtering is done based on UMI counts.
Spots with counts below a certain threshold are considered low-quality and are filtered out.
This method helps retain only the spots with significant transcriptomic signals, reducing noise
from sptos with minimal or no meaningful biological information.

Threshold Determination:
The threshold in this method is computed by analyzing the distribution of UMI counts across spots,
and identifying the point of maximum slope in the cumulative UMI distribution curve.
This point often corresponds to the transition between background noise and real biological signals.

"EmptyDropletUtils"

Alternatively, the \code{DropletUtils} package offers a more sophisticated approach
by using statistical methods to identify droplets or spots that contain real cells, as opposed
to empty droplets or those containing background RNA. This method calculates a
false discovery rate (FDR) to assess the likelihood of each droplet containing a real cell. Sptos are retained if they meet the significance criteria for either the p-value or FDR.
To learn more details regarding \code{DropletUtils}, visit
\href{https://bioconductor.org/packages/release/bioc/html/DropletUtils.html}{this link to its Bioconductor page}.

Multiple Thresholds:
This method will determine two thresholds based on the config file input parameters.
Filtering can be fine-tuned using both p-value and FDR thresholds, offering greater flexibility in distinguishing between noise and meaningful data.
Sptos are retained if they meet the significance criteria for either the p-value or FDR.
}
\examples{
output_dir <- tempdir()
config_list <- list(
output_directory = output_dir,
qc_filter = "slope_max",
qc_per = "0.4_0.8" 
)
config_path <- tempfile(fileext = ".yml")
yaml::write_yaml(config_list, config_path)
set.seed(123)
gene_ids <- paste0("gene", seq_len(100))
spatial_names <- paste0("SPATIAL_", seq_len(100))
count_matrix <- matrix(rpois(100*100, lambda = 20), nrow = 100, ncol = 100)
colnames(count_matrix) <- spatial_names
gene_matrix <- data.frame(row.names = gene_ids, count_matrix, stringsAsFactors = FALSE)
matched_data <- data.frame(
 X_coordinate = runif(100, min = 0, max = 1000),
 Y_coordinate = runif(100, min = 0, max = 1000),
 barcode_sequence = paste0("BC", seq_len(100)),
 spatial_name = spatial_names,
 stringsAsFactors = FALSE
)
umi_counts <- colSums(gene_matrix[, -1])
matched_data$UMI_count <- umi_counts[match(matched_data$spatial_name, names(umi_counts))]
qc_results <- Run_QC(
 config = config_path,
 matched.data = matched_data,
 gene.matrix = gene_matrix,
 show.config = FALSE
)
}
