BERT (Batch-Effect Removal with Trees) offers flexible and efficient batch effect correction of omics data, while providing maximum tolerance to missing values. Tested on multiple datasets from proteomic analyses, BERT offered a typical 5-10x runtime improvement over existing methods, while retaining more numeric values and preserving batch effect reduction quality.
As such, BERT is a valuable preprocessing tool for data analysis workflows, in particular for proteomic data. By providing BERT via Bioconductor, we make this tool available to a wider research community. An accompanying research paper is currently under preparation and will be made public soon.
BERT addresses the same fundamental data integration challenges than the [HarmonizR][https://github.com/HSU-HPC/HarmonizR] package, which is released on Bioconductor in November 2023. However, various algorithmic modications and optimizations of BERT provide better execution time and better data coverage than HarmonizR. Moreover, BERT offers a more user-friendly design and a less error-prone input format.
Please note that our package BERT is neither affiliated with nor related to Bidirectional Encoder Representations from Transformers as published by Google.
Please report any questions and issues in the GitHub forum, the BioConductor forum or directly contact the authors,
Please download and install a current version of R (Windows binaries). You might want to consider installing a development environment as well, e.g. RStudio. Finally, BERT can be installed via Bioconductor using
if (!require("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
BiocManager::install("BERT")
which will install all required dependencies. To install the development version of BERT, you can use devtools as follows
devtools::install_github("HSU-HPC/BERT")
which may require the manual installation of the dependencies sva
and limma
.
if (!require("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
BiocManager::install("sva")
BiocManager::install("limma")
As input, BERT requires a dataframe1 Matrices and SummarizedExperiments work as well, but will automatically be converted to dataframes. with samples in rows and features in columns.
For each sample, the respective batch should be indicated by an integer or string in a corresponding column labelled Batch. Missing values should be labelled as NA
. A valid example dataframe could look like this:
example = data.frame(feature_1 = stats::rnorm(5), feature_2 = stats::rnorm(5), Batch=c(1,1,2,2,2))
example
#> feature_1 feature_2 Batch
#> 1 -0.4434270 0.4045337 1
#> 2 -0.4278170 -0.8523657 1
#> 3 -0.4103102 0.5505798 2
#> 4 2.0282830 0.6216209 2
#> 5 1.6862133 -0.9767376 2
Note that each batch should contain at least two samples. Optional columns that can be passed are
Label
A column with integers or strings indicating the (known) class for each sample.
NA
is not allowed. BERT may use this columns and Batch
to compute quality metrics after batch effect correction.
Sample
A sample name.
This column is ignored by BERT and can be used to provide meta-information for further processing.
Cov_1
, Cov_2
, …, Cov_x
: One or multiple columns with integers, indicating one or several covariate levels. NA
is not allowed.
If this(these) column(s) is present, BERT will pass them as covariates to the the underlying batch effect correction method.
As an example, this functionality can be used to preserve differences between healthy/tumorous samples, if some of the batches exhibit strongly variable class distributions.
Note that BERT requires at least two numeric values per batch and unique covariate level to adjust a feature.
Features that don’t satisfy this condition in a specific batch are set to NA
for that batch.
Reference
A column with integers or strings from \(\mathbb{N}_0\) that indicate, whether a sample should be used for “learning” the transformation for batch effect correction or whether the sample should be co-adjusted using the learned transformation from the other samples.NA
is not allowed. This feature can be used, if some batches contain unique classes or samples with unknown classes which would prohibit the usage of covariate columns. If the column contains a 0
for a sample, this sample will be co-adjusted. Otherwise, the sample should contain the respective class (encoded as integer or string). Note that BERT requires at least two references of common class per adjustment step and that the Reference
column is mutually exclusive with covariate columns.
Note that BERT tries to find all metadata information for a SummarizedExperiment
, including the mandatory batch information, using colData
.
For instance, a valid SummarizedExperiment
might be defined as
nrows <- 200
ncols <- 8
expr_values <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
# colData also takes all other metadata information, such as Label, Sample,
# Covariables etc.
colData <- data.frame(Batch=c(1,1,1,1,2,2,2,2), Reference=c(1,1,0,0,1,1,0,0))
dataset_raw = SummarizedExperiment::SummarizedExperiment(assays=list(expr=expr_values), colData=colData)
BERT can be invoked by importing the BERT
library and calling the BERT
function.
The batch effect corrected data is returned as a dataframe that mirrors the input dataframe2 In particular, the row and column names are in the same order and the optional columns are preserved..
library(BERT)
# generate test data with 10% missing values as provided by the BERT library
dataset_raw <- generate_dataset(features=60, batches=10, samplesperbatch=10, mvstmt=0.1, classes=2)
# apply BERT
dataset_adjusted <- BERT(dataset_raw)
#> 2025-06-04 17:34:09.171552 INFO::Formatting Data.
#> 2025-06-04 17:34:09.182438 INFO::Replacing NaNs with NAs.
#> 2025-06-04 17:34:09.192294 INFO::Removing potential empty rows and columns
#> 2025-06-04 17:34:09.641276 INFO::Found 600 missing values.
#> 2025-06-04 17:34:09.656597 INFO::Introduced 0 missing values due to singular proteins at batch/covariate level.
#> 2025-06-04 17:34:09.657766 INFO::Done
#> 2025-06-04 17:34:09.658732 INFO::Acquiring quality metrics before batch effect correction.
#> 2025-06-04 17:34:09.678164 INFO::Starting hierarchical adjustment
#> 2025-06-04 17:34:09.67964 INFO::Found 10 batches.
#> 2025-06-04 17:34:09.680612 INFO::Cores argument is not defined or BPPARAM has been specified. Argument corereduction will not be used.
#> 2025-06-04 17:34:11.151268 INFO::Using default BPPARAM
#> 2025-06-04 17:34:11.152229 INFO::Processing subtree level 1
#> 2025-06-04 17:34:22.07632 INFO::Processing subtree level 2
#> 2025-06-04 17:34:32.397254 INFO::Adjusting the last 1 batches sequentially
#> 2025-06-04 17:34:32.39986 INFO::Done
#> 2025-06-04 17:34:32.401392 INFO::Acquiring quality metrics after batch effect correction.
#> 2025-06-04 17:34:32.41015 INFO::ASW Batch was 0.485093760799866 prior to batch effect correction and is now -0.115820321570524 .
#> 2025-06-04 17:34:32.411801 INFO::ASW Label was 0.321264698669755 prior to batch effect correction and is now 0.80404867745765 .
#> 2025-06-04 17:34:32.413545 INFO::Total function execution time is 23.3192839622498 s and adjustment time is 22.7205741405487 s ( 97.43 )
BERT uses the logging
library to convey live information to the user during the adjustment procedure.
The algorithm first verifies the shape and suitability of the input dataframe (lines 1-6) before continuing with the actual batch effect correction (lines 8-14).
BERT measure batch effects before and after the correction step by means of the average silhouette score (ASW) with respect to batch and labels (lines 7 and 15).
The ASW Label should increase in a successful batch effect correction, whereas low values (\(\leq 0\)) are desireable for the ASW Batch3 The optimum of ASW Label is 1, which is typically however not achieved on real-world datasets.
Also, the optimum of ASW Batch can vary, depending on the class distributions of the batches..
Finally, BERT prints the total function execution time (including the computation time for the quality metrics).
BERT offers a large number of parameters to customize the batch effect adjustment. The full function call, including all defaults is
BERT(data, cores = NULL, combatmode = 1, corereduction=2, stopParBatches=2, backend="default", method="ComBat", qualitycontrol=TRUE, verify=TRUE, labelname="Label", batchname="Batch", referencename="Reference", samplename="Sample", covariatename=NULL, BPPARAM=NULL, assayname=NULL)
In the following, we list the respective meaning of each parameter: - data
: The input dataframe/matrix/SummarizedExperiment to adjust.
See Data Preparation for detailed formatting instructions.
- data
The data for batch-effect correction.
Must contain at least two samples per batch and 2 features.
cores
: BERT uses BiocParallel for parallelization. If the user specifies a value cores
, BERT internally creates and uses a new instance of BiocParallelParam
, which is however not exhibited to the user. Setting this parameter can speed up the batch effect adjustment considerably, in particular for large datasets and on unix-based operating systems. A value between \(2\) and \(4\) is a reasonable choice for typical commodity hardware. Multi-node computations are not supported as of now. If, however, cores
is not specified, BERT will default to BiocParallel::bpparam()
, which may have been set by the user or the system. Additionally, the user can directly specify a specific instance of BiocParallelParam
to be used via the BPPARAM
argument.combatmode
An integer that encodes the parameters to use for ComBat.Value | par.prior | mean.only |
---|---|---|
1 | TRUE | FALSE |
2 | TRUE | TRUE |
3 | FALSE | FALSE |
4 | FALSE | TRUE |
The value of this parameter will be ignored, if method!="ComBat"
.
corereduction
Positive integer indicating the factor by which the number of processes should be reduced, once no further adjustment is possible for the current number of batches.4 E.g. consider a BERT call with 8 batches and 8 processes.
Further adjustment is not possible with this number of processes, since batches are always processed in pairs.
With corereduction=2
, the number of processes for the following adjustment steps would be set to \(8/2=4\), which is the maximum number of usable processes for this example.
This parameter is used only, if the user specified a custom value for parameter cores
.
stopParBatches
Positive integer indicating the minimum number of batches required at a hierarchy level to proceed with parallelized adjustment.
If the number of batches is smaller, adjustment will be performed sequentially to avoid communication overheads.
backend
: The backend to use for inter-process communication.
Possible choices are default
and file
, where the former refers to the default communication backend of the requested parallelization mode and the latter will create temporary .rds
files for data communication.
‘default’ is usually faster for small to medium sized datasets.
method
: The method to use for the underlying batch effect correction steps.
Should be either ComBat
, limma
for limma::removeBatchEffects
or ref
for adjustment using specified references (cf. Data Preparation).
The underlying batch effect adjustment method for ref
is a modified version of the limma
method.
qualitycontrol
: A boolean to (de)activate the ASW computation.
Deactivating the ASW computations accelerates the computations.
verify
: A boolean to (de)activate the initial format check of the input data.
Deactivating this verification step accelerates the computations.
labelname
: A string containing the name of the column to use as class labels.
The default is “Label”.
batchname
: A string containing the name of the column to use as batch labels.
The default is “Batch”.
referencename
: A string containing the name of the column to use as reference labels.
The default is “Reference”.
covariatename
: A vector containing the names of columns with categorical covariables.The default is NULL, in which case all column names are matched agains the pattern “Cov”.
BPPARAM
: An instance of BiocParallelParam
that will be used for parallelization. The default is null, in which case the value of cores
determines the behaviour of BERT.
assayname
: If the user chooses to pass a SummarizedExperiment
object, they need to specify the name of the assay that they want to apply BERT to here.
BERT then returns the input SummarizedExperiment
with an additional assay labeled assayname_BERTcorrected
.
BERT utilizes the logging
package for output.
The user can easily specify the verbosity of BERT by setting the global logging level in the script.
For instance
logging::setLevel("WARN") # set level to warn and upwards
result <- BERT(data,cores = 1) # BERT executes silently
BERT exhibits a large number of parameters for parallelisation as to provide users with maximum flexibility. For typical scenarios, however, the default parameters are well suited. For very large experiments (\(>15\) batches), we recommend to increase the number of cores (a reasonable value is \(4\) but larger values may be possible on your hardware). Most users should leave all parameters to their respective default.
In the following, we present simple cookbook examples for BERT usage. Note that ASWs (and runtime) will most likely differ on your machine, since the data generating process involves multiple random choices.
Here, BERT uses limma as underlying batch effect correction algorithm (method='limma'
) and performs all computations on a single process (cores
parameter is left on default).
# import BERT
library(BERT)
# generate data with 30 batches, 60 features, 15 samples per batch, 15% missing values and 2 classes
dataset_raw <- generate_dataset(features=60, batches=20, samplesperbatch=15, mvstmt=0.15, classes=2)
# BERT
dataset_adjusted <- BERT(dataset_raw, method="limma")
#> 2025-06-04 17:34:32.580999 INFO::Formatting Data.
#> 2025-06-04 17:34:32.582918 INFO::Replacing NaNs with NAs.
#> 2025-06-04 17:34:32.58536 INFO::Removing potential empty rows and columns
#> 2025-06-04 17:34:32.589525 INFO::Found 2700 missing values.
#> 2025-06-04 17:34:32.627872 INFO::Introduced 0 missing values due to singular proteins at batch/covariate level.
#> 2025-06-04 17:34:32.629416 INFO::Done
#> 2025-06-04 17:34:32.630806 INFO::Acquiring quality metrics before batch effect correction.
#> 2025-06-04 17:34:32.655975 INFO::Starting hierarchical adjustment
#> 2025-06-04 17:34:32.657781 INFO::Found 20 batches.
#> 2025-06-04 17:34:32.659231 INFO::Cores argument is not defined or BPPARAM has been specified. Argument corereduction will not be used.
#> 2025-06-04 17:34:32.662114 INFO::Using default BPPARAM
#> 2025-06-04 17:34:32.663451 INFO::Processing subtree level 1
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#>
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> 2025-06-04 17:34:36.17338 INFO::Processing subtree level 2
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> 2025-06-04 17:34:39.581998 INFO::Processing subtree level 3
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> design matrix of interest not specified. Assuming a one-group experiment.
#> 2025-06-04 17:34:43.001521 INFO::Adjusting the last 1 batches sequentially
#> 2025-06-04 17:34:43.004106 INFO::Done
#> 2025-06-04 17:34:43.005417 INFO::Acquiring quality metrics after batch effect correction.
#> 2025-06-04 17:34:43.027553 INFO::ASW Batch was 0.523114140670311 prior to batch effect correction and is now -0.127184493000365 .
#> 2025-06-04 17:34:43.028989 INFO::ASW Label was 0.264386618446501 prior to batch effect correction and is now 0.807919771914894 .
#> 2025-06-04 17:34:43.030547 INFO::Total function execution time is 10.449676990509 s and adjustment time is 10.3466250896454 s ( 99.01 )
Here, BERT uses ComBat as underlying batch effect correction algorithm (method
is left on default) and performs all computations on a 2 processes (cores=2
).
# import BERT
library(BERT)
# generate data with 30 batches, 60 features, 15 samples per batch, 15% missing values and 2 classes
dataset_raw <- generate_dataset(features=60, batches=20, samplesperbatch=15, mvstmt=0.15, classes=2)
# BERT
dataset_adjusted <- BERT(dataset_raw, cores=2)
#> 2025-06-04 17:34:43.12408 INFO::Formatting Data.
#> 2025-06-04 17:34:43.125706 INFO::Replacing NaNs with NAs.
#> 2025-06-04 17:34:43.127153 INFO::Removing potential empty rows and columns
#> 2025-06-04 17:34:43.129458 INFO::Found 2700 missing values.
#> 2025-06-04 17:34:43.150649 INFO::Introduced 0 missing values due to singular proteins at batch/covariate level.
#> 2025-06-04 17:34:43.151736 INFO::Done
#> 2025-06-04 17:34:43.152567 INFO::Acquiring quality metrics before batch effect correction.
#> 2025-06-04 17:34:43.170662 INFO::Starting hierarchical adjustment
#> 2025-06-04 17:34:43.171888 INFO::Found 20 batches.
#> 2025-06-04 17:34:43.828816 INFO::Set up parallel execution backend with 2 workers
#> 2025-06-04 17:34:43.830358 INFO::Processing subtree level 1 with 20 batches using 2 cores.
#> 2025-06-04 17:34:55.479333 INFO::Adjusting the last 2 batches sequentially
#> 2025-06-04 17:34:55.481703 INFO::Adjusting sequential tree level 1 with 2 batches
#> 2025-06-04 17:34:57.848205 INFO::Done
#> 2025-06-04 17:34:57.849661 INFO::Acquiring quality metrics after batch effect correction.
#> 2025-06-04 17:34:57.873079 INFO::ASW Batch was 0.476526106368758 prior to batch effect correction and is now -0.131990042279593 .
#> 2025-06-04 17:34:57.874643 INFO::ASW Label was 0.312483246869969 prior to batch effect correction and is now 0.821721261506118 .
#> 2025-06-04 17:34:57.876319 INFO::Total function execution time is 14.7522168159485 s and adjustment time is 14.6758248806 s ( 99.48 )
Here, BERT takes the input data using a SummarizedExperiment
instead.
Batch effect correction is then performed using ComBat as underlying algorithm (method
is left on default) and all computations are performed on a single process (cores
parameter is left on default).
nrows <- 200
ncols <- 8
# SummarizedExperiments store samples in columns and features in rows (in contrast to BERT).
# BERT will automatically account for this.
expr_values <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
# colData also takes further metadata information, such as Label, Sample,
# Reference or Covariables
colData <- data.frame("Batch"=c(1,1,1,1,2,2,2,2), "Label"=c(1,2,1,2,1,2,1,2), "Sample"=c(1,2,3,4,5,6,7,8))
dataset_raw = SummarizedExperiment::SummarizedExperiment(assays=list(expr=expr_values), colData=colData)
dataset_adjusted = BERT(dataset_raw, assayname = "expr")
#> 2025-06-04 17:34:58.008658 INFO::Formatting Data.
#> 2025-06-04 17:34:58.010814 INFO::Recognized SummarizedExperiment
#> 2025-06-04 17:34:58.012531 INFO::Typecasting input to dataframe.
#> 2025-06-04 17:34:58.070837 INFO::Replacing NaNs with NAs.
#> 2025-06-04 17:34:58.073266 INFO::Removing potential empty rows and columns
#> 2025-06-04 17:34:58.079468 INFO::Found 0 missing values.
#> 2025-06-04 17:34:58.09169 INFO::Introduced 0 missing values due to singular proteins at batch/covariate level.
#> 2025-06-04 17:34:58.093328 INFO::Done
#> 2025-06-04 17:34:58.094809 INFO::Acquiring quality metrics before batch effect correction.
#> 2025-06-04 17:34:58.103505 INFO::Starting hierarchical adjustment
#> 2025-06-04 17:34:58.105491 INFO::Found 2 batches.
#> 2025-06-04 17:34:58.10826 INFO::Cores argument is not defined or BPPARAM has been specified. Argument corereduction will not be used.
#> 2025-06-04 17:34:58.109374 INFO::Using default BPPARAM
#> 2025-06-04 17:34:58.110328 INFO::Adjusting the last 2 batches sequentially
#> 2025-06-04 17:34:58.111912 INFO::Adjusting sequential tree level 1 with 2 batches
#> 2025-06-04 17:34:58.184401 INFO::Done
#> 2025-06-04 17:34:58.186135 INFO::Acquiring quality metrics after batch effect correction.
#> 2025-06-04 17:34:58.195066 INFO::ASW Batch was -0.0128458446110842 prior to batch effect correction and is now -0.0911581902779435 .
#> 2025-06-04 17:34:58.196737 INFO::ASW Label was 0.0136607111537808 prior to batch effect correction and is now 0.0253034542769397 .
#> 2025-06-04 17:34:58.198689 INFO::Total function execution time is 0.189998865127563 s and adjustment time is 0.0792069435119629 s ( 41.69 )
BERT can utilize categorical covariables that are specified in columns Cov_1, Cov_2, ...
.
These columns are automatically detected and integrated into the batch effect correction process.
# import BERT
library(BERT)
# set seed for reproducibility
set.seed(1)
# generate data with 5 batches, 60 features, 30 samples per batch, 15% missing values and 2 classes
dataset_raw <- generate_dataset(features=60, batches=5, samplesperbatch=30, mvstmt=0.15, classes=2)
# create covariable column with 2 possible values, e.g. male/female condition
dataset_raw["Cov_1"] = sample(c(1,2), size=dim(dataset_raw)[1], replace=TRUE)
# BERT
dataset_adjusted <- BERT(dataset_raw)
#> 2025-06-04 17:34:58.30089 INFO::Formatting Data.
#> 2025-06-04 17:34:58.302875 INFO::Replacing NaNs with NAs.
#> 2025-06-04 17:34:58.305271 INFO::Removing potential empty rows and columns
#> 2025-06-04 17:34:58.309484 INFO::Found 1350 missing values.
#> 2025-06-04 17:34:58.311757 INFO::BERT requires at least 2 numeric values per batch/covariate level. This may reduce the number of adjustable features considerably, depending on the quantification technique.
#> 2025-06-04 17:34:58.346557 INFO::Introduced 0 missing values due to singular proteins at batch/covariate level.
#> 2025-06-04 17:34:58.34827 INFO::Done
#> 2025-06-04 17:34:58.349851 INFO::Acquiring quality metrics before batch effect correction.
#> 2025-06-04 17:34:58.36203 INFO::Starting hierarchical adjustment
#> 2025-06-04 17:34:58.365503 INFO::Found 5 batches.
#> 2025-06-04 17:34:58.367181 INFO::Cores argument is not defined or BPPARAM has been specified. Argument corereduction will not be used.
#> 2025-06-04 17:34:58.368816 INFO::Using default BPPARAM
#> 2025-06-04 17:34:58.370242 INFO::Processing subtree level 1
#> 2025-06-04 17:35:10.433308 INFO::Adjusting the last 2 batches sequentially
#> 2025-06-04 17:35:10.435471 INFO::Adjusting sequential tree level 1 with 2 batches
#> 2025-06-04 17:35:10.487074 INFO::Done
#> 2025-06-04 17:35:10.488036 INFO::Acquiring quality metrics after batch effect correction.
#> 2025-06-04 17:35:10.494503 INFO::ASW Batch was 0.492773245691086 prior to batch effect correction and is now -0.0377157224767566 .
#> 2025-06-04 17:35:10.495444 INFO::ASW Label was 0.40854766060101 prior to batch effect correction and is now 0.895560693013661 .
#> 2025-06-04 17:35:10.496491 INFO::Total function execution time is 12.195839881897 s and adjustment time is 12.1220738887787 s ( 99.4 )
In rare cases, class distributions across experiments may be severely skewed.
In particular, a batch might contain classes that other batches don’t contain.
In these cases, samples of common conditions may serve as references (bridges) between the batches (method="ref"
).
BERT utilizes those samples as references that have a condition specified in the “Reference” column of the input.
All other samples are co-adjusted.
Please note, that this strategy implicitly uses limma as underlying batch effect correction algorithm.
# import BERT
library(BERT)
# generate data with 4 batches, 6 features, 15 samples per batch, 15% missing values and 2 classes
dataset_raw <- generate_dataset(features=6, batches=4, samplesperbatch=15, mvstmt=0.15, classes=2)
# create reference column with default value 0. The 0 indicates, that the respective sample should be co-adjusted only.
dataset_raw[, "Reference"] <- 0
# randomly select 2 references per batch and class - in practice, this choice will be determined by external requirements (e.g. class known for only these samples)
batches <- unique(dataset_raw$Batch) # all the batches
for(b in batches){ # iterate over all batches
# references from class 1
ref_idx = sample(which((dataset_raw$Batch==b)&(dataset_raw$Label==1)), size=2, replace=FALSE)
dataset_raw[ref_idx, "Reference"] <- 1
# references from class 2
ref_idx = sample(which((dataset_raw$Batch==b)&(dataset_raw$Label==2)), size=2, replace=FALSE)
dataset_raw[ref_idx, "Reference"] <- 2
}
# BERT
dataset_adjusted <- BERT(dataset_raw, method="ref")
#> 2025-06-04 17:35:10.934181 INFO::Formatting Data.
#> 2025-06-04 17:35:10.936153 INFO::Replacing NaNs with NAs.
#> 2025-06-04 17:35:10.938077 INFO::Removing potential empty rows and columns
#> 2025-06-04 17:35:10.940045 INFO::Found 60 missing values.
#> 2025-06-04 17:35:10.945517 INFO::Introduced 0 missing values due to singular proteins at batch/covariate level.
#> 2025-06-04 17:35:10.946847 INFO::Done
#> 2025-06-04 17:35:10.948049 INFO::Acquiring quality metrics before batch effect correction.
#> 2025-06-04 17:35:10.952872 INFO::Starting hierarchical adjustment
#> 2025-06-04 17:35:10.954117 INFO::Found 4 batches.
#> 2025-06-04 17:35:10.954911 INFO::Cores argument is not defined or BPPARAM has been specified. Argument corereduction will not be used.
#> 2025-06-04 17:35:10.973221 INFO::Using default BPPARAM
#> 2025-06-04 17:35:10.974737 INFO::Processing subtree level 1
#> 2025-06-04 17:35:14.16495 INFO::Adjusting the last 2 batches sequentially
#> 2025-06-04 17:35:14.167169 INFO::Adjusting sequential tree level 1 with 2 batches
#> 2025-06-04 17:35:14.193548 INFO::Done
#> 2025-06-04 17:35:14.195073 INFO::Acquiring quality metrics after batch effect correction.
#> 2025-06-04 17:35:14.200404 INFO::ASW Batch was 0.440355021914032 prior to batch effect correction and is now -0.087480278736629 .
#> 2025-06-04 17:35:14.201648 INFO::ASW Label was 0.373906827748893 prior to batch effect correction and is now 0.919791677398366 .
#> 2025-06-04 17:35:14.20335 INFO::Total function execution time is 3.26923298835754 s and adjustment time is 3.23952102661133 s ( 99.09 )
Issues can be reported in the GitHub forum, the BioConductor forum or directly to the authors.
This code is published under the GPLv3.0 License and is available for non-commercial academic purposes.
Please cite our manuscript, if you use BERT for your research: Schumann Y, Gocke A, Neumann J (2024). Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets. PROTEOMICS. ISSN 1615-9861, doi:10.1002/pmic.202400100
sessionInfo()
#> R version 4.5.0 (2025-04-11 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows Server 2022 x64 (build 20348)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] BERT_1.5.0 BiocStyle_2.37.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 blob_1.2.4
#> [3] Biostrings_2.77.1 fastmap_1.2.0
#> [5] janitor_2.2.1 XML_3.99-0.18
#> [7] digest_0.6.37 timechange_0.3.0
#> [9] lifecycle_1.0.4 cluster_2.1.8.1
#> [11] survival_3.8-3 statmod_1.5.0
#> [13] KEGGREST_1.49.0 invgamma_1.1
#> [15] RSQLite_2.4.0 magrittr_2.0.3
#> [17] genefilter_1.91.0 compiler_4.5.0
#> [19] rlang_1.1.6 sass_0.4.10
#> [21] tools_4.5.0 yaml_2.3.10
#> [23] knitr_1.50 S4Arrays_1.9.1
#> [25] bit_4.6.0 DelayedArray_0.35.1
#> [27] abind_1.4-8 BiocParallel_1.43.3
#> [29] BiocGenerics_0.55.0 grid_4.5.0
#> [31] stats4_4.5.0 xtable_1.8-4
#> [33] edgeR_4.7.2 iterators_1.0.14
#> [35] logging_0.10-108 SummarizedExperiment_1.39.0
#> [37] cli_3.6.5 rmarkdown_2.29
#> [39] crayon_1.5.3 generics_0.1.4
#> [41] httr_1.4.7 DBI_1.2.3
#> [43] cachem_1.1.0 stringr_1.5.1
#> [45] splines_4.5.0 parallel_4.5.0
#> [47] AnnotationDbi_1.71.0 BiocManager_1.30.25
#> [49] XVector_0.49.0 matrixStats_1.5.0
#> [51] vctrs_0.6.5 Matrix_1.7-3
#> [53] jsonlite_2.0.0 sva_3.57.0
#> [55] bookdown_0.43 comprehenr_0.6.10
#> [57] IRanges_2.43.0 S4Vectors_0.47.0
#> [59] bit64_4.6.0-1 locfit_1.5-9.12
#> [61] foreach_1.5.2 limma_3.65.1
#> [63] jquerylib_0.1.4 snow_0.4-4
#> [65] annotate_1.87.0 glue_1.8.0
#> [67] codetools_0.2-20 lubridate_1.9.4
#> [69] stringi_1.8.7 GenomeInfoDb_1.45.4
#> [71] GenomicRanges_1.61.0 UCSC.utils_1.5.0
#> [73] htmltools_0.5.8.1 R6_2.6.1
#> [75] evaluate_1.0.3 lattice_0.22-7
#> [77] Biobase_2.69.0 png_0.1-8
#> [79] memoise_2.0.1 snakecase_0.11.1
#> [81] bslib_0.9.0 SparseArray_1.9.0
#> [83] nlme_3.1-168 mgcv_1.9-3
#> [85] xfun_0.52 MatrixGenerics_1.21.0