The Tricky Problem of ‘Batch Effects’ in Biological Data
Published on March 8th, 2024
Last updated on November 12th, 2024
Written by Antonino Zito, PhD
⏱ 7 min read
In this Tech Blog, we discuss and address the notorious issue of ‘Batch Effects’ in biological datasets.
Batch effects include variations in the data triggered or associated to technical factors. We will walk you through this complex problem, gradually, and will show how the Omics Playground platform can fully take care of it.
Current biomedical research heavily employs large datasets to test hypotheses and advance knowledge.
In this modern era of big data, it a standard practice to distribute sample acquisition and data generation steps to multiple laboratories. This enables time-effective research as data generation steps proceed in parallel. It also protects research quality: experimental error(s) arising in a single, data processing center are more likely to be propagated onto the entire dataset with destructive -including financial- effects.
Instead, data generated across multiple labs may be an excellent platform for internal quality controls. Last but not least, outsourcing data generation to multiple labs promotes inter-disciplinary collaborations, leading to better science.
However, these advantages are paralleled by important issues.
Distinct laboratories often employ distinct protocols and technologies, and personnel with different skill sets and experience. Altogether, these technical sources of variation, commonly referred to as ‘batch effects’, inevitably introduce unwanted variation to the measurements.
Sometimes, batch effects also arise from hidden technical sources and go unnoticed. But, why does this matter? To put it simply, they confound the real, underlying biological signals, potentially leading to spurious findings. This has been demonstrated by extensive, previous research (e.g., Leek et al., 2010; Lauss et al., 2013; Hicks et al., 2018; Cuklina et al., Mol Syst Biol 2021).
Importantly, batch effects may also arise within a single laboratory such as across distinct sequencing runs, from different sample donors or when processing occurs at separate days. All of these variables can be defined as ‘batches’.
These challenges underscore the importance of thoughtful experimental design from the outset—an approach detailed in Planning for Success: A Strategic Design Guide for RNA-Seq Experiments in Drug Discovery by Lexogen, which outlines the key factors that contribute to successful RNA-Seq experiments and how strategic planning upfront can significantly impact outcomes across the drug discovery pipeline.
To minimize batch effects, it’s crucial that the study design involves a balanced representation of samples across batches (e.g., processing units). Unfortunately, study designs are often imperfect.
When the variable of interest is highly imbalanced between distinct batches, it’s could be very challenging to disentangle biological and batch effects [Fig.1].
The ability to correct the batch effects in the data largely depends on the experimental design. In a fully balanced scenario where the phenotype classes of interest are equally distributed across the batches, we could say that batch effects may be ‘averaged out’ when comparing the phenotypes [Fig.1, left].
Conversely, in a fully imbalanced study design where the phenotype classes completely separate by batches, the phenotype perfectly correlates with the batch and is said to be ‘fully confounded’ with batch [Fig.1, right].
In a fully confounded study is may not be possible to attribute differences between the two conditions to either true underlying biological signals or technical effects.
As a representative example, we use a publicly available array dataset of a retrospective study of 181 clinical samples from patients pre-treated with CHOP, and 233 samples from patients pre-treated with R-CHOP pharmacological regimen (Lenz et al., 2008).
As treatment was performed prior to data generation and samples were categorized in the two groups for processing, this dataset well represent a scenario of how batch effects may appear and confound the data [Fig.2].
Clustering of samples driven by pharmacological treatment (A) rather than DLBCL class (B). When a batch correction method is applied, the clustering of the sample appears substantially changed: with Limma, a highly used batch effect correction method, the DLBCL class is now the variable capable of distinguishing the samples, while the batch variable is not (C,D) Similar results are seen when we apply our new method (undisclosed) (E,F). Therefore, we can say that batch effects have been successfully corrected and the data are now suitable for further analyses.
At BigOmics Analytics we think batch effects should not be neglected. Integral to the Omics Playground platform, there are distinct, effective computational methods for batch effect correction [Fig.3].
These include, among the others, the highly used Limma ‘RemoveBatchEffects’, ComBat and SVA.
We also developed NPmatch, a new method able to correct batch effects through sample matching & pairing, demonstrating superior performance than current methods. This is an in-house developed method which is currently submitted for publication. You can read more about it here.
The Omics Playground platform can detect batch effects by leveraging the user’s supplied information about samples, but also in absence of any information about batches. Once detected, the batch effects are corrected to generate a new, clean set of data that are free of batch effects and well suited for downstream analyses [Fig.3].
Figure 3 shows heatmaps of samples’ clustering, which also show grouping of the datasets’ metadata (A,B); Bar plots of F-test for association between available experimental study variables and principal components (C,D). Omics Playground also shows the t-SNEs scatter plots of samples’ clustering coloured by main phenotype under investigation (E,F).
The t-SNE plots are particularly useful: they enable the user to directly assess how the samples cluster in the space according to the experimental variable being investigated (e.g., phenotype, condition). Samples clustering in the same group tend to be more similar than samples populating two distinct clusters.
At BigOmics Analytics, we aim to maximize discovery power. In doing so, we need to deal with those sources of undesired variability that affect biological data. Batch effects, in most cases, collectively arise from non-biological factors and, as indeed demonstrated by extensive previous research, may confound real biological signals and so hamper reproducibility, leading to inconsistencies across studies.
The powerful Omics Playground platform takes care of batch effects without the need for the user to write complex & error-prone computer codes making bioinformatics accessible to everyone, regardless of programming skills.
When you upload your data into Omics Playground, you’ll find batch correction options available in the QC/BC step of the guided data upload process. For more information on the unsupervised batch correction option, click here.

Antonino is a senior bioinformatics engineer at BigOmics with a strong background in bioinformatics and biostatistics. With a PhD in genetics and bioinformatics and an MSc in biotechnology, he has made significant contributions to computational analysis in numerous projects during his previous research at Harvard Medical School and King’s College London.
Leek et al.,: Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Review Genetics, 2010
Lauss et al.,: Monitoring of Technical Variation in Quantitative High-Throughput Datasets. Cancer Inform, 2013
Cuklina et al.,: Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Mol Syst Biol 2021
Hicks et al.,: Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics, 2018
