Heuristic normalization procedure for batch effect correction

Arthur Yosef*, Eli Shnaider, Moti Schneider, Michael Gurevich

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we introduce heuristic method for addressing batch effect of transcriptome (genetic expression) data. Batch effect refers to the distortions in the data set originated due to measurements being performed under different conditions. While the data from the same batch (measurements performed under the same conditions) are compatible, the outcome of combining various batches together creates a new set of incompatible data elements, and thus, the combined data are inappropriate for conducting reliable analysis. Therefore, it is necessary to perform correction (normalization) of the combined data, before performing gene expression analysis. There are numerous normalization methods designed to correct gene expression data for batch effect. These methods rely on various assumptions regarding the distribution of the measurements. However, forcing the data elements into assumed in advance distribution can severely distort biological signals, thus leading to incorrect interpretations and conclusions. As the discrepancy between the assumptions and the actual data is wider, the biases introduced by such “correction methods” are greater. The method we introduce has several clear advantages in comparison with the alternative methods presently in use. It does not rely on assumptions regarding the distribution and the behavior of data elements. Hence, it does not generate any new biases due to unrealistic assumptions. In addition, it strictly maintains the integrity of measurements within the original batches. Additional major advantage of the introduced method is its conceptual simplicity. It is based on common sense and human reasoning, and thus, the users can easily understand the consequences of normalization, in contrast to the alternative methods.

Original languageEnglish
Pages (from-to)7813-7829
Number of pages17
JournalSoft Computing
Volume27
Issue number12
DOIs
StatePublished - Jun 2023

Keywords

  • Batch effect
  • Cluster construction
  • Data mining
  • Gene expression data
  • Heuristic methods
  • Soft computing

Fingerprint

Dive into the research topics of 'Heuristic normalization procedure for batch effect correction'. Together they form a unique fingerprint.

Cite this