TY - JOUR
T1 - Heuristic normalization procedure for batch effect correction
AU - Yosef, Arthur
AU - Shnaider, Eli
AU - Schneider, Moti
AU - Gurevich, Michael
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/6
Y1 - 2023/6
N2 - In this paper, we introduce heuristic method for addressing batch effect of transcriptome (genetic expression) data. Batch effect refers to the distortions in the data set originated due to measurements being performed under different conditions. While the data from the same batch (measurements performed under the same conditions) are compatible, the outcome of combining various batches together creates a new set of incompatible data elements, and thus, the combined data are inappropriate for conducting reliable analysis. Therefore, it is necessary to perform correction (normalization) of the combined data, before performing gene expression analysis. There are numerous normalization methods designed to correct gene expression data for batch effect. These methods rely on various assumptions regarding the distribution of the measurements. However, forcing the data elements into assumed in advance distribution can severely distort biological signals, thus leading to incorrect interpretations and conclusions. As the discrepancy between the assumptions and the actual data is wider, the biases introduced by such “correction methods” are greater. The method we introduce has several clear advantages in comparison with the alternative methods presently in use. It does not rely on assumptions regarding the distribution and the behavior of data elements. Hence, it does not generate any new biases due to unrealistic assumptions. In addition, it strictly maintains the integrity of measurements within the original batches. Additional major advantage of the introduced method is its conceptual simplicity. It is based on common sense and human reasoning, and thus, the users can easily understand the consequences of normalization, in contrast to the alternative methods.
AB - In this paper, we introduce heuristic method for addressing batch effect of transcriptome (genetic expression) data. Batch effect refers to the distortions in the data set originated due to measurements being performed under different conditions. While the data from the same batch (measurements performed under the same conditions) are compatible, the outcome of combining various batches together creates a new set of incompatible data elements, and thus, the combined data are inappropriate for conducting reliable analysis. Therefore, it is necessary to perform correction (normalization) of the combined data, before performing gene expression analysis. There are numerous normalization methods designed to correct gene expression data for batch effect. These methods rely on various assumptions regarding the distribution of the measurements. However, forcing the data elements into assumed in advance distribution can severely distort biological signals, thus leading to incorrect interpretations and conclusions. As the discrepancy between the assumptions and the actual data is wider, the biases introduced by such “correction methods” are greater. The method we introduce has several clear advantages in comparison with the alternative methods presently in use. It does not rely on assumptions regarding the distribution and the behavior of data elements. Hence, it does not generate any new biases due to unrealistic assumptions. In addition, it strictly maintains the integrity of measurements within the original batches. Additional major advantage of the introduced method is its conceptual simplicity. It is based on common sense and human reasoning, and thus, the users can easily understand the consequences of normalization, in contrast to the alternative methods.
KW - Batch effect
KW - Cluster construction
KW - Data mining
KW - Gene expression data
KW - Heuristic methods
KW - Soft computing
UR - http://www.scopus.com/inward/record.url?scp=85151262302&partnerID=8YFLogxK
U2 - 10.1007/s00500-023-08049-4
DO - 10.1007/s00500-023-08049-4
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85151262302
SN - 1432-7643
VL - 27
SP - 7813
EP - 7829
JO - Soft Computing
JF - Soft Computing
IS - 12
ER -