TY - JOUR
T1 - The importance of nonlinear transformations use in medical data analysis
AU - Shachar, Netta
AU - Mitelpunkt, Alexis
AU - Kozlovski, Tal
AU - Galili, Tal
AU - Frostig, Tzviel
AU - Brill, Barak
AU - Marcus-Kalish, Mira
AU - Benjamini, Yoav
N1 - Publisher Copyright:
© Netta Shachar, Alexis Mitelpunkt, Tal Kozlovski, Tal Galili, Tzviel Frostig, Barak Brill, Mira Marcus-Kalish, Yoav Benjamini.
PY - 2018/5
Y1 - 2018/5
N2 - Background: The accumulation of data and its accessibility through easier-to-use platforms will allow data scientists and practitioners who are less sophisticated data analysts to get answers by using big data for many purposes in multiple ways. Data scientists working with medical data are aware of the importance of preprocessing, yet in many cases, the potential benefits of using nonlinear transformations is overlooked. Objective: Our aim is to present a semi-automated approach of symmetry-aiming transformations tailored for medical data analysis and its advantages. Methods: We describe 10 commonly encountered data types used in the medical field and the relevant transformations for each data type. Data from the Alzheimer's Disease Neuroimaging Initiative study, Parkinson's disease hospital cohort, and disease-simulating data were used to demonstrate the approach and its benefits. Results: Symmetry-targeted monotone transformations were applied, and the advantages gained in variance, stability, linearity, and clustering are demonstrated. An open source application implementing the described methods was developed. Both linearity of relationships and increase of stability of variability improved after applying proper nonlinear transformation. Clustering simulated nonsymmetric data gave low agreement to the generating clusters (Rand value=0.681), while capturing the original structure after applying nonlinear transformation to symmetry (Rand value=0.986). Conclusions: This work presents the use of nonlinear transformations for medical data and the importance of their semi-automated choice. Using the described approach, the data analyst increases the ability to create simpler, more robust and translational models, thereby facilitating the interpretation and implementation of the analysis by medical practitioners. Applying nonlinear transformations as part of the preprocessing is essential to the quality and interpretability of results.
AB - Background: The accumulation of data and its accessibility through easier-to-use platforms will allow data scientists and practitioners who are less sophisticated data analysts to get answers by using big data for many purposes in multiple ways. Data scientists working with medical data are aware of the importance of preprocessing, yet in many cases, the potential benefits of using nonlinear transformations is overlooked. Objective: Our aim is to present a semi-automated approach of symmetry-aiming transformations tailored for medical data analysis and its advantages. Methods: We describe 10 commonly encountered data types used in the medical field and the relevant transformations for each data type. Data from the Alzheimer's Disease Neuroimaging Initiative study, Parkinson's disease hospital cohort, and disease-simulating data were used to demonstrate the approach and its benefits. Results: Symmetry-targeted monotone transformations were applied, and the advantages gained in variance, stability, linearity, and clustering are demonstrated. An open source application implementing the described methods was developed. Both linearity of relationships and increase of stability of variability improved after applying proper nonlinear transformation. Clustering simulated nonsymmetric data gave low agreement to the generating clusters (Rand value=0.681), while capturing the original structure after applying nonlinear transformation to symmetry (Rand value=0.986). Conclusions: This work presents the use of nonlinear transformations for medical data and the importance of their semi-automated choice. Using the described approach, the data analyst increases the ability to create simpler, more robust and translational models, thereby facilitating the interpretation and implementation of the analysis by medical practitioners. Applying nonlinear transformations as part of the preprocessing is essential to the quality and interpretability of results.
KW - Big data
KW - Data mining
KW - Health informatics
KW - Medical informatics
KW - Preprocessing
KW - Statistics
KW - Transformations
UR - http://www.scopus.com/inward/record.url?scp=85047518783&partnerID=8YFLogxK
U2 - 10.2196/medinform.7992
DO - 10.2196/medinform.7992
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 29752251
AN - SCOPUS:85047518783
SN - 2291-9694
VL - 20
JO - JMIR Medical Informatics
JF - JMIR Medical Informatics
IS - 5
M1 - e27
ER -