TY - JOUR
T1 - Data processing pipeline for cardiogenic shock prediction using machine learning
AU - Jajcay, Nikola
AU - Bezak, Branislav
AU - Segev, Amitai
AU - Matetzky, Shlomi
AU - Jankova, Jana
AU - Spartalis, Michael
AU - El Tahlawi, Mohammad
AU - Guerra, Federico
AU - Friebel, Julian
AU - Thevathasan, Tharusan
AU - Berta, Imrich
AU - Pölzl, Leo
AU - Nägele, Felix
AU - Pogran, Edita
AU - Cader, F. Aaysha
AU - Jarakovic, Milana
AU - Gollmann-Tepeköylü, Can
AU - Kollarova, Marta
AU - Petrikova, Katarina
AU - Tica, Otilia
AU - Krychtiuk, Konstantin A.
AU - Tavazzi, Guido
AU - Skurk, Carsten
AU - Huber, Kurt
AU - Böhm, Allan
N1 - Publisher Copyright:
2023 Jajcay, Bezak, Segev, Matetzky, Jankova, Spartalis, El Tahlawi, Guerra, Friebel, Thevathasan, Berta, Pölzl, Nägele, Pogran, Cader, Jarakovic, Gollmann-Tepeköylü, Kollarova, Petrikova, Tica, Krychtiuk, Tavazzi, Skurk, Huber and Böhm.
PY - 2023
Y1 - 2023
N2 - Introduction: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. Methods: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)—based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. Results: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. Conclusion: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
AB - Introduction: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. Methods: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)—based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. Results: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. Conclusion: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
KW - cardiogenic shock
KW - classification
KW - machine learning
KW - missing data imputation
KW - prediction model
KW - processing pipeline
UR - http://www.scopus.com/inward/record.url?scp=85152014080&partnerID=8YFLogxK
U2 - 10.3389/fcvm.2023.1132680
DO - 10.3389/fcvm.2023.1132680
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 37034352
AN - SCOPUS:85152014080
SN - 2297-055X
VL - 10
JO - Frontiers in Cardiovascular Medicine
JF - Frontiers in Cardiovascular Medicine
M1 - 1132680
ER -