TY - JOUR
T1 - Prediction of Hematopoietic Stem Cell Transplantation Related Mortality- Lessons Learned from the In-Silico Approach
T2 - A European Society for Blood and Marrow Transplantation Acute Leukemia Working Party Data Mining Study
AU - Shouval, Roni
AU - Labopin, Myriam
AU - Unger, Ron
AU - Giebel, Sebastian
AU - Ciceri, Fabio
AU - Schmid, Christoph
AU - Esteve, Jordi
AU - Baron, Frederic
AU - Gorin, Norbert Claude
AU - Savani, Bipin
AU - Shimoni, Avichai
AU - Mohty, Mohamad
AU - Nagler, Arnon
N1 - Publisher Copyright:
© 2016 Shouval et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2016/3
Y1 - 2016/3
N2 - Models for prediction of allogeneic hematopoietic stem transplantation (HSCT) related mortality partially account for transplant risk. Improving predictive accuracy requires understating of prediction limiting factors, such as the statistical methodology used, number and quality of features collected, or simply the population size. Using an in-silico approach (i.e., iterative computerized simulations), based on machine learning (ML) algorithms, we set out to analyze these factors. A cohort of 25,923 adult acute leukemia patients from the European Society for Blood and Marrow Transplantation (EBMT) registry was analyzed. Predictive objective was non-relapse mortality (NRM) 100 days following HSCT. Thousands of prediction models were developed under varying conditions: increasing sample size, specific subpopulations and an increasing number of variables, which were selected and ranked by separate feature selection algorithms. Depending on the algorithm, predictive performance plateaued on a population size of 6,611-8,814 patients, reaching a maximal area under the receiver operator characteristic curve (AUC) of 0.67. AUCs of models developed on specific subpopulation ranged from 0.59 to 0.67 for patients in second complete remission and receiving reduced intensity conditioning, respectively. Only 3-5 variables were necessary to achieve near maximal AUCs. The top 3 ranking variables, shared by all algorithms were disease stage, donor type, and conditioning regimen. Our findings empirically demonstrate that with regards to NRM prediction, few variables "carry the weight" and that traditional HSCT data has been "worn out". "Breaking through" the predictive boundaries will likely require additional types of inputs.
AB - Models for prediction of allogeneic hematopoietic stem transplantation (HSCT) related mortality partially account for transplant risk. Improving predictive accuracy requires understating of prediction limiting factors, such as the statistical methodology used, number and quality of features collected, or simply the population size. Using an in-silico approach (i.e., iterative computerized simulations), based on machine learning (ML) algorithms, we set out to analyze these factors. A cohort of 25,923 adult acute leukemia patients from the European Society for Blood and Marrow Transplantation (EBMT) registry was analyzed. Predictive objective was non-relapse mortality (NRM) 100 days following HSCT. Thousands of prediction models were developed under varying conditions: increasing sample size, specific subpopulations and an increasing number of variables, which were selected and ranked by separate feature selection algorithms. Depending on the algorithm, predictive performance plateaued on a population size of 6,611-8,814 patients, reaching a maximal area under the receiver operator characteristic curve (AUC) of 0.67. AUCs of models developed on specific subpopulation ranged from 0.59 to 0.67 for patients in second complete remission and receiving reduced intensity conditioning, respectively. Only 3-5 variables were necessary to achieve near maximal AUCs. The top 3 ranking variables, shared by all algorithms were disease stage, donor type, and conditioning regimen. Our findings empirically demonstrate that with regards to NRM prediction, few variables "carry the weight" and that traditional HSCT data has been "worn out". "Breaking through" the predictive boundaries will likely require additional types of inputs.
UR - http://www.scopus.com/inward/record.url?scp=84961285352&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0150637
DO - 10.1371/journal.pone.0150637
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 26942424
AN - SCOPUS:84961285352
SN - 1932-6203
VL - 11
JO - PLoS ONE
JF - PLoS ONE
IS - 3
M1 - e0150637
ER -