TY - JOUR
T1 - Prediction of progression from pre-diabetes to diabetes
T2 - Development and validation of a machine learning model
AU - Cahn, Avivit
AU - Shoshan, Avi
AU - Sagiv, Tal
AU - Yesharim, Rachel
AU - Goshen, Ran
AU - Shalev, Varda
AU - Raz, Itamar
N1 - Publisher Copyright:
© 2020 John Wiley & Sons, Ltd.
PY - 2020/2/1
Y1 - 2020/2/1
N2 - Aims: Identification, a priori, of those at high risk of progression from pre-diabetes to diabetes may enable targeted delivery of interventional programmes while avoiding the burden of prevention and treatment in those at low risk. We studied whether the use of a machine-learning model can improve the prediction of incident diabetes utilizing patient data from electronic medical records. Methods: A machine-learning model predicting the progression from pre-diabetes to diabetes was developed using a gradient boosted trees model. The model was trained on data from The Health Improvement Network (THIN) database cohort, internally validated on THIN data not used for training, and externally validated on the Canadian AppleTree and the Israeli Maccabi Health Services (MHS) data sets. The model's predictive ability was compared with that of a logistic-regression model within each data set. Results: A cohort of 852 454 individuals with pre-diabetes (glucose ≥ 100 mg/dL and/or HbA1c ≥ 5.7) was used for model training including 4.9 million time points using 900 features. The full model was eventually implemented using 69 variables, generated from 11 basic signals. The machine-learning model demonstrated superiority over the logistic-regression model, which was maintained at all sensitivity levels – comparing AUC [95% CI] between the models; in the THIN data set (0.865 [0.860,0.869] vs 0.778 [0.773,0.784] P <.05), the AppleTree data set (0.907 [0.896, 0.919] vs 0.880 [0.867, 0.894] P <.05) and the MHS data set (0.925 [0.923, 0.927] vs 0.876 [0.872, 0.879] P <.05). Conclusions: Machine-learning models preserve their performance across populations in diabetes prediction, and can be integrated into large clinical systems, leading to judicious selection of persons for interventional programmes.
AB - Aims: Identification, a priori, of those at high risk of progression from pre-diabetes to diabetes may enable targeted delivery of interventional programmes while avoiding the burden of prevention and treatment in those at low risk. We studied whether the use of a machine-learning model can improve the prediction of incident diabetes utilizing patient data from electronic medical records. Methods: A machine-learning model predicting the progression from pre-diabetes to diabetes was developed using a gradient boosted trees model. The model was trained on data from The Health Improvement Network (THIN) database cohort, internally validated on THIN data not used for training, and externally validated on the Canadian AppleTree and the Israeli Maccabi Health Services (MHS) data sets. The model's predictive ability was compared with that of a logistic-regression model within each data set. Results: A cohort of 852 454 individuals with pre-diabetes (glucose ≥ 100 mg/dL and/or HbA1c ≥ 5.7) was used for model training including 4.9 million time points using 900 features. The full model was eventually implemented using 69 variables, generated from 11 basic signals. The machine-learning model demonstrated superiority over the logistic-regression model, which was maintained at all sensitivity levels – comparing AUC [95% CI] between the models; in the THIN data set (0.865 [0.860,0.869] vs 0.778 [0.773,0.784] P <.05), the AppleTree data set (0.907 [0.896, 0.919] vs 0.880 [0.867, 0.894] P <.05) and the MHS data set (0.925 [0.923, 0.927] vs 0.876 [0.872, 0.879] P <.05). Conclusions: Machine-learning models preserve their performance across populations in diabetes prediction, and can be integrated into large clinical systems, leading to judicious selection of persons for interventional programmes.
KW - electronic medical records
KW - machine learning
KW - pre-diabetes
UR - http://www.scopus.com/inward/record.url?scp=85077992077&partnerID=8YFLogxK
U2 - 10.1002/dmrr.3252
DO - 10.1002/dmrr.3252
M3 - מאמר
C2 - 31943669
AN - SCOPUS:85077992077
VL - 36
JO - Diabetes/Metabolism Research and Reviews
JF - Diabetes/Metabolism Research and Reviews
SN - 1520-7552
IS - 2
M1 - e3252
ER -