TY - JOUR
T1 - A Prediction Model of Autism Spectrum Diagnosis from Well-Baby Electronic Data Using Machine Learning
AU - Ben-Sasson, Ayelet
AU - Guedalia, Joshua
AU - Nativ, Liat
AU - Ilan, Keren
AU - Shaham, Meirav
AU - Gabis, Lidia V.
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/4
Y1 - 2024/4
N2 - Early detection of autism spectrum disorder (ASD) is crucial for timely intervention, yet diagnosis typically occurs after age three. This study aimed to develop a machine learning model to predict ASD diagnosis using infants’ electronic health records obtained through a national screening program and evaluate its accuracy. A retrospective cohort study analyzed health records of 780,610 children, including 1163 with ASD diagnoses. Data encompassed birth parameters, growth metrics, developmental milestones, and familial and post-natal variables from routine wellness visits within the first two years. Using a gradient boosting model with 3-fold cross-validation, 100 parameters predicted ASD diagnosis with an average area under the ROC curve of 0.86 (SD < 0.002). Feature importance was quantified using the Shapley Additive explanation tool. The model identified a high-risk group with a 4.3-fold higher ASD incidence (0.006) compared to the cohort (0.001). Key predictors included failing six milestones in language, social, and fine motor domains during the second year, male gender, parental developmental concerns, non-nursing, older maternal age, lower gestational age, and atypical growth percentiles. Machine learning algorithms capitalizing on preventative care electronic health records can facilitate ASD screening considering complex relations between familial and birth factors, post-natal growth, developmental parameters, and parent concern.
AB - Early detection of autism spectrum disorder (ASD) is crucial for timely intervention, yet diagnosis typically occurs after age three. This study aimed to develop a machine learning model to predict ASD diagnosis using infants’ electronic health records obtained through a national screening program and evaluate its accuracy. A retrospective cohort study analyzed health records of 780,610 children, including 1163 with ASD diagnoses. Data encompassed birth parameters, growth metrics, developmental milestones, and familial and post-natal variables from routine wellness visits within the first two years. Using a gradient boosting model with 3-fold cross-validation, 100 parameters predicted ASD diagnosis with an average area under the ROC curve of 0.86 (SD < 0.002). Feature importance was quantified using the Shapley Additive explanation tool. The model identified a high-risk group with a 4.3-fold higher ASD incidence (0.006) compared to the cohort (0.001). Key predictors included failing six milestones in language, social, and fine motor domains during the second year, male gender, parental developmental concerns, non-nursing, older maternal age, lower gestational age, and atypical growth percentiles. Machine learning algorithms capitalizing on preventative care electronic health records can facilitate ASD screening considering complex relations between familial and birth factors, post-natal growth, developmental parameters, and parent concern.
KW - autism spectrum disorders
KW - development
KW - electronic health records
KW - machine learning
KW - screening
UR - http://www.scopus.com/inward/record.url?scp=85191323953&partnerID=8YFLogxK
U2 - 10.3390/children11040429
DO - 10.3390/children11040429
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 38671647
AN - SCOPUS:85191323953
SN - 2227-9067
VL - 11
JO - Children
JF - Children
IS - 4
M1 - 429
ER -