TY - JOUR
T1 - Explainable multimodal machine learning model for classifying pregnancy drug safety
AU - Shtar, Guy
AU - Rokach, Lior
AU - Shapira, Bracha
AU - Kohn, Elkana
AU - Berkovitch, Matitiahu
AU - Berlin, Maya
N1 - Publisher Copyright:
© 2022 Oxford University Press. All rights reserved.
PY - 2022/2/15
Y1 - 2022/2/15
N2 - Motivation: Teratogenic drugs can cause severe fetal malformation and therefore have critical impact on the health of the fetus, yet the teratogenic risks are unknown for most approved drugs. This article proposes an explainable machine learning model for classifying pregnancy drug safety based on multimodal data and suggests an orthogonal ensemble for modeling multimodal data. To train the proposed model, we created a set of labeled drugs by processing over 100 000 textual responses collected by a large teratology information service. Structured textual information is incorporated into the model by applying clustering analysis to textual features. Results: We report an area under the receiver operating characteristic curve (AUC) of 0.891 using cross-validation and an AUC of 0.904 for cross-expert validation. Our findings suggest the safety of two drugs during pregnancy, Varenicline and Mebeverine, and suggest that Meloxicam, an NSAID, is of higher risk; according to existing data, the safety of these three drugs during pregnancy is unknown. We also present a web-based application that enables physicians to examine a specific drug and its risk factors.
AB - Motivation: Teratogenic drugs can cause severe fetal malformation and therefore have critical impact on the health of the fetus, yet the teratogenic risks are unknown for most approved drugs. This article proposes an explainable machine learning model for classifying pregnancy drug safety based on multimodal data and suggests an orthogonal ensemble for modeling multimodal data. To train the proposed model, we created a set of labeled drugs by processing over 100 000 textual responses collected by a large teratology information service. Structured textual information is incorporated into the model by applying clustering analysis to textual features. Results: We report an area under the receiver operating characteristic curve (AUC) of 0.891 using cross-validation and an AUC of 0.904 for cross-expert validation. Our findings suggest the safety of two drugs during pregnancy, Varenicline and Mebeverine, and suggest that Meloxicam, an NSAID, is of higher risk; according to existing data, the safety of these three drugs during pregnancy is unknown. We also present a web-based application that enables physicians to examine a specific drug and its risk factors.
UR - http://www.scopus.com/inward/record.url?scp=85130419799&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btab769
DO - 10.1093/bioinformatics/btab769
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 34791058
AN - SCOPUS:85130419799
SN - 1367-4803
VL - 38
SP - 1102
EP - 1109
JO - Bioinformatics
JF - Bioinformatics
IS - 4
ER -