TY - JOUR
T1 - A foundational vision transformer improves diagnostic performance for electrocardiograms
AU - Vaid, Akhil
AU - Jiang, Joy
AU - Sawant, Ashwin
AU - Lerakis, Stamatios
AU - Argulian, Edgar
AU - Ahuja, Yuri
AU - Lampert, Joshua
AU - Charney, Alexander
AU - Greenspan, Hayit
AU - Narula, Jagat
AU - Glicksberg, Benjamin
AU - Nadkarni, Girish N.
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - The electrocardiogram (ECG) is a ubiquitous diagnostic modality. Convolutional neural networks (CNNs) applied towards ECG analysis require large sample sizes, and transfer learning approaches for biomedical problems may result in suboptimal performance when pre-training is done on natural images. We leveraged masked image modeling to create a vision-based transformer model, HeartBEiT, for electrocardiogram waveform analysis. We pre-trained this model on 8.5 million ECGs and then compared performance vs. standard CNN architectures for diagnosis of hypertrophic cardiomyopathy, low left ventricular ejection fraction and ST elevation myocardial infarction using differing training sample sizes and independent validation datasets. We find that HeartBEiT has significantly higher performance at lower sample sizes compared to other models. We also find that HeartBEiT improves explainability of diagnosis by highlighting biologically relevant regions of the EKG vs. standard CNNs. Domain specific pre-trained transformer models may exceed the classification performance of models trained on natural images especially in very low data regimes. The combination of the architecture and such pre-training allows for more accurate, granular explainability of model predictions.
AB - The electrocardiogram (ECG) is a ubiquitous diagnostic modality. Convolutional neural networks (CNNs) applied towards ECG analysis require large sample sizes, and transfer learning approaches for biomedical problems may result in suboptimal performance when pre-training is done on natural images. We leveraged masked image modeling to create a vision-based transformer model, HeartBEiT, for electrocardiogram waveform analysis. We pre-trained this model on 8.5 million ECGs and then compared performance vs. standard CNN architectures for diagnosis of hypertrophic cardiomyopathy, low left ventricular ejection fraction and ST elevation myocardial infarction using differing training sample sizes and independent validation datasets. We find that HeartBEiT has significantly higher performance at lower sample sizes compared to other models. We also find that HeartBEiT improves explainability of diagnosis by highlighting biologically relevant regions of the EKG vs. standard CNNs. Domain specific pre-trained transformer models may exceed the classification performance of models trained on natural images especially in very low data regimes. The combination of the architecture and such pre-training allows for more accurate, granular explainability of model predictions.
UR - http://www.scopus.com/inward/record.url?scp=85161316544&partnerID=8YFLogxK
U2 - 10.1038/s41746-023-00840-9
DO - 10.1038/s41746-023-00840-9
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 37280346
AN - SCOPUS:85161316544
SN - 2398-6352
VL - 6
JO - npj Digital Medicine
JF - npj Digital Medicine
IS - 1
M1 - 108
ER -