TY - GEN
T1 - Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound
AU - Sharma, Harshita
AU - Drukker, Lior
AU - Papageorghiou, Aris T.
AU - Noble, J. Alison
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/4/13
Y1 - 2021/4/13
N2 - This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.
AB - This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.
KW - Convolutional neural networks
KW - Eye tracking
KW - Multi-modal learning
KW - Pupillometry
KW - Ultrasound
UR - http://www.scopus.com/inward/record.url?scp=85107208043&partnerID=8YFLogxK
U2 - 10.1109/ISBI48211.2021.9433863
DO - 10.1109/ISBI48211.2021.9433863
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
C2 - 34413933
AN - SCOPUS:85107208043
T3 - Proceedings - International Symposium on Biomedical Imaging
SP - 1646
EP - 1649
BT - 2021 IEEE 18th International Symposium on Biomedical Imaging, ISBI 2021
PB - IEEE Computer Society
T2 - 18th IEEE International Symposium on Biomedical Imaging, ISBI 2021
Y2 - 13 April 2021 through 16 April 2021
ER -