Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention

Richard Droste*, Yifan Cai, Harshita Sharma, Pierre Chatelain, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations


Image representations are commonly learned from class labels, which are a simplistic approximation of human image understanding. In this paper we demonstrate that transferable representations of images can be learned without manual annotations by modeling human visual attention. The basis of our analyses is a unique gaze tracking dataset of sonographers performing routine clinical fetal anomaly screenings. Models of sonographer visual attention are learned by training a convolutional neural network (CNN) to predict gaze on ultrasound video frames through visual saliency prediction or gaze-point regression. We evaluate the transferability of the learned representations to the task of ultrasound standard plane detection in two contexts. Firstly, we perform transfer learning by fine-tuning the CNN with a limited number of labeled standard plane images. We find that fine-tuning the saliency predictor is superior to training from random initialization, with an average F1-score improvement of 9.6% overall and 15.3% for the cardiac planes. Secondly, we train a simple softmax regression on the feature activations of each CNN layer in order to evaluate the representations independently of transfer learning hyper-parameters. We find that the attention models derive strong representations, approaching the precision of a fully-supervised baseline model for all but the last layer.

Original languageEnglish
Title of host publicationInformation Processing in Medical Imaging - 26th International Conference, IPMI 2019, Proceedings
EditorsSiqi Bao, James C. Gee, Paul A. Yushkevich, Albert C.S. Chung
PublisherSpringer Verlag
Number of pages13
ISBN (Print)9783030203504
StatePublished - 2019
Externally publishedYes
Event26th International Conference on Information Processing in Medical Imaging, IPMI 2019 - Hong Kong, China
Duration: 2 Jun 20197 Jun 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11492 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference26th International Conference on Information Processing in Medical Imaging, IPMI 2019
CityHong Kong


FundersFunder number
Horizon 2020 Framework Programme694581
Engineering and Physical Sciences Research CouncilEP/M013774/1, EP/R013853/1
NIHR Oxford Biomedical Research Centre


    • Convolutional neural networks
    • Fetal ultrasound
    • Gaze tracking
    • Representation learning
    • Saliency prediction
    • Self-supervised learning
    • Transfer learning


    Dive into the research topics of 'Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention'. Together they form a unique fingerprint.

    Cite this