Learning Spatio-Temporal Aggregation for Fetal Heart Analysis in Ultrasound Video

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10553)


We investigate recent deep convolutional architectures for automatically describing multiple clinically relevant properties of the fetal heart in Ultrasound (US) videos, with the goal of learning spatio-temporal aggregation of deep representations. We examine multiple temporal encoding models that combine both spatial and temporal features tailored for US video representation. We cast our task into a multi-task learning problem within a hierarchical convolutional model that jointly predicts the visibility, view plane and localization of the fetal heart at the frame level. We study deep convolutional networks developed for video classification, and analyse them for our task by looking at the architectures and the multi-task loss in the specific modality of US videos. We experimentally verify that the developed hierarchical convolutional model that progressively encodes temporal information throughout the network is powerful to retain both spatial details and rich temporal features, which leads to high performance on a real-world clinical dataset.



This work was supported by the EPSRC Programme Grant Seebibyte (EP/M013774/1). Arijit Patra is supported by the Rhodes Trust.


  1. 1.
    Bridge, C.P., Ioannou, C., Noble, J.A.: Automated annotation and quantitative description of ultrasound videos of the fetal heart. Med. Image Anal. 36, 147–161 (2017)CrossRefGoogle Scholar
  2. 2.
    Chen, H., Dou, Q., Ni, D., Cheng, J.-Z., Qin, J., Li, S., Heng, P.-A.: Automatic fetal ultrasound standard plane detection using knowledge transferred recurrent neural networks. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 507–514. Springer, Cham (2015). doi: 10.1007/978-3-319-24553-9_62 CrossRefGoogle Scholar
  3. 3.
    Gao, Y., Maraci, M.A., Noble, J.A.: Describing ultrasound video content using deep convolutional neural networks. In: ISBI (2016)Google Scholar
  4. 4.
    Huang, W., Bridge, C.P., Noble, J.A., Zisserman, A.: Temporal HeartNet: towards human-level automatic analysis of fetal cardiac screening video. In: MICCAI (2017)Google Scholar
  5. 5.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)Google Scholar
  6. 6.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  7. 7.
    Maraci, M.A., Napolitano, R., Papageorghiou, A., Noble, J.A.: Searching for structures of interest in an ultrasound video sequence. Med. Image Anal. 37, 22–36 (2017)CrossRefGoogle Scholar
  8. 8.
    Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR (2015)Google Scholar
  9. 9.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)Google Scholar
  10. 10.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institute of Biomedical EngineeringUniversity of OxfordOxfordUK

Personalised recommendations