Less is More: Simultaneous View Classification and Landmark Detection for Abdominal Ultrasound Images

  • Zhoubing XuEmail author
  • Yuankai Huo
  • JinHyeong Park
  • Bennett Landman
  • Andy Milkowski
  • Sasa Grbic
  • Shaohua Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11071)


An abdominal ultrasound examination, which is the most common ultrasound examination, requires substantial manual efforts to acquire standard abdominal organ views, annotate the views in texts, and record clinically relevant organ measurements. Hence, automatic view classification and landmark detection of the organs can be instrumental to streamline the examination workflow. However, this is a challenging problem given not only the inherent difficulties from the ultrasound modality, e.g., low contrast and large variations, but also the heterogeneity across tasks, i.e., one classification task for all views, and then one landmark detection task for each relevant view. While convolutional neural networks (CNN) have demonstrated more promising outcomes on ultrasound image analytics than traditional machine learning approaches, it becomes impractical to deploy multiple networks (one for each task) due to the limited computational and memory resources on most existing ultrasound scanners. To overcome such limits, we propose a multi-task learning framework to handle all the tasks by a single network. This network is integrated to perform view classification and landmark detection simultaneously; it is also equipped with global convolutional kernels, coordinate constraints, and a conditional adversarial module to leverage the performances. In an experimental study based on 187,219 ultrasound images, with the proposed simplified approach we achieve (1) view classification accuracy better than the agreement between two clinical experts and (2) landmark-based measurement errors on par with inter-user variability. The multi-task approach also benefits from sharing the feature extraction during the training process across all tasks and, as a result, outperforms the approaches that address each task individually.


  1. 1.
    Peery, A.F., et al.: Burden of gastrointestinal disease in the united states: 2012 update. Gastroenterology 143(5), 1179–1187 (2012)CrossRefGoogle Scholar
  2. 2.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  3. 3.
    Kokkinos, I.: Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: CVPR, pp. 6129–6138 (2017)Google Scholar
  4. 4.
    Ranjan, R., et al.: An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 17–24. IEEE (2017)Google Scholar
  5. 5.
    Xue, W., Lum, A., Mercado, A., Landis, M., Warrington, J., Li, S.: Full quantification of left ventricle via deep multitask learning network respecting intra- and inter-task relatedness. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 276–284. Springer, Cham (2017). Scholar
  6. 6.
    Moeskops, P., et al.: Deep learning for multi-task medical image segmentation in multiple modalities. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 478–486. Springer, Cham (2016). Scholar
  7. 7.
    He, K., et al.: Deep residual learning for image recognition. In: CPVR, pp. 770–778 (2016)Google Scholar
  8. 8.
    Long, J., et al.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  9. 9.
    Peng, C., et al.: Large kernel matters-improve semantic segmentation by global convolutional network. In: CVPR, pp. 4353–4361 (2017)Google Scholar
  10. 10.
    Isola, P., et al.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)Google Scholar
  11. 11.
    Russakovsky, O.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Tu, Z.: Probabilistic boosting-tree: Learning discriminative models for classification, recognition, and clustering. In: ICCV, vol. 2, pp. 1589–1596. IEEE (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zhoubing Xu
    • 1
    Email author
  • Yuankai Huo
    • 2
  • JinHyeong Park
    • 1
  • Bennett Landman
    • 2
  • Andy Milkowski
    • 3
  • Sasa Grbic
    • 1
  • Shaohua Zhou
    • 1
  1. 1.Medical Imaging TechnologiesSiemens HealthineersPrincetonUSA
  2. 2.Electrical EngineeringVanderbilt UniversityNashvilleUSA
  3. 3.UltrasoundSiemens HealthineersIssaquahUSA

Personalised recommendations