Regressing Heatmaps for Multiple Landmark Localization Using CNNs

  • Christian PayerEmail author
  • Darko Štern
  • Horst Bischof
  • Martin Urschler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9901)


We explore the applicability of deep convolutional neural networks (CNNs) for multiple landmark localization in medical image data. Exploiting the idea of regressing heatmaps for individual landmark locations, we investigate several fully convolutional 2D and 3D CNN architectures by training them in an end-to-end manner. We further propose a novel SpatialConfiguration-Net architecture that effectively combines accurate local appearance responses with spatial landmark configurations that model anatomical variation. Evaluation of our different architectures on 2D and 3D hand image datasets show that heatmap regression based on CNNs achieves state-of-the-art landmark localization performance, with SpatialConfiguration-Net being robust even in case of limited amounts of training data.


Network Weight Convolutional Neural Network Statistical Shape Model Deep Network Local Appearance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  2. 2.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. Proc. IEEE 86(11), 2278–2323 (1998)CrossRefGoogle Scholar
  3. 3.
    Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40763-5_31 CrossRefGoogle Scholar
  4. 4.
    Roth, H.R., Lu, L., Seff, A., Cherry, K.M., Hoffman, J., Wang, S., Liu, J., Turkbey, E., Summers, R.M.: A new 2.5D representation for lymph node detection using random sets of deep convolutional neural network observations. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8673, pp. 520–527. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10404-1_65 Google Scholar
  5. 5.
    Zheng, Y., Liu, D., Georgescu, B., Nguyen, H., Comaniciu, D.: 3D deep learning for efficient and robust landmark detection in volumetric data. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 565–572. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24553-9_69 CrossRefGoogle Scholar
  6. 6.
    Liu, D., Zhou, K.S., Bernhardt, D., Comaniciu, D.: Search strategies for multiple landmark detection by submodular maximization. In: CVPR, pp. 2831–2838 (2010)Google Scholar
  7. 7.
    Donner, R., Menze, B.H., Bischof, H., Langs, G.: Global localization of 3D anatomical structures by pre-filtered hough forests and discrete optimization. Med. Image Anal. 17(8), 1304–1314 (2013)CrossRefGoogle Scholar
  8. 8.
    Lindner, C., Bromiley, P.A., Ionita, M.C., Cootes, T.: Robust and accurate shape model matching using random forest regression-voting. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1862–1874 (2015)CrossRefGoogle Scholar
  9. 9.
    Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV, pp. 1913–1921 (2015)Google Scholar
  10. 10.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  11. 11.
    Liu, Z., Li, X., Luo, P., Change, C., Tang, L.X.: Semantic image segmentation via deep parsing network. In: ICCV, pp. 1377–1385 (2015)Google Scholar
  12. 12.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24574-4_28 CrossRefGoogle Scholar
  13. 13.
    Ebner, T., Stern, D., Donner, R., Bischof, H., Urschler, M.: Towards automatic bone age estimation from MRI: localization of 3D anatomical landmarks. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 421–428. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10470-6_53 Google Scholar
  14. 14.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACM International Conference on Multimedia (MM 2014), pp. 675–678 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Christian Payer
    • 1
    Email author
  • Darko Štern
    • 2
  • Horst Bischof
    • 1
  • Martin Urschler
    • 2
    • 3
  1. 1.Institute for Computer Graphics and VisionGraz University of TechnologyGrazAustria
  2. 2.Ludwig Boltzmann Institute for Clinical Forensic ImagingGrazAustria
  3. 3.BioTechMed-GrazGrazAustria

Personalised recommendations