Structured Landmark Detection via Topology-Adapting Deep Graph Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12354)


Image landmark detection aims to automatically identify the locations of predefined fiducial points. Despite recent success in this field, higher-ordered structural modeling to capture implicit or explicit relationships among anatomical landmarks has not been adequately exploited. In this work, we present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical (e.g., hand, pelvis) landmark detection. The proposed method constructs graph signals leveraging both local image features and global shape features. The adaptive graph topology naturally explores and lands on task-specific structures which are learned end-to-end with two Graph Convolutional Networks (GCNs). Extensive experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis). Quantitative results comparing with the previous state-of-the-art approaches across all studied datasets indicating the superior performance in both robustness and accuracy. Qualitative visualizations of the learned graph topologies demonstrate a physically plausible connectivity laying behind the landmarks.


Landmark detection GCN Adaptive topology 



This work is supported in part by NSF through award IIS-1722847, NIH through the Morris K. Udall Center of Excellence in Parkinson’s Disease Research. The main work was done when Weijian Li was a research intern at PAII Inc.

Supplementary material

504446_1_En_16_MOESM1_ESM.pdf (169 kb)
Supplementary material 1 (pdf 169 KB)


  1. 1.
    Alp Guler, R., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: Densereg: Fully convolutional dense shape regression in-the-wild. In: CVPR. pp. 6799–6808 (2017)Google Scholar
  2. 2.
    Arik, S.Ö., Ibragimov, B., Xing, L.: Fully automated quantitative cephalometry using convolutional neural networks. J. Med. Imag. 4(1), 014501 (2017)CrossRefGoogle Scholar
  3. 3.
    Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: CVPR. pp. 3444–3451 (2013)Google Scholar
  4. 4.
    Bulat, A., Tzimiropoulos, G.: Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: ICCV. pp. 3706–3714 (2017)Google Scholar
  5. 5.
    Bulat, A., Tzimiropoulos, G.: Super-fan: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. In: CVPR. pp. 109–117 (2018)Google Scholar
  6. 6.
    Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: CVPR. pp. 1513–1520 (2013)Google Scholar
  7. 7.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. IJCV 107(2), 177–190 (2014)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chen, H., et al.: Anatomy-aware siamese network: Exploiting semantic asymmetry for accurate pelvic fracture detection in x-ray images (2020)Google Scholar
  9. 9.
    Chen, R., Ma, Y., Chen, N., Lee, D., Wang, W.: Cephalometric landmark detection by attentive feature pyramid fusion and regression-voting. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 873–881. Springer, Cham (2019). Scholar
  10. 10.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. TPAMI 6, 681–685 (2001)CrossRefGoogle Scholar
  11. 11.
    Cootes, T.F., Taylor, C.J.: Active shape models-‘smart snakes’. In: BMVC, pp. 266–275. Springer (1992)Google Scholar
  12. 12.
    Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)CrossRefGoogle Scholar
  13. 13.
    Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In: BMVC. vol. 1, p. 3. Citeseer (2006)Google Scholar
  14. 14.
    Deng, J., Liu, Q., Yang, J., Tao, D.: M3 csr: Multi-view, multi-scale and multi-component cascade shape regression. Image Vision Comput. 47, 19–26 (2016)CrossRefGoogle Scholar
  15. 15.
    Deng, J., Trigeorgis, G., Zhou, Y., Zafeiriou, S.: Joint multi-view face alignment in the wild. TIP 28(7), 3636–3648 (2019)MathSciNetzbMATHGoogle Scholar
  16. 16.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)
  17. 17.
    Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Style aggregated network for facial landmark detection. In: CVPR. pp. 379–388 (2018)Google Scholar
  18. 18.
    Dong, X., Yu, S.I., Weng, X., Wei, S.E., Yang, Y., Sheikh, Y.: Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. In: CVPR. pp. 360–368 (2018)Google Scholar
  19. 19.
    Fan, H., Zhou, E.: Approaching human level facial landmark localization by deep learning. Image Vision Comput. 47, 27–35 (2016)CrossRefGoogle Scholar
  20. 20.
    Feng, Z.H., Kittler, J., Awais, M., Huber, P., Wu, X.J.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: CVPR. pp. 2235–2245 (2018)Google Scholar
  21. 21.
    Ghiasi, G., Fowlkes, C.C.: Occlusion coherence: Detecting and localizing occluded faces. arXiv preprint arXiv:1506.08347 (2015)
  22. 22.
    Han, D., Gao, Y., Wu, G., Yap, P.T., Shen, D.: Robust anatomical landmark detection with application to mr brain image registration. Comput. Med. Imag. Graph. 46, 277–290 (2015)CrossRefGoogle Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. pp. 770–778 (2016)Google Scholar
  24. 24.
    Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C., Kautz, J.: Improving landmark localization with semi-supervised learning. In: CVPR. pp. 1546–1555 (2018)Google Scholar
  25. 25.
    Honari, S., Yosinski, J., Vincent, P., Pal, C.: Recombinator networks: Learning coarse-to-fine feature aggregation. In: CVPR. pp. 5743–5752 (2016)Google Scholar
  26. 26.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurIPS. pp. 2017–2025 (2015)Google Scholar
  27. 27.
    Kumar, A., et al.: Luvli face alignment: Estimating landmarks’ location, uncertainty, and visibility likelihood. In: CVPR. pp. 8236–8246 (2020)Google Scholar
  28. 28.
    Kumar, A., Chellappa, R.: Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In: CVPR. pp. 430–439 (2018)Google Scholar
  29. 29.
    Li, G., Müller, M., Thabet, A., Ghanem, B.: Can gcns go as deep as cnns? In: CVPR (2019)Google Scholar
  30. 30.
    Lindner, C., Bromiley, P.A., Ionita, M.C., Cootes, T.F.: Robust and accurate shape model matching using random forest regression-voting. TPAMI 37(9), 1862–1874 (2014)CrossRefGoogle Scholar
  31. 31.
    Ling, H., Gao, J., Kar, A., Chen, W., Fidler, S.: Fast interactive object annotation with curve-gcn. In: CVPR. pp. 5257–5266 (2019)Google Scholar
  32. 32.
    Liu, X.: Generic face alignment using boosted appearance model. In: CVPR. pp. 1–8. IEEE (2007)Google Scholar
  33. 33.
    Liu, Z., Yan, S., Luo, P., Wang, X., Tang, X.: Fashion landmark detection in the wild. In: ECCV. pp. 229–245. Springer (2016)Google Scholar
  34. 34.
    Lu, Y., et al.: Learning to segment anatomical structures accurately from one exemplar. arXiv preprint arXiv:2007.03052 (2020)
  35. 35.
    Lv, J., Shao, X., Xing, J., Cheng, C., Zhou, X.: A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: CVPR. pp. 3317–3326 (2017)Google Scholar
  36. 36.
    Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: ECCV. pp. 504–513. Springer (2008)Google Scholar
  37. 37.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV. pp. 483–499. Springer (2016)Google Scholar
  38. 38.
    Payer, C., Štern, D., Bischof, H., Urschler, M.: Regressing heatmaps for multiple landmark localization using cnns. In: MICCAI. pp. 230–238. Springer (2016)Google Scholar
  39. 39.
    Payer, C., Štern, D., Bischof, H., Urschler, M.: Integrating spatial configuration into heatmap regression based CNNs for landmark localization. MIA 54, 207–219 (2019). Scholar
  40. 40.
    Qi, M., Li, W., Yang, Z., Wang, Y., Luo, J.: Attentive relational networks for mapping images to scene graphs. In: CVPR. pp. 3957–3966 (2019)Google Scholar
  41. 41.
    Qian, S., Sun, K., Wu, W., Qian, C., Jia, J.: Aggregation via separation: Boosting facial landmark detector with semi-supervised style translation. In: ICCV. pp. 10153–10163 (2019)Google Scholar
  42. 42.
    Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 fps via regressing local binary features. In: CVPR. pp. 1685–1692 (2014)Google Scholar
  43. 43.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: CVPRW. pp. 397–403 (2013)Google Scholar
  44. 44.
    Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: ICCV. pp. 1034–1041. IEEE (2009)Google Scholar
  45. 45.
    Sauer, P., Cootes, T.F., Taylor, C.J.: Accurate regression procedures for active appearance models. In: BMVC. pp. 1–11 (2011)Google Scholar
  46. 46.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  47. 47.
    Su, J., Wang, Z., Liao, C., Ling, H.: Efficient and accurate face alignment by global regression and cascaded local refinement. In: CVPRW (2019)Google Scholar
  48. 48.
    Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR. pp. 5693–5703 (2019)Google Scholar
  49. 49.
    Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: CVPR. pp. 3476–3483 (2013)Google Scholar
  50. 50.
    Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D.: Quantized densely connected u-nets for efficient landmark localization. In: ECCV. pp. 339–354 (2018)Google Scholar
  51. 51.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: CVPR. pp. 1653–1660 (2014)Google Scholar
  52. 52.
    Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In: CVPR. pp. 4177–4187 (2016)Google Scholar
  53. 53.
    Valle, R., Buenaposada, J.M., Valdés, A., Baumela, L.: A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In: ECCV. pp. 585–601 (2018)Google Scholar
  54. 54.
    Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
  55. 55.
    Wang, C.W., Huang, C.T., Lee, J.H., Li, C.H., Chang, S.W., Siao, M.J., Lai, T.M., Ibragimov, B., Vrtovec, T., Ronneberger, O., et al.: A benchmark for comparison of dental radiography analysis algorithms. MIA 31, 63–76 (2016)Google Scholar
  56. 56.
    Wang, X., Bo, L., Fuxin, L.: Adaptive wing loss for robust face alignment via heatmap regression. In: ICCV. pp. 6971–6981 (2019)Google Scholar
  57. 57.
    Wang, Y., Lu, L., Cheng, C.T., Jin, D., Harrison, A.P., Xiao, J., Liao, C.H., Miao, S.: Weakly supervised universal fracture detection in pelvic x-rays. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A. (eds.) MICCAI, pp. 459–467. Springer International Publishing, Cham (2019)Google Scholar
  58. 58.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR. pp. 4724–4732 (2016)Google Scholar
  59. 59.
    Wu, S., Tang, Y., Zhu, Y., Wang, L., Xie, X., Tan, T.: Session-based recommendation with graph neural networks. AAAI. 33, 346–353 (2019)CrossRefGoogle Scholar
  60. 60.
    Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: A boundary-aware face alignment algorithm. In: CVPR. pp. 2129–2138 (2018)Google Scholar
  61. 61.
    Wu, W., Yang, S.: Leveraging intra and inter-dataset variations for robust face alignment. In: CVPRW. pp. 150–159 (2017)Google Scholar
  62. 62.
    Wu, Y., Ji, Q.: Facial landmark detection: a literature survey. IJCV 127(2), 115–142 (2019)CrossRefGoogle Scholar
  63. 63.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR. pp. 532–539 (2013)Google Scholar
  64. 64.
    Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018)
  65. 65.
    Yu, W., Liang, X., Gong, K., Jiang, C., Xiao, N., Lin, L.: Layout-graph reasoning for fashion landmark detection. In: CVPR. pp. 2937–2945 (2019)Google Scholar
  66. 66.
    Yu, X., Huang, J., Zhang, S., Metaxas, D.N.: Face landmark fitting via optimized part mixtures and cascaded deformable model. TPAMI 38(11), 2212–2226 (2015)CrossRefGoogle Scholar
  67. 67.
    Yu, X., Zhou, F., Chandraker, M.: Deep deformation network for object landmark localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 52–70. Springer, Cham (2016). Scholar
  68. 68.
    Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 1–16. Springer, Cham (2014). Scholar
  69. 69.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. TPAMI 38(5), 918–930 (2015)CrossRefGoogle Scholar
  70. 70.
    Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. In: CVPR. pp. 3425–3435 (2019)Google Scholar
  71. 71.
    Zhou, F., Brandt, J., Lin, Z.: Exemplar-based graph matching for robust facial landmark localization. In: ICCV. pp. 1025–1032 (2013)Google Scholar
  72. 72.
    Zhu, M., Shi, D., Zheng, M., Sadiq, M.: Robust facial landmark detection via occlusion-adaptive deep networks. In: CVPR. pp. 3486–3496 (2019)Google Scholar
  73. 73.
    Zhu, S., Li, C., Change Loy, C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: CVPR. pp. 4998–5006 (2015)Google Scholar
  74. 74.
    Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: A 3d solution. In: CVPR. pp. 146–155 (2016)Google Scholar
  75. 75.
    Zhu, Z., Luo, P., Wang, X., Tang, X.: Deep learning identity-preserving face space. In: ICCV. pp. 113–120 (2013)Google Scholar
  76. 76.
    Zou, X., Zhong, S., Yan, L., Zhao, X., Zhou, J., Wu, Y.: Learning robust facial landmark detection via hierarchical structured ensemble. In: ICCV (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.PAII. Inc.BethesdaUSA
  2. 2.Department of Computer ScienceUniversity of RochesterRochesterUSA
  3. 3.Department of Computer Science and EngineeringUniversity of South CarolinaColumbiaUSA
  4. 4.Chang Gung Memorial HospitalLinkouTaiwan
  5. 5.Ping An TechnologyShenzhenChina

Personalised recommendations