Region-Based Face Alignment with Convolution Neural Network Cascade

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10636)


Most face alignment approaches perform landmark detection over the entire face. However, it has been shown that the difficulty for landmark detection is unbalanced among different facial parts. Thus, in this paper, we propose a novel region-based facial landmark detection algorithm based on a two-level convolutional neural networks (CNNs). In the first level, we partition the whole face into four regions including three facial components (eyebrow-eyes, nose, and mouth) and the face contour. Regions are detected through an improved CNN model which is incorporated with a feature fusion scheme. To simultaneously detect three facial components and face contour landmarks, a novel weighted loss function combining bounding box regression with landmark localization is presented. In the second level, the landmarks are separately detected for three facial components. Experimental results on the public benchmarks demonstrate the superiority of the proposed algorithm over several state-of-the-art face alignment algorithms.


Face alignment Region-based Convolution neural network Feature fusion 



The authors would like to thank the editor and all the anonymous reviewers of this paper for their constructive suggestions and comments. This work is supported by NSFC (No.61671290) in China, the Key Program for International S&T Cooperation Project of China (No.2016YFE0129500), and the Shanghai Committee of Science and Technology, China (No.17511101903).


  1. 1.
    Fabian, B.Q., Srinivasan, R., Martinez, A.M.: Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: 29th IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570. IEEE Press, Las Vegas (2016)Google Scholar
  2. 2.
    Chen, C., Dantcheva, A., Ross, A.: Automatic facial makeup detection with application in face recognition. In: 6th IEEE Conference on Biometrics, pp. 1–8. IEEE Press, Madrid (2013)Google Scholar
  3. 3.
    Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with GaussianFace. In: 29th AAAI Conference on Artifical Intelligence, pp. 3811–3819. AAAI Press, Austin Texas (2015)Google Scholar
  4. 4.
    Cootes, T.F., Taylor, C.J.: An algorithm for tuning an active appearance model to new data. In: 17th British Machine Vision Conference, pp. 919–928. DBLP, Edinburgh (2006)Google Scholar
  5. 5.
    Ashraf, A.B., Lucey, S., Cohn, J.F., Chen, T., Ambadar, Z., Prkachin, K.M.: The painful face - pain expression recognition using active appearance models. Image Vis. Comput. 27(12), 1788–1796 (2009)CrossRefGoogle Scholar
  6. 6.
    Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 27th IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874. IEEE Press, Columbus (2014)Google Scholar
  7. 7.
    Zhu, S., Li, C., Change Loy, C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 4998–5006. IEEE Press, Boston (2015)Google Scholar
  8. 8.
    Zhang, Z., Luo, P., Chen, C.L., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2016)CrossRefGoogle Scholar
  9. 9.
    Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483. IEEE Press, Portland (2013)Google Scholar
  10. 10.
    Sauer, P., Cootes, T., Taylor, C.: Accurate regression procedures for active appearance models. In: 18th British Machine Vision Conference, pp. 681–685. DBLP, Warwickshire (2007)Google Scholar
  11. 11.
    Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 26th IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 386–391. IEEE Press, Portland (2013)Google Scholar
  12. 12.
    Saragih, J., Goecke, R.: A nonlinear discriminative approach to AAM fitting. In: 11th International Conference on Computer Vision, pp. 1–8. IEEE Press, Rio de Janeiro (2007)Google Scholar
  13. 13.
    Cootes, T.F., Taylor, C.J.: Active Shape Models-‘smart snakes’. In: 3th British Machine Vision Conference, pp. 266–275. DBLP, Oxford (1992)Google Scholar
  14. 14.
    Jin, X., Tan, X.: Face alignment in-the-wild: a survey. arXiv preprint. arXiv:1608.04188 (2016)
  15. 15.
    Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539. IEEE Press, Portland (2013)Google Scholar
  16. 16.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  17. 17.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, I.: 26th IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 397–403. IEEE Press, Portland (2013)Google Scholar
  18. 18.
    Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)CrossRefGoogle Scholar
  19. 19.
    Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive Facial Feature Localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33712-3_49 CrossRefGoogle Scholar
  20. 20.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 25th IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886. IEEE Press, Rhode Island (2012)Google Scholar
  21. 21.
    Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: database and results. Image Vis. Comput. 47, 3–18 (2016)CrossRefGoogle Scholar
  22. 22.
    Jia, Y.Q.: Caffe: convolutional architecture for fast feature embedding. In: 22nd ACM international Conference on Multimedia, pp. 675–678. ACM, Netherlands (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityMinhang District, ShanghaiChina

Personalised recommendations