Skip to main content
Log in

Unified convolutional neural network for direct facial keypoints detection

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We propose a novel approach to directly estimate the position of the facial keypoints via convolutional neural networks (CNN). Our method estimates the global position and the local positions from a unified CNN and combines them through a simplified optimization process. There are twofolds of advantages for our approach. First, the global geometrical position and the local detailed position of the facial keypoints are combined complementarily to avoid local minimums caused by occlusions and pose variations. Second, unlike the traditional method such as a cascade of multiple CNN, we propose a unified deep and large architecture network consisted by global position network and local position network. Our design shares most of computations for facial features between networks, and this efficient high-level features improves largely to the precise estimate of facial keypoints. We conduct comparative experiments with the state-of-the-art researches and commercial services. In experiments, our approach shows a remarkable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006). https://doi.org/10.1109/TPAMI.2006.244

    Article  MATH  Google Scholar 

  2. Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013). https://doi.org/10.1109/TPAMI.2013.23

    Article  Google Scholar 

  3. Berretti, S., del Bimbo, A., Pala, P.: Automatic facial expression recognition in real-time from dynamic sequences of 3D face scans. Vis. Comput. 29(12), 1333–1350 (2013). https://doi.org/10.1007/s00371-013-0869-2

    Article  Google Scholar 

  4. Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2887–2894 (2012). https://doi.org/10.1109/CVPR.2012.6248015

  5. Cao, Z., Yin, Q., Tang, X., Sun, J.: Face recognition with learning-based descriptor. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2707–2714 (2010). https://doi.org/10.1109/CVPR.2010.5539992

  6. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995). https://doi.org/10.1006/cviu.1995.1004

    Article  Google Scholar 

  7. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models, pp. 484–498. Springer, Berlin (1998). https://doi.org/10.1007/BFb0054760

    Book  Google Scholar 

  8. Ding, L., Ding, X., Fang, C.: 3D face sparse reconstruction based on local linear fitting. Vis. Comput. 30(2), 189–200 (2014). https://doi.org/10.1007/s00371-013-0795-3

    Article  Google Scholar 

  9. Gidaris, S., Komodakis, N.: Locnet: improving localization accuracy for object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 789–798 (2016). https://doi.org/10.1109/CVPR.2016.92

  10. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  12. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123

  13. Hu, J., Hua, J.: Pose analysis using spectral geometry. Vis. Comput. 29(9), 949–958 (2013). https://doi.org/10.1007/s00371-013-0850-0

    Article  Google Scholar 

  14. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007)

  15. Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust Face Detection Using the Hausdorff Distance, pp. 90–95. Springer, Berlin (2001). https://doi.org/10.1007/3-540-45344-X_14

    Book  MATH  Google Scholar 

  16. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc., Nevada (2012)

    Google Scholar 

  18. Liang, L., Xiao, R., Wen, F., Sun, J.: Face Alignment Via Component-Based Discriminative Search, pp. 72–85. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-88688-4_6

    Book  Google Scholar 

  19. Luxand facesdk. http://www.luxand.com/facesdk/. Accessed 19 July 2017

  20. Microsoft cognitive face. https://azure.microsoft.com/services/cognitive-services/face/. Accessed 19 July 2017

  21. Milborrow, S., Nicolls, F.: Locating Facial Features with an Extended Active Shape Model, pp. 504–513. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-88693-8_37

    Book  Google Scholar 

  22. Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1692 (2014). https://doi.org/10.1109/CVPR.2014.218

  23. Saatci, Y., Town, C.: Cascaded classification of gender and facial expression using active appearance models. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp. 393–398 (2006). https://doi.org/10.1109/FGR.2006.29

  24. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. ArXiv e-prints (2013). http://arxiv.org/abs/1312.6229

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ArXiv e-prints (2014)

  26. Singh, C., Walia, E., Mittal, N.: Robust two-stage face recognition approach using global and local features. Vis. Comput. 28(11), 1085–1098 (2012). https://doi.org/10.1007/s00371-011-0659-7

    Article  Google Scholar 

  27. Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013). https://doi.org/10.1109/CVPR.2013.446

  28. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220

  29. Xiong, X., la Torre, F.D.: Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013). https://doi.org/10.1109/CVPR.2013.75

  30. Zhang, C., Zhang, Z.: Improving multiview face detection with multi-task deep convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1036–1041 (2014). https://doi.org/10.1109/WACV.2014.6835990

  31. Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 386–391 (2013). https://doi.org/10.1109/ICCVW.2013.58

  32. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012). https://doi.org/10.1109/CVPR.2012.6248014

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2016R1A2B4007608), National IT Industry Promotion Agency (NIPA) grant funded by the Korea government (MSIT) (No. S0602-17-1001) and Technology & Information Promotion Agency for SMEs (TIPA) grant funded by the Korea government (MSIT) (No. C0507460).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Joong Kang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, JK., Kang, DJ. Unified convolutional neural network for direct facial keypoints detection. Vis Comput 35, 1615–1626 (2019). https://doi.org/10.1007/s00371-018-1561-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-018-1561-3

Keywords

Navigation