Abstract
We propose a novel approach to directly estimate the position of the facial keypoints via convolutional neural networks (CNN). Our method estimates the global position and the local positions from a unified CNN and combines them through a simplified optimization process. There are twofolds of advantages for our approach. First, the global geometrical position and the local detailed position of the facial keypoints are combined complementarily to avoid local minimums caused by occlusions and pose variations. Second, unlike the traditional method such as a cascade of multiple CNN, we propose a unified deep and large architecture network consisted by global position network and local position network. Our design shares most of computations for facial features between networks, and this efficient high-level features improves largely to the precise estimate of facial keypoints. We conduct comparative experiments with the state-of-the-art researches and commercial services. In experiments, our approach shows a remarkable performance.
Similar content being viewed by others
References
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006). https://doi.org/10.1109/TPAMI.2006.244
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013). https://doi.org/10.1109/TPAMI.2013.23
Berretti, S., del Bimbo, A., Pala, P.: Automatic facial expression recognition in real-time from dynamic sequences of 3D face scans. Vis. Comput. 29(12), 1333–1350 (2013). https://doi.org/10.1007/s00371-013-0869-2
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2887–2894 (2012). https://doi.org/10.1109/CVPR.2012.6248015
Cao, Z., Yin, Q., Tang, X., Sun, J.: Face recognition with learning-based descriptor. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2707–2714 (2010). https://doi.org/10.1109/CVPR.2010.5539992
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995). https://doi.org/10.1006/cviu.1995.1004
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models, pp. 484–498. Springer, Berlin (1998). https://doi.org/10.1007/BFb0054760
Ding, L., Ding, X., Fang, C.: 3D face sparse reconstruction based on local linear fitting. Vis. Comput. 30(2), 189–200 (2014). https://doi.org/10.1007/s00371-013-0795-3
Gidaris, S., Komodakis, N.: Locnet: improving localization accuracy for object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 789–798 (2016). https://doi.org/10.1109/CVPR.2016.92
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
Hu, J., Hua, J.: Pose analysis using spectral geometry. Vis. Comput. 29(9), 949–958 (2013). https://doi.org/10.1007/s00371-013-0850-0
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007)
Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust Face Detection Using the Hausdorff Distance, pp. 90–95. Springer, Berlin (2001). https://doi.org/10.1007/3-540-45344-X_14
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc., Nevada (2012)
Liang, L., Xiao, R., Wen, F., Sun, J.: Face Alignment Via Component-Based Discriminative Search, pp. 72–85. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-88688-4_6
Luxand facesdk. http://www.luxand.com/facesdk/. Accessed 19 July 2017
Microsoft cognitive face. https://azure.microsoft.com/services/cognitive-services/face/. Accessed 19 July 2017
Milborrow, S., Nicolls, F.: Locating Facial Features with an Extended Active Shape Model, pp. 504–513. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-88693-8_37
Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1692 (2014). https://doi.org/10.1109/CVPR.2014.218
Saatci, Y., Town, C.: Cascaded classification of gender and facial expression using active appearance models. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp. 393–398 (2006). https://doi.org/10.1109/FGR.2006.29
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. ArXiv e-prints (2013). http://arxiv.org/abs/1312.6229
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ArXiv e-prints (2014)
Singh, C., Walia, E., Mittal, N.: Robust two-stage face recognition approach using global and local features. Vis. Comput. 28(11), 1085–1098 (2012). https://doi.org/10.1007/s00371-011-0659-7
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013). https://doi.org/10.1109/CVPR.2013.446
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220
Xiong, X., la Torre, F.D.: Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013). https://doi.org/10.1109/CVPR.2013.75
Zhang, C., Zhang, Z.: Improving multiview face detection with multi-task deep convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1036–1041 (2014). https://doi.org/10.1109/WACV.2014.6835990
Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 386–391 (2013). https://doi.org/10.1109/ICCVW.2013.58
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012). https://doi.org/10.1109/CVPR.2012.6248014
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2016R1A2B4007608), National IT Industry Promotion Agency (NIPA) grant funded by the Korea government (MSIT) (No. S0602-17-1001) and Technology & Information Promotion Agency for SMEs (TIPA) grant funded by the Korea government (MSIT) (No. C0507460).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, JK., Kang, DJ. Unified convolutional neural network for direct facial keypoints detection. Vis Comput 35, 1615–1626 (2019). https://doi.org/10.1007/s00371-018-1561-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-018-1561-3