Advertisement

Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment

  • Jie Zhang
  • Shiguang Shan
  • Meina Kan
  • Xilin Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)

Abstract

Accurate face alignment is a vital prerequisite step for most face perception tasks such as face recognition, facial expression analysis and non-realistic face re-rendering. It can be formulated as the nonlinear inference of the facial landmarks from the detected face region. Deep network seems a good choice to model the nonlinearity, but it is nontrivial to apply it directly. In this paper, instead of a straightforward application of deep network, we propose a Coarse-to-Fine Auto-encoder Networks (CFAN) approach, which cascades a few successive Stacked Auto-encoder Networks (SANs). Specifically, the first SAN predicts the landmarks quickly but accurately enough as a preliminary, by taking as input a low-resolution version of the detected face holistically. The following SANs then progressively refine the landmark by taking as input the local features extracted around the current landmarks (output of the previous SAN) with higher and higher resolution. Extensive experiments conducted on three challenging datasets demonstrate that our CFAN outperforms the state-of-the-art methods and performs in real-time(40+fps excluding face detection on a desktop).

Keywords

Face Alignment Nonlinear Deep Learning Stacked Auto-encoder Coarse-to-Fine Real-time 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    300 faces in-the-wild challenge, http://ibug.doc.ic.ac.uk/resources/300-W/
  2. 2.
    Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3444–3451 (2013)Google Scholar
  3. 3.
    Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 545–552 (2011)Google Scholar
  4. 4.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2(1), 1–127 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: IEEE International Conference on Computer Vision, ICCV (2013)Google Scholar
  6. 6.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2887–2894 (2012)Google Scholar
  7. 7.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 23(6), 681–685 (2001)CrossRefGoogle Scholar
  8. 8.
    Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Computer Vision and Image Understanding (CVIU) 61(1), 38–59 (1995)CrossRefGoogle Scholar
  9. 9.
    Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In: British Machine Vision Conference (BMVC), vol. 17, pp. 929–938 (2006)Google Scholar
  10. 10.
    Cristinacce, D., Cootes, T.F.: Boosted regression active shape models. In: British Machine Vision Conference (BMVC), pp. 1–10 (2007)Google Scholar
  11. 11.
    Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2578–2585 (2012)Google Scholar
  12. 12.
    Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1078–1085 (2010)Google Scholar
  13. 13.
    Grangier, D., Bottou, L., Collobert, R.: Deep convolutional networks for scene parsing. In: International Conference on Machine Learning Workshops, vol. 3 (2009)Google Scholar
  14. 14.
    Gross, R., Matthews, I., Baker, S.: Generic vs. person specific active appearance models. Image and Vision Computing (IVC) 23(12), 1080–1093 (2005)CrossRefGoogle Scholar
  15. 15.
    Gu, L., Kanade, T.: A generative shape regularization model for robust face alignment. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 413–426. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust face detection using the hausdorff distance. In: International Conference on Audio-and Video-based Biometric Person Authentication (AVBPA), pp. 90–95 (2001)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1106–1114 (2012)Google Scholar
  18. 18.
    Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  19. 19.
    Liu, X.: Discriminative face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 31(11), 1941–1954 (2009)CrossRefGoogle Scholar
  20. 20.
    Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2480–2487 (2012)Google Scholar
  21. 21.
    Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision (IJCV) 60(2), 135–164 (2004)CrossRefGoogle Scholar
  22. 22.
    Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: Xm2vtsdb: The extended m2vts database. In: International Conference on Audio and Video-based Biometric Person Authentication (AVBPA), vol. 964, pp. 965–966 (1999)Google Scholar
  23. 23.
    Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 504–513. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  24. 24.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: A semi-automatic methodology for facial landmark annotation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 896–903 (2013)Google Scholar
  25. 25.
    Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: IEEE International Conference on Computer Vision (ICCV), pp. 1034–1041 (2009)Google Scholar
  26. 26.
    Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3476–3483 (2013)Google Scholar
  27. 27.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)Google Scholar
  28. 28.
    Valstar, M., Martinez, B., Binefa, X., Pantic, M.: Facial point detection using boosted regression and graph models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2729–2736 (2010)Google Scholar
  29. 29.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. I–511 (2001)Google Scholar
  30. 30.
    Wu, Y., Wang, Z., Ji, Q.: Facial feature tracking under varying facial expressions and face poses based on restricted boltzmann machines. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3452–3459 (2013)Google Scholar
  31. 31.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)Google Scholar
  32. 32.
    Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: IEEE International Conference on Computer Vision, ICCV (2013)Google Scholar
  33. 33.
    Zhao, X., Kim, T.K., Luo, W.: Unified face analysis by iterative multi-output random forests. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)Google Scholar
  34. 34.
    Zhao, X., Shan, S., Chai, X., Chen, X.: Locality-constrained active appearance model. In: Asian Conference on Computer Vision (ACCV), pp. 636–647 (2013)Google Scholar
  35. 35.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jie Zhang
    • 1
    • 2
  • Shiguang Shan
    • 1
  • Meina Kan
    • 1
  • Xilin Chen
    • 1
  1. 1.Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations