Computational Visual Media

, Volume 3, Issue 1, pp 33–47 | Cite as

Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video

  • Shuang Liu
  • Yongqiang Zhang
  • Xiaosong Yang
  • Daming Shi
  • Jian J. Zhang
Open Access
Research Article

Abstract

We present a novel approach for automatically detecting and tracking facial landmarks across poses and expressions from in-the-wild monocular video data, e.g., YouTube videos and smartphone recordings. Our method does not require any calibration or manual adjustment for new individual input videos or actors. Firstly, we propose a method of robust 2D facial landmark detection across poses, by combining shape-face canonical-correlation analysis with a global supervised descent method. Since 2D regression-based methods are sensitive to unstable initialization, and the temporal and spatial coherence of videos is ignored, we utilize a coarse-todense 3D facial expression reconstruction method to refine the 2D landmarks. On one side, we employ an in-the-wild method to extract the coarse reconstruction result and its corresponding texture using the detected sparse facial landmarks, followed by robust pose, expression, and identity estimation. On the other side, to obtain dense reconstruction results, we give a face tracking flow method that corrects coarse reconstruction results and tracks weakly textured areas; this is used to iteratively update the coarse face model. Finally, a dense reconstruction result is estimated after it converges. Extensive experiments on a variety of video sequences recorded by ourselves or downloaded from YouTube show the results of facial landmark detection and tracking under various lighting conditions, for various head poses and facial expressions. The overall performance and a comparison with state-of-art methods demonstrate the robustness and effectiveness of our method.

Keywords

face tracking facial reconstruction landmark detection 

Notes

Acknowledgements

This work was supported by the Harbin Institute of Technology Scholarship Fund 2016 and the National Centre for Computer Animation, Bournemouth University.

Supplementary material

41095_2016_68_MOESM1_ESM.mp4 (30.4 mb)
Supplementary material, approximately 30.4 MB.

References

  1. [1]
    Mori, M.; MacDorman, K. F.; Kageki, N. The uncanny valley [from the field]. IEEE Robotics & Automation Magazine Vol. 19, No. 2, 98–100, 2012.CrossRefGoogle Scholar
  2. [2]
    Cootes, T. F.; Taylor, C. J.; Cooper, D. H.; Graham, J. Active shape models—Their training and application. Computer Vision and Image Understanding Vol. 61, No. 1, 38–59, 1995.CrossRefGoogle Scholar
  3. [3]
    Cootes, T. F.; Edwards, G. J.; Taylor, C. J. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 6, 681–685, 2001.CrossRefGoogle Scholar
  4. [4]
    Cristinacce, D.; Cootes, T. F. Feature detection and tracking with constrained local models. In: Proceedings of the British Machine Conference, 95.1–95.10, 2006.Google Scholar
  5. [5]
    Gonzalez-Mora, J.; De la Torre, F.; Murthi, R.; Guil, N.; Zapata, E. L. Bilinear active appearance models. In: Proceedings of IEEE 11th International Conference on Computer Vision, 1–8, 2007.Google Scholar
  6. [6]
    Lee, H.-S.; Kim, D. Tensor-based AAM with continuous variation estimation: Application to variation-robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 31, No. 6, 1102–1116, 2009.CrossRefGoogle Scholar
  7. [7]
    Cao, X.; Wei, Y.; Wen, F.; Sun, J. Face alignment by explicit shape regression. U.S. Patent Application 13/728,584. 2012-12-27.Google Scholar
  8. [8]
    Xiong, X.; De la Torre, F. Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 532–539, 2013.Google Scholar
  9. [9]
    Xing, J.; Niu, Z.; Huang, J.; Hu, W.; Yan, S. Towards multi-view and partially-occluded face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1829–1836, 2014.Google Scholar
  10. [10]
    Yan, J.; Lei, Z.; Yi, D.; Li, S. Z. Learn to combine multiple hypotheses for accurate face alignment. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 392–396, 2013.Google Scholar
  11. [11]
    Burgos-Artizzu, X. P.; Perona, P.; Dollár, P. Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, 1513–1520, 2013.Google Scholar
  12. [12]
    Yang, H.; He, X.; Jia, X.; Patras, I. Robust face alignment under occlusion via regional predictive power estimation. IEEE Transactions on Image Processing Vol. 24, No. 8, 2393–2403, 2015.MathSciNetCrossRefGoogle Scholar
  13. [13]
    Feng, Z.-H.; Huber, P.; Kittler, J.; Christmas, W.; Wu, X.-J. Random cascaded-regression copse for robust facial landmark detection. IEEE Signal Processing Letters Vol. 22, No. 1, 76–80, 2015.CrossRefGoogle Scholar
  14. [14]
    Yang, H.; Jia, X.; Patras, I.; Chan, K.-P. Random subspace supervised descent method for regression problems in computer vision. IEEE Signal Processing Letters Vol. 22, No. 10, 1816–1820, 2015.CrossRefGoogle Scholar
  15. [15]
    Zhu, S.; Li, C.; Loy, C. C.; Tang, X. Face alignment by coarse-to-fine shape searching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4998–5006, 2015.Google Scholar
  16. [16]
    Cao, C.; Hou, Q.; Zhou, K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 43, 2014.Google Scholar
  17. [17]
    Liu, S.; Yang, X.; Wang, Z.; Xiao, Z.; Zhang, J. Real-time facial expression transfer with single video camera. Computer Animation and Virtual Worlds Vol. 27, Nos. 3–4, 301–310, 2016.CrossRefGoogle Scholar
  18. [18]
    Tzimiropoulos, G.; Pantic, M. Optimization problems for fast AAM fitting in-the-wild. In: Proceedings of the IEEE International Conference on Computer Vision, 593–600, 2013.Google Scholar
  19. [19]
    Suwajanakorn, S.; Kemelmacher-Shlizerman, I.; Seitz, S. M. Total moving face reconstruction. In: Computer Vision–ECCV 2014. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer International Publishing, 796–812, 2014.Google Scholar
  20. [20]
    Cootes, T. F.; Taylor, C. J. Statistical models of appearance for computer vision. 2004. Available at http://personalpages.manchester.ac.uk/staff/timothy.f. cootes/Models/app models.pdf.Google Scholar
  21. [21]
    Yan, S.; Liu, C.; Li, S. Z.; Zhang, H.; Shum, H.-Y.; Cheng, Q. Face alignment using texture-constrained active shape models. Image and Vision Computing Vol. 21, No. 1, 69–75, 2003.CrossRefGoogle Scholar
  22. [22]
    Donner, R.; Reiter, M.; Langs, G.; Peloschek, P.; Bischof, H. Fast active appearance model search using canonical correlation analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 10, 1690–1694, 2006.CrossRefGoogle Scholar
  23. [23]
    Matthews, I.; Baker, S. Active appearance models revisited. International Journal of Computer Vision Vol. 60, No. 2, 135–164, 2004.CrossRefGoogle Scholar
  24. [24]
    Cao, X.; Wei, Y.; Wen, F.; Sun, J. Face alignment by explicit shape regression. International Journal of Computer Vision Vol. 107, No. 2, 177–190, 2014.MathSciNetCrossRefGoogle Scholar
  25. [25]
    Dollár, P.; Welinder, P.; Perona, P. Cascaded pose regression. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1078–1085, 2010.Google Scholar
  26. [26]
    Zhou, S. K.; Comaniciu, D. Shape regression machine. In: Information Processing in Medical Imaging. Karssemeijer, N.; Lelieveldt, B. Eds. Springer Berlin Heidelberg, 13–25, 2007.CrossRefGoogle Scholar
  27. [27]
    Burgos-Artizzu, X. P.; Perona, P.; Dollár, P. Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, 1513–1520, 2013.Google Scholar
  28. [28]
    Ren, S.; Cao, X.; Wei, Y.; Sun, J. Face alignment at 3000 fps via regressing local binary features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1685–1692, 2014.Google Scholar
  29. [29]
    Cootes, T. F.; Ionita, M. C.; Lindner, C.; Sauer, P. Robust and accurate shape model fitting using random forest regression voting. In: Computer Vision–ECCV 2012. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 278–291, 2012.CrossRefGoogle Scholar
  30. [30]
    Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1867–1874, 2014.Google Scholar
  31. [31]
    Sagonas, C.; Tzimiropoulos, G.; Zafeiriou, S.; Pantic, M. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 397–403, 2013.Google Scholar
  32. [32]
    Zhou, F.; Brandt, J.; Lin, Z. Exemplar-based graph matching for robust facial landmark localization. In: Proceedings of the IEEE International Conference on Computer Vision, 1025–1032, 2013.Google Scholar
  33. [33]
    Huang, G. B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, 2007.Google Scholar
  34. [34]
    Shen, J.; Zafeiriou, S.; Chrysos, G. G.; Kossaifi, J.; Tzimiropoulos, G.; Pantic, M. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, 1003–1011, 2015.Google Scholar
  35. [35]
    Cao, C.; Bradley, D.; Zhou, K.; Beeler, T. Realtime high-fidelity facial performance capture. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 46, 2015.Google Scholar
  36. [36]
    Cao, C.; Wu, H.; Weng, Y.; Shao, T.; Zhou, K. Real-time facial animation with image-based dynamic avatars. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 126, 2016.Google Scholar
  37. [37]
    Garrido, P.; Valgaerts, L.; Wu, C.; Theobalt, C. Reconstructing detailed dynamic face geometry from monocular video. ACM Transactions on Graphics Vol. 32, No. 6, Article No. 158, 2013.Google Scholar
  38. [38]
    Ichim, A. E.; Bouaziz, S.; Pauly, M. Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 45, 2015.Google Scholar
  39. [39]
    Saito, S.; Li, T.; Li, H. Real-time facial segmentation and performance capture from RGB input. arXiv preprint arXiv:1604.02647, 2016.CrossRefGoogle Scholar
  40. [40]
    Shi, F.; Wu, H.-T.; Tong, X.; Chai, J. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Transactions on Graphics Vol. 33, No. 6, Article No. 222, 2014.Google Scholar
  41. [41]
    Thies, J.; Zollhöfer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1, 2016.Google Scholar
  42. [42]
    Furukawa, Y.; Ponce, J. Accurate camera calibration from multi-view stereo and bundle adjustment. International Journal of Computer Vision Vol. 84, No. 3, 257–268, 2009.CrossRefGoogle Scholar
  43. [43]
    Cao, C.; Weng, Y.; Zhou, S.; Tong, Y.; Zhou, K. FaceWarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics Vol. 20, No. 3, 413–425, 2014.CrossRefGoogle Scholar
  44. [44]
    Newcombe, R. A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A. J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. KinectFusion: Realtime dense surface mapping and tracking. In: Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality, 127–136, 2011.Google Scholar
  45. [45]
    Weise, T.; Bouaziz, S.; Li, H.; Pauly, M. Realtime performance-based facial animation. ACM Transactions on Graphics Vol. 30, No. 4, Article No. 77, 2011.Google Scholar
  46. [46]
    Blanz, V.; Vetter, T. A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187–194, 1999.Google Scholar
  47. [47]
    Yan, J.; Zhang, X.; Lei, Z.; Yi, D.; Li, S. Z. Structural models for face detection. In: Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 1–6, 2013.Google Scholar
  48. [48]
    Xiong, X.; De la Torre, F. Global supervised descent method. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2664–2673, 2015.Google Scholar
  49. [49]
    Snavely, N. Bundler: Structure from motion (SFM) for unordered image collections. 2010. Available at http://www.cs.cornell.edu/~snavely/bundler/.Google Scholar
  50. [50]
    Chen, L.; Armstrong, C. W.; Raftopoulos, D. D. An investigation on the accuracy of threedimensional space reconstruction using the direct linear transformation technique. Journal of Biomechanics Vol. 27, No. 4, 493–500, 1994.CrossRefGoogle Scholar
  51. [51]
    Moré, J. J. The Levenberg–Marquardt algorithm: Implementation and theory. In: Numerical Analysis. Watson, G. A. Ed. Springer Berlin Heidelberg, 105–116, 1978.CrossRefGoogle Scholar
  52. [52]
    Rall, L. B. Automatic Differentiation: Techniques and Applications. Springer Berlin Heidelberg, 1981.CrossRefMATHGoogle Scholar
  53. [53]
    Kolda, T. G.; Sun, J. Scalable tensor decompositions for multi-aspect data mining. In: Proceedings of the 8th IEEE International Conference on Data Mining, 363–372, 2008.Google Scholar
  54. [54]
    Li, D.-H.; Fukushima, M. A modified BFGS method and its global convergence in nonconvex minimization. Journal of Computational and Applied Mathematics Vol. 129, Nos. 1–2, 15–35, 2001.MathSciNetCrossRefMATHGoogle Scholar
  55. [55]
    Igarashi, T.; Moscovich, T.; Hughes, J. F. As-rigidas-possible shape manipulation. ACM Transactions on Graphics Vol. 24, No. 3, 1134–1141, 2005.CrossRefGoogle Scholar
  56. [56]
    Hartigan, J. A.; Wong, M. A. Algorithm AS 136: A K-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 28, No. 1, 100–108, 1979.MATHGoogle Scholar
  57. [57]
    Brox, T.; Bruhn, A.; Papenberg, N.; Weickert, J. High accuracy optical flow estimation based on a theory for warping. In: Computer Vision–ECCV 2004. Pajdla, T.; Matas, J. Eds. Springer Berlin Heidelberg, 25–36, 2004.CrossRefGoogle Scholar
  58. [58]
    Brox, T.; Malik, J. Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 3, 500–513, 2011.CrossRefGoogle Scholar
  59. [59]
    Agarwal, S.; Snavely, N.; Seitz, S. M.; Szeliski, R. Bundle adjustment in the large. In: Computer Vision–ECCV 2010. Daniilidis, K.; Maragos, P.; Paragios, N. Eds. Springer Berlin Heidelberg, 29–42, 2010.CrossRefGoogle Scholar
  60. [60]
    Belhumeur, P. N.; Jacobs, D. W.; Kriegman, D. J.; Kumar, N. Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 12, 2930–2940, 2013.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2016

Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Authors and Affiliations

  • Shuang Liu
    • 1
  • Yongqiang Zhang
    • 2
  • Xiaosong Yang
    • 1
  • Daming Shi
    • 2
  • Jian J. Zhang
    • 1
  1. 1.Bournemouth UniversityPooleUK
  2. 2.Harbin Institute of TechnologyHarbinChina

Personalised recommendations