Skip to main content
Log in

Toward Personalized Modeling: Incremental and Ensemble Alignment for Sequential Faces in the Wild

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Fitting facial landmarks on unconstrained videos is a challenging task with broad applications. Both generic and joint alignment methods have been proposed with varying degrees of success. However, many generic methods are heavily sensitive to initializations and usually rely on offline-trained static models, which limit their performance on sequential images with extensive variations. On the other hand, joint methods are restricted to offline applications, since they require all frames to conduct batch alignment. To address these limitations, we propose to exploit incremental learning for personalized ensemble alignment. We sample multiple initial shapes to achieve image congealing within one frame, which enables us to incrementally conduct ensemble alignment by group-sparse regularized rank minimization. At the same time, incremental subspace adaptation is performed to achieve personalized modeling in a unified framework. To alleviate the drifting issue, we leverage a very efficient fitting evaluation network to pick out well-aligned faces for robust incremental learning. Extensive experiments on both controlled and unconstrained datasets have validated our approach in different aspects and demonstrated its superior performance compared with state of the arts in terms of fitting accuracy and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. (1) 0292_02_002_angelina_jolie (2) 0502_01_005_bruce_willis (3) 1198_01_012_julia_roberts (4) 1621_02_017_ronald_reagan (5) 1786_02_006_sylvester_stallone (6) 1847_01_005_victoria_beckham.

References

  • Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2013). Robust discriminative response map fitting with constrained local models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3444–3451).

  • Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2014). Incremental face alignment in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Baró, X., Gonzalez, J., Fabian, J., Bautista, M. A., Oliu, M., Escalante, H. J., Guyon, I., & Escalera, S. (2015). Chalearn looking at people 2015 challenges: Action spotting and cultural event recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1–9). IEEE.

  • Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.

    Article  MathSciNet  MATH  Google Scholar 

  • Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2013). Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35, 2930–2940.

    Article  Google Scholar 

  • Black, M., & Yacoob, Y. (1995). Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 374–381).

  • Brand, M. (2006). Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and Its Applications, 415(1), 20–30.

    Article  MathSciNet  MATH  Google Scholar 

  • Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.

    Article  MathSciNet  Google Scholar 

  • Cheng, X., Fookes, C., Sridharan, S., Saragih, J., & Lucey, S. (2013). Deformable face ensemble alignment with robust grouped-l1 anchors. In: Automatic Face and Gesture Recognition (FG). In IEEE International Conference and Workshops on (pp. 1–7). IEEE.

  • Cheng, X., Sridharan, S., Saraghi, J., & Lucey, S. (2012). Anchored deformable face ensemble alignment. In European Conference on Computer Vision (pp. 133–142). Berlin: Springer.

  • Cheng, X., Sridharan, S., Saragih, J., & Lucey, S. (2013). Rank minimization across appearance and shape for aam ensemble fitting. In IEEE International Conference on Computer Vision (ICCV) (pp. 577–584).

  • Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 23(6), 681–685.

    Article  Google Scholar 

  • Decarlo, D., & Metaxas, D. (2000). Optical flow constraints on deformable models with applications to face tracking. International Journal of Computer Vision, 38(2), 99–127.

    Article  MATH  Google Scholar 

  • Doucet, A., De Freitas, N., & Gordon, N. (2001). An introduction to sequential monte carlo methods. In Sequential Monte (Ed.), Carlo methods in practice (pp. 3–14). Berlin: Springer.

    Google Scholar 

  • Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2), 303–353.

    Article  MathSciNet  MATH  Google Scholar 

  • Escalera, S., Gonzalez, J., Baró, X., Pardo, P., Fabian, J., Oliu, M., Escalante, H. J., Huerta, I., & Guyon, I. (2015). Chalearn looking at people 2015 new competitions: Age estimation and cultural event recognition. In International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE.

  • FGNet. (2004). Talking face video.

  • Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image Vision Computing (IVC), 28(5), 807–813.

    Article  Google Scholar 

  • He, J., Balzano, L., & Szlam, A. (2012). Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1568–1575). IEEE.

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACMM (pp. 675–678).

  • Kim, M., Kumar, S., Pavlovic, V., & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008 (pp. 1–8). IEEE.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105).

  • Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. S. (2012). Interactive facial feature localization. In European Conference on Computer Vision (ECCV) (pp. 679–692).

  • Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055.

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Mei, X., & Ling, H. (2009). Robust visual tracking using & #x2113; 1 minimization. In 2009 IEEE 12th International Conference on Computer Vision (pp. 1436–1443). IEEE.

  • Nasrollahi, K., Escalera, S., Rasti, P., Anbarjafari, G., Baro, X., Escalante, H. J., & Moeslund, T.B. (2015). Deep learning based super-resolution for improved action recognition. In: Image Processing Theory, Tools and Applications (IPTA). In 2015 International Conference on IEEE (pp. 67–72). IEEE.

  • Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In Proceedings of the British Machine Vision Conference (BMVC).

  • Patras, I., & Pantic, M. (2004). Particle filtering with factorized likelihoodsfor tracking facial features. In The IEEE International Conference on Automatic Face and Gesture Recognition (FG) (pp. 97–102).

  • Peng, X., Feris, R. S., Wang, X., & Metaxas, D. N. (2016). A recurrent encoder-decoder network for sequential face alignment. In European Conference on Computer Vision (pp. 38–56). Berlin: Springer.

  • Peng, X., Zhang, S., Yang, Y., & Metaxas, D. N. (2015). Piefa: Personalized incremental and ensemble face alignment. In The IEEE International Conference on Computer Vision (ICCV).

  • Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2010). RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

  • Perakis, P., Passalis, G., Theoharis, T., & Kakadiaris, I. A. (2013). 3d facial landmark detection under large yaw and expression variations. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(7), 1552–1564.

    Article  Google Scholar 

  • Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. In Image and Vision Computing (vol. 47, pp. 3–18). 300-W, the First Automatic Facial Landmark Detection in-the-Wild Challenge.

  • Sagonas, C., Panagakis, Y., Zafeiriou, S., & Pantic, M. (2014). Raps: Robust and efficient automatic construction of person-specific deformable models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1789–1796).

  • Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In The IEEE International Conference on Computer Vision (ICCV) Workshops.

  • Saragih, J. M., Lucey, S., & Cohn, J. F. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision (IJCV), 91(2), 200–215.

    Article  MathSciNet  MATH  Google Scholar 

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015) Facenet: A unified embedding for face recognition and clustering. In CVPR (pp. 815–823).

  • Shen, J., Zafeiriou, S., Chrysos, G., Kossaifi, J., Tzimiropoulos, G., & Pantic, M. (2015) The first facial landmark tracking in-the-wild challenge: Benchmark and results. In The IEEE International Conference on Computer Vision (ICCV) Workshops.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.

  • Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3476–3483).

  • Sung, J., & Kim, D. (2009). Adaptive active appearance model with incremental learning. Pattern Recognition Letters (PRL), 30(4), 359–367.

    Article  Google Scholar 

  • Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In CVPR.

  • Tang, M., & Peng, X. (2012). Robust tracking with discriminative ranking lists. IEEE Transactions on Image Processing (TIP), 21(7), 3273–3281.

    Article  MathSciNet  MATH  Google Scholar 

  • Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE International Conference on Computer Vision Pattern Recognition (CVPR).

  • Tzimiropoulos, G. (2015). Project-out cascaded regression with an application to face alignment. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3659–3667). IEEE.

  • Tzimiropoulos, G., & Pantic, M. (2014). Gauss-newton deformable part models for face alignment in-the-wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1851–1858).

  • Vogler, C., Li, Z., Kanaujia, A., Goldenstein, S., & Metaxas, D. (2007). The best of both worlds: Combining 3d deformable models with active shape models. In IEEE International Conference on Computer Vision (ICCV) (pp. 1–7). IEEE.

  • Wang, Z., Mi, H., & Ittycheriah, A. (2016a). Semi-supervised clustering for short text via deep representation learning. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL) (pp. 31–39).

  • Wang, Z., Mi, H., & Ittycheriah, A. (2016b). Sentence similarity learning by lexical decomposition and composition. In Coling 2016.

  • Wang, Z., Mi, H., & Nianwen, X. (2015). Feature optimization for constituent parsing via neural networks. In Proceedings of ACL 2015 (pp. 1138–1147).

  • Wu, L., Romero, E., & Stathopoulos, A. (2016). A high-performance preconditioned svd solver for accurate large-scale computations. SIAM Journal on Scientific Computing. arXiv:1607.01404.

  • Wu, L., & Stathopoulos, A. (2015). A preconditioned hybrid svd method for accurately computing singular triplets of large matrices. SIAM Journal on Scientific Computing, 37(5), S365–S388.

    Article  MathSciNet  MATH  Google Scholar 

  • Xiong, X., & De la Torre, F. (2013). Supervised descent method and its application to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Yan, J., Lei, Z., Yi, D., & Li, S. (2013). Learn to combine multiple hypotheses for accurate face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 392–396).

  • Yang, H., Jia, X., Loy, C. C., & Robinson, P. (2015). An empirical study of recent face alignment methods. arXiv preprint arXiv:1511.05049.

  • Zafeiriou, L., Antonakos, E., Zafeiriou, S., & Pantic, M. (2014). Joint unsupervised face alignment and behaviour analysis. In D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (eds.) European Conference on Computer Vision (ECCV) (pp. 167–183).

  • Zhang, J., Shan, S., Kan, M., & Chen, X. (2014a). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In European Conference on Computer Vision (ECCV) (pp. 1–16).

  • Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2015). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision, 111(2), 171–190.

    Article  Google Scholar 

  • Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014b). Facial landmark detection by deep multi-task learning. In European Conference on Computer Vision (ECCV) (pp. 94–108).

  • Zhao, C., Cham, W. K., & Wang, X. (2011). Joint face alignment with a generic deformable face model. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 561–568). IEEE.

  • Zhu, S., Li, C., Loy, C. C., & Tang, X. (2015). Face alignment by coarse-to-fine shape searching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4998–5006).

  • Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation and landmark estimation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Peng.

Additional information

Communicated by Xiaoou Tang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, X., Zhang, S., Yu, Y. et al. Toward Personalized Modeling: Incremental and Ensemble Alignment for Sequential Faces in the Wild. Int J Comput Vis 126, 184–197 (2018). https://doi.org/10.1007/s11263-017-0996-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-0996-8

Keywords

Navigation