Abstract
Fitting facial landmarks on unconstrained videos is a challenging task with broad applications. Both generic and joint alignment methods have been proposed with varying degrees of success. However, many generic methods are heavily sensitive to initializations and usually rely on offline-trained static models, which limit their performance on sequential images with extensive variations. On the other hand, joint methods are restricted to offline applications, since they require all frames to conduct batch alignment. To address these limitations, we propose to exploit incremental learning for personalized ensemble alignment. We sample multiple initial shapes to achieve image congealing within one frame, which enables us to incrementally conduct ensemble alignment by group-sparse regularized rank minimization. At the same time, incremental subspace adaptation is performed to achieve personalized modeling in a unified framework. To alleviate the drifting issue, we leverage a very efficient fitting evaluation network to pick out well-aligned faces for robust incremental learning. Extensive experiments on both controlled and unconstrained datasets have validated our approach in different aspects and demonstrated its superior performance compared with state of the arts in terms of fitting accuracy and efficiency.
Similar content being viewed by others
Notes
(1) 0292_02_002_angelina_jolie (2) 0502_01_005_bruce_willis (3) 1198_01_012_julia_roberts (4) 1621_02_017_ronald_reagan (5) 1786_02_006_sylvester_stallone (6) 1847_01_005_victoria_beckham.
References
Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2013). Robust discriminative response map fitting with constrained local models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3444–3451).
Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2014). Incremental face alignment in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Baró, X., Gonzalez, J., Fabian, J., Bautista, M. A., Oliu, M., Escalante, H. J., Guyon, I., & Escalera, S. (2015). Chalearn looking at people 2015 challenges: Action spotting and cultural event recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1–9). IEEE.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2013). Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35, 2930–2940.
Black, M., & Yacoob, Y. (1995). Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 374–381).
Brand, M. (2006). Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and Its Applications, 415(1), 20–30.
Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.
Cheng, X., Fookes, C., Sridharan, S., Saragih, J., & Lucey, S. (2013). Deformable face ensemble alignment with robust grouped-l1 anchors. In: Automatic Face and Gesture Recognition (FG). In IEEE International Conference and Workshops on (pp. 1–7). IEEE.
Cheng, X., Sridharan, S., Saraghi, J., & Lucey, S. (2012). Anchored deformable face ensemble alignment. In European Conference on Computer Vision (pp. 133–142). Berlin: Springer.
Cheng, X., Sridharan, S., Saragih, J., & Lucey, S. (2013). Rank minimization across appearance and shape for aam ensemble fitting. In IEEE International Conference on Computer Vision (ICCV) (pp. 577–584).
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 23(6), 681–685.
Decarlo, D., & Metaxas, D. (2000). Optical flow constraints on deformable models with applications to face tracking. International Journal of Computer Vision, 38(2), 99–127.
Doucet, A., De Freitas, N., & Gordon, N. (2001). An introduction to sequential monte carlo methods. In Sequential Monte (Ed.), Carlo methods in practice (pp. 3–14). Berlin: Springer.
Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2), 303–353.
Escalera, S., Gonzalez, J., Baró, X., Pardo, P., Fabian, J., Oliu, M., Escalante, H. J., Huerta, I., & Guyon, I. (2015). Chalearn looking at people 2015 new competitions: Age estimation and cultural event recognition. In International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE.
FGNet. (2004). Talking face video.
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image Vision Computing (IVC), 28(5), 807–813.
He, J., Balzano, L., & Szlam, A. (2012). Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1568–1575). IEEE.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACMM (pp. 675–678).
Kim, M., Kumar, S., Pavlovic, V., & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008 (pp. 1–8). IEEE.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105).
Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. S. (2012). Interactive facial feature localization. In European Conference on Computer Vision (ECCV) (pp. 679–692).
Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Mei, X., & Ling, H. (2009). Robust visual tracking using & #x2113; 1 minimization. In 2009 IEEE 12th International Conference on Computer Vision (pp. 1436–1443). IEEE.
Nasrollahi, K., Escalera, S., Rasti, P., Anbarjafari, G., Baro, X., Escalante, H. J., & Moeslund, T.B. (2015). Deep learning based super-resolution for improved action recognition. In: Image Processing Theory, Tools and Applications (IPTA). In 2015 International Conference on IEEE (pp. 67–72). IEEE.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In Proceedings of the British Machine Vision Conference (BMVC).
Patras, I., & Pantic, M. (2004). Particle filtering with factorized likelihoodsfor tracking facial features. In The IEEE International Conference on Automatic Face and Gesture Recognition (FG) (pp. 97–102).
Peng, X., Feris, R. S., Wang, X., & Metaxas, D. N. (2016). A recurrent encoder-decoder network for sequential face alignment. In European Conference on Computer Vision (pp. 38–56). Berlin: Springer.
Peng, X., Zhang, S., Yang, Y., & Metaxas, D. N. (2015). Piefa: Personalized incremental and ensemble face alignment. In The IEEE International Conference on Computer Vision (ICCV).
Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2010). RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
Perakis, P., Passalis, G., Theoharis, T., & Kakadiaris, I. A. (2013). 3d facial landmark detection under large yaw and expression variations. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(7), 1552–1564.
Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. In Image and Vision Computing (vol. 47, pp. 3–18). 300-W, the First Automatic Facial Landmark Detection in-the-Wild Challenge.
Sagonas, C., Panagakis, Y., Zafeiriou, S., & Pantic, M. (2014). Raps: Robust and efficient automatic construction of person-specific deformable models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1789–1796).
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In The IEEE International Conference on Computer Vision (ICCV) Workshops.
Saragih, J. M., Lucey, S., & Cohn, J. F. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision (IJCV), 91(2), 200–215.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015) Facenet: A unified embedding for face recognition and clustering. In CVPR (pp. 815–823).
Shen, J., Zafeiriou, S., Chrysos, G., Kossaifi, J., Tzimiropoulos, G., & Pantic, M. (2015) The first facial landmark tracking in-the-wild challenge: Benchmark and results. In The IEEE International Conference on Computer Vision (ICCV) Workshops.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.
Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3476–3483).
Sung, J., & Kim, D. (2009). Adaptive active appearance model with incremental learning. Pattern Recognition Letters (PRL), 30(4), 359–367.
Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In CVPR.
Tang, M., & Peng, X. (2012). Robust tracking with discriminative ranking lists. IEEE Transactions on Image Processing (TIP), 21(7), 3273–3281.
Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE International Conference on Computer Vision Pattern Recognition (CVPR).
Tzimiropoulos, G. (2015). Project-out cascaded regression with an application to face alignment. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3659–3667). IEEE.
Tzimiropoulos, G., & Pantic, M. (2014). Gauss-newton deformable part models for face alignment in-the-wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1851–1858).
Vogler, C., Li, Z., Kanaujia, A., Goldenstein, S., & Metaxas, D. (2007). The best of both worlds: Combining 3d deformable models with active shape models. In IEEE International Conference on Computer Vision (ICCV) (pp. 1–7). IEEE.
Wang, Z., Mi, H., & Ittycheriah, A. (2016a). Semi-supervised clustering for short text via deep representation learning. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL) (pp. 31–39).
Wang, Z., Mi, H., & Ittycheriah, A. (2016b). Sentence similarity learning by lexical decomposition and composition. In Coling 2016.
Wang, Z., Mi, H., & Nianwen, X. (2015). Feature optimization for constituent parsing via neural networks. In Proceedings of ACL 2015 (pp. 1138–1147).
Wu, L., Romero, E., & Stathopoulos, A. (2016). A high-performance preconditioned svd solver for accurate large-scale computations. SIAM Journal on Scientific Computing. arXiv:1607.01404.
Wu, L., & Stathopoulos, A. (2015). A preconditioned hybrid svd method for accurately computing singular triplets of large matrices. SIAM Journal on Scientific Computing, 37(5), S365–S388.
Xiong, X., & De la Torre, F. (2013). Supervised descent method and its application to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Yan, J., Lei, Z., Yi, D., & Li, S. (2013). Learn to combine multiple hypotheses for accurate face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 392–396).
Yang, H., Jia, X., Loy, C. C., & Robinson, P. (2015). An empirical study of recent face alignment methods. arXiv preprint arXiv:1511.05049.
Zafeiriou, L., Antonakos, E., Zafeiriou, S., & Pantic, M. (2014). Joint unsupervised face alignment and behaviour analysis. In D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (eds.) European Conference on Computer Vision (ECCV) (pp. 167–183).
Zhang, J., Shan, S., Kan, M., & Chen, X. (2014a). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In European Conference on Computer Vision (ECCV) (pp. 1–16).
Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2015). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision, 111(2), 171–190.
Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014b). Facial landmark detection by deep multi-task learning. In European Conference on Computer Vision (ECCV) (pp. 94–108).
Zhao, C., Cham, W. K., & Wang, X. (2011). Joint face alignment with a generic deformable face model. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 561–568). IEEE.
Zhu, S., Li, C., Loy, C. C., & Tang, X. (2015). Face alignment by coarse-to-fine shape searching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4998–5006).
Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation and landmark estimation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Xiaoou Tang.
Rights and permissions
About this article
Cite this article
Peng, X., Zhang, S., Yu, Y. et al. Toward Personalized Modeling: Incremental and Ensemble Alignment for Sequential Faces in the Wild. Int J Comput Vis 126, 184–197 (2018). https://doi.org/10.1007/s11263-017-0996-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-017-0996-8