Abstract
It has been proved that the performance of a person-specific active appearance model (AAM) built to model the appearance variation of a single person across pose, illumination, and expression is substantially better than the performance of a generic AAM built to model the appearance variation of many faces. However, it is not practical to build a personal AAM before tracking an unseen subject. A virtual person-specific AAM is proposed to tackle the problem. The AAM is constructed from a set of virtual personal images with different poses and expressions which are synthesized from the annotated first frame via regressions. To preserve personal facial details on the virtual images, a poison fusion strategy is designed and applied to the virtual facial images generated via bilinear kernel ridge regression. Furthermore, the AAM subspace is sequentially updated during tracking based on sequential Karhunen–Loeve algorithm, which helps the AAM adaptive to the facial context variation. Experiments show the proposed virtual personal AAM is robust to facial context changes during tracking, and outperforms other state-of-the-art AAM on facial feature tracking accuracy and computation cost.
Similar content being viewed by others
Notes
It is often assumed that the model is initialized by fitting it to the manually annotated points in the first frame in the field of AAM-based facial feature tracking when demonstrating the efficiency of tracking algorithms (Cootes and Taylor 2006; Matthews 2004). In practical cases, the model can be initialized via general AAM by placing the mean shape within the face detection window or by warping AAM base mesh to the detected face feature points, such as eyes, mouth centers, and nose tips for initialization. Although the general AAM does not work well for a whole video sequence, it is usually capable of fitting to the first frame which is usually frontal and neutral (Ionita et al. 2011; Saragih et al. 2011).
The images have been been re-annotated following the Multi-PIE 68 points mark-up.
References
Abdi, H., & Williams, L. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.
Akshay, A., Conrad, S., Tom, G., & Roland, G. (2009). Learning-based face synthesis for pose-robust recognition from single image. In: BMVC 2009: Proceedings of British Machine Vision Conference. London, UK.
Asthana, A., Saragih, J., Wagner, M., & Goecke, R. (2009). Evaulating aam fitting methods for facial expression recognition. In: Proceedings of the International Conference on Affective Computing and Intelligent Interaction, pp. 1–8. Amsterdam.
Aykut, M., & Ekinci, M. (2013). AAM-based palm segmentation in unrestricted backgrounds and various postures for palmprint recognition. Pattern Recognition Letters, 9(34), 955–962.
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. Intertional Journal of Computer Vision, 56, 221–255.
Bathe, K., & Wilson, E. (1971). Numerical methods in finite element. Englewood Cliffs: Prentice-Hall.
Chen, Y., Yu, F., & Ai, C. (2013). Sequential active appearance model based on online instance learning. IEEE Signal Processing Letter, 20(6), 567–570.
Chena, S., Tiana, Y., Liub, Q., & Metaxas, D. N. (2013). Recognizing expressions from face and body gesture by temporal normalized motion and appearance features. Image and Vision Computing, 31(2), 175–185.
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (1998). Active appearance models. European Conference on Computer Vision, 2, 484–498.
Cootes, T. F., & Taylor, C. J. (2006). An algorithm for tuning an active appearance model to new data. In: Proceeding of BMVC, pp. 919–928.
Du, Z., & Wang, Y. (2009). Dynamic facial expression synthesis by active appearance model. Journal of Computer-Aided Design and Computer Graphics, 21(11), 1590–1594.
Goodall, C. (1991). Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society B, 53(2), 285–339.
Gross, R., Matthews, I., & Baker, S. (2005). Generic vs. person specific active appearance models. Image and Vision Computing, 23(11), 1080–1093.
Huang, D., & De la Torre, F. (2010). Bilinear kernel reduced rank regression for facial expression synthesis. In: ECCV 2010: Proceedings of European Conference on Computer Vision. Grete, Greece.
Ionita, M.C., Tresadern, P.A., & Cootes, T.F. (2011). Real time feature point tracking with automatic model selection. In: IEEE International Conference on Computer Vision Workshops, pp. 453–460.
Levy, A., & Lindenbaum, M. (2000). Sequential karhunen-loeve basis extraction and its application to images. IEEE Transactions on Image Processing, 9(8), 1371–1374.
Li, K., Xu, F., Wang, J., Dai, Q., & Liu, Y. (2012). A data-driven approach for facial expression synthesis in video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 57–64.
Liu, X. (2007). Generic face alignment using boosted appearance model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
Liu, X. (2009). Discriminative face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 1941–1948.
Liu, X. (2010). Video-based face model fitting using adaptive active appearance model. Image and Vision Computing, 28, 1162–1172.
Matthews, I. (2004). The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 810–815.
Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60, 135–164.
Murtaza, M., Sharif, M., Raza, M., & Shah, J. H. (2013). Analysis of face recognition under varying facial expression: A survey. The International Arab Journal of Information Technology, 10(4), 378–388.
Ross, D. A., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for robust visual tracking. International Journal of Computer Vision, 77, 125–141.
Saragih, J. M., Lucey, S., & Cohn, J. F. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision, 91, 200–215.
Sung, J., & Kim, D. (2009). Adaptive active appearance model with incremental learning. Pattern Recognition Letters, 30(4), 359–367.
Van der Maaten, L., & Hendriks, E. (2010). Capturing appearance variation in active appearance models. In: Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–41. Los Alamitos: IEEE Computer Society Press.
Yu, H., Garrod, O. G. B., & Schyns, P. G. (2012). Perception-driven facial expression synthesis. Computers and Graphics, 36(3), 152–162.
Zhao, X., Shan, S., Chai, X., & Chen, X. (2013). Locality-constrained active appearance model. In: Computer Vision-ACCV 2012, Lecture Notes in Computer Science, vol. 7724, pp. 636–647.
Zhong, B., Yao, H., Chen, S., Ji, R., Chin, T. J., & Wang, H. (2014). Visual tracking via weakly supervised learning from multiple imperfect oracles. Pattern Recognition, 47(3), 1395–1410.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China (61104213), Natural Science Foundation of Jiangsu Province (BK2011146).
Rights and permissions
About this article
Cite this article
Chen, Y., Hua, C. & Guo, X. Face model fitting on video sequences based on incremental virtual active appearance model. Multidim Syst Sign Process 28, 1–21 (2017). https://doi.org/10.1007/s11045-015-0326-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-015-0326-7