Skip to main content

Advertisement

Log in

Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Compared to facial expression recognition, expression synthesis requires a very high-dimensional mapping. This problem exacerbates with increasing image sizes and limits existing expression synthesis approaches to relatively small images. We observe that facial expressions often constitute sparsely distributed and locally correlated changes from one expression to another. By exploiting this observation, the number of parameters in an expression synthesis model can be significantly reduced. Therefore, we propose a constrained version of ridge regression that exploits the local and sparse structure of facial expressions. We consider this model as masked regression for learning local receptive fields. In contrast to the existing approaches, our proposed model can be efficiently trained on larger image sizes. Experiments using three publicly available datasets demonstrate that our model is significantly better than \(\ell _0, \ell _1\) and \(\ell _2\)-regression, SVD based approaches, and kernelized regression in terms of mean-squared-error, visual quality as well as computational and spatial complexities. The reduction in the number of parameters allows our method to generalize better even after training on smaller datasets. The proposed algorithm is also compared with state-of-the-art GANs including Pix2Pix, CycleGAN, StarGAN and GANimation. These GANs produce photo-realistic results as long as the testing and the training distributions are similar. In contrast, our results demonstrate significant generalization of the proposed algorithm over out-of-dataset human photographs, pencil sketches and even animal faces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. https://github.com/thoughtworksarts/EmoPy.

References

  • Barsoum, E., Zhang, C., Ferrer, C.C., & Zhang, Z. (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM international conference on multimodal interaction, ACM (pp 279–283).

  • Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.

    Article  Google Scholar 

  • Bermano, A. H., Bradley, D., Beeler, T., Zund, F., Nowrouzezahrai, D., Baran, I., et al. (2014). Facial performance enhancement using dynamic shape space analysis. ACM Transactions on Graphics, 33(2), 13:1–13:12.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Berlin: Springer.

    MATH  Google Scholar 

  • Blanz, V., Vetter, T., et al. (1999). A morphable model for the synthesis of 3d faces. SIGGRAPH, 99, 187–194.

    Google Scholar 

  • Choi, Y., Choi, M., Kim, M., Ha, J.W,. Kim, S., & Choo, J. (2018) StarGAN: Unified generative adversarial networks for multi-domain imageto- image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8789–8797).

  • Coates, A., & Ng, A.Y. (2011) Selecting receptive fields in deep networks. In Advances in neural information processing systems (pp. 2528– 2536).

  • Cootes, T. F., Edwards, G. J., Taylor, C. J., et al. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.

    Article  Google Scholar 

  • Costigan, T., Prasad, M., & McDonnell, R. (2014) Facial retargeting using neural networks. In Proceedings of the seventh international conference on motion in games, ACM (pp. 31–38).

  • De La Hunty, M., Asthana, A., & Goecke, R. (2010) Linear facial expression transfer with active appearance models. In 2010 20th international conference on pattern recognition, IEEE (pp. 3789–3792).

  • Deng, Z., & Noh, J. (2008) Computer facial animation: A survey. In Datadriven 3D facial animation (pp. 1–28). Berlin: Springer.

  • Ekman, P., Friesen, W. V., & Ellsworth, P. (2013). Emotion in the human face: Guidelines for research and an integration of findings. Amsterdam: Elsevier.

    Google Scholar 

  • Elaiwat, S., Bennamoun, M., & Boussaid, F. (2016). A spatio-temporal RBMbased model for facial expression recognition. Pattern Recognition, 49, 152–161.

    Article  Google Scholar 

  • Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A.M. (2016) Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5562–5570).

  • Georgakis, C., Panagakis, Y., & Pantic, M. (2016). Discriminant incoherent component analysis. IEEE Transactions on Image Processing, 25(5), 2021–2034.

    Article  MathSciNet  Google Scholar 

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014) Generative adversarial nets. In Advances in neural information processing systems (pp. 2672– 2680).

  • Havaldar, P. (2006) Sony pictures imageworks. In ACM SIGGRAPH 2006 Courses (p. 5). New York: ACM.

  • Huang, D., & De la Torre, F. (2010) Bilinear kernel reduced rank regression for facial expression synthesis. In European conference on computer vision (pp. 364–377). Berlin: Springer.

  • Isola, P., Zhu, J.Y., Zhou, T., & Efros, A.A. (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125–1134).

  • Jain, A. K., & Li, S. Z. (2011). Handbook of face recognition. Berlin: Springer.

    MATH  Google Scholar 

  • Jampour, M., Mauthner, T., & Bischof, H. (2015) Multi-view facial expressions recognition using local linear regression of sparse codes. In Proceedings of the 20th computer vision winter workshop.

  • Kim, T., Cha, M., Kim, H., Lee, J.K., & Kim, J. (2017) Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 1857–1865).

  • Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., & Matas, J. (2018) Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8183–8192).

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., & Wang, Z. et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681–4690).

  • Lee, C.S., & Elgammal, A. (2006) Nonlinear shape and appearance models for facial expression analysis and synthesis. In 18th international conference on pattern recognition (Vol. 1, pp. 497–502). IEEE.

  • Lee, H. S., & Kim, D. (2008). Tensor-based aam with continuous variation estimation: Application to variation-robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 1102–1116.

    Google Scholar 

  • Lin, J. R., & Ic, Lin. (2011). Multi-layered expression synthesis. Journal of Information Science and Engineering, 27(1), 337–351.

    MathSciNet  Google Scholar 

  • Liu, M.Y., & Tuzel, O. (2016) Coupled generative adversarial networks. In Advances in neural information processing systems (pp. 469–477).

  • Liu, M.Y., Breuel, T., & Kautz, J. (2017) Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700–708).

  • Liu, S., Huang, D.Y., Lin, W., Dong, M., Li, H., & Ong, E.P. (2014) Emotional facial expression transfer based on temporal restricted Boltzmann machines. In 2014 Asia-Pacific Signal and information processing association annual summit and conference (APSIPA), (pp. 1– 7).

  • Liu, Z., Shan, Y., & Zhang, Z. (2001) Expressive expression mapping with ratio images. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (pp. 271–276). ACM.

  • Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010) The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In 2010 IEEE computer society conference on computer vision and pattern recognition-workshops (pp. 94–101). IEEE.

  • Lundqvist, D., Flykt, A., & Öhman, A. (1998). The karolinska directed emotional faces–KDEF, CD ROM. Stockholm: Department of Clinical Neuroscience, Psychology section, Karolinska Institutet.

    Google Scholar 

  • Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998) Coding facial expressions with Gabor wavelets. In Proceedings third IEEE international conference on automatic face and gesture recognition (pp. 200–205). IEEE.

  • Mirza, M., Osindero, S. (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

  • Nhan Duong, C., Luu, K., Gia Quach, K., & Bui, T.D. (2016) Longitudinal face modeling via temporal deep restricted boltzmann machines. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5772–5780).

  • Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1424–1445.

    Article  Google Scholar 

  • Patel, N. M., & Zaveri, M. (2010). Parametric facial expression synthesis and animation. International Journal of Computer Applications, 3, 34–40.

    Article  Google Scholar 

  • Pati, Y.C., Rezaiifar, R., & Krishnaprasad, P.S. (1993) Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of 27th Asilomar conference on signals, systems and computers (Vol. 1, pp. 40–44).

  • Pighin, F, & Lewis, J. (2006) Performance-driven facial animation. In Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., & Salesin, D.H. (eds). Synthesizing realistic facial expressions from photographs, ACM SIGGRAPH 2006 Courses, ACM (p. 19).

  • Pumarola, A., Agudo, A., Martinez, A., Sanfeliu, A., & Moreno-Noguer, F. (2019) GANimation: One-shot anatomically consistent facial animation. International Journal of Computer Vision (IJCV). https://doi.org/10.1007/s11263-019-01210-3.

  • Rizzo, A. A., Neumann, U., Enciso, R., Fidaleo, D., & Noh, J. (2004). Performance-driven facial animation: Basic research on human judgments of emotional state in facial avatars. CyberPsychology & Behavior, 4(4), 471–487.

    Article  Google Scholar 

  • Saragih, J.M., Lucey, S., & Cohn, J.F. (2011) Real-time avatar animation from a single image. In 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011) (pp. 117–124). IEEE.

  • Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., Gökberk, B., Sankur, B., & Akarun, L. (2008) Bosphorus database for 3D face analysis. In European workshop on biometrics and identity management (pp. 47–56). Springer.

  • Shen, W., & Liu, R. (2017) Learning residual images for face attribute manipulation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4030–4038).

  • Susskind, J.M., Anderson, A.K., Hinton. G.E, & Movellan, J.R. (2008) Generating facial expressions with deep belief nets. In: Or, J. (ed.) Affective computing (chapter 10, pp. 421–440). InTech.

  • Suwajanakorn, S., Seitz, S.M., & Kemelmacher-Shlizerman, I. (2015) What makes tom hanks look like tom hanks. In Proceedings of the IEEE international conference on computer vision (pp. 3952– 3960).

  • Tenenbaum, J. B., & Freeman, W. T. (2000). Separating style and content with bilinear models. Neural Computation, 12(6), 1247–1283.

    Article  Google Scholar 

  • Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016) Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2387–2395).

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58, 267–288.

    Article  MathSciNet  Google Scholar 

  • Tropp, J. A., & Gilbert, A. C. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666.

    Article  MathSciNet  Google Scholar 

  • Wang, H., et al (2003) Facial expression decomposition. In Proceedings ninth IEEE international conference on computer vision (pp. 958–965). IEEE.

  • Wei, W., Tian, C., Maybank, S. J., & Zhang, Y. (2016). Facial expression transfer method based on frequency analysis. Pattern Recognition, 49, 115–128.

    Article  Google Scholar 

  • Yi, Z., Zhang, H., Tan, P., & Gong, M. (2017) DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision (pp. 2849– 2857).

  • Zeiler, M.D., Taylor, G.W., Sigal, L., Matthews, I., & Fergus, R. (2011) Facial expression transfer with input-output temporal restricted boltzmann machines. In Advances in neural information processing systems (pp. 1629–1637).

  • Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.

    Article  Google Scholar 

  • Zhang, G., Kan, M., Shan, S., & Chen, X. (2018) Generative adversarial network with spatial attention for face attribute editing. In Proceedings of the European conference on computer vision (pp. 417–432).

  • Zhang, Q., Liu, Z., Quo, G., Terzopoulos, D., & Shum, H. Y. (2006). Geometrydriven photorealistic facial expression synthesis. IEEE Transactions on Visualization and Computer Graphics, 12(1), 48–60.

    Article  Google Scholar 

  • Zhang, Y., & Wei, W. (2012). A realistic dynamic facial expression transfer method. Neurocomputing, 89, 21–29.

    Article  Google Scholar 

  • Zhu, J.Y., Krähenbühl, P., Shechtman, E., & Efros, A.A. (2016) Generative visual manipulation on the natural image manifold. In European conference on computer vision (pp. 597–613). Springer.

  • Zhu, J.Y., Park, T., Isola, P., & Efros, A.A. (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nazar Khan.

Additional information

Communicated by Xavier Alameda-Pineda, Elisa Ricci, Albert Ali Salah, Nicu Sebe, Shuicheng Yan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, N., Akram, A., Mahmood, A. et al. Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis. Int J Comput Vis 128, 1433–1454 (2020). https://doi.org/10.1007/s11263-019-01256-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01256-3

Keywords

Navigation