Disentangling Factors of Variation for Facial Expression Recognition

  • Salah Rifai
  • Yoshua Bengio
  • Aaron Courville
  • Pascal Vincent
  • Mehdi Mirza
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7577)


We propose a semi-supervised approach to solve the task of emotion recognition in 2D face images using recent ideas in deep learning for handling the factors of variation present in data. An emotion classification algorithm should be both robust to (1) remaining variations due to the pose of the face in the image after centering and alignment, (2) the identity or morphology of the face. In order to achieve this invariance, we propose to learn a hierarchy of features in which we gradually filter the factors of variation arising from both (1) and (2). We address (1) by using a multi-scale contractive convolutional network (CCNET) in order to obtain invariance to translations of the facial traits in the image. Using the feature representation produced by the CCNET, we train a Contractive Discriminative Analysis (CDA) feature extractor, a novel variant of the Contractive Auto-Encoder (CAE), designed to learn a representation separating out the emotion-related factors from the others (which mostly capture the subject identity, and what is left of pose after the CCNET). This system beats the state-of-the-art on a recently proposed dataset for facial expression recognition, the Toronto Face Database, moving the state-of-art accuracy from 82.4% to 85.0%, while the CCNET and CDA improve accuracy of a standard CAE by 8%.


emotion recognition contractive convolution deep learning auto-encoder TFD 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1–127 (2009); Also published as a book. Now Publishers (2009)Google Scholar
  2. 2.
    Saul, L., Roweis, S.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4, 119–155 (2002)MathSciNetGoogle Scholar
  3. 3.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contracting auto-encoders: Explicit invariance during feature extraction. In: ICML 2011 (2011)Google Scholar
  4. 4.
    Susskind, J., Anderson, A., Hinton, G.E.: The Toronto face dataset. Technical Report UTML TR 2010-001, U. Toronto (2010)Google Scholar
  5. 5.
    Ranzato, M., Susskind, J., Mnih, V., Hinton, G.E.: On deep generative models with applications to recognition. In: CVPR 2011, pp. 2857–2864 (2011)Google Scholar
  6. 6.
    Padgett, C., Cottrell, G.W.: A simple neural network models categorical perception of facial expressions. In: Proceedings of the Twentieth Annual Cognitive Science Conference, pp. 806–807. Erlbaum (1998)Google Scholar
  7. 7.
    Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV 2009 (2009)Google Scholar
  8. 8.
    Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011 (2011)Google Scholar
  9. 9.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  11. 11.
    Dailey, M.N., Cottrell, G.W., Padgett, C., Adolphs, R.: EMPATH: A neural network that categorizes facial expressions. J. Cognitive Neuroscience, 1158–1173 (2002)Google Scholar
  12. 12.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 541–551 (1989)CrossRefGoogle Scholar
  13. 13.
    Wolf, R., Platt, J.: Postal address block location using a convolutional locator network. In: NIPS 1993, pp. 745–752 (1994)Google Scholar
  14. 14.
    Ranzato, M., Huang, F., Boureau, Y., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: CVPR 2007 (2007)Google Scholar
  15. 15.
    Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional Learning of Spatio-temporal Features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Kavukcuoglu, K., Sermanet, P., Boureau, Y.L., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierarchies for visual recognition. In: NIPS 2010 (2010)Google Scholar
  17. 17.
    Courville, A., Bergstra, J., Bengio, Y.: Unsupervised models of images by spike-and-slab RBMs. In: ICML 2011 (2011)Google Scholar
  18. 18.
    Barker, M., Rayens, W.: Partial least squares for discrimination. Journal of Chemometrics 17, 166–173 (2003)CrossRefGoogle Scholar
  19. 19.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179–188 (1936)CrossRefGoogle Scholar
  20. 20.
    Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)zbMATHGoogle Scholar
  21. 21.
    Wold, S., Ruhe, A., Wold, H., Dunn, W.J.: The collinearity problem in linear regression. The partial least squares (pls) approach to generalized inverses. SIAM Journal on Scientific and Statistical Computing 5, 735–743 (1984)zbMATHCrossRefGoogle Scholar
  22. 22.
    Bartlett, M.S.: Further aspects of the theory of multiple regression. Mathematical Proceedings of the Cambridge Philosophical Society 34, 33–40 (1938)CrossRefGoogle Scholar
  23. 23.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS 2006, pp. 153–160. MIT Press (2007)Google Scholar
  24. 24.
    Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: ICML 2008 (2008)Google Scholar
  25. 25.
    Larochelle, H., Bengio, Y.: Classification using discriminative restricted Boltzmann machines. In: ICML 2008 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Salah Rifai
    • 1
  • Yoshua Bengio
    • 1
  • Aaron Courville
    • 1
  • Pascal Vincent
    • 1
  • Mehdi Mirza
    • 1
  1. 1.Department of Computer Science and Operations ResearchUniversité de MontréalCanada

Personalised recommendations