Chapter

Computer Vision – ECCV 2012

Volume 7577 of the series Lecture Notes in Computer Science pp 808-822

Disentangling Factors of Variation for Facial Expression Recognition

  • Salah RifaiAffiliated withDepartment of Computer Science and Operations Research, Université de Montréal
  • , Yoshua BengioAffiliated withDepartment of Computer Science and Operations Research, Université de Montréal
  • , Aaron CourvilleAffiliated withDepartment of Computer Science and Operations Research, Université de Montréal
  • , Pascal VincentAffiliated withDepartment of Computer Science and Operations Research, Université de Montréal
  • , Mehdi MirzaAffiliated withDepartment of Computer Science and Operations Research, Université de Montréal

* Final gross prices may vary according to local VAT.

Get Access

Abstract

We propose a semi-supervised approach to solve the task of emotion recognition in 2D face images using recent ideas in deep learning for handling the factors of variation present in data. An emotion classification algorithm should be both robust to (1) remaining variations due to the pose of the face in the image after centering and alignment, (2) the identity or morphology of the face. In order to achieve this invariance, we propose to learn a hierarchy of features in which we gradually filter the factors of variation arising from both (1) and (2). We address (1) by using a multi-scale contractive convolutional network (CCNET) in order to obtain invariance to translations of the facial traits in the image. Using the feature representation produced by the CCNET, we train a Contractive Discriminative Analysis (CDA) feature extractor, a novel variant of the Contractive Auto-Encoder (CAE), designed to learn a representation separating out the emotion-related factors from the others (which mostly capture the subject identity, and what is left of pose after the CCNET). This system beats the state-of-the-art on a recently proposed dataset for facial expression recognition, the Toronto Face Database, moving the state-of-art accuracy from 82.4% to 85.0%, while the CCNET and CDA improve accuracy of a standard CAE by 8%.

Keywords

emotion recognition contractive convolution deep learning auto-encoder TFD