Learning Spatiotemporal and Geometric Features with ISA for Video-Based Facial Expression Recognition

  • Chenhan Lin
  • Fei Long
  • Junfeng Yao
  • Ming-Ting Sun
  • Jinsong Su
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10636)


Many appearance-based and geometry-based approaches have been proposed in facial expression recognition. In this paper, we propose a method of learning and combining spatiotemporal features and geometric features for video-based expression recognition. Specifically, we first adopt a multi-layer independent subspace analysis (ISA) network to learn spatiotemporal features directly from videos, and then use another single layer ISA network to learn geometric features from the trajectories of the facial landmark points. The learned spatiotemporal features and geometric features are concatenated to be the final representation for the input video. We use a linear SVM in classification. Experiments on CK+ and MMI facial expression databases show that recognition performance can be improved effectively by incorporating geometric features into spatiotemporal features. Furthermore, comparison results with other related methods demonstrate that the overall accuracy of our method is comparable to some deep learning based methods and the learned features outperform popular hand-crafted features.


Facial expression recognition Independent subspace analysis Spatiotemporal feature learning 



This work is supported by the Fundamental Research Funds for the Central Universities in China (No. 20720170056), the open funding project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (Grant No. BUAAVR-14KF-01), and the Science and Technology Project of Quanzhou City (No. 2015G62).


  1. 1.
    De la Torre, F., Cohn, J.F.: Facial expression analysis. In: Moeslund, T.B., Hilton, A., Krüger, V., Sigal, L. (eds.) Visual Analysis of Humans, pp. 377–409. Springer, London (2011). doi: 10.1007/978-0-85729-997-0_19 CrossRefGoogle Scholar
  2. 2.
    Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recogn. 36(1), 259–275 (2003)CrossRefMATHGoogle Scholar
  3. 3.
    Wu, T., Bartlett, M.S., Movellan, J.R.: Facial expression recognition using Gabor motion energy filters. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–47 (2010)Google Scholar
  4. 4.
    Ji, Y., Idrissi, K.: Automatic facial expression recognition based on spatio-temporal descriptors. Pattern Recogn. Lett. 33(10), 1373–1380 (2012)CrossRefGoogle Scholar
  5. 5.
    Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE T PAMI 29(6), 915–928 (2007)CrossRefGoogle Scholar
  6. 6.
    Klaser, A., Marszalek, M.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008 (2008)Google Scholar
  7. 7.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia, pp. 357–360. ACM (2007)Google Scholar
  8. 8.
    Sanin, A., Sanderson, C., Harandi, M.T., Lovell, B.C.: Spatiotemporal covariance descriptors for action and gesture recognition. In: WACV 2013, pp. 103–110 (2013)Google Scholar
  9. 9.
    Jain, S., Hu, C., Aggarwal, J.K.: Facial expression recognition with temporal modeling of shapes. In: Computer Vision Workshops (ICCV Workshops), pp. 1642–1649 (2011)Google Scholar
  10. 10.
    Kaya, H., Gürpinar, F., Afshar, S., Salah, A.A.: Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proceedings of the 2015 ACM, pp. 459–466. ACM (2015)Google Scholar
  11. 11.
    Yu, H., Liu, H.: Combining appearance and geometric features for facial expression recognition. In: Sixth International Conference on Graphic and Image Processing (ICGIP 2014), p. 944308. International Society for Optics and Photonics (2015)Google Scholar
  12. 12.
    Afshar, S., Ali Salah, A.: Facial expression recognition in the wild using improved dense trajectories and Fisher vector encoding. In: IEEE CVPR 2016, pp. 66–74 (2016)Google Scholar
  13. 13.
    Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: IEEE ICCV 2015, pp. 2983–2991 (2015)Google Scholar
  14. 14.
    Kahou, S.E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç., Memisevic, R., Vincent, P., Courville, A., Bengio, Y.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction (ICML), pp. 543–550 (2013)Google Scholar
  15. 15.
    Mollahosseini, A., Chan, D., Mohammad H.M.: Going deeper in facial expression recognition using deep neural networks. In: WACV 2016, pp. 1–10 (2016)Google Scholar
  16. 16.
    Liu, M., Shan, S., Wang, R., Chen, X.: Learning expression lets on spatiotemporal manifold for dynamic facial expression recognition. In: IEEE CVPR 2014, pp. 1749–1756 (2014)Google Scholar
  17. 17.
    Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE CVPR, pp. 3361–3368 (2011)Google Scholar
  18. 18.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE CVPR 2013, pp. 532–539 (2013)Google Scholar
  19. 19.
    Gower, J.C.: Generalized procrustes analysis. Psychometrika 40(1), 33–51 (1975)CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade Dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: IEEE Computer Society Conference on CVPR Workshops (CVPRW 2010), pp. 94–101. IEEE (2010)Google Scholar
  21. 21.
    Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: LRECW (2010)Google Scholar
  22. 22.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008). http://www.csie.ntu.edu.tw/cjlinlliblinear MATHGoogle Scholar
  23. 23.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  24. 24.
    Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 143–157. Springer, Cham (2015). doi: 10.1007/978-3-319-16817-3_10 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Chenhan Lin
    • 1
  • Fei Long
    • 1
  • Junfeng Yao
    • 1
    • 2
  • Ming-Ting Sun
    • 2
  • Jinsong Su
    • 1
  1. 1.Center for Digital Media Computing, Software SchoolXiamen UniversityXiamenChina
  2. 2.Department of Electrical EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations