Probabilistic Subpixel Temporal Registration for Facial Expression Analysis

  • Evangelos SariyanidiEmail author
  • Hatice Gunes
  • Andrea Cavallaro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9006)


Face images in a video sequence should be registered accurately before any analysis, otherwise registration errors may be interpreted as facial activity. Subpixel accuracy is crucial for the analysis of subtle actions. In this paper we present PSTR (Probabilistic Subpixel Temporal Registration), a framework that achieves high registration accuracy. Inspired by the human vision system, we develop a motion representation that measures registration errors among subsequent frames, a probabilistic model that learns the registration errors from the proposed motion representation, and an iterative registration scheme that identifies registration failures thus making PSTR aware of its errors. We evaluate PSTR’s temporal registration accuracy on facial action and expression datasets, and demonstrate its ability to generalise to naturalistic data even when trained with controlled data.


Gabor Filter Registration Error Mean Absolute Error Facial Activity Illumination Variation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The work of E. Sariyanidi and H. Gunes is partially supported by the EPSRC MAPTRAITS Project (Grant Ref: EP/K017500/1).

Supplementary material

Supplementary material (avi 17,790 KB)


  1. 1.
    Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 27, 1743–1759 (2009)CrossRefGoogle Scholar
  2. 2.
    Gunes, H., Schuller, B.: Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image Vis. Comput. 31, 120–136 (2013)CrossRefGoogle Scholar
  3. 3.
    Valstar, M., Schuller, B., Smith, K., Eyben, F., Jiang, B., Bilakhia, S., Schnieder, S., Cowie, R., Pantic, M.: AVEC 2013 - the continuous audio/visual emotion and depression recognition challenge. In: Proceedings ACM International Workshop on Audio/Visual Emotion Challenge, pp. 3–10 (2013)Google Scholar
  4. 4.
    Almaev, T., Valstar, M.: Local Gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: Proceedings International Conference on Affective Computing and Intelligent Interaction, pp. 356–361 (2013)Google Scholar
  5. 5.
    Zhao, G., Pietikäinen, M.: Boosted multi-resolution spatiotemporal descriptors for facial expression recognition. Pattern Recogn. Lett. 30, 1117–1127 (2009)CrossRefGoogle Scholar
  6. 6.
    Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 915–928 (2007)CrossRefGoogle Scholar
  7. 7.
    Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39–58 (2009)CrossRefGoogle Scholar
  8. 8.
    Jiang, B., Valstar, M., Martinez, B., Pantic, M.: Dynamic appearance descriptor approach to facial actions temporal modelling. IEEE Trans. Syst. Man Cybern. Part B 44, 161–174 (2014)Google Scholar
  9. 9.
    Huang, X., Zhao, G., Zheng, W., Pietikäinen, M.: Towards a dynamic expression recognition system under facial occlusion. Pattern Recogn. Lett. 33, 2181–2191 (2012)CrossRefGoogle Scholar
  10. 10.
    Valstar, M.F., Pantic, M.: Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 118–127. Springer, Heidelberg (2007) Google Scholar
  11. 11.
    Valstar, M., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: Proceedings IEEE International Conference Automatic Face Gesture Recognition, pp. 921–926 (2011)Google Scholar
  12. 12.
    Çeliktutan, O., Ulukaya, S., Sankur, B.: A comparative study of face landmarking techniques. EURASIP J. Image Video Process. 2013, 13 (2013)CrossRefGoogle Scholar
  13. 13.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012)Google Scholar
  14. 14.
    Jiang, B., Valstar, M., Pantic, M.: Action unit detection using sparse appearance descriptors in space-time video volumes. In: Proceedings IEEE International Conference on Automatic Face and Gesture Recognition, pp. 314–321 (2011)Google Scholar
  15. 15.
    Tzimiropoulos, G., Argyriou, V., Zafeiriou, S., Stathaki, T.: Robust FFT-based scale-invariant image registration with image gradients. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1899–1906 (2010)CrossRefGoogle Scholar
  16. 16.
    Adelson, E.H., Bergen, J.R.: Spatio-temporal energy models for the perception of motion. J. Opt. Soc. Am. 2, 284–299 (1985)CrossRefGoogle Scholar
  17. 17.
    Kolers, P.A.: Aspects of Motion Perception. Pergamon Press, Oxford (1972) Google Scholar
  18. 18.
    Petkov, N., Subramanian, E.: Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition. Biol. Cybern. 97, 423–439 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) zbMATHGoogle Scholar
  20. 20.
    Amano, K., Edwards, M., Badcock, D.R., Nishida, S.: Adaptive pooling of visual motion signals by the human visual system revealed with a novel multi-element stimulus. J. Vis. 9, 1–25 (2009)CrossRefGoogle Scholar
  21. 21.
    Pinto, N., Cox, D.D., DiCarlo, J.J.: Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, e27 (2008)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Webb, B.S., Ledgeway, T., Rocchi, F.: Neural computations governing spatiotemporal pooling of visual motion signals in humans. J. Neurosci. 31, 4917–4925 (2011)CrossRefGoogle Scholar
  23. 23.
    Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: International Conference on Machine Learning, pp. 111–118 (2010)Google Scholar
  24. 24.
    Fischer, S., Šroubek, F., Perrinet, L., Redondo, R., Cristóbal, G.: Self-invertible 2d log-Gabor wavelets. Int. J. Comput. Vis. 75, 231–246 (2007)CrossRefGoogle Scholar
  25. 25.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) Computer Vision - ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  26. 26.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761–767 (2004)CrossRefGoogle Scholar
  27. 27.
    Sim, T., Baker, S., Bsat, M.: The CMU pose, illumination, and expression database. IEEE Trans. Pattern Analysis and Machine Intelligence 25, 1615–1618 (2003)CrossRefGoogle Scholar
  28. 28.
    McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3, 5–17 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Evangelos Sariyanidi
    • 1
    Email author
  • Hatice Gunes
    • 1
  • Andrea Cavallaro
    • 1
  1. 1.Centre for Intelligent SensingQueen Mary University of LondonLondonUK

Personalised recommendations