Continuous Conditional Neural Fields for Structured Regression

  • Tadas Baltrušaitis
  • Peter Robinson
  • Louis-Philippe Morency
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8692)

Abstract

An increasing number of computer vision and pattern recognition problems require structured regression techniques. Problems like human pose estimation, unsegmented action recognition, emotion prediction and facial landmark detection have temporal or spatial output dependencies that regular regression techniques do not capture. In this paper we present continuous conditional neural fields (CCNF) – a novel structured regression model that can learn non-linear input-output dependencies, and model temporal and spatial output relationships of varying length sequences. We propose two instances of our CCNF framework: Chain-CCNF for time series modelling, and Grid-CCNF for spatial relationship modelling. We evaluate our model on five public datasets spanning three different regression problems: facial landmark detection in the wild, emotion prediction in music and facial action unit recognition. Our CCNF model demonstrates state-of-the-art performance on all of the datasets used.

Keywords

Structured regression Landmark detection Face tracking 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-319-10593-2_39_MOESM1_ESM.pdf (208 kb)
Electronic Supplementary Material(208 KB)

References

  1. 1.
    Baltrusaitis, T., Morency, L.P., Robinson, P.: Constrained local neural fields for robust facial landmark detection in the wild. In: IEEE International Conference on Computer Vision Workshops (2013)Google Scholar
  2. 2.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc. (2006)Google Scholar
  3. 3.
    Saragih, J., Lucey, S., Cohn, J.: Deformable Model Fitting by Regularized Landmark Mean-Shift. IJCV (2011)Google Scholar
  4. 4.
    Wang, Y., Lucey, S., Cohn, J.: Enforcing convexity for improved alignment with constrained local models. In: CVPR (2008)Google Scholar
  5. 5.
    Han, B.J., Rho, S., Dannenberg, R.B., Hwang, E.: Smers: Music emotion recognition using support vector regression. In: ISMIR (2009)Google Scholar
  6. 6.
    Valstar, M., Schuller, B., Smith, K., Eyben, F., Jiang, B., Bilakhia, S., Schnieder, S., Cowie, R., Pantic, M.: AVEC 2013 – The Continuous Audio / Visual Emotion and Depression Recognition Challenge (2013)Google Scholar
  7. 7.
    Jeni, L.A., Girard, J.M., Cohn, J.F., De La Torre, F.: Continuous au intensity estimation using localized, sparse facial feature space. In: FG (2013)Google Scholar
  8. 8.
    Nicolaou, M.A., Gunes, H., Pantic, M.: Output-associative RVM regression for dimensional and continuous emotion prediction. IVC (2012)Google Scholar
  9. 9.
    Wang, F., Verhelst, W., Sahli, H.: Relevance vector machine based speech emotion recognition. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 111–120. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Sutton, C., McCallum, A.: Introduction to Conditional Random Fields for Relational Learning. In: Introduction to Statistical Relational Learning. MIT Press (2006)Google Scholar
  11. 11.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Google Scholar
  12. 12.
    Sandbach, G., Zafeiriou, S., Pantic, M.: Markov random field structures for facial action unit intensity estimation. In: IEEE International Conference on Computer Vision, Workshop on Decoding Subtle Cues from Social Interactions (2013)Google Scholar
  13. 13.
    Peng, J., Bo, L., Xu, J.: Conditional neural fields. In: NIPS (2009)Google Scholar
  14. 14.
    Bo, L., Sminchisescu, C.: Structured output-associative regression. In: CVPR (2009)Google Scholar
  15. 15.
    Bo, L., Sminchisescu, C.: Twin gaussian processes for structured prediction. IJCV (2010)Google Scholar
  16. 16.
    Qin, T., Liu, T.Y., Zhang, X.D., Wang, D.S., Li, H.: Global ranking using continuous conditional random fields. In: NIPS (2008)Google Scholar
  17. 17.
    Baltrušaitis, T., Banda, N., Robinson, P.: Dimensional affect recognition using continuous conditional random fields. In: FG (2013)Google Scholar
  18. 18.
    Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 16(5), 1190–1208 (1994)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Speck, J.A., Schmidt, E.M., Morton, B.G., Kim, Y.E.: A comparative study of collaborative vs. traditional musical mood annotation. In: ISMIR (2011)Google Scholar
  20. 20.
    Mavadati, S.M., Member, S., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: Disfa: A spontaneous facial action intensity database. IEEE T-AFFC (2013)Google Scholar
  21. 21.
    Ekman, P., Friesen, W.V.: Manual for the Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1977)Google Scholar
  22. 22.
    Kim, J., Park, H.: Toward faster nonnegative matrix factorization: A new algorithm and comparisons (2008)Google Scholar
  23. 23.
    Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. IVC 28(5), 807–813 (2010)CrossRefGoogle Scholar
  24. 24.
    Imbrasaitė, V., Baltrušaitis, T., Robinson, P.: Emotion tracking in music using Continuous Conditional Random Fields and relative feature representation. In: IEEE International Conference on Multimedia and Expo (2013)Google Scholar
  25. 25.
    Imbrasaitė, V., Baltrušaitis, T., Robinson, P.: What really matters? a study into peoples instinctive evaluation metrics for continuous emotion prediction in music. In: Affective Computing and Intelligent Interaction (2013)Google Scholar
  26. 26.
    Martins, P., Caseiro, R., Henriques, J.F., Batista, J.: Discriminative bayesian active shape models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 57–70. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  27. 27.
    Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: CVPR (2013)Google Scholar
  28. 28.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: IEEE CVPR (2012)Google Scholar
  29. 29.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: ICCV (2013)Google Scholar
  30. 30.
    Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: CVPR (2011)Google Scholar
  31. 31.
    Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  32. 32.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)Google Scholar
  33. 33.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments (2007)Google Scholar
  34. 34.
    Matthews, I., Baker, S.: Active appearance models revisited. IJCV 60(2), 135–164 (2004)CrossRefGoogle Scholar
  35. 35.
    Fan, R.E., Kai-Wei, C., Cho-Jui, H., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9 (2008)Google Scholar
  36. 36.
    Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. TPAMI 30(5), 878–892 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tadas Baltrušaitis
    • 1
  • Peter Robinson
    • 1
  • Louis-Philippe Morency
    • 2
  1. 1.Computer LaboratoryUniversity of CambridgeUK
  2. 2.Institute for Creative TechnologiesUniversity of Southern CaliforniaUSA

Personalised recommendations