Skip to main content

Audio Visual Emotion Recognition Based on Triple-Stream Dynamic Bayesian Network Models

  • Conference paper
Affective Computing and Intelligent Interaction (ACII 2011)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6974))

Abstract

We present a triple stream DBN model (T_AsyDBN) for audio visual emotion recognition, in which the two audio feature streams are synchronous, while they are asynchronous with the visual feature stream within controllable constraints. MFCC features and the principle component analysis (PCA) coefficients of local prosodic features are used for the audio streams. For the visual stream, 2D facial features as well 3D facial animation unit features are defined and concatenated, and the feature dimensions are reduced by PCA. Emotion recognition experiments on the eNERFACE’05 database show that by adjusting the asynchrony constraint, the proposed T_AsyDBN model obtains 18.73% higher correction rate than the traditional multi-stream state synchronous HMM (MSHMM), and 10.21% higher than the two stream asynchronous DBN model (Asy_DBN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ekman, P., Friesen, W.V.: Constants across Cultures in the Face and Emotion. Journal of Personality and Social Psychology 17(2), 124–129 (1971)

    Article  Google Scholar 

  2. Lee, C.M., Narayanan, S.S.: Toward Detecting Emotions In Spoken Dialogs. IEEE Tran. on Speech and Audio Processing 13(2), 293–303 (2005)

    Article  Google Scholar 

  3. Neiberg, D., Elenius, K., Laskowski, K.: Emotion Recognition In Spontaneous Speech Using GMMs. In: Proceedings ICSLP 2006, Pittsburgh, pp. 809–812 (2006)

    Google Scholar 

  4. Wang, J., Yin, L., Wei, X., Sun, Y.: 3D Facial Expression Recognition Based on Primitive Surface Feature Distribution. In: Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp. 1399–1406 (2006)

    Google Scholar 

  5. Zeng, Z., Fu, Y., Roisman, G.I., Wen, Z., Hu, Y., Huang, T.S.: Spontaneous Emotional Facial Expression Detection. Journal of Multimedia 1(5), 1–8 (2006)

    Article  Google Scholar 

  6. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., et al.: Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. In: ACM Int. Conf. on Multimodal Interfaces, pp. 205–211 (2004)

    Google Scholar 

  7. Schuller, B., Müller, R., Hörnler, B., et al.: Audiovisual Recognition of Spontaneous Interest Within Conversations. In: ACM Int. Conf. on Multimodal Interfaces, pp. 30–37 (2007)

    Google Scholar 

  8. Pal, P., Iyer, A.N., Yantorno, R.E.: Emotion Detection from Infant Facial Expressions and Cries. In: Proc. ICASSP, vol. 2, pp. 721–724 (2006)

    Google Scholar 

  9. Petridis, S., Pantic, M.: Audiovisual Discrimination between Laughter and Speech. In: Proc. ICASSP, pp. 5117–5120 (2008)

    Google Scholar 

  10. Zeng, Z., Hu, Y., Roisman, G.I., Wen, Z., Fu, Y., et al.: Audio-Visual Emotion Recognition in Adult Attachment Interview. In: Int. Conf. on Multimodal Interfaces, pp. 139–145 (2006)

    Google Scholar 

  11. Zeng, Z., Tu, J., Pianfetti, et al.: Audio-visual Affective Expression Recognition through Multi-stream Fused HMM. IEEE Transactions on Multimedia 10(4), 570–577 (2008)

    Article  Google Scholar 

  12. Song, M., You, M., Li, N., Chen, C.: A Robust Multimodal Approach for Emotion Recognition. Neurocomputing 71(10-12), 1913–1920 (2008)

    Article  Google Scholar 

  13. Chen, D., Jiang, D., Ravyse, I., Sahli, H.: Audio-Visual Emotion Recognition Based on a DBN Model with Constrained Asynchrony. In: Proc. ICIG, pp. 912–916 (2009)

    Google Scholar 

  14. Ekman, P., Friesen, W.V.: Facial Action Coding System. Consulting Psychologist Press, Palo Alto (1978)

    Google Scholar 

  15. Young, S., Kershaw, O.D., Ollason, J., Valtchev, D.V., Woodland, P.: The HTK Book. Entropic Ltd., Cambridge (1999)

    Google Scholar 

  16. Hou, Y., Sahli, H., Ravyse, I., Zhang, Y., Zhao, R.: Robust Shape-based Head Tracking. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2007. LNCS, vol. 4678, pp. 340–351. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Hou, Y., Fan, P., Ravyse, I., Sahli, H.: 3D Face Alignment via Cascade 2D Shape Alignment and Constrained Structure from Motion. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2009. LNCS, vol. 5807, pp. 550–561. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Bilmes, J., Zweig, G.: The Graphical Models Toolkit: An Open Source Software System for Speech and Time Series Processing. In: Proc. ICASSP, pp. 3916–3919 (2002)

    Google Scholar 

  19. Martin, O., Kotsia, I., Macq, B., et al.: The eNTERFACE’05 Audio-visual Emotion Database. In: Proceedings of the 22nd Int. Conf. on Data Engineering Workshops (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, D., Cui, Y., Zhang, X., Fan, P., Ganzalez, I., Sahli, H. (2011). Audio Visual Emotion Recognition Based on Triple-Stream Dynamic Bayesian Network Models. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24600-5_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24600-5_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24599-2

  • Online ISBN: 978-3-642-24600-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics