Advertisement

Using Neural Networks for Data-Driven Backchannel Prediction: A Survey on Input Features and Training Techniques

  • Markus MuellerEmail author
  • David Leuschner
  • Lars Briem
  • Maria Schmidt
  • Kevin Kilgour
  • Sebastian Stueker
  • Alex Waibel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9170)

Abstract

In order to make human computer interaction more social, the use of supporting backchannel cues can be beneficial. Such cues can be delivered in different channels like vision, speech or gestures. In this work, we focus on the prediction of acoustic backchannels in terms of speech. Previously, this prediction has been accomplished by using rule-based approaches. But like every rule-based implementation, it is dependent on a fixed set of handwritten rules which have to be changed every time the mechanism is adjusted or different data is used. In this paper we want to overcome these limitations by making use of recent advancements in the field of machine learning. We show that backchannel predictions can be generated by means of a neural network based approach. Such a method has the advantage of depending only on the training data, without the need of handwritten rules.

Keywords

Backchannel Neural networks Data-driven prediction 

References

  1. Woszczyna, M., Aoki-Waibel, N., Bu, F.D., Coccaro, N., Horiguchi, K., Kemp, T., Lavie, A., McNair, A., Polzin, T., Rogina, I., Rose, C., Schultz, T., Suhm, B., Tomita, M., Waibel, A.: JANUS 93: Towards Spontaneous Speech Translation International Conference on Acoustics, Speech, and Signal Processing (1994)Google Scholar
  2. Stolcke, Andreas, et al.: Dialog act modeling for conversational speech. In: AAAI Spring Symposium on Applying Machine Learning to Discourse Processing (1998)Google Scholar
  3. Kjell, S.: Pitch tracking and his application on speech recognition Diploma Thesis, University of Karlsruhe (TH)Google Scholar
  4. Ries, K.: HMM and neural network based speech act detection. In: 1999 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. IEEE (1999)Google Scholar
  5. Stolcke, A., et al.: Dialogue act modeling for automatic tagging and recognition of conversational speech. Comput. Linguist. 26(3), 339–373 (2000)CrossRefGoogle Scholar
  6. Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese. J. pragmatics 32, 1177–1207 (2000)CrossRefGoogle Scholar
  7. Hinton, G., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18.7, 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  8. Morency, L.-P., de Kok, I., Gratch, J.: Predicting listener backchannels: a probabilistic multimodal approach. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 176–190. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  9. Huang, L., Morency, L.-P., Gratch, J.: Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior. In: Autonomous Agents and Multiagent Systems (AAMAS), PP. 176–190 (2010)Google Scholar
  10. Truong, K.P., Poppe, R., Heylen, D.: A rule-based backchannel prediction model using pitch and pause information. In: Interspeech, PP. 3058–3061 (2010)Google Scholar
  11. de Kok, I., Poppe, R., Heylen, D.: Iterative Perceptual Learning for Social Behavior Synthesis, Centre for Telematics and Information Technology University of Twente. Technical report (2012)Google Scholar
  12. de Kok, I., Heylen, D.: A survey on evaluation metrics for backchannel prediction models. In: The Interdisciplinary Workshop on Feedback Behaviors in Dialog, pp. 15–18 (2012)Google Scholar
  13. Gehring, Jonas, et al.: Extracting deep bottleneck features using stacked auto-encoders. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)Google Scholar
  14. Kawahara, T., Uesato, M., Yoshino, K., Takanashi, K.: Toward adaptive generation of backchannels for attentive listening agents. In: International Workshop Serien on Spoken Dialogue Systems Technology, pp. 1–10 (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Markus Mueller
    • 1
    Email author
  • David Leuschner
    • 1
  • Lars Briem
    • 1
  • Maria Schmidt
    • 1
  • Kevin Kilgour
    • 1
  • Sebastian Stueker
    • 1
  • Alex Waibel
    • 1
  1. 1.Interactive Systems LabInstitute for Anthropomatics and Robotics, Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations