Skip to main content

Speech Communication and Multimodal Interfaces

  • Chapter
Advanced Man-Machine Interaction

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns., 2004.

    Google Scholar 

  2. Ang, J., Dhillon, R., Krupski, A., Shriberg, E., and Stolcke, A. Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In Proceedings of the International Conference on Speech and Language Processing (ICSLP 2002). Denver, CO, 2002.

    Google Scholar 

  3. Arsic, D., Wallhoff, F., Schuller, B., and Rigoll, G. Video Based Online Behavior Detection Using Probabilistic Multi-Stream Fusion. In Proceedings of the International IEEE Conference on Image Processing (ICIP 2005). 2005.

    Google Scholar 

  4. Batliner, A., Hacker, C., Steidl, S., Nöth, E., Russel, S. D. M., and Wong, M. ‘You Stupid Tin Box’-Children Interacting with the AIBO Robot:A Cross-linguisitc Emotional Speech Corpus. In Proceedings of the LREC 2004. Lisboa, Portugal, 2004.

    Google Scholar 

  5. Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., and Suhm, B., editors. Audiovisual and Multimodal Speech Systems. In: Handbook of Standards and Resources for Spoken Language Systems-Supplement Volume. D. Gibbon, I. Mertins, R.K. Moore, Kluwer International Series in Engineering and Computer Science, 2000.

    Google Scholar 

  6. Bolt, R. A. “Put-That-There”: Voice and Gesture at the Graphics Interface. In International Conference on Computer Graphics and Interactive Techniques, pages 262–270. July 1980.

    Google Scholar 

  7. Carpenter, B. The Logic of Typed Feature Structures. Cambridge, England, 1992.

    Google Scholar 

  8. Chuang, Z. and Wu, C. Emotion Recognition using Acoustic Features and Textual Content. In Proceedings of the International IEEE Conference on Multimedia and Expo (ICME) 2004. Taipei, Taiwan, 2004.

    Google Scholar 

  9. Core, M. G. Analyzing and Predicting Patterns of DAMSL Utterance Tags. In AAAI Spring Symposium Technical Report SS-98-01. AAAI Press, 1998. ISBN ISBN 1-57735-046-4.

    Google Scholar 

  10. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. G. Emotion Recognition in Human-computer Interaction. IEEE Signal Processing magazine, 18(1):32–80, January 2001.

    Article  Google Scholar 

  11. Devillers, L. and Lamel, L. Emotion Detection in Task-Oriented Dialogs. In Proceedings of the International Conference on Multimedia and Expo(ICME 2003), IEEE, Multimedia Human-Machine Interface and Interaction, volume III, pages 549–552. Baltimore, MD, 2003.

    Google Scholar 

  12. Ekman, P. and Friesen, W. Facial Action Coding System. Consulting Psychologists Press, 1978.

    Google Scholar 

  13. Freund, Y. and Schapire, R. Experiments with a New Boosting Algorithm. In International Conference on Machine Learning, pages 148–156. 1996.

    Google Scholar 

  14. Geiser, G., editor. Mensch-Maschine-Kommunikation. Oldenbourg-Verlag, München, 1990.

    Google Scholar 

  15. Goldschen, A. and Loehr, D. The Role of the DARPA Communicator Architecture as a Human-Computer Interface for Distributed Simulations. In Simulation Interoperability Standards Organization (SISO) Spring Simulation Interoperability Workshop. Orlando, Florida, 1999.

    Google Scholar 

  16. Grosz, B. and Sidner, C. Attentions, Intentions and the Structure of Discourse. Computational Linguistics, 12(3):175–204, 1986.

    Google Scholar 

  17. Hartung, K., Münch, S., and Schomaker, L. MIAMI: Software Architecture, Deliverable Report 4. Report of ESPRIT III: Basic Research Project 8579, Multimodal Interface for Advanced Multimedia Interfaces (MIAMI). Technical report, 1996.

    Google Scholar 

  18. Hewett, T., Baecker, R., Card, S., Carey, T., Gasen, J., Mantei, M., Perlman, G., Strong, G., and Verplank, W., editors. Curricula for Human-Computer Interaction. ACM Special Interest Group on Computer-Human Interaction, Curriculum Development Group, 1996.

    Google Scholar 

  19. Hoch, S., Althoff, F., McGlaun, G., and Rigoll, G. Bimodal Fusion of Emotional Data in an Automotive Environment. In Proc. of the ICASSP 2005, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing. 2005.

    Google Scholar 

  20. Jiao, F., Li, S., Shum, H., and Schuurmanns, D. Face Alignment Using Statistical Models and Wavelet Features. In Conference on Computer Vision and Pattern Recognition. 2003.

    Google Scholar 

  21. Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Technical report, LS-8 Report 23, Dortmund, Germany, 1997.

    Google Scholar 

  22. Johnston, M. Unification-based Multimodal Integration. In Below, R. K. and Booker, L., editors, Proccedings of the 4th International Conference on Genetic Algorithms. Morgan Kaufmann, 1997.

    Google Scholar 

  23. Krahmer, E. The Science and Art of Voice Interfaces. Technical report, Philips Research, Eindhoven, Netherlands, 2001.

    Google Scholar 

  24. Langley, P., Thompson, C., Elio, R., and Haddadi, A. An Adaptive Conversational Interface for Destination Advice. In Proceedings of the Third International Workshop on Cooperative Information Agents. Springer, Uppsala, Sweden, 1999.

    Google Scholar 

  25. Lee, C. M. and Pieraccini, R. Combining Acoustic and Language Information for Emotion Recognition. In Proceedings of the International Conference on Speech and Language Processing (ICSLP 2002). Denver, CO, 2002.

    Google Scholar 

  26. Lee, T. S. Image Representation Using 2D Gabor Wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):959–971, 1996.

    Article  Google Scholar 

  27. Levin, E., Pieraccini, R., and Eckert, W. A stochastic Model of Human-Machine Interaction for Learning Dialog Strategies. IEEE Transactions on Speech and Audio Processing, 8(1):11–23, 2000.

    Article  Google Scholar 

  28. Litman, D., Kearns, M., Singh, S., and Walker, M. Automatic Optimization of Dialogue Management. In Proceedings of the 18th International Conference on Computational Linguistics. Saarbrücken, Germany, 2000.

    Google Scholar 

  29. Maybury, M. T. and Stock, O. Multimedia Communication, including Text. In Hovy, E., Ide, N., Frederking, R., Mariani, J., and Zampolli, A., editors, Multilingual Information Management: Current Levels and Future Abilities. A study commissioned by the US National Science Foundation and also delivered to European Commission Language Engineering Office and the US Defense Advanced Research Projects Agency, 1999.

    Google Scholar 

  30. McGlaun, G., Althoff, F., Lang, M., and Rigoll, G. Development of a Generic Multimodal Framework for Handling Error Patterns during Human-Machine Interaction. In SCI 2004, 8th World Multi-Conference on Systems, Cybernetics, and Informatics, Orlando, FL, USA. 2004.

    Google Scholar 

  31. McTear, M. F. Spoken Dialogue Technology: Toward the Conversational User Interface. Springer Verlag, London, 2004. ISBN 1-85233-672-2.

    Google Scholar 

  32. Mehrabian, A. Communication without Words. Psychology Today, 2(4):53–56, 1968.

    Google Scholar 

  33. Nielsen, J. Usability Engineering. Academic Press, Inc., 1993. ISBN 0-12-518405-0.

    Google Scholar 

  34. Nogueiras, A., Moreno, A., Bonafonte, A., and Marino, J. Speech Emotion Recognition Using Hidden Markov Models. In Eurospeech 2001 Poster Proceedings, pages 2679–2682. Scandinavia, 2001.

    Google Scholar 

  35. Oviatt, S. Ten Myths of Multimodal Interaction. Communications of the ACM 42, 11:74–81, 1999.

    Article  Google Scholar 

  36. Oviatt, S., Cohen, P., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., and Ferro, D. Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directionss. Human Computer Interaction, (15(4)):263–322, 2000.

    Article  Google Scholar 

  37. Pantic, M. and Rothkrantz, L. Automatic Analysis of Facial Expressions: The State of the Art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1424–1445, 2000.

    Article  Google Scholar 

  38. Pantic, M. and Rothkrantz, L. Toward an Affect-Sensitive Multimodal Human-Computer Interaction. Proccedings of the IEEE, 91:1370–1390, September 2003.

    Article  Google Scholar 

  39. Petrushin, V. Emotion in Speech: Recognition and Application to Call Centers. In Proceedings of the Conference on Artificial Neural Networks in Engineering(ANNIE’ 99). 1999.

    Google Scholar 

  40. Picard, R. W. Affective Computing. MIT Press, Massachusetts, 2nd edition, 1998. ISBN 0-262-16170-2.

    Google Scholar 

  41. Pieraccini, R., Levin, E., and Eckert, W. AMICA: The AT&T Mixed Initiative Conversational Architecture. In Proceedings of the Eurospeech’ 97, pages 1875–1878. Rhodes, Greece, 1997.

    Google Scholar 

  42. Reason, J. Human Error. Cambridge University Press, 1990. ISBN 0521314194.

    Google Scholar 

  43. Sadek, D. and de Mori, R. Dialogue Systems. In de Mori, R., editor, Spoken Dialogues with computers, pages 523–562. Academic Press, 1998.

    Google Scholar 

  44. Schomaker, L., Nijtmanns, J., Camurri, C., Morasso, P., and Benoit, C. A Taxonomy of Multimodal Interaction in the Human Information Processing System. Report of ESPRIT III: Basic Research Project 8579, Multimodal Interface for Advanced Multimedia Interfaces (MIAMI). Technical report, 1995.

    Google Scholar 

  45. Schuller, B., Müller, R., Lang, M., and Rigoll, G. Speaker Independent Emotion Recognition by Early Fusion of Acousticand Linguistic Features within Ensembles. In Proceedings of the ISCA Interspeech 2005. Lisboa, Portugal, 2005.

    Google Scholar 

  46. Schuller, B., Rigoll, G., and Lang, M. Hidden Markov Model-Based Speech Emotion Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), volume II, pages 1–4. 2003.

    Google Scholar 

  47. Schuller, B., Rigoll, G., and Lang, M. Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-Belief Network Architecture. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), volume I, pages 577–580. Montreal, Quebec, 2004.

    Google Scholar 

  48. Schuller, B., Villar, R. J., Rigoll, G., and Lang, M. Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition. In Proceedings of the International Conference on Acoustics, Speechand Signal Processing (ICASSP) 2005, volume 1, pages 325–329. Philadelphia, Pennsylvania, 2005.

    Google Scholar 

  49. Shneiderman, B. Designing the user interface: Strategies for effective human-computer interaction (3rd ed.). Addison-Wesley Publishing, 1998. ISBN 0201694972.

    Google Scholar 

  50. Smith, W. and Hipp, D. Spoken Natural Language Dialog Systems: A Practical Approach. Oxford University Press, 1994. ISBN 0-19-509187-6.

    Google Scholar 

  51. Tian, Y., Kanade, T., and Cohn, J. Evaluation of Gabor-wavelet-based Facial Action Unit Recognitionin Image Sequences of Increasing Complexity. In Proceedings of the Fifth IEEE International Conference on AutomaticFace and Gesture Recognition, pages 229–234. May 2002.

    Google Scholar 

  52. Turk, M. and Pentland, A. Face Recognition Using Eigenfaces. In Proc. of Conference on Computer Vision and Pattern Recognition, pages 586–591. 1991.

    Google Scholar 

  53. van Zanten, G. V. User-modeling in Adaptive Dialogue Management. In Proceedings of the Eurospeech’ 99, pages 1183–1186. Budapest, Hungary, 1999.

    Google Scholar 

  54. Ververidis, D. and Kotropoulos, C. A State of the Art Review on Emotional Speech Databases. In Proceedings of the 1st Richmedia Conference, pages 109–119. Lausanne, Sitzerland, 2003.

    Google Scholar 

  55. Witten, I. H. and Frank, E. Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000. ISBN 1-558-60552-5.

    Google Scholar 

  56. Wu, L., Oviatt, S., and Cohen, P. Multimodal integration-A Statistical Review. 1(4), pages 334–341. 1999.

    Google Scholar 

  57. Young, S. Probabilistic Methods in Spoken Dialogue Systems. Philosophical Transactions of the Royal Society, 358:1389–1402, 2000.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schuller, B., Ablaßmeier, M., Müller, R., Reifinger, S., Poitschke, T., Rigoll, G. (2006). Speech Communication and Multimodal Interfaces. In: Kraiss, KF. (eds) Advanced Man-Machine Interaction. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-30619-6_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-30619-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30618-4

  • Online ISBN: 978-3-540-30619-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics