Speech Communication and Multimodal Interfaces

When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns., 2004.
Google Scholar
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., and Stolcke, A. Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In Proceedings of the International Conference on Speech and Language Processing (ICSLP 2002). Denver, CO, 2002.
Google Scholar
Arsic, D., Wallhoff, F., Schuller, B., and Rigoll, G. Video Based Online Behavior Detection Using Probabilistic Multi-Stream Fusion. In Proceedings of the International IEEE Conference on Image Processing (ICIP 2005). 2005.
Google Scholar
Batliner, A., Hacker, C., Steidl, S., Nöth, E., Russel, S. D. M., and Wong, M. ‘You Stupid Tin Box’-Children Interacting with the AIBO Robot:A Cross-linguisitc Emotional Speech Corpus. In Proceedings of the LREC 2004. Lisboa, Portugal, 2004.
Google Scholar
Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., and Suhm, B., editors. Audiovisual and Multimodal Speech Systems. In: Handbook of Standards and Resources for Spoken Language Systems-Supplement Volume. D. Gibbon, I. Mertins, R.K. Moore, Kluwer International Series in Engineering and Computer Science, 2000.
Google Scholar
Bolt, R. A. “Put-That-There”: Voice and Gesture at the Graphics Interface. In International Conference on Computer Graphics and Interactive Techniques, pages 262–270. July 1980.
Google Scholar
Carpenter, B. The Logic of Typed Feature Structures. Cambridge, England, 1992.
Google Scholar
Chuang, Z. and Wu, C. Emotion Recognition using Acoustic Features and Textual Content. In Proceedings of the International IEEE Conference on Multimedia and Expo (ICME) 2004. Taipei, Taiwan, 2004.
Google Scholar
Core, M. G. Analyzing and Predicting Patterns of DAMSL Utterance Tags. In AAAI Spring Symposium Technical Report SS-98-01. AAAI Press, 1998. ISBN ISBN 1-57735-046-4.
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. G. Emotion Recognition in Human-computer Interaction. IEEE Signal Processing magazine, 18(1):32–80, January 2001.
Article Google Scholar
Devillers, L. and Lamel, L. Emotion Detection in Task-Oriented Dialogs. In Proceedings of the International Conference on Multimedia and Expo(ICME 2003), IEEE, Multimedia Human-Machine Interface and Interaction, volume III, pages 549–552. Baltimore, MD, 2003.
Google Scholar
Ekman, P. and Friesen, W. Facial Action Coding System. Consulting Psychologists Press, 1978.
Google Scholar
Freund, Y. and Schapire, R. Experiments with a New Boosting Algorithm. In International Conference on Machine Learning, pages 148–156. 1996.
Google Scholar
Geiser, G., editor. Mensch-Maschine-Kommunikation. Oldenbourg-Verlag, München, 1990.
Google Scholar
Goldschen, A. and Loehr, D. The Role of the DARPA Communicator Architecture as a Human-Computer Interface for Distributed Simulations. In Simulation Interoperability Standards Organization (SISO) Spring Simulation Interoperability Workshop. Orlando, Florida, 1999.
Google Scholar
Grosz, B. and Sidner, C. Attentions, Intentions and the Structure of Discourse. Computational Linguistics, 12(3):175–204, 1986.
Google Scholar
Hartung, K., Münch, S., and Schomaker, L. MIAMI: Software Architecture, Deliverable Report 4. Report of ESPRIT III: Basic Research Project 8579, Multimodal Interface for Advanced Multimedia Interfaces (MIAMI). Technical report, 1996.
Google Scholar
Hewett, T., Baecker, R., Card, S., Carey, T., Gasen, J., Mantei, M., Perlman, G., Strong, G., and Verplank, W., editors. Curricula for Human-Computer Interaction. ACM Special Interest Group on Computer-Human Interaction, Curriculum Development Group, 1996.
Google Scholar
Hoch, S., Althoff, F., McGlaun, G., and Rigoll, G. Bimodal Fusion of Emotional Data in an Automotive Environment. In Proc. of the ICASSP 2005, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing. 2005.
Google Scholar
Jiao, F., Li, S., Shum, H., and Schuurmanns, D. Face Alignment Using Statistical Models and Wavelet Features. In Conference on Computer Vision and Pattern Recognition. 2003.
Google Scholar
Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Technical report, LS-8 Report 23, Dortmund, Germany, 1997.
Google Scholar
Johnston, M. Unification-based Multimodal Integration. In Below, R. K. and Booker, L., editors, Proccedings of the 4th International Conference on Genetic Algorithms. Morgan Kaufmann, 1997.
Google Scholar
Krahmer, E. The Science and Art of Voice Interfaces. Technical report, Philips Research, Eindhoven, Netherlands, 2001.
Google Scholar
Langley, P., Thompson, C., Elio, R., and Haddadi, A. An Adaptive Conversational Interface for Destination Advice. In Proceedings of the Third International Workshop on Cooperative Information Agents. Springer, Uppsala, Sweden, 1999.
Google Scholar
Lee, C. M. and Pieraccini, R. Combining Acoustic and Language Information for Emotion Recognition. In Proceedings of the International Conference on Speech and Language Processing (ICSLP 2002). Denver, CO, 2002.
Google Scholar
Lee, T. S. Image Representation Using 2D Gabor Wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):959–971, 1996.
Article Google Scholar
Levin, E., Pieraccini, R., and Eckert, W. A stochastic Model of Human-Machine Interaction for Learning Dialog Strategies. IEEE Transactions on Speech and Audio Processing, 8(1):11–23, 2000.
Article Google Scholar
Litman, D., Kearns, M., Singh, S., and Walker, M. Automatic Optimization of Dialogue Management. In Proceedings of the 18th International Conference on Computational Linguistics. Saarbrücken, Germany, 2000.
Google Scholar
Maybury, M. T. and Stock, O. Multimedia Communication, including Text. In Hovy, E., Ide, N., Frederking, R., Mariani, J., and Zampolli, A., editors, Multilingual Information Management: Current Levels and Future Abilities. A study commissioned by the US National Science Foundation and also delivered to European Commission Language Engineering Office and the US Defense Advanced Research Projects Agency, 1999.
Google Scholar
McGlaun, G., Althoff, F., Lang, M., and Rigoll, G. Development of a Generic Multimodal Framework for Handling Error Patterns during Human-Machine Interaction. In SCI 2004, 8th World Multi-Conference on Systems, Cybernetics, and Informatics, Orlando, FL, USA. 2004.
Google Scholar
McTear, M. F. Spoken Dialogue Technology: Toward the Conversational User Interface. Springer Verlag, London, 2004. ISBN 1-85233-672-2.
Google Scholar
Mehrabian, A. Communication without Words. Psychology Today, 2(4):53–56, 1968.
Google Scholar
Nielsen, J. Usability Engineering. Academic Press, Inc., 1993. ISBN 0-12-518405-0.
Google Scholar
Nogueiras, A., Moreno, A., Bonafonte, A., and Marino, J. Speech Emotion Recognition Using Hidden Markov Models. In Eurospeech 2001 Poster Proceedings, pages 2679–2682. Scandinavia, 2001.
Google Scholar
Oviatt, S. Ten Myths of Multimodal Interaction. Communications of the ACM 42, 11:74–81, 1999.
Article Google Scholar
Oviatt, S., Cohen, P., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., and Ferro, D. Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directionss. Human Computer Interaction, (15(4)):263–322, 2000.
Article Google Scholar
Pantic, M. and Rothkrantz, L. Automatic Analysis of Facial Expressions: The State of the Art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1424–1445, 2000.
Article Google Scholar
Pantic, M. and Rothkrantz, L. Toward an Affect-Sensitive Multimodal Human-Computer Interaction. Proccedings of the IEEE, 91:1370–1390, September 2003.
Article Google Scholar
Petrushin, V. Emotion in Speech: Recognition and Application to Call Centers. In Proceedings of the Conference on Artificial Neural Networks in Engineering(ANNIE’ 99). 1999.
Google Scholar
Picard, R. W. Affective Computing. MIT Press, Massachusetts, 2nd edition, 1998. ISBN 0-262-16170-2.
Google Scholar
Pieraccini, R., Levin, E., and Eckert, W. AMICA: The AT&T Mixed Initiative Conversational Architecture. In Proceedings of the Eurospeech’ 97, pages 1875–1878. Rhodes, Greece, 1997.
Google Scholar
Reason, J. Human Error. Cambridge University Press, 1990. ISBN 0521314194.
Google Scholar
Sadek, D. and de Mori, R. Dialogue Systems. In de Mori, R., editor, Spoken Dialogues with computers, pages 523–562. Academic Press, 1998.
Google Scholar
Schomaker, L., Nijtmanns, J., Camurri, C., Morasso, P., and Benoit, C. A Taxonomy of Multimodal Interaction in the Human Information Processing System. Report of ESPRIT III: Basic Research Project 8579, Multimodal Interface for Advanced Multimedia Interfaces (MIAMI). Technical report, 1995.
Google Scholar
Schuller, B., Müller, R., Lang, M., and Rigoll, G. Speaker Independent Emotion Recognition by Early Fusion of Acousticand Linguistic Features within Ensembles. In Proceedings of the ISCA Interspeech 2005. Lisboa, Portugal, 2005.
Google Scholar
Schuller, B., Rigoll, G., and Lang, M. Hidden Markov Model-Based Speech Emotion Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), volume II, pages 1–4. 2003.
Google Scholar
Schuller, B., Rigoll, G., and Lang, M. Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-Belief Network Architecture. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), volume I, pages 577–580. Montreal, Quebec, 2004.
Google Scholar
Schuller, B., Villar, R. J., Rigoll, G., and Lang, M. Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition. In Proceedings of the International Conference on Acoustics, Speechand Signal Processing (ICASSP) 2005, volume 1, pages 325–329. Philadelphia, Pennsylvania, 2005.
Google Scholar
Shneiderman, B. Designing the user interface: Strategies for effective human-computer interaction (3rd ed.). Addison-Wesley Publishing, 1998. ISBN 0201694972.
Google Scholar
Smith, W. and Hipp, D. Spoken Natural Language Dialog Systems: A Practical Approach. Oxford University Press, 1994. ISBN 0-19-509187-6.
Google Scholar
Tian, Y., Kanade, T., and Cohn, J. Evaluation of Gabor-wavelet-based Facial Action Unit Recognitionin Image Sequences of Increasing Complexity. In Proceedings of the Fifth IEEE International Conference on AutomaticFace and Gesture Recognition, pages 229–234. May 2002.
Google Scholar
Turk, M. and Pentland, A. Face Recognition Using Eigenfaces. In Proc. of Conference on Computer Vision and Pattern Recognition, pages 586–591. 1991.
Google Scholar
van Zanten, G. V. User-modeling in Adaptive Dialogue Management. In Proceedings of the Eurospeech’ 99, pages 1183–1186. Budapest, Hungary, 1999.
Google Scholar
Ververidis, D. and Kotropoulos, C. A State of the Art Review on Emotional Speech Databases. In Proceedings of the 1st Richmedia Conference, pages 109–119. Lausanne, Sitzerland, 2003.
Google Scholar
Witten, I. H. and Frank, E. Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000. ISBN 1-558-60552-5.
Google Scholar
Wu, L., Oviatt, S., and Cohen, P. Multimodal integration-A Statistical Review. 1(4), pages 334–341. 1999.
Google Scholar
Young, S. Probabilistic Methods in Spoken Dialogue Systems. Philosophical Transactions of the Royal Society, 358:1389–1402, 2000.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Human-Machine Communication, Technische Universität München, Munich, Germany
Dipl.-Ing. Björn Schuller, Dipl.-Ing. Markus Ablaßmeier, Dipl.-Ing. Ronald Müller, Dipl.-Ing. Stefan Reifinger, Dipl.-Ing. Tony Poitschke & Prof. Dr.-Ing. Gerhard Rigoll

Authors

Dipl.-Ing. Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar
Dipl.-Ing. Markus Ablaßmeier
View author publications
You can also search for this author in PubMed Google Scholar
Dipl.-Ing. Ronald Müller
View author publications
You can also search for this author in PubMed Google Scholar
Dipl.-Ing. Stefan Reifinger
View author publications
You can also search for this author in PubMed Google Scholar
Dipl.-Ing. Tony Poitschke
View author publications
You can also search for this author in PubMed Google Scholar
Prof. Dr.-Ing. Gerhard Rigoll
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RWTH Aachen University, Ahornstraße 55, 52074, Aachen, Germany
Karl-Friedrich Kraiss (Chair of Technical Computer Science) (Chair of Technical Computer Science)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schuller, B., Ablaßmeier, M., Müller, R., Reifinger, S., Poitschke, T., Rigoll, G. (2006). Speech Communication and Multimodal Interfaces. In: Kraiss, KF. (eds) Advanced Man-Machine Interaction. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-30619-6_4

Download citation

DOI: https://doi.org/10.1007/3-540-30619-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30618-4
Online ISBN: 978-3-540-30619-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Speech Communication and Multimodal Interfaces

Access this chapter

Preview

References