Building a Practical Multimodal System with a Multimodal Fusion Module

Sun, Yong; Shi, Yu (David); Chen, Fang; Chung, Vera

doi:10.1007/978-3-642-02577-8_11

Building a Practical Multimodal System with a Multimodal Fusion Module

Yong Sun^17,18,
Yu (David) Shi¹⁷,
Fang Chen^17,18 &
…
Vera Chung¹⁸

Conference paper

3224 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5611))

Abstract

A multimodal system is a system equipped with a multimodal interface through which a user can interact with the system by using his/her natural communication modalities, such as speech, gesture, eye gaze, etc. To understand a user’s intension, multimodal input fusion, a critical component of a multimodal interface, integrates a user’s multimodal inputs and finds the combined semantic interpretation of them. As powerful, yet affordable input and output technologies becoming available, such as speech recognition and eye tracking, it becomes possible to attach recognition technologies to existing applications with a multimodal input fusion module; therefore, a practical multimodal system can be built. This paper documents our experience about building a practical multimodal system with our multimodal input fusion technology. The pilot study has been conducted over the multimodal system. By outlining observations from the pilot study, the implications on multimodal interface design are laid out.

Download to read the full chapter text

Chapter PDF

References

Baldridge, J., Kruijff, M.G.: Coupling CCG and Hybrid Logic Dependency Semantics. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (July 2002)
Google Scholar
Bozsahin, C., Kruijff, M.G., White, M.: Specifying Grammars for OpenCCG: A Rough Guide. The OpenCCG package (2006), http://openccg.sourceforge.net/
http://fonixspeech.com/
Kruijff. G. M.: A Categorial Modal Architecture of Informativity: Dependency Grammar Logic & Information Structure. Ph.D thesis, Charles University, Prague, Czech Republic (2001)
Google Scholar
Nischt, M., Prendinger, H., Andre, E., Ishizuka, M.: MPML3D: a Reactive Framework for the Multimodal Presentation Markup Language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS, vol. 4133, pp. 218–229. Springer, Heidelberg (2006)
Chapter Google Scholar
Sun, Y., Prendinger, H., Shi, Y., Chen, F., Chung, V., Ishizuka, M.: THE HINGE between Input and Output: Understanding the Multimodal Input Fusion Results In an Agent-Based Multimodal Presentation System. In: CHI 2008 extended abstracts on Human factors in computing systems, Florence, Italy, April 2008, pp. 3483–3488 (2008)
Google Scholar
Sun, Y., Shi, Y., Chen, F., Chung, V.: An Efficient Unification-based Multimodal Language Processor in Multimodal Input Fusion. In: 19th Australasian conference on Computer-Human Interaction: Entertaining User Interfaces, Adelaide, Australia (November 2007)
Google Scholar
Steedman, M.: The Syntactic Process. MIT Press, Cambridge (2000)
MATH Google Scholar
http://www.seeingmachines.com/

Download references

Author information

Authors and Affiliations

National ICT Australia, Level 5, 13 Garden Street, Eveleigh, NSW 2015, Australia
Yong Sun, Yu (David) Shi & Fang Chen
School of Information Technologies, J12, The University of Sydney, NSW 2006, Australia
Yong Sun, Fang Chen & Vera Chung

Authors

Yong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yu (David) Shi
View author publications
You can also search for this author in PubMed Google Scholar
Fang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Vera Chung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Health Informatics, MMC 912, University of Minnesota, 420 Delaware Street S.E., 55455, Minneapolis, MN, USA
Julie A. Jacko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Y., Shi, Y.(., Chen, F., Chung, V. (2009). Building a Practical Multimodal System with a Multimodal Fusion Module. In: Jacko, J.A. (eds) Human-Computer Interaction. Novel Interaction Methods and Techniques. HCI 2009. Lecture Notes in Computer Science, vol 5611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02577-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-02577-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02576-1
Online ISBN: 978-3-642-02577-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics