Advertisement

Mind: A Context-Based Multimodal Interpretation Framework in Conversational Systems

  • Joyce Y. Chai
  • Shimei Pan
  • Michelle X. Zhou
Chapter
Part of the Text, Speech and Language Technology book series (TLTB, volume 30)

Abstract

In a multimodal human-machine conversation, user inputs are often abbreviated or imprecise. Simply fusing multimodal inputs together may not be sufficient to derive a complete understanding of the inputs. Aiming to handle a wide variety of multimodal inputs, we are building a context-based multimodal interpretation framework called MIND (Multimodal Interpreter for Natural Dialog). MIND is unique in its use of a variety of contexts, such as domain context and conversation context, to enhance multimodal interpretation. In this chapter, we first describe a fine-grained semantic representation that captures salient information from user inputs and the overall conversation, and then present a context-based interpretation approach that enables MIND to reach a full understanding of user inputs, including those abbreviated or imprecise ones.

Keywords

Multimodal input interpretation multimodal interaction conversation systems 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alexandersson, J. and Becker, T. (2001). Overlay as the Basic Operation for Discourse Processing in a Multimodal Dialogue System. In Proceedings of the IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Seattle, Washington, USA.Google Scholar
  2. Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., and Stent, A. (2001). Towards Conversational Human-Computer Interaction. AI Magazine, 22(4):27–37.Google Scholar
  3. Bolt, R. A. (1980). Voice and Gesture at the Graphics Interface. Computer Graphics, pages 262–270.Google Scholar
  4. Burger, J. and Marshall, R. (1993). The Application of Natural Language Models to Intelligent Multimedia. In Maybury, M., editor, Intelligent Multimedia Interfaces, pages 429–440. Menlo Park, CA: MIT Press.Google Scholar
  5. Carpenter, B. (1992). The Logic of Typed Feature Structures. Cambridge University Press.Google Scholar
  6. Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., and Yan, H. (1999). Embodiment in Conversational Interfaces: Rea. In Proceedings of Conference on Human Factors in Computing Systems (CHI), pages 520–527, Pittsburgh, PA.Google Scholar
  7. Chai, J. (2002a). Operations for Context-Based Multimodal Interpretation in Conversational Systems. In Proceedings of International Conference on Spoken Language Processing (ICSLP), pages 2249–2252, Denver, Colorado, USA.Google Scholar
  8. Chai, J. (2002b). Semantics-Based Representation for Multimodal Interpretation in Conversational Systems. In Proceedings of the 19th International Conference on Computational Linguistics (COLING), pages 141–147, Taipei, Taiwan.Google Scholar
  9. Chai, J., Hong, P., and Zhou, M. (2004). A Probabilistic Approach for Reference Resolution in Multimodal User Interfaces. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), pages 70–77, Madeira, Portugal. ACM.Google Scholar
  10. Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. (1997). Quickset: Multimodal Interaction for Distributed Applications. In Proceedings of the Fifth Annual International ACM Multimedia Conference, pages 31–40, Seattle, USA.Google Scholar
  11. Grosz, B. J. and Sidner, S. (1986). Attention, Intentions, and the Structure of Discourse. Computational Linguistics, 12(3):175–204.Google Scholar
  12. Gustafson, J., Bell, L., Beskow, J., Boye, J., Carlson, R., Edlund, J., Granström, B., House, D., and Wirén, M. (2000). AdApt—A Multimodal Conversational Dialogue System in an Apartment Domain. In Proceedings of International Conference on Spoken Language Processing (ICSLP), volume 2, pages 134–137, Beijing, China.Google Scholar
  13. Jelinek, F., Lafferty, J., Magerman, D. M., Mercer, R., and Roukos, S. (1994). Decision Tree Parsing Using a Hidden Derivation Model. In Proceedings of Darpa Speech and Natural Language Workshop, pages 272–277.Google Scholar
  14. Johnston, M. (1998). Unification-Based Multimodal Parsing. In Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pages 624–630, Montreal, Quebec, Canada.Google Scholar
  15. Johnston, M. and Bangalore, S. (2000). Finite-State Multimodal Parsing and Understanding. In Proceedings of the 18th International Conference on Computational Linguistics (COLING), pages 369–375, Saarbrücken, Germany.Google Scholar
  16. Johnston, M., Bangalore, S., Visireddy, G., Stent, A., Ehlen, P., Walker, M., Whittaker, S., and Maloor, P. (2002). MATCH: An Architecture for Multimodal Dialog Systems. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 376–383, Philadelphia, USA.Google Scholar
  17. Kehler, A. (2000). Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction. In Proceedings of the 17th National Conference on Artifical Intelligence (AAAI), pages 685–689, Austin, Texas, USA.Google Scholar
  18. Koons, D. B., Sparrell, C. J., and Thorisson, K. R. (1993). Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures. In Maybury, M., editor, Intelligent Multimedia Interfaces, pages 257–276. Menlo Park, CA: MIT Press.Google Scholar
  19. Lambert, L. and Carberry, S. (1992). Modeling Negotiation Subdialogues. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (ACL), pages 193–200, Newark, Delaware, USA.Google Scholar
  20. Litman, D. J. and Allen, J. F. (1987). A Plan Recognition Model for Subdialogues in Conversations. Cognitive Science, 11:163–200.CrossRefGoogle Scholar
  21. Neal, J. G. and Shapiro, S. C. (1988). Architectures for Intelligent Interfaces: Elements and Prototypes. In Sullivan, J. and Tyler, S., editors, Intelligent User Interfaces, pages 69–91. Addison-Wesley.Google Scholar
  22. Neal, J. G., Thielman, C. Y., Dobes, Z., Haller, S. M., and Shapiro, S. C. (1998). Natural Language with Integrated Deictic and Graphic Gestures. In Maybury, M. and Wahlster, W., editors, Intelligent User Interfaces, pages 38–52. Morgan Kaufmann.Google Scholar
  23. Oviatt, S. L. (2000). Multimodal System Processing in Mobile Environments. In Proceedings of the Thirteenth Annual ACM Symposium on User Interface Software Technology (UIST), pages 21–30. New York: ACM Press.Google Scholar
  24. Stent, A., Dowding, J., Gawron, J. M., Bratt, E. O., and Moore, R. (1999). The Commandtalk Spoken Dialog System. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), pages 183–190, College Park, Maryland, USA.Google Scholar
  25. Vo, M. T. and Wood, C. (1996). Building an Application Framework for Speech and Pen Input Integration in Multimodal Learning Interfaces. In Proceedings of IEEE International Conference of Acoustic, Speech and Signal Processing, volume 6, pages 3545–3548, Atlanta, USA.CrossRefGoogle Scholar
  26. Wahlster, W. (1998). User and Discourse Models for Multimodal Communication. In Maybury, M. and Wahlster, W., editors, Intelligent User Interfaces, pages 359–370. Morgan Kaufmann.Google Scholar
  27. Wahlster, W. (2000). Mobile Speech-to-Speech Translation of Spontaneous Dialogs: An Overview of the Final Verbmobil System. In Verbmobil: Foundations of Speech-to-Speech Translation, pages 3–21. Springer Press.Google Scholar
  28. Wu, L., Oviatt, S., and Cohen, P. (1999). Multimodal Integration-A Statistical View. IEEE Transactions on Multimedia, 1(4):334–341.CrossRefGoogle Scholar
  29. Zancanaro, M., Stock, O., and Strapparava, C. (1997). Multimodal Interaction for Information Access: Exploiting Cohesion. Computational Intelligence, 13(4):439–464.CrossRefGoogle Scholar
  30. Zhou, M. X. and Pan, S. (2001). Automated Authoring of Coherent Multimedia Discourse in Conversation Systems. In Proceedings of the Ninth ACM International Conference on Multimedia, pages 555–559, Ottawa, Ontario, Canada.Google Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  • Joyce Y. Chai
    • 1
  • Shimei Pan
    • 2
  • Michelle X. Zhou
    • 2
  1. 1.Department of Computer Science and EngineeringMichigan State UniversityEast LansingUSA
  2. 2.IBM T. J. Watson Research CenterHawthorneUSA

Personalised recommendations