Abstract
This overview introduces the emerging set of techniques for parsing and generating multiple media (e.g., text, graphics, maps, gestures) using multiple sensory modalities (e.g., auditory, visual, tactile). We first briefly introduce and motivate the value of such techniques. Next we describe various computational methods for parsing input from heterogeneous media and modalities (e.g., natural language, gesture, gaze). We subsequently overview complementary techniques for generating coordinated multimedia and multimodal output. Finally, we discuss systems that have integrated both parsing and generation to enable multimedia dialogue in the context of intelligent interfaces. The article concludes by outlining fundamental problems which require further research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, J. (1987). Natural Language Understanding. Benjamin Cummings: Reading, MA.
André, E. & Rist, T. (1993). The Design of Illustrated Documents as a Planning Task. In (Maybury 1993), 94–116. Also DFKI Research Report RR-92-45.
André, E., Finkler, W., Graf, W., Rist, T., Schauder, A. & Wahlster, W. (1993). WIP: The Automatic Synthesis of Multimodal Presentations. In (Maybury 1993), 73–90. Also DFKI Research Report RR-92-46.
Arens, Y., Miller, L. & Sondheimer, N. K. (1991). Presentation Design Using an Integrated Knowledge Base. In (Sullivan and Tyler 1991), 241–258.
Proceedings of the ARPA Human Language Technology Workshop, March 1993. Morgan Kaufman: San Francisco.
Austin, J. (1962). How to do Things with Words, J. O. Urmson (ed.), Oxford University Press: England.
Blattner, M. M. & Dannenberg, R. B. (eds.). (1992). Multimedia Interface Design, ACM Press/Addison-Wesley: Reading, MA.
Bonarini, A. (1993). Modeling Issues in Multimedia Car-Driver Interaction. In (Maybury 1993), 353–371.
Brachman, R. J. & Schmolze, J. G. (1985). An Overview of the KL-ONE Knowledge Representation System. Cognitive Science 9(2): 171–216.
Burger, J. & Marshall, R. (1993). The Application of Natural Language Models to Intelligent Multimedia. In (Maybury 1993), 167–187.
Buxton, W. & Myers, B. A. (1986). A Study in Two-Handed Input. Proceedings of Human Factors in Computing Systems (CHI-86), 321–326, ACM: New York.
Buxton, W., Bly, S., Frysinger, S., Lunney, D., Mansur, D., Mezrich, J. & Morrison, R. (1985). Communicating with Sound. Proceedings of The Human Factors in Computing Systems (CHI-85), 115–119, New York.
Buxton, W. (ed.). (1989). Human-Computer Interaction 4: Special Issue on Nonspeech Audio, Lawrence Erlbaum.
Buxton, W., Gaver, W. & Bly, S. Auditory Interfaces: The use of Non-speech Audio at the Interface. Cambridge University Press (in press).
Carbonell, J. R. (1970). Mixed-Initiative Man-Computer Dialogues. Bolt, Beranek and Newman (BBN) Report No. 1971, Cambridge, MA.
Cornell, M., Woolf, B. & Suthers, D. (1993). Using ‘Live Information’ in a Multimedia Framework. In (Maybury 1993), 307–327.
Dale, R., Mellish, C. & Zock, M. (eds.). (1990). Current Research in Natural Language Generation. Based on Extended Abstracts from the Second European Workshop on Natural Language Generation, University of Edinburgh, Edinburgh, Scotland, 6–8 April, 1989. London: Academic Press. ISBN 0-12-200735-2, 356 pp.
Dale, R., Hovy, E. Rösner, D. & Stock, O. (eds.). (1992). Aspects of Automated Natural Language Generation, Lecture Notes in Computer Science, 587. Proceedings of The 6th International Workshop on Natural Language Generation, Trento, Italy, April 5–7, 1992. Springer-Verlag: Berlin.
Fallside, F. & Woods, W. (eds.). (1985). Computer Speech Processing. Prentice Hall: Englewood Cliffs, NJ. Contributions by speakers at an advanced course on computer speech processing held at the University of Cambridge in 1983.
Feiner, S. (1985). APEX: An Experiment in the Automated Creation of Pictorial Explanations. IEEE Computer Graphics and Application 5(11): 29–37.
Feiner, S. (1988). A Grid-based Approach to Automating Display Layout. Proceedings of The Graphics Interface, 192–197. Morgan Kaufmann: Los Angeles.
Feiner, S. K. & McKeown, K. R. (1993). Automating the Generation of Coordinated Multimedia Explanations. In (Maybury 1993), 113–134.
Feiner, S. K., Litman, D. J., McKeown, K. R. & Passonneau, R. J. (1993). Towards Coordinated Temporal Multimedia Presentations. In (Maybury 1993), 139–147.
Feiner, S., Mackinlay, J. & Marks, J. 1992. Automating the Design of Effective Graphics for Intelligent User Interfaces. Tutorial Notes. Human Factors in Computing Systems, CHI-92, Monterey.
Goodman, B. A. (1993). Multimedia Explanations for Intelligent Training Systems. In (Maybury 1993), 148–171.
Gray, W. D., Hefley, W. E. & Murray, D. (eds.). (1993). In Proceedings of The 1993 International Workshop on Intelligent User Interfaces. Orlando, FL January, 1993. ACM: New York.
Graf, W. (1992). Constraint-based Graphical Layout of Multimodal Presentations. In (Catarci, Costabile, and Levialdi 1992), 365–385. Also available as DFKI Report RR-92-15.
Graf, W. (1994) Semantik-gesteuertes Layout-Design multimodaler Prasentationen, Ph.D. diss., Technische Fakultät, Universitat des Saarlandes, Saarbriicken, Germany.
Grosz, B. J., Sparck Jones, K. & Webber, B. L. (eds.). (1986). Readings in Natural Language Processing. Morgan Kaufmann: Los Altos.
Horacek, H. & Zock, M. (eds.). (1993). New Concepts in Natural Language Generation: Planning, Realization and Systems. Frances Pinter, London and New York.
Hovy, E. H. & Arens, Y. (1991). Automatic Generation of Formatted Text. In Proceedings of The Ninth National Conference of the American Association for Artificial Intelligence, 92–91, Anaheim, CA.
Hovy, E. H. & Arens, Y. (1993). On the Knowledge Underlying Multimedia Presentations. In (Maybury 1993), 280–306.
Jacob, R. J. K. (1990). What You Look at is What You Get: Eye Movement-Based Interaction Techniques. In Proceedings of The Human Factors in Computing Systems (CHI ‘90), 11–18. ACM Press: New York. Seattle, April 1–5.
Kempen, G. (ed.). (1987). Natural Language Generation: New Results in Artificial Intelligence, Psychology, and Linguistics, Martinus Nijhoff. NATO ASI Series: Dordrecht.
Kobsa, A. & Wahlster, W. (eds.). (1989). User Models in Dialog Systems. Springer-Verlag: Berlin.
Kobsa, A., Allgayer, J., Reddig, C., Reithinger, N., Schmauks, D., Harbush, K. & Wahlster, W. 1986. Combining Deictic Gestures and Natural Language for Referent Identification. Proceedings of The 11th International Conference on Computational Linguistics, 356–361, Bonn, West Germany.
Koons, D. B., Sparrell, C. J. & Thorisson, K. R. (1993). Integrating Simultaneous Output from Speech, Gaze, and Hand Gestures. In (Maybury 1993), 243–261.
Krause, J. (1993). A Multilayered Empirical Approach to Multimodality: Towards Mixed Solutions of Natural Language and Graphical Interfaces. In (Maybury 1993), 312–336.
Mackinlay, J. D. (1986). Automating the Design of Graphical Presentations of Relational Information. ACM Transactions on Graphics 5(2): 110–141.
Marks, J. W. (1991). Automating the Design of Network Diagrams. Ph.D. thesis, Harvard University, Cambridge, MA.
Marks, J. (1991). A Formal Specification Scheme for Network Diagrams that Facilitates Automated Design. Journal of Visual Languages and Computing 2(4): 395–414.
Marti, P., Profili, M., Raffaelli, P. & Toffoli, G. (1992). Graphics, Hyperqueries, and Natural Language: an Integrated Approach to User-Computer Interfaces. In (Catarci, Costabile, and Levialdi, 1992), 68–84.
Maybury, M. T. (1990). Planning Multisentential English Text using Communicative Acts. Ph.D. diss., University of Cambridge, England. Available as Rome Air Development Center TR 90–411, In-House Report, December 1990 or as Cambridge University Computer Laboratory TR-239, December, 1991.
Maybury, M. T. (1991). Planning Multimedia Explanations Using Communicative Acts. In Proceedings of The Ninth National Conference on Artificial Intelligence, 61–66. AAAI: Anaheim, CA.
Maybury, M. T. (ed.). (1993). Intelligent Multimedia Interfaces. AAAI/MIT Press: Menlo Park.
Maybury, M. T. (1994). Knowledge Based Multimedia: The Future of Expert Systems and Multimedia. International Journal of Expert Systems with Applications. Special issue on Expert Systems Integration with Multimedia Technologies 7(3), 387–396.
Ragusa, J. (ed.). (1994). Knowledge Based Multimedia: The Future of Expert Systems and Multimedia. Elsevier Science.
Maybury, M. T. (1994). Automated Explanation and Natural Language Generation. In Computational Text Generation. Bibliography. Sabourin, C. (ed.), Montreal: Infolingua, 1–88.
Maybury, M. T. (in press). Communicative Acts for Multimedia and Multimodal Dialogue. In Taylor, M. M., Néel, F. & Bouwhuis, D. G. The Structure of Multimodal Dialogue. North-Holland: London. ISSN 1018–4554. Proceedings from workshop at Acquafredda di Maratea, Italy. September 16–20, 1991.
Neal, J. G. & Shapiro, S. C. 1991. Intelligent Multi-Media Interface Technology. In (Sullivan and Tyler 1991), 11–43.
Paris, C. L., Swartout, W. R. & Mann, W. C. (eds.). (1991). Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer: Norwell, MA.
Pelachaud, C. (1992). Functional Decomposition of Facial Expressions for an Animation System. In (Catarci, Costabile, and Levialdi 1992), 26–49.
Pentland, A. (ed.). (1993). Proceedings of IJCAI Special Workshop #3, Looking at People: Recognition and Interpretation of Human Action. Held in conjunction with 13th IJCAI, Chambrey, Savoie France, 28 August-3 September, 1993.
Rabiner, L. R. & Schafer, R. W. (eds.). Digital Processing of Speech Signals. Prentice Hall: Englewood Cliffs, NJ.
Rimé, B. & Schiaratura, L. (1991). Gesture and Speech. In Feldman, R. S. & Rim, B. (eds.) Fundamentals of Nonverbal Behavior, 239–281. New York: Press Syndicate of the University of Cambridge.
Reiter, E., Mellish, C. & Levine, J. (1992). Automatic Generation of on-line Documentation in the IDAS Project. Proceedings of the 3rd Conference on Applied Natural Language Processing, 31 March-3 April 1992, Trento, Italy. Association of Computional Linguistics: Morristown, NJ.
Roe, D. B. & Wilpon, J. S. (eds). (to appear). Proceedings of The National Academy of Sciences Colloquium on Human Machine Communication by Voice, National Academy of Sciences Press: Washington, DC.
Roth, S. F. & Mattis, J. 1990. Data Characterization for Intelligent Graphics Presentation. In Proceedings of The 1990 Conference on Human Factors in Computing Systems, 193–200. New Orleans, Louisiana. ACM/SIGCHI.
Roth, S. F. & Mattis, J. 1991. Automating the Presentation of Information. In Proceedings of The IEEE Conference on AI Applications, 90–97. Miami Beach, FL.
Roth, S. F., Mattis, J. & Mesnard, X. (1991). Graphics and Natural Language Generation as Components of Automatic Explanation. In (Sullivan and Tyler 1991), 207–239.
Schwanauer, S. & Levitt, D. (eds). (1993). Machine Models of Music. MIT Press: Cambridge, MA.
Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press: London.
Stein, A., Thiel, U. & Tissen, A. (1992). Knowledge based Control of Visual Dialogues in Information Systems. In Catarci, T., Costabile, M. F. & Levialdi, S. (eds.) 1992. Advanced Visual Interfaces: Proceedings of the International Workshop AVI’92, Singapore: World Scientific Series in Computer Science, Vol. 36, 138–155.
Proceedings of the Third International Workshop on Frontiers of Handwriting Recognition (IWFHR III). Buffalo, NY. May 25–27, 1993.
Stein, A. & Tissen, A. (1993). A Conversational Model of Multimodal Interaction in Information Systems. In Proceedings of The Eleventh National Conference on Artificial Intelligence, 283–288. AAAI/MIT Press: Washington, DC.
Stock, O. & the ALFRESCO Project Team (1993). ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Maybury, M. (ed.) Intelligent Multimedia Interfaces, 197–224. AAAI/MIT Press: Menlo Park.
Sullivan, J. W. & Tyler, S. W. (eds.). (1991) Intelligent User Interfaces. Frontier Series. New York: ACM Press.
Taylor, M. & Bouwhuis, D. G. (eds). (1989) The Structure of Multimodal Dialogue. Elsevier Science Publishers: Amsterdam.
Thorisson, K., Koons, D. & Bolt, R. (1992). Multi-modal Natural Dialogue. In Proceedings of Computer Human Interaction (CHI-92), 653–654.
Wahlster, W. (1991). User and Discourse Models for Multimodal Communication. In (Sullivan and Tyler, 1991), 45–67.
Waibel, A. & Lee, K. (eds.) (1990). Readings in Speech Recognition, Morgan Kaufmann: San Mateo, CA.
Wittenburg, K. (1993). Multimedia and Multimodal Parsing: Tutorial Notes. 31st Annual Meeting of the ACL, Columbus, Ohio, 23, June, 1993.
Books
Taylor, M. & Bouwhuis, D. G. (eds.). (1989). The Structure of Multimodal Dialogue. Elsevier Science Publishers: Amsterdam.
Sullivan, J. W. & Tyler, S. W. (eds). (1991). Intelligent User Interfaces. Frontier Series. ACM Press: New York.
Blattner, M. M. & Dannenberg, R. B. (eds.). (1992). Multimedia Interface Design. ACM Press/Addison-Wesley: Reading, MA.
Catarci, T., Costabile, M. F. & Levialdi, S. (eds.). (1992). Advanced Visual Interfaces: Proceedings of The International Workshop AVI’92, Singapore: World Scientific Series in Computer Science, Vol. 36.
Buxton, W., Gaver, W. & Bly, S. (in press). Auditory Interfaces: The use of Non-speech Audio at the Interface. Cambridge University Press.
Maybury, M. (ed.). (1993). Intelligent Multimedia Interfaces. AAAI/MIT Press: Cambridge, MA.
Workshop/Conference Proceedings
Neches, B. & Kaczmarek, T. (1986). Working Notes from the AAAI Workshop on Intelligence in Interfaces, August 14, 1986. AAAI: Menlo Park.
Arens, Y., Feiner, S., Hollan, J. & Neches, B. (eds.). (1989). Workshop Notes from the IJCAI-89 Workshop on A New Generation of Intelligent Interfaces. Detroit, MI, 22 August.
Maybury, M. T. (ed.). (1991). Working Notes from the AAAI Workshop on Intelligent Multimedia Interfaces. Ninth National Conference on Artificial Intelligence. 15 July, Anaheim, CA. AAAI: Menlo Park.
Taylor, M., Bouwhuis, D. G. & Neél, F. (eds.). (1991) Pre-proceedings of the Second Venaco Workshop on The Structure of Multimodal Dialogue, Acquafredda di Maratea, Italy, September, 1991.
Gray, W. D. Hefley, W. E. & Murray, D. (eds.). (1993). Proceedings of The 1993 International Workshop on Intelligent User Interfaces, Orlando, FL January, 1993. ACM: New York.
Johnson, P., Marks, J., Maybury, M., Moore, J. & Feiner, S. (organizing committee) Working notes from The AAAI 1994 Spring Symposium on Intelligent Multimedia and Multimodal Systems, Stanford, CA, March 21–24, 1994.
Tutorials/Overviews
Wittenburg, K. (1993). Multimedia and Multimodal Parsing: Tutorial Notes. 31st Annual Meeting of the ACL, Columbus, Ohio, 23 June, 1993.
Feiner, S., Mackinlay, J. & Marks, J. (1992). Automating the Design of Effective Graphics for Intelligent User Interfaces. Tutorial Notes. Human Factors in Computing Systems, CHI-92, Monterey, May 4, 1992.
Wahlster, W. (1993). Planning Multimodal Discourse. Invited Talk. Association for Computational Linguistics, Annual Meeting, Ohio State Univ., Columbus, Ohio, 24 June 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1995 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Maybury, M.T. (1995). Research in Multimedia and Multimodal Parsing and Generation. In: Mc Kevitt, P. (eds) Integration of Natural Language and Vision Processing. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0445-6_6
Download citation
DOI: https://doi.org/10.1007/978-94-011-0445-6_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4199-7
Online ISBN: 978-94-011-0445-6
eBook Packages: Springer Book Archive