Research in Multimedia and Multimodal Parsing and Generation

Maybury, Mark T.

doi:10.1007/978-94-011-0445-6_6

Mark T. Maybury²

61 Accesses
2 Citations

Abstract

This overview introduces the emerging set of techniques for parsing and generating multiple media (e.g., text, graphics, maps, gestures) using multiple sensory modalities (e.g., auditory, visual, tactile). We first briefly introduce and motivate the value of such techniques. Next we describe various computational methods for parsing input from heterogeneous media and modalities (e.g., natural language, gesture, gaze). We subsequently overview complementary techniques for generating coordinated multimedia and multimodal output. Finally, we discuss systems that have integrated both parsing and generation to enable multimedia dialogue in the context of intelligent interfaces. The article concludes by outlining fundamental problems which require further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen, J. (1987). Natural Language Understanding. Benjamin Cummings: Reading, MA.
Google Scholar
André, E. & Rist, T. (1993). The Design of Illustrated Documents as a Planning Task. In (Maybury 1993), 94–116. Also DFKI Research Report RR-92-45.
Google Scholar
André, E., Finkler, W., Graf, W., Rist, T., Schauder, A. & Wahlster, W. (1993). WIP: The Automatic Synthesis of Multimodal Presentations. In (Maybury 1993), 73–90. Also DFKI Research Report RR-92-46.
Google Scholar
Arens, Y., Miller, L. & Sondheimer, N. K. (1991). Presentation Design Using an Integrated Knowledge Base. In (Sullivan and Tyler 1991), 241–258.
Google Scholar
Proceedings of the ARPA Human Language Technology Workshop, March 1993. Morgan Kaufman: San Francisco.
Google Scholar
Austin, J. (1962). How to do Things with Words, J. O. Urmson (ed.), Oxford University Press: England.
Google Scholar
Blattner, M. M. & Dannenberg, R. B. (eds.). (1992). Multimedia Interface Design, ACM Press/Addison-Wesley: Reading, MA.
MATH Google Scholar
Bonarini, A. (1993). Modeling Issues in Multimedia Car-Driver Interaction. In (Maybury 1993), 353–371.
Google Scholar
Brachman, R. J. & Schmolze, J. G. (1985). An Overview of the KL-ONE Knowledge Representation System. Cognitive Science 9(2): 171–216.
Article Google Scholar
Burger, J. & Marshall, R. (1993). The Application of Natural Language Models to Intelligent Multimedia. In (Maybury 1993), 167–187.
Google Scholar
Buxton, W. & Myers, B. A. (1986). A Study in Two-Handed Input. Proceedings of Human Factors in Computing Systems (CHI-86), 321–326, ACM: New York.
Google Scholar
Buxton, W., Bly, S., Frysinger, S., Lunney, D., Mansur, D., Mezrich, J. & Morrison, R. (1985). Communicating with Sound. Proceedings of The Human Factors in Computing Systems (CHI-85), 115–119, New York.
Google Scholar
Buxton, W. (ed.). (1989). Human-Computer Interaction 4: Special Issue on Nonspeech Audio, Lawrence Erlbaum.
Google Scholar
Buxton, W., Gaver, W. & Bly, S. Auditory Interfaces: The use of Non-speech Audio at the Interface. Cambridge University Press (in press).
Google Scholar
Carbonell, J. R. (1970). Mixed-Initiative Man-Computer Dialogues. Bolt, Beranek and Newman (BBN) Report No. 1971, Cambridge, MA.
Google Scholar
Cornell, M., Woolf, B. & Suthers, D. (1993). Using ‘Live Information’ in a Multimedia Framework. In (Maybury 1993), 307–327.
Google Scholar
Dale, R., Mellish, C. & Zock, M. (eds.). (1990). Current Research in Natural Language Generation. Based on Extended Abstracts from the Second European Workshop on Natural Language Generation, University of Edinburgh, Edinburgh, Scotland, 6–8 April, 1989. London: Academic Press. ISBN 0-12-200735-2, 356 pp.
Google Scholar
Dale, R., Hovy, E. Rösner, D. & Stock, O. (eds.). (1992). Aspects of Automated Natural Language Generation, Lecture Notes in Computer Science, 587. Proceedings of The 6th International Workshop on Natural Language Generation, Trento, Italy, April 5–7, 1992. Springer-Verlag: Berlin.
Google Scholar
Fallside, F. & Woods, W. (eds.). (1985). Computer Speech Processing. Prentice Hall: Englewood Cliffs, NJ. Contributions by speakers at an advanced course on computer speech processing held at the University of Cambridge in 1983.
Google Scholar
Feiner, S. (1985). APEX: An Experiment in the Automated Creation of Pictorial Explanations. IEEE Computer Graphics and Application 5(11): 29–37.
Article Google Scholar
Feiner, S. (1988). A Grid-based Approach to Automating Display Layout. Proceedings of The Graphics Interface, 192–197. Morgan Kaufmann: Los Angeles.
Google Scholar
Feiner, S. K. & McKeown, K. R. (1993). Automating the Generation of Coordinated Multimedia Explanations. In (Maybury 1993), 113–134.
Google Scholar
Feiner, S. K., Litman, D. J., McKeown, K. R. & Passonneau, R. J. (1993). Towards Coordinated Temporal Multimedia Presentations. In (Maybury 1993), 139–147.
Google Scholar
Feiner, S., Mackinlay, J. & Marks, J. 1992. Automating the Design of Effective Graphics for Intelligent User Interfaces. Tutorial Notes. Human Factors in Computing Systems, CHI-92, Monterey.
Google Scholar
Goodman, B. A. (1993). Multimedia Explanations for Intelligent Training Systems. In (Maybury 1993), 148–171.
Google Scholar
Gray, W. D., Hefley, W. E. & Murray, D. (eds.). (1993). In Proceedings of The 1993 International Workshop on Intelligent User Interfaces. Orlando, FL January, 1993. ACM: New York.
Google Scholar
Graf, W. (1992). Constraint-based Graphical Layout of Multimodal Presentations. In (Catarci, Costabile, and Levialdi 1992), 365–385. Also available as DFKI Report RR-92-15.
Google Scholar
Graf, W. (1994) Semantik-gesteuertes Layout-Design multimodaler Prasentationen, Ph.D. diss., Technische Fakultät, Universitat des Saarlandes, Saarbriicken, Germany.
Google Scholar
Grosz, B. J., Sparck Jones, K. & Webber, B. L. (eds.). (1986). Readings in Natural Language Processing. Morgan Kaufmann: Los Altos.
Google Scholar
Horacek, H. & Zock, M. (eds.). (1993). New Concepts in Natural Language Generation: Planning, Realization and Systems. Frances Pinter, London and New York.
Google Scholar
Hovy, E. H. & Arens, Y. (1991). Automatic Generation of Formatted Text. In Proceedings of The Ninth National Conference of the American Association for Artificial Intelligence, 92–91, Anaheim, CA.
Google Scholar
Hovy, E. H. & Arens, Y. (1993). On the Knowledge Underlying Multimedia Presentations. In (Maybury 1993), 280–306.
Google Scholar
Jacob, R. J. K. (1990). What You Look at is What You Get: Eye Movement-Based Interaction Techniques. In Proceedings of The Human Factors in Computing Systems (CHI ‘90), 11–18. ACM Press: New York. Seattle, April 1–5.
Google Scholar
Kempen, G. (ed.). (1987). Natural Language Generation: New Results in Artificial Intelligence, Psychology, and Linguistics, Martinus Nijhoff. NATO ASI Series: Dordrecht.
Google Scholar
Kobsa, A. & Wahlster, W. (eds.). (1989). User Models in Dialog Systems. Springer-Verlag: Berlin.
Google Scholar
Kobsa, A., Allgayer, J., Reddig, C., Reithinger, N., Schmauks, D., Harbush, K. & Wahlster, W. 1986. Combining Deictic Gestures and Natural Language for Referent Identification. Proceedings of The 11th International Conference on Computational Linguistics, 356–361, Bonn, West Germany.
Google Scholar
Koons, D. B., Sparrell, C. J. & Thorisson, K. R. (1993). Integrating Simultaneous Output from Speech, Gaze, and Hand Gestures. In (Maybury 1993), 243–261.
Google Scholar
Krause, J. (1993). A Multilayered Empirical Approach to Multimodality: Towards Mixed Solutions of Natural Language and Graphical Interfaces. In (Maybury 1993), 312–336.
Google Scholar
Mackinlay, J. D. (1986). Automating the Design of Graphical Presentations of Relational Information. ACM Transactions on Graphics 5(2): 110–141.
Article Google Scholar
Marks, J. W. (1991). Automating the Design of Network Diagrams. Ph.D. thesis, Harvard University, Cambridge, MA.
Google Scholar
Marks, J. (1991). A Formal Specification Scheme for Network Diagrams that Facilitates Automated Design. Journal of Visual Languages and Computing 2(4): 395–414.
Article Google Scholar
Marti, P., Profili, M., Raffaelli, P. & Toffoli, G. (1992). Graphics, Hyperqueries, and Natural Language: an Integrated Approach to User-Computer Interfaces. In (Catarci, Costabile, and Levialdi, 1992), 68–84.
Google Scholar
Maybury, M. T. (1990). Planning Multisentential English Text using Communicative Acts. Ph.D. diss., University of Cambridge, England. Available as Rome Air Development Center TR 90–411, In-House Report, December 1990 or as Cambridge University Computer Laboratory TR-239, December, 1991.
Google Scholar
Maybury, M. T. (1991). Planning Multimedia Explanations Using Communicative Acts. In Proceedings of The Ninth National Conference on Artificial Intelligence, 61–66. AAAI: Anaheim, CA.
Google Scholar
Maybury, M. T. (ed.). (1993). Intelligent Multimedia Interfaces. AAAI/MIT Press: Menlo Park.
Google Scholar
Maybury, M. T. (1994). Knowledge Based Multimedia: The Future of Expert Systems and Multimedia. International Journal of Expert Systems with Applications. Special issue on Expert Systems Integration with Multimedia Technologies 7(3), 387–396.
Google Scholar
Ragusa, J. (ed.). (1994). Knowledge Based Multimedia: The Future of Expert Systems and Multimedia. Elsevier Science.
Google Scholar
Maybury, M. T. (1994). Automated Explanation and Natural Language Generation. In Computational Text Generation. Bibliography. Sabourin, C. (ed.), Montreal: Infolingua, 1–88.
Google Scholar
Maybury, M. T. (in press). Communicative Acts for Multimedia and Multimodal Dialogue. In Taylor, M. M., Néel, F. & Bouwhuis, D. G. The Structure of Multimodal Dialogue. North-Holland: London. ISSN 1018–4554. Proceedings from workshop at Acquafredda di Maratea, Italy. September 16–20, 1991.
Google Scholar
Neal, J. G. & Shapiro, S. C. 1991. Intelligent Multi-Media Interface Technology. In (Sullivan and Tyler 1991), 11–43.
Google Scholar
Paris, C. L., Swartout, W. R. & Mann, W. C. (eds.). (1991). Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer: Norwell, MA.
MATH Google Scholar
Pelachaud, C. (1992). Functional Decomposition of Facial Expressions for an Animation System. In (Catarci, Costabile, and Levialdi 1992), 26–49.
Google Scholar
Pentland, A. (ed.). (1993). Proceedings of IJCAI Special Workshop #3, Looking at People: Recognition and Interpretation of Human Action. Held in conjunction with 13th IJCAI, Chambrey, Savoie France, 28 August-3 September, 1993.
Google Scholar
Rabiner, L. R. & Schafer, R. W. (eds.). Digital Processing of Speech Signals. Prentice Hall: Englewood Cliffs, NJ.
Google Scholar
Rimé, B. & Schiaratura, L. (1991). Gesture and Speech. In Feldman, R. S. & Rim, B. (eds.) Fundamentals of Nonverbal Behavior, 239–281. New York: Press Syndicate of the University of Cambridge.
Google Scholar
Reiter, E., Mellish, C. & Levine, J. (1992). Automatic Generation of on-line Documentation in the IDAS Project. Proceedings of the 3rd Conference on Applied Natural Language Processing, 31 March-3 April 1992, Trento, Italy. Association of Computional Linguistics: Morristown, NJ.
Google Scholar
Roe, D. B. & Wilpon, J. S. (eds). (to appear). Proceedings of The National Academy of Sciences Colloquium on Human Machine Communication by Voice, National Academy of Sciences Press: Washington, DC.
Google Scholar
Roth, S. F. & Mattis, J. 1990. Data Characterization for Intelligent Graphics Presentation. In Proceedings of The 1990 Conference on Human Factors in Computing Systems, 193–200. New Orleans, Louisiana. ACM/SIGCHI.
Google Scholar
Roth, S. F. & Mattis, J. 1991. Automating the Presentation of Information. In Proceedings of The IEEE Conference on AI Applications, 90–97. Miami Beach, FL.
Google Scholar
Roth, S. F., Mattis, J. & Mesnard, X. (1991). Graphics and Natural Language Generation as Components of Automatic Explanation. In (Sullivan and Tyler 1991), 207–239.
Google Scholar
Schwanauer, S. & Levitt, D. (eds). (1993). Machine Models of Music. MIT Press: Cambridge, MA.
Google Scholar
Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press: London.
Book Google Scholar
Stein, A., Thiel, U. & Tissen, A. (1992). Knowledge based Control of Visual Dialogues in Information Systems. In Catarci, T., Costabile, M. F. & Levialdi, S. (eds.) 1992. Advanced Visual Interfaces: Proceedings of the International Workshop AVI’92, Singapore: World Scientific Series in Computer Science, Vol. 36, 138–155.
Google Scholar
Proceedings of the Third International Workshop on Frontiers of Handwriting Recognition (IWFHR III). Buffalo, NY. May 25–27, 1993.
Google Scholar
Stein, A. & Tissen, A. (1993). A Conversational Model of Multimodal Interaction in Information Systems. In Proceedings of The Eleventh National Conference on Artificial Intelligence, 283–288. AAAI/MIT Press: Washington, DC.
Google Scholar
Stock, O. & the ALFRESCO Project Team (1993). ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Maybury, M. (ed.) Intelligent Multimedia Interfaces, 197–224. AAAI/MIT Press: Menlo Park.
Google Scholar
Sullivan, J. W. & Tyler, S. W. (eds.). (1991) Intelligent User Interfaces. Frontier Series. New York: ACM Press.
MATH Google Scholar
Taylor, M. & Bouwhuis, D. G. (eds). (1989) The Structure of Multimodal Dialogue. Elsevier Science Publishers: Amsterdam.
Google Scholar
Thorisson, K., Koons, D. & Bolt, R. (1992). Multi-modal Natural Dialogue. In Proceedings of Computer Human Interaction (CHI-92), 653–654.
Google Scholar
Wahlster, W. (1991). User and Discourse Models for Multimodal Communication. In (Sullivan and Tyler, 1991), 45–67.
Google Scholar
Waibel, A. & Lee, K. (eds.) (1990). Readings in Speech Recognition, Morgan Kaufmann: San Mateo, CA.
Google Scholar
Wittenburg, K. (1993). Multimedia and Multimodal Parsing: Tutorial Notes. 31st Annual Meeting of the ACL, Columbus, Ohio, 23, June, 1993.
Google Scholar

Books

Taylor, M. & Bouwhuis, D. G. (eds.). (1989). The Structure of Multimodal Dialogue. Elsevier Science Publishers: Amsterdam.
Google Scholar
Sullivan, J. W. & Tyler, S. W. (eds). (1991). Intelligent User Interfaces. Frontier Series. ACM Press: New York.
MATH Google Scholar
Blattner, M. M. & Dannenberg, R. B. (eds.). (1992). Multimedia Interface Design. ACM Press/Addison-Wesley: Reading, MA.
MATH Google Scholar
Catarci, T., Costabile, M. F. & Levialdi, S. (eds.). (1992). Advanced Visual Interfaces: Proceedings of The International Workshop AVI’92, Singapore: World Scientific Series in Computer Science, Vol. 36.
Google Scholar
Buxton, W., Gaver, W. & Bly, S. (in press). Auditory Interfaces: The use of Non-speech Audio at the Interface. Cambridge University Press.
Google Scholar
Maybury, M. (ed.). (1993). Intelligent Multimedia Interfaces. AAAI/MIT Press: Cambridge, MA.
Google Scholar

Workshop/Conference Proceedings

Neches, B. & Kaczmarek, T. (1986). Working Notes from the AAAI Workshop on Intelligence in Interfaces, August 14, 1986. AAAI: Menlo Park.
Google Scholar
Arens, Y., Feiner, S., Hollan, J. & Neches, B. (eds.). (1989). Workshop Notes from the IJCAI-89 Workshop on A New Generation of Intelligent Interfaces. Detroit, MI, 22 August.
Google Scholar
Maybury, M. T. (ed.). (1991). Working Notes from the AAAI Workshop on Intelligent Multimedia Interfaces. Ninth National Conference on Artificial Intelligence. 15 July, Anaheim, CA. AAAI: Menlo Park.
Google Scholar
Taylor, M., Bouwhuis, D. G. & Neél, F. (eds.). (1991) Pre-proceedings of the Second Venaco Workshop on The Structure of Multimodal Dialogue, Acquafredda di Maratea, Italy, September, 1991.
Google Scholar
Gray, W. D. Hefley, W. E. & Murray, D. (eds.). (1993). Proceedings of The 1993 International Workshop on Intelligent User Interfaces, Orlando, FL January, 1993. ACM: New York.
Google Scholar
Johnson, P., Marks, J., Maybury, M., Moore, J. & Feiner, S. (organizing committee) Working notes from The AAAI 1994 Spring Symposium on Intelligent Multimedia and Multimodal Systems, Stanford, CA, March 21–24, 1994.
Google Scholar

Tutorials/Overviews

Wittenburg, K. (1993). Multimedia and Multimodal Parsing: Tutorial Notes. 31st Annual Meeting of the ACL, Columbus, Ohio, 23 June, 1993.
Google Scholar
Feiner, S., Mackinlay, J. & Marks, J. (1992). Automating the Design of Effective Graphics for Intelligent User Interfaces. Tutorial Notes. Human Factors in Computing Systems, CHI-92, Monterey, May 4, 1992.
Google Scholar
Wahlster, W. (1993). Planning Multimodal Discourse. Invited Talk. Association for Computational Linguistics, Annual Meeting, Ohio State Univ., Columbus, Ohio, 24 June 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Center, The MITRE Corporation, Mail Stop K331, 202 Burlington Road, Bedford, MA, 01730, USA
Mark T. Maybury

Authors

Mark T. Maybury
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, University of Sheffield, UK
Paul Mc Kevitt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Maybury, M.T. (1995). Research in Multimedia and Multimodal Parsing and Generation. In: Mc Kevitt, P. (eds) Integration of Natural Language and Vision Processing. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0445-6_6

Download citation

DOI: https://doi.org/10.1007/978-94-011-0445-6_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4199-7
Online ISBN: 978-94-011-0445-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics