Advertisement

Research in Multimedia and Multimodal Parsing and Generation

Chapter

Abstract

This overview introduces the emerging set of techniques for parsing and generating multiple media (e.g., text, graphics, maps, gestures) using multiple sensory modalities (e.g., auditory, visual, tactile). We first briefly introduce and motivate the value of such techniques. Next we describe various computational methods for parsing input from heterogeneous media and modalities (e.g., natural language, gesture, gaze). We subsequently overview complementary techniques for generating coordinated multimedia and multimodal output. Finally, we discuss systems that have integrated both parsing and generation to enable multimedia dialogue in the context of intelligent interfaces. The article concludes by outlining fundamental problems which require further research.

Key words

multimedia interfaces multimodal interfaces parsing generation intelligent interfaces 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J. (1987). Natural Language Understanding. Benjamin Cummings: Reading, MA.Google Scholar
  2. André, E. & Rist, T. (1993). The Design of Illustrated Documents as a Planning Task. In (Maybury 1993), 94–116. Also DFKI Research Report RR-92-45.Google Scholar
  3. André, E., Finkler, W., Graf, W., Rist, T., Schauder, A. & Wahlster, W. (1993). WIP: The Automatic Synthesis of Multimodal Presentations. In (Maybury 1993), 73–90. Also DFKI Research Report RR-92-46.Google Scholar
  4. Arens, Y., Miller, L. & Sondheimer, N. K. (1991). Presentation Design Using an Integrated Knowledge Base. In (Sullivan and Tyler 1991), 241–258.Google Scholar
  5. Proceedings of the ARPA Human Language Technology Workshop, March 1993. Morgan Kaufman: San Francisco.Google Scholar
  6. Austin, J. (1962). How to do Things with Words, J. O. Urmson (ed.), Oxford University Press: England.Google Scholar
  7. Blattner, M. M. & Dannenberg, R. B. (eds.). (1992). Multimedia Interface Design, ACM Press/Addison-Wesley: Reading, MA.zbMATHGoogle Scholar
  8. Bonarini, A. (1993). Modeling Issues in Multimedia Car-Driver Interaction. In (Maybury 1993), 353–371.Google Scholar
  9. Brachman, R. J. & Schmolze, J. G. (1985). An Overview of the KL-ONE Knowledge Representation System. Cognitive Science 9(2): 171–216.CrossRefGoogle Scholar
  10. Burger, J. & Marshall, R. (1993). The Application of Natural Language Models to Intelligent Multimedia. In (Maybury 1993), 167–187.Google Scholar
  11. Buxton, W. & Myers, B. A. (1986). A Study in Two-Handed Input. Proceedings of Human Factors in Computing Systems (CHI-86), 321–326, ACM: New York.Google Scholar
  12. Buxton, W., Bly, S., Frysinger, S., Lunney, D., Mansur, D., Mezrich, J. & Morrison, R. (1985). Communicating with Sound. Proceedings of The Human Factors in Computing Systems (CHI-85), 115–119, New York.Google Scholar
  13. Buxton, W. (ed.). (1989). Human-Computer Interaction 4: Special Issue on Nonspeech Audio, Lawrence Erlbaum.Google Scholar
  14. Buxton, W., Gaver, W. & Bly, S. Auditory Interfaces: The use of Non-speech Audio at the Interface. Cambridge University Press (in press).Google Scholar
  15. Carbonell, J. R. (1970). Mixed-Initiative Man-Computer Dialogues. Bolt, Beranek and Newman (BBN) Report No. 1971, Cambridge, MA.Google Scholar
  16. Cornell, M., Woolf, B. & Suthers, D. (1993). Using ‘Live Information’ in a Multimedia Framework. In (Maybury 1993), 307–327.Google Scholar
  17. Dale, R., Mellish, C. & Zock, M. (eds.). (1990). Current Research in Natural Language Generation. Based on Extended Abstracts from the Second European Workshop on Natural Language Generation, University of Edinburgh, Edinburgh, Scotland, 6–8 April, 1989. London: Academic Press. ISBN 0-12-200735-2, 356 pp.Google Scholar
  18. Dale, R., Hovy, E. Rösner, D. & Stock, O. (eds.). (1992). Aspects of Automated Natural Language Generation, Lecture Notes in Computer Science, 587. Proceedings of The 6th International Workshop on Natural Language Generation, Trento, Italy, April 5–7, 1992. Springer-Verlag: Berlin.Google Scholar
  19. Fallside, F. & Woods, W. (eds.). (1985). Computer Speech Processing. Prentice Hall: Englewood Cliffs, NJ. Contributions by speakers at an advanced course on computer speech processing held at the University of Cambridge in 1983.Google Scholar
  20. Feiner, S. (1985). APEX: An Experiment in the Automated Creation of Pictorial Explanations. IEEE Computer Graphics and Application 5(11): 29–37.CrossRefGoogle Scholar
  21. Feiner, S. (1988). A Grid-based Approach to Automating Display Layout. Proceedings of The Graphics Interface, 192–197. Morgan Kaufmann: Los Angeles.Google Scholar
  22. Feiner, S. K. & McKeown, K. R. (1993). Automating the Generation of Coordinated Multimedia Explanations. In (Maybury 1993), 113–134.Google Scholar
  23. Feiner, S. K., Litman, D. J., McKeown, K. R. & Passonneau, R. J. (1993). Towards Coordinated Temporal Multimedia Presentations. In (Maybury 1993), 139–147.Google Scholar
  24. Feiner, S., Mackinlay, J. & Marks, J. 1992. Automating the Design of Effective Graphics for Intelligent User Interfaces. Tutorial Notes. Human Factors in Computing Systems, CHI-92, Monterey.Google Scholar
  25. Goodman, B. A. (1993). Multimedia Explanations for Intelligent Training Systems. In (Maybury 1993), 148–171.Google Scholar
  26. Gray, W. D., Hefley, W. E. & Murray, D. (eds.). (1993). In Proceedings of The 1993 International Workshop on Intelligent User Interfaces. Orlando, FL January, 1993. ACM: New York.Google Scholar
  27. Graf, W. (1992). Constraint-based Graphical Layout of Multimodal Presentations. In (Catarci, Costabile, and Levialdi 1992), 365–385. Also available as DFKI Report RR-92-15.Google Scholar
  28. Graf, W. (1994) Semantik-gesteuertes Layout-Design multimodaler Prasentationen, Ph.D. diss., Technische Fakultät, Universitat des Saarlandes, Saarbriicken, Germany.Google Scholar
  29. Grosz, B. J., Sparck Jones, K. & Webber, B. L. (eds.). (1986). Readings in Natural Language Processing. Morgan Kaufmann: Los Altos.Google Scholar
  30. Horacek, H. & Zock, M. (eds.). (1993). New Concepts in Natural Language Generation: Planning, Realization and Systems. Frances Pinter, London and New York.Google Scholar
  31. Hovy, E. H. & Arens, Y. (1991). Automatic Generation of Formatted Text. In Proceedings of The Ninth National Conference of the American Association for Artificial Intelligence, 92–91, Anaheim, CA.Google Scholar
  32. Hovy, E. H. & Arens, Y. (1993). On the Knowledge Underlying Multimedia Presentations. In (Maybury 1993), 280–306.Google Scholar
  33. Jacob, R. J. K. (1990). What You Look at is What You Get: Eye Movement-Based Interaction Techniques. In Proceedings of The Human Factors in Computing Systems (CHI ‘90), 11–18. ACM Press: New York. Seattle, April 1–5.Google Scholar
  34. Kempen, G. (ed.). (1987). Natural Language Generation: New Results in Artificial Intelligence, Psychology, and Linguistics, Martinus Nijhoff. NATO ASI Series: Dordrecht.Google Scholar
  35. Kobsa, A. & Wahlster, W. (eds.). (1989). User Models in Dialog Systems. Springer-Verlag: Berlin.Google Scholar
  36. Kobsa, A., Allgayer, J., Reddig, C., Reithinger, N., Schmauks, D., Harbush, K. & Wahlster, W. 1986. Combining Deictic Gestures and Natural Language for Referent Identification. Proceedings of The 11th International Conference on Computational Linguistics, 356–361, Bonn, West Germany.Google Scholar
  37. Koons, D. B., Sparrell, C. J. & Thorisson, K. R. (1993). Integrating Simultaneous Output from Speech, Gaze, and Hand Gestures. In (Maybury 1993), 243–261.Google Scholar
  38. Krause, J. (1993). A Multilayered Empirical Approach to Multimodality: Towards Mixed Solutions of Natural Language and Graphical Interfaces. In (Maybury 1993), 312–336.Google Scholar
  39. Mackinlay, J. D. (1986). Automating the Design of Graphical Presentations of Relational Information. ACM Transactions on Graphics 5(2): 110–141.CrossRefGoogle Scholar
  40. Marks, J. W. (1991). Automating the Design of Network Diagrams. Ph.D. thesis, Harvard University, Cambridge, MA.Google Scholar
  41. Marks, J. (1991). A Formal Specification Scheme for Network Diagrams that Facilitates Automated Design. Journal of Visual Languages and Computing 2(4): 395–414.CrossRefGoogle Scholar
  42. Marti, P., Profili, M., Raffaelli, P. & Toffoli, G. (1992). Graphics, Hyperqueries, and Natural Language: an Integrated Approach to User-Computer Interfaces. In (Catarci, Costabile, and Levialdi, 1992), 68–84.Google Scholar
  43. Maybury, M. T. (1990). Planning Multisentential English Text using Communicative Acts. Ph.D. diss., University of Cambridge, England. Available as Rome Air Development Center TR 90–411, In-House Report, December 1990 or as Cambridge University Computer Laboratory TR-239, December, 1991.Google Scholar
  44. Maybury, M. T. (1991). Planning Multimedia Explanations Using Communicative Acts. In Proceedings of The Ninth National Conference on Artificial Intelligence, 61–66. AAAI: Anaheim, CA.Google Scholar
  45. Maybury, M. T. (ed.). (1993). Intelligent Multimedia Interfaces. AAAI/MIT Press: Menlo Park.Google Scholar
  46. Maybury, M. T. (1994). Knowledge Based Multimedia: The Future of Expert Systems and Multimedia. International Journal of Expert Systems with Applications. Special issue on Expert Systems Integration with Multimedia Technologies 7(3), 387–396.Google Scholar
  47. Ragusa, J. (ed.). (1994). Knowledge Based Multimedia: The Future of Expert Systems and Multimedia. Elsevier Science.Google Scholar
  48. Maybury, M. T. (1994). Automated Explanation and Natural Language Generation. In Computational Text Generation. Bibliography. Sabourin, C. (ed.), Montreal: Infolingua, 1–88.Google Scholar
  49. Maybury, M. T. (in press). Communicative Acts for Multimedia and Multimodal Dialogue. In Taylor, M. M., Néel, F. & Bouwhuis, D. G. The Structure of Multimodal Dialogue. North-Holland: London. ISSN 1018–4554. Proceedings from workshop at Acquafredda di Maratea, Italy. September 16–20, 1991.Google Scholar
  50. Neal, J. G. & Shapiro, S. C. 1991. Intelligent Multi-Media Interface Technology. In (Sullivan and Tyler 1991), 11–43.Google Scholar
  51. Paris, C. L., Swartout, W. R. & Mann, W. C. (eds.). (1991). Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer: Norwell, MA.zbMATHGoogle Scholar
  52. Pelachaud, C. (1992). Functional Decomposition of Facial Expressions for an Animation System. In (Catarci, Costabile, and Levialdi 1992), 26–49.Google Scholar
  53. Pentland, A. (ed.). (1993). Proceedings of IJCAI Special Workshop #3, Looking at People: Recognition and Interpretation of Human Action. Held in conjunction with 13th IJCAI, Chambrey, Savoie France, 28 August-3 September, 1993.Google Scholar
  54. Rabiner, L. R. & Schafer, R. W. (eds.). Digital Processing of Speech Signals. Prentice Hall: Englewood Cliffs, NJ.Google Scholar
  55. Rimé, B. & Schiaratura, L. (1991). Gesture and Speech. In Feldman, R. S. & Rim, B. (eds.) Fundamentals of Nonverbal Behavior, 239–281. New York: Press Syndicate of the University of Cambridge.Google Scholar
  56. Reiter, E., Mellish, C. & Levine, J. (1992). Automatic Generation of on-line Documentation in the IDAS Project. Proceedings of the 3rd Conference on Applied Natural Language Processing, 31 March-3 April 1992, Trento, Italy. Association of Computional Linguistics: Morristown, NJ.Google Scholar
  57. Roe, D. B. & Wilpon, J. S. (eds). (to appear). Proceedings of The National Academy of Sciences Colloquium on Human Machine Communication by Voice, National Academy of Sciences Press: Washington, DC.Google Scholar
  58. Roth, S. F. & Mattis, J. 1990. Data Characterization for Intelligent Graphics Presentation. In Proceedings of The 1990 Conference on Human Factors in Computing Systems, 193–200. New Orleans, Louisiana. ACM/SIGCHI.Google Scholar
  59. Roth, S. F. & Mattis, J. 1991. Automating the Presentation of Information. In Proceedings of The IEEE Conference on AI Applications, 90–97. Miami Beach, FL.Google Scholar
  60. Roth, S. F., Mattis, J. & Mesnard, X. (1991). Graphics and Natural Language Generation as Components of Automatic Explanation. In (Sullivan and Tyler 1991), 207–239.Google Scholar
  61. Schwanauer, S. & Levitt, D. (eds). (1993). Machine Models of Music. MIT Press: Cambridge, MA.Google Scholar
  62. Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press: London.CrossRefGoogle Scholar
  63. Stein, A., Thiel, U. & Tissen, A. (1992). Knowledge based Control of Visual Dialogues in Information Systems. In Catarci, T., Costabile, M. F. & Levialdi, S. (eds.) 1992. Advanced Visual Interfaces: Proceedings of the International Workshop AVI’92, Singapore: World Scientific Series in Computer Science, Vol. 36, 138–155.Google Scholar
  64. Proceedings of the Third International Workshop on Frontiers of Handwriting Recognition (IWFHR III). Buffalo, NY. May 25–27, 1993.Google Scholar
  65. Stein, A. & Tissen, A. (1993). A Conversational Model of Multimodal Interaction in Information Systems. In Proceedings of The Eleventh National Conference on Artificial Intelligence, 283–288. AAAI/MIT Press: Washington, DC.Google Scholar
  66. Stock, O. & the ALFRESCO Project Team (1993). ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Maybury, M. (ed.) Intelligent Multimedia Interfaces, 197–224. AAAI/MIT Press: Menlo Park.Google Scholar
  67. Sullivan, J. W. & Tyler, S. W. (eds.). (1991) Intelligent User Interfaces. Frontier Series. New York: ACM Press.zbMATHGoogle Scholar
  68. Taylor, M. & Bouwhuis, D. G. (eds). (1989) The Structure of Multimodal Dialogue. Elsevier Science Publishers: Amsterdam.Google Scholar
  69. Thorisson, K., Koons, D. & Bolt, R. (1992). Multi-modal Natural Dialogue. In Proceedings of Computer Human Interaction (CHI-92), 653–654.Google Scholar
  70. Wahlster, W. (1991). User and Discourse Models for Multimodal Communication. In (Sullivan and Tyler, 1991), 45–67.Google Scholar
  71. Waibel, A. & Lee, K. (eds.) (1990). Readings in Speech Recognition, Morgan Kaufmann: San Mateo, CA.Google Scholar
  72. Wittenburg, K. (1993). Multimedia and Multimodal Parsing: Tutorial Notes. 31st Annual Meeting of the ACL, Columbus, Ohio, 23, June, 1993.Google Scholar

Books

  1. Taylor, M. & Bouwhuis, D. G. (eds.). (1989). The Structure of Multimodal Dialogue. Elsevier Science Publishers: Amsterdam.Google Scholar
  2. Sullivan, J. W. & Tyler, S. W. (eds). (1991). Intelligent User Interfaces. Frontier Series. ACM Press: New York.zbMATHGoogle Scholar
  3. Blattner, M. M. & Dannenberg, R. B. (eds.). (1992). Multimedia Interface Design. ACM Press/Addison-Wesley: Reading, MA.zbMATHGoogle Scholar
  4. Catarci, T., Costabile, M. F. & Levialdi, S. (eds.). (1992). Advanced Visual Interfaces: Proceedings of The International Workshop AVI’92, Singapore: World Scientific Series in Computer Science, Vol. 36.Google Scholar
  5. Buxton, W., Gaver, W. & Bly, S. (in press). Auditory Interfaces: The use of Non-speech Audio at the Interface. Cambridge University Press.Google Scholar
  6. Maybury, M. (ed.). (1993). Intelligent Multimedia Interfaces. AAAI/MIT Press: Cambridge, MA.Google Scholar

Workshop/Conference Proceedings

  1. Neches, B. & Kaczmarek, T. (1986). Working Notes from the AAAI Workshop on Intelligence in Interfaces, August 14, 1986. AAAI: Menlo Park.Google Scholar
  2. Arens, Y., Feiner, S., Hollan, J. & Neches, B. (eds.). (1989). Workshop Notes from the IJCAI-89 Workshop on A New Generation of Intelligent Interfaces. Detroit, MI, 22 August.Google Scholar
  3. Maybury, M. T. (ed.). (1991). Working Notes from the AAAI Workshop on Intelligent Multimedia Interfaces. Ninth National Conference on Artificial Intelligence. 15 July, Anaheim, CA. AAAI: Menlo Park.Google Scholar
  4. Taylor, M., Bouwhuis, D. G. & Neél, F. (eds.). (1991) Pre-proceedings of the Second Venaco Workshop on The Structure of Multimodal Dialogue, Acquafredda di Maratea, Italy, September, 1991.Google Scholar
  5. Gray, W. D. Hefley, W. E. & Murray, D. (eds.). (1993). Proceedings of The 1993 International Workshop on Intelligent User Interfaces, Orlando, FL January, 1993. ACM: New York.Google Scholar
  6. Johnson, P., Marks, J., Maybury, M., Moore, J. & Feiner, S. (organizing committee) Working notes from The AAAI 1994 Spring Symposium on Intelligent Multimedia and Multimodal Systems, Stanford, CA, March 21–24, 1994.Google Scholar

Tutorials/Overviews

  1. Wittenburg, K. (1993). Multimedia and Multimodal Parsing: Tutorial Notes. 31st Annual Meeting of the ACL, Columbus, Ohio, 23 June, 1993.Google Scholar
  2. Feiner, S., Mackinlay, J. & Marks, J. (1992). Automating the Design of Effective Graphics for Intelligent User Interfaces. Tutorial Notes. Human Factors in Computing Systems, CHI-92, Monterey, May 4, 1992.Google Scholar
  3. Wahlster, W. (1993). Planning Multimodal Discourse. Invited Talk. Association for Computational Linguistics, Annual Meeting, Ohio State Univ., Columbus, Ohio, 24 June 1993.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1995

Authors and Affiliations

  1. 1.Artificial Intelligence CenterThe MITRE CorporationBedfordUSA

Personalised recommendations