Generating Coherent Presentations Employing Textual and Visual Material



The objective of the work described in this paper is the development of an intelligent generation system which is able to combine textual and visual material. As coherent presentations cannot be generated by simply merging verbalization and visualization results into multimedia output, the processes for content determination, medium selection and content realization in different media have to be carefully coordinated. We first show that multimedia presentations and pure text follow similar structuring principles. Based on this insight, we sketch how techniques for planning text and discourse can be generalized to allow the structure and contents of multimedia communications to be planned as well. In particular, we explain how our approach handles the crucial task of process coordination.

Key words

multimedia generation presentation planning communicative acts coherence 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. André, E. & Rist, T. (1990). Towards a Plan-Based Synthesis of Illustrated Documents. In Proceedings of The Ninth ECAI, 25–30. Stockholm. Also as DFKI Research Report RR-90-11.Google Scholar
  2. André, E. & Rist, T. (1993). The Design of Illustrated Documents as a Planning Task. In Maybury, M. (ed.) Intelligent Multimedia Interfaces, 94–116. AAAI Press. Also as DFKI Research Report RR-92-45.Google Scholar
  3. André, E. & Rist, T. (1994). Referring to World Objects with Text and Pictures. In Proceedings of The Fifteenth COLING, Kyoto, Japan (to appear).Google Scholar
  4. André, E., Finkler, W., Graf, W., Rist, T., Schauder, A. & Wahlster, W. (1993). WIP: The Automatic Synthesis of Multimodal Presentations. In Maybury, M. (ed.) Intelligent Multimedia Interfaces, 75–93. AAAI Press. Also as DFKI Research Report RR-92-46.Google Scholar
  5. Appelt, D. & Kronfeld, A. (1987). A Computational Model of Referring. In Proceedings of The Tenth 1JCAI, 640–647. Milan, Italy.Google Scholar
  6. Arens, Y., Hovy, E. & van Mulken, S. (1993a). Structure and Rules in Automated Multimedia Presentation Planning. In Proceedings of The Thirteenth IJCAI, volume 2, 1253–1259. Chambéry, France.Google Scholar
  7. Arens, Y., Hovy, E. & Vossers, M. (1993b). Describing the Presentational Knowledge Underlying Multimedia Instruction Manuals. In Maybury, M. (ed.) Intelligent Multimedia Interfaces, 280–306.AAAI Press.Google Scholar
  8. Badler, N., Barsky, B. Zeltzer, D. (eds.) (1991a). Making Them Move: Mechanics, control, and Animation of Articulated Figures. Morgan Kaufmann: San Mateo, California.Google Scholar
  9. Badler, N., Webber, B., Kalita, J. & Esakov, J. (1991b). Animation from Instructions. In Badler et al, 51–93.Google Scholar
  10. Bandyopadhyay, S. (1990). Towards an Understanding of Coherence in Multimodal Discourse. Technical Memo TM-90-01, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Saarbriicken, Germany.Google Scholar
  11. Brandt, M., Koch, W., Motsch, W. & Rosengren, I. (1983). Der Einfluβ der kommunikativen Strategic auf die Textstruktur — dargestellt am Beispiel des Geschäftsbriefes. In Rosengren, I. (ed.) Sprache und Pragmatik Lunder Symposium 1982, 105–135. Almquist & Wiksell: Stockholm.Google Scholar
  12. Costabile, M. F., Catarci, T. & Levialdi, S. (eds.) (1992). Advanced Visual Interfaces (Proceedings of AVI ‘92, Rome, Italy). World Scientific Press: Singapore.Google Scholar
  13. Feiner, S. K. & McKeown, K. R. (1991). Automating the Generation of Coordinated Multimedia Explanations. IEEE Computer 24(10): 33–41.CrossRefGoogle Scholar
  14. Grice, H. P. (1975). Logic and Conversation. In Cole, P. & Morgan, J. L. (eds.) Syntax and Semantics: Speech Acts 3: 41–58. Academic Press: New York.Google Scholar
  15. Grimes, J. E. (1975). The Thread of Discourse. Mouton: The Hague, Paris.Google Scholar
  16. Hirst, G. (1981). Anaphora in Natural Language Understanding. Springer: Berlin, Heidelberg.CrossRefGoogle Scholar
  17. Hobbs, J. (1978). Why is a Discourse Coherent? Technical Report 176, SRI International: Menlo Park, CA.Google Scholar
  18. Houghton, H. A. & Willows, D. M. (1987). The Psychology of Illustration, Instructional Issues, volume 2. Springer: New York, Berlin, Heidelberg, London, Paris, Tokyo.Google Scholar
  19. Hovy, E. H. (1988). Planning Coherent Multisentential Text. In Proceedings of The Twenty-Sixth ACL, 163–169.Google Scholar
  20. Hunter, B., Crismore, A. & Pearson, P. D. (1987). Visual Displays in Basal Readers and Social Studies Textbooks. In Willows, D. M. & Houghton, H. A. (eds.) The Psychology of Illustration, Basic Research, volume 2, 116–135. Springer: New York, Berlin, Heidelberg.Google Scholar
  21. Kjorup, S. (1978). Pictorial Speech Acts. Erkenntnis 12: 55–71.CrossRefGoogle Scholar
  22. Levie, W. H. (1987). Research on Pictures: A Guide to the Literature. In Willows, D. M. & Houghton, H. A. (eds.) The Psychology of Illustration, Basic Research, volume 1, 1–50. Springer: New York, Berlin, Heidelberg.Google Scholar
  23. Levin, J. R., Anglin, G. J. & Carney, R. N. (1987). On Empirically Validating Functions of Pictures in Prose. In Willows, D. M. & Houghton, H. A. (eds.) The Psychology of Illustration, Basic Research 1: 51–85. Springer: New York, Berlin, Heidelberg.CrossRefGoogle Scholar
  24. Mann, W. C. & Thompson, S. A. (1987). Rhetorical Structure Theory: A Theory of Text Organization. Report ISI/RS-87-190. Univ. of Southern California, Marina del Rey, CA.Google Scholar
  25. Marks, J. & Reiter, E. (1990). Avoiding Unwanted Conversational Implicatures in Text and Graphics. In Proceedings of AAAI-90, volume 1, 450–456. Boston, MA.Google Scholar
  26. Maybury, M. (ed.) (1993). Intelligent Multimedia Interfaces. AAAI Press.Google Scholar
  27. Molitor, S., Ballstaedt, S.-P. & Mandl, H. (1989). Problems in Knowledge Acquisition from text and Pictures. In Mandl, H. & Levin, J. R. (eds.) Knowledge Acquisition from text and Pictures, 3–35. North Holland: Amsterdam, New York, Oxford, Tokyo.CrossRefGoogle Scholar
  28. Moore, J. D. & Paris, C. L. (1989). Planning Text for Advisory Dialogues. In Proceedings of The Twenty-Seventh ACL, 203–211. Vancouver.Google Scholar
  29. Reiter, E. & Dale, R. (1992). A Fast Algorithm for the Generation of Referring Expressions. In Proceedings of The Fourteenth COLING, volume 1, 232–238. Nantes, France.Google Scholar
  30. Roth, S. F., Mattis, J. & Mesnard, X. (1991). Graphics and Natural Language as Components of Automatic Explanation. In Sullivan, J. W. & Tyler, S. W. (eds.) Intelligent User Interfaces, 207–239. ACM Press: New York, NY.Google Scholar
  31. Schneiderlöchner, F. (1994). Generierung von Referenzausdrücken in einem multimodalen Diskurs. Master’s thesis, Fachbereich Informatik, Universitat des Saarlandes, Saarbrücken, Germany.Google Scholar
  32. Searle, J. R. (1980). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press: Cambridge, England.Google Scholar
  33. Stock, O. & the ALFRESCO Project Team (1993). ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Maybury, M. (ed.) Intelligent Multimedia Interfaces, 197–224. AAAI Press.Google Scholar
  34. van Dijk, T. A. (1980). Textwissenschaft. dtv: München.CrossRefGoogle Scholar
  35. Wahlster, W., André, E., Graf, W. & Rist, T. (1991). Designing Illustrated Texts: How Language Production is Influenced by Graphics Generation. In Proceedings of The Fifth EACL, 8–14. Berlin, Germany.Google Scholar
  36. Wahlster, W., André, E., Finkler, W., Profitlich, H.-J. & Rist, T. (1993). Plan-Based Integration of Natural Language and Graphics Generation. AI Journal 63: 387–427. Also as DFKI Research Report RR-93-02.Google Scholar
  37. Wazinski, P. (1992). Generating Spatial Descriptions for Cross-Modal References. In Proceedings of The Third Conference on Applied Natural Language Processing, 56–63. Trento, Italy.Google Scholar
  38. Willows, D. M. & Houghton, H. A. (1987). The Psychology of Illustration, Basic Research, volume 1. Springer: New York, Berlin, Heidelberg, London, Paris, Tokyo.Google Scholar
  39. Wilson, M., Sedlock, D., Binot, J.-L. & Falzon, P. (1992). An Architecture For Multimodal Dialogue. In Proceedings of The Second Vencona Workshop for Multimodal Dialogue. Vencona, Italy.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1995

Authors and Affiliations

  1. 1.German Research Center for Artificial Intelligence (DFKI)SaarbrückenGermany

Personalised recommendations