Skip to main content
Log in

Generating coherent presentations employing textual and visual material

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The objective of the work described in this paper is the development of an intelligent generation system which is able to combine textual and visual material. As coherent presentations cannot be generated by simply merging verbalization and visualization results into multimedia output, the processes for content determination, medium selection and content realization in different media have to be carefully coordinated. We first show that multimedia presentations and pure text follow similar structuring principles. Based on this insight, we sketch how techniques for planning text and discourse can be generalized to allow the structure and contents of multimedia communications to be planned as well. In particular, we explain how our approach handles the crucial task of process coordination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • André, E. & Rist, T. (1990). Towards a Plan-Based Synthesis of Illustrated Documents. In Proceedings ofThe Ninth ECAI, 25–30. Stockholm. Also as DFKI Research Report RR-90-11.

  • André, E. & Rist, T. (1993). The Design of Illustrated Documents as a Planning Task. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 94–116. AAAI Press. Also as DFKI Research Report RR-92-45.

  • André, E. & Rist, T. (1994). Referring to World Objects with Text and Pictures. In Proceedings ofThe Fifteenth COLING, Kyoto, Japan (to appear).

  • André, E., Finkler, W., Graf, W., Rist, T., Schauder, A. & Wahlster, W. (1993). WIP: The Automatic Synthesis of Multimodal Presentations. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 75–93. AAAI Press. Also as DFKI Research Report RR-92-46.

  • Appelt, D. & Kronfeld, A. (1987). A Computational Model of Referring. In Proceedings ofThe Tenth IJCAI, 640–647. Milan, Italy.

  • Arens, Y., Hovy, E. & van Mulken, S. (1993a). Structure and Rules in Automated Multimedia Presentation Planning. In Proceedings ofThe Thirteenth IJCAI, volume 2, 1253–1259. Chambéry, France.

    Google Scholar 

  • Arens, Y., Hovy, E. & Vossers, M. (1993b). Describing the Presentational Knowledge Underlying Multimedia Instruction Manuals. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 280–306. AAAI Press.

  • Badler, N., Barsky, B., Zeltzer, D. (eds.) (1991a).Making Them Move: Mechanics, control, and Animation of Articulated Figures. Morgan Kaufmann: San Mateo, California.

    Google Scholar 

  • Badler, N., Webber, B., Kalita, J. & Esakov, J. (1991b).Animation from Instructions. In Badler et al, 51–93.

  • Bandyopadhyay, S. (1990).Towards an Understanding of Coherence in Multimodal Discourse. Technical Memo TM-90-01, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Saarbrücken, Germany.

  • Brandt, M., Koch, W., Motsch, W. & Rosengren, I. (1983). Der Einfluß der kommunikativen Strategie auf die Textstruktur — dargestellt am Beispiel des Geschäftsbriefes. In Rosengren, I. (ed.)Sprache und Pragmatik Lunder Symposium 1982, 105–135. Almquist & Wiksell: Stockholm.

    Google Scholar 

  • Costabile, M. F., Catarci, T. & Levialdi, S. (eds.) (1992).Advanced Visual Interfaces (Proceedings of AVI '92, Rome, Italy). World Scientific Press: Singapore.

    Google Scholar 

  • Feiner, S. K. & McKeown, K. R. (1991). Automating the Generation of Coordinated Multimedia Explanations.IEEE Computer 24(10): 33–41.

    Google Scholar 

  • Grice, H. P. (1975). Logic and Conversation. In Cole, P. & Morgan, J. L. (eds.)Syntax and Semantics: Speech Acts 3: 41–58. Academic Press: New York.

    Google Scholar 

  • Grimes, J. E. (1975).The Thread of Discourse. Mouton: The Hague, Paris.

    Google Scholar 

  • Hirst, G. (1981)Anaphora in Natural Language Understanding. Springer: Berlin, Heidelberg.

    Google Scholar 

  • Hobbs, J. (1978).Why is a Discourse Coherent? Technical Report 176, SRI International: Menlo Park, CA.

    Google Scholar 

  • Houghton, H. A. & Willows, D. M. (1987).The Psychology of Illustration, Instructional Issues, volume 2. Springer: New York, Berlin, Heidelberg, London, Paris, Tokyo.

    Google Scholar 

  • Hovy, E. H. (1988). Planning Coherent Multisentential Text. In Proceedings ofThe Twenty-Sixth ACL, 163–169.

  • Hunter, B., Crismore, A. & Pearson, P. D. (1987). Visual Displays in Basal Readers and Social Studies Textbooks. In Willows, D. M. & Houghton, H. A. (eds.)The Psychology of Illustration, Basic Research, volume 2, 116–135. Springer: New York, Berlin, Heidelberg.

    Google Scholar 

  • Kjorup, S. (1978). Pictorial Speech Acts.Erkenntnis 12: 55–71.

    Google Scholar 

  • Levie, W. H. (1987). Research on Pictures: A Guide to the Literature. In Willows, D. M. & Houghton, H. A. (eds.)The Psychology of Illustration, Basic Research, volume 1, 1–50. Springer: New York, Berlin, Heidelberg.

    Google Scholar 

  • Levin, J. R., Anglin, G. J. & Carney, R. N. (1987). On Empirically Validating Functions of Pictures in Prose. In Willows, D. M. & Houghton, H. A. (eds.)The Psychology of Illustration, Basic Research 1: 51–85. Springer: New York, Berlin, Heidelberg.

    Google Scholar 

  • Mann, W. C. & Thompson, S. A. (1987).Rhetorical Structure Theory: A Theory of Text Organization. Report ISI/RS-87-190. Univ. of Southern California, Marina del Rey, CA.

    Google Scholar 

  • Marks, J. & Reiter, E. (1990). Avoiding Unwanted Conversational Implicatures in Text and Graphics. In Proceedings of AAAI-90, volume 1, 450–456. Boston, MA.

  • Maybury, M. (ed.) (1993).Intelligent Multimedia Interfaces. AAAI Press.

  • Molitor, S., Ballstaedt, S.-P. & Mandl, H. (1989). Problems in Knowledge Acquisition from text and Pictures. In Mandl, H. & Levin, J. R. (eds.)Knowledge Acquisition from text and Pictures, 3–35. North Holland: Amsterdam, New York, Oxford, Tokyo.

    Google Scholar 

  • Moore, J. D. & Paris, C. L. (1989). Planning Text for Advisory Dialogues. In Proceedings ofThe Twenty-Seventh ACL, 203–211. Vancouver.

  • Reiter, E. & Dale, R. (1992). A Fast Algorithm for the Generation of Referring Expressions. In Proceedings ofThe Fourteenth COLING, volume 1, 232–238. Nantes, France.

  • Roth, S. F., Mattis, J. & Mesnard, X. (1991). Graphics and Natural Language as Components of Automatic Explanation. In Sullivan, J. W. & Tyler, S. W. (eds.)Intelligent User Interfaces, 207–239. ACM Press: New York, NY.

    Google Scholar 

  • Schneiderlöchner, F. (1994).Generierung von Referenzausdrücken in einem multimodalen Diskurs. Master's thesis, Fachbereich Informatik, Universität des Saarlandes, Saarbrücken, Germany.

  • Searle, J. R. (1980).Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press: Cambridge, England.

    Google Scholar 

  • Stock, O. & the ALFRESCO Project Team (1993). ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 197–224. AAAI Press.

  • van Dijk, T. A. (1980).Textwissenschaft. dtv: München.

  • Wahlster, W., André, E., Graf, W. & Rist, T. (1991). Designing Illustrated Texts: How Language Production is Influenced by Graphics Generation. In Proceedings ofThe Fifth EACL, 8–14. Berlin, Germany.

  • Wahlster, W., André, E., Finkler, W. Profitlich, H.-J. & Rist, T. (1993). Plan-Based Integration of Natural Language and Graphics Generation.AI Journal 63: 387–427. Also as DFKI Research Report RR-93-02.

    Google Scholar 

  • Wazinski, P. (1992). Generating Spatial Descriptions for Cross-Modal References. In Proceedings ofThe Third Conference on Applied Natural Language Processing, 56–63. Trento, Italy.

  • Willows, D. M. & Houghton, H. A. (1987).The Psychology of Illustration, Basic Research, volume 1. Springer: New York, Berlin, Heidelberg, London, Paris, Tokyo.

    Google Scholar 

  • Wilson, M., Sedlock, D., Binot, J.-L. & Falzon, P. (1992). An Architecture For Multimodal Dialogue. In Proceedings ofThe Second Vencona Workshop for Multimodal Dialogue. Vencona, Italy.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

André, E., Rist, T. Generating coherent presentations employing textual and visual material. Artif Intell Rev 9, 147–165 (1995). https://doi.org/10.1007/BF00849177

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00849177

Key words

Navigation