Abstract
The objective of the work described in this paper is the development of an intelligent generation system which is able to combine textual and visual material. As coherent presentations cannot be generated by simply merging verbalization and visualization results into multimedia output, the processes for content determination, medium selection and content realization in different media have to be carefully coordinated. We first show that multimedia presentations and pure text follow similar structuring principles. Based on this insight, we sketch how techniques for planning text and discourse can be generalized to allow the structure and contents of multimedia communications to be planned as well. In particular, we explain how our approach handles the crucial task of process coordination.
Similar content being viewed by others
References
André, E. & Rist, T. (1990). Towards a Plan-Based Synthesis of Illustrated Documents. In Proceedings ofThe Ninth ECAI, 25–30. Stockholm. Also as DFKI Research Report RR-90-11.
André, E. & Rist, T. (1993). The Design of Illustrated Documents as a Planning Task. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 94–116. AAAI Press. Also as DFKI Research Report RR-92-45.
André, E. & Rist, T. (1994). Referring to World Objects with Text and Pictures. In Proceedings ofThe Fifteenth COLING, Kyoto, Japan (to appear).
André, E., Finkler, W., Graf, W., Rist, T., Schauder, A. & Wahlster, W. (1993). WIP: The Automatic Synthesis of Multimodal Presentations. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 75–93. AAAI Press. Also as DFKI Research Report RR-92-46.
Appelt, D. & Kronfeld, A. (1987). A Computational Model of Referring. In Proceedings ofThe Tenth IJCAI, 640–647. Milan, Italy.
Arens, Y., Hovy, E. & van Mulken, S. (1993a). Structure and Rules in Automated Multimedia Presentation Planning. In Proceedings ofThe Thirteenth IJCAI, volume 2, 1253–1259. Chambéry, France.
Arens, Y., Hovy, E. & Vossers, M. (1993b). Describing the Presentational Knowledge Underlying Multimedia Instruction Manuals. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 280–306. AAAI Press.
Badler, N., Barsky, B., Zeltzer, D. (eds.) (1991a).Making Them Move: Mechanics, control, and Animation of Articulated Figures. Morgan Kaufmann: San Mateo, California.
Badler, N., Webber, B., Kalita, J. & Esakov, J. (1991b).Animation from Instructions. In Badler et al, 51–93.
Bandyopadhyay, S. (1990).Towards an Understanding of Coherence in Multimodal Discourse. Technical Memo TM-90-01, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Saarbrücken, Germany.
Brandt, M., Koch, W., Motsch, W. & Rosengren, I. (1983). Der Einfluß der kommunikativen Strategie auf die Textstruktur — dargestellt am Beispiel des Geschäftsbriefes. In Rosengren, I. (ed.)Sprache und Pragmatik Lunder Symposium 1982, 105–135. Almquist & Wiksell: Stockholm.
Costabile, M. F., Catarci, T. & Levialdi, S. (eds.) (1992).Advanced Visual Interfaces (Proceedings of AVI '92, Rome, Italy). World Scientific Press: Singapore.
Feiner, S. K. & McKeown, K. R. (1991). Automating the Generation of Coordinated Multimedia Explanations.IEEE Computer 24(10): 33–41.
Grice, H. P. (1975). Logic and Conversation. In Cole, P. & Morgan, J. L. (eds.)Syntax and Semantics: Speech Acts 3: 41–58. Academic Press: New York.
Grimes, J. E. (1975).The Thread of Discourse. Mouton: The Hague, Paris.
Hirst, G. (1981)Anaphora in Natural Language Understanding. Springer: Berlin, Heidelberg.
Hobbs, J. (1978).Why is a Discourse Coherent? Technical Report 176, SRI International: Menlo Park, CA.
Houghton, H. A. & Willows, D. M. (1987).The Psychology of Illustration, Instructional Issues, volume 2. Springer: New York, Berlin, Heidelberg, London, Paris, Tokyo.
Hovy, E. H. (1988). Planning Coherent Multisentential Text. In Proceedings ofThe Twenty-Sixth ACL, 163–169.
Hunter, B., Crismore, A. & Pearson, P. D. (1987). Visual Displays in Basal Readers and Social Studies Textbooks. In Willows, D. M. & Houghton, H. A. (eds.)The Psychology of Illustration, Basic Research, volume 2, 116–135. Springer: New York, Berlin, Heidelberg.
Kjorup, S. (1978). Pictorial Speech Acts.Erkenntnis 12: 55–71.
Levie, W. H. (1987). Research on Pictures: A Guide to the Literature. In Willows, D. M. & Houghton, H. A. (eds.)The Psychology of Illustration, Basic Research, volume 1, 1–50. Springer: New York, Berlin, Heidelberg.
Levin, J. R., Anglin, G. J. & Carney, R. N. (1987). On Empirically Validating Functions of Pictures in Prose. In Willows, D. M. & Houghton, H. A. (eds.)The Psychology of Illustration, Basic Research 1: 51–85. Springer: New York, Berlin, Heidelberg.
Mann, W. C. & Thompson, S. A. (1987).Rhetorical Structure Theory: A Theory of Text Organization. Report ISI/RS-87-190. Univ. of Southern California, Marina del Rey, CA.
Marks, J. & Reiter, E. (1990). Avoiding Unwanted Conversational Implicatures in Text and Graphics. In Proceedings of AAAI-90, volume 1, 450–456. Boston, MA.
Maybury, M. (ed.) (1993).Intelligent Multimedia Interfaces. AAAI Press.
Molitor, S., Ballstaedt, S.-P. & Mandl, H. (1989). Problems in Knowledge Acquisition from text and Pictures. In Mandl, H. & Levin, J. R. (eds.)Knowledge Acquisition from text and Pictures, 3–35. North Holland: Amsterdam, New York, Oxford, Tokyo.
Moore, J. D. & Paris, C. L. (1989). Planning Text for Advisory Dialogues. In Proceedings ofThe Twenty-Seventh ACL, 203–211. Vancouver.
Reiter, E. & Dale, R. (1992). A Fast Algorithm for the Generation of Referring Expressions. In Proceedings ofThe Fourteenth COLING, volume 1, 232–238. Nantes, France.
Roth, S. F., Mattis, J. & Mesnard, X. (1991). Graphics and Natural Language as Components of Automatic Explanation. In Sullivan, J. W. & Tyler, S. W. (eds.)Intelligent User Interfaces, 207–239. ACM Press: New York, NY.
Schneiderlöchner, F. (1994).Generierung von Referenzausdrücken in einem multimodalen Diskurs. Master's thesis, Fachbereich Informatik, Universität des Saarlandes, Saarbrücken, Germany.
Searle, J. R. (1980).Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press: Cambridge, England.
Stock, O. & the ALFRESCO Project Team (1993). ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Maybury, M. (ed.)Intelligent Multimedia Interfaces, 197–224. AAAI Press.
van Dijk, T. A. (1980).Textwissenschaft. dtv: München.
Wahlster, W., André, E., Graf, W. & Rist, T. (1991). Designing Illustrated Texts: How Language Production is Influenced by Graphics Generation. In Proceedings ofThe Fifth EACL, 8–14. Berlin, Germany.
Wahlster, W., André, E., Finkler, W. Profitlich, H.-J. & Rist, T. (1993). Plan-Based Integration of Natural Language and Graphics Generation.AI Journal 63: 387–427. Also as DFKI Research Report RR-93-02.
Wazinski, P. (1992). Generating Spatial Descriptions for Cross-Modal References. In Proceedings ofThe Third Conference on Applied Natural Language Processing, 56–63. Trento, Italy.
Willows, D. M. & Houghton, H. A. (1987).The Psychology of Illustration, Basic Research, volume 1. Springer: New York, Berlin, Heidelberg, London, Paris, Tokyo.
Wilson, M., Sedlock, D., Binot, J.-L. & Falzon, P. (1992). An Architecture For Multimodal Dialogue. In Proceedings ofThe Second Vencona Workshop for Multimodal Dialogue. Vencona, Italy.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
André, E., Rist, T. Generating coherent presentations employing textual and visual material. Artif Intell Rev 9, 147–165 (1995). https://doi.org/10.1007/BF00849177
Issue Date:
DOI: https://doi.org/10.1007/BF00849177