(Mis?)-Using DRT for generation of natural language text from image sequences

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1407)


The abundance of geometric results from image sequence evaluation which is expected to shortly become available creates a new problem: how to present this material to a user without inundating him with unwanted details? A system design which attempts to cope not only with image sequence evaluation, but in addition with an increasing number of abstraction steps required for efficient presentation and inspection of results, appears to become necessary. The system-user interaction of a Computer Vision system should thus be designed as a natural language dialogue, assigned within the overall system at what we call the ‘Natural Language Level’. Such a decision requires to construct a series of abstraction steps from geometric evaluation results to natural language text describing the contents of an image sequence. We suggest to use Discourse Representation Theory as developed by [14] in order to design the system-internal representation of knowledge and results at the Natural Language Level. A first implementation of this approach and results obtained applying it to image sequences recorded from real world traffic scenes are described.


Natural Language Image Sequence Computer Vision System Discourse Referent Conceptual Layer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    A. Abella and J.R. Kender: Description Generation of Abnormal Densities Found in Radiographs. Proc. Workshop on Conceptual Descriptions from Images, Cambridge/UK, 19 April 1996, H. Buxton (Ed.), pp. 97–111.Google Scholar
  2. [2]
    E. Andrè, G. Herzog, and T. Rist: The System Soccer. Proc. of the 8th European Conference on Artificial Intelligence, Munich/Germany, 1–5 August 1988, pp. 449–454.Google Scholar
  3. [3]
    D.S. Bloomberg and F.R. Chen: Document Image Summarization without OCR. Proc. IEEE International Conference on Image Processing (ICIP '96), Lausanne/CH, 16–19 September 1996, Vol. II, pp. 229–232.Google Scholar
  4. [4]
    H. Buxton and S. Gong: Visual Surveillance in a Dynamic and Uncertain World. Artificial Intelligence 78 (1995) 431–459.CrossRefGoogle Scholar
  5. [5]
    S. Dance, T. Caelli, and Z.-Q. Liu: Picture Interpretation: A Symbolic Approach. Series in Machine Perception and Artificial Intelligence Vol. 20, World Scientific, Singapore a. o. 1995.Google Scholar
  6. [6]
    S. Dance, T. Caelli, and Z.-Q. Liu: A Concurrent, Hierarchical Approach to Symbolic Scene Interpretation. Pattern Recognition 29:11 (1996) 1891–1903.CrossRefGoogle Scholar
  7. [7]
    L. Friedman: From Images to Language. Proc. Workshop on Conceptual Descriptions from Images, Cambridge/UK, 19 April 1996, H. Buxton (Ed.), pp. 70–81.Google Scholar
  8. [8]
    R. Gerber and H.-H. Nagel: Berechnung natürlichsprachlicher Beschreibungen von StraΒenverkehrsszenen aus Bildfolgen unter Verwendung von Geschehens-und Verdeckungsmodellierung. In B. JÄhne, P. Gei\ler, H. Hau\ecker und F. Hering (Hrsg.), Mustererkennung 1996; 18. DAGM-Symposium, Heidelberg/Germany, 11.–13. September 1996, pp. 601–608 (in German).Google Scholar
  9. [9]
    R. Gerber and H.-H. Nagel: Knowledge Representation for the Generation of Quantified Natural Language Descriptions of Vehicle Traffic in Image Sequences. Proc. IEEE International Conference on Image Processing (ICIP '96), Lausanne/CH, 16–19 September 1996, Vol. II, pp. 805–808.Google Scholar
  10. [10]
    M. Haag, H.-H. Nagel: Beginning a Transition from a Local to a More Global Point of View in Model-Based Vehicle Tracking. H Burkhardt, B. Neumann (Eds.): Proc. European Conference on Computer Vision 1998 (ECCV '98), Freiburg/Germany, 2–6 June 1998.Google Scholar
  11. [11]
    M. Haag, W. Theilmann, K.H. SchÄfer, and H.-H. Nagel: Integration of Image Sequence Evaluation and Fuzzy Metric Temporal Logic Programming. KI-97: Advances in Artificial Intelligence, Proc. 21st Annual German Conference on Artificial Intelligence, Freiburg/Germany, 9–12 September 1997; G. Brewka, C. Habel, and B. Nebel (Eds.): Lecture Notes in Artificial Intelligence vol. 1303, Springer-Verlag Berlin, Heidelberg, New York 1997, pp. 301–312.Google Scholar
  12. [12]
    G. Herzog and P. Wazinski: Visual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review Journal 8 (1994) 175–187.CrossRefGoogle Scholar
  13. [13]
    T. Huang, D. Koller, J. Malik, G. Ogasawara, B. Rao, S. Russell, and J. Weber: Automatic Symbolic Traffic Scene Analysis Using Belief Networks. Proc. 12th National Conference on Artificial Intelligence, Seattle/WA, 31 July–4 August 1994, pp. 966–972.Google Scholar
  14. [14]
    H. Kamp and U. Reyle: From Discourse to Logic. Kluwer Academic Publishers, Dordrecht/NL, Boston/MA, London/UK 1993.Google Scholar
  15. [15]
    H. Kollnig und H.-H. Nagel: Ermittlung von begrifflichen Beschreibungen von Geschehen in Stra\enverkehrsszenen mit Hilfe unscharfer Mengen. Informatik — Forschung und Entwicklung 8 (1993) 186–196 (in German).Google Scholar
  16. [16]
    H. Kollnig and H.-H. Nagel: 3D Pose Estimation by Directly Matching Polyhedral Models to Gray Value Gradients. International Journal of Computer Vision 23:3 (1997) 283–302.CrossRefGoogle Scholar
  17. [17]
    H.-H. Nagel, H. Kollnig, M. Haag, and H. Damm: The Association of Situation Graphs with Temporal Variations in Image Sequences. Working Notes AAAI-95 Fall Symposium Series ‘Computational Models for Integrating Language and Vision', R.K. Srihari (ed.), Cambridge/MA, 10–12 November 1995, pp. 1–8.Google Scholar
  18. [18]
    B. Neumann und H.-J. Novak: NAOS: Ein System zur natürlichsprachlichen Beschreibung zeitverÄnderlicher Szenen. Informatik — Forschung Entwicklung 1 (1986) 83–92 (in German).Google Scholar
  19. [19]
    S. Satoh, Y. Nakamura, and T. Kanade: Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing. Proc. 15th International Joint Conference on Artificial Intelligence (IJCAI '97), 23–29 August 1997, Nagoya/Japan, Vol. II, pp. 1488–1493.Google Scholar
  20. [20]
    K.H. SchÄfer: Unscharfe zeitlogische Modellierung von Situationen und Handlungen in Bildfolgenauswertung und Robotik. Dissertation, FakultÄt für Informatik der UniversitÄt Karlsruhe (TH), Juli 1996. Published in: Dissertationen zur Künstlichen Intelligenz (DISKI), Band 135, infix-Verlag St. Augustin 1996 (in German).Google Scholar
  21. [21]
    M.A. Smith and T. Kanade: Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR '97), 17–19 June 1997, San Juan, Puerto Rico, pp. 775–781.Google Scholar
  22. [22]
    R.K. Srihari: Linguistic Context in Vision. Proc. IEEE Workshop on Context-Based Vision, Cambridge/MA, 19 June 1995, pp. 100–110.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  1. 1.Institut für Algorithmen und Kognitive SystemeFakultät für Informatik der Universität Karlsruhe (TH)KarlsruheGermany
  2. 2.Fraunhofer-Institut für Informations- und Datenverarbeitung (IITB)KarlsruheGermany

Personalised recommendations