How spatial information connects visual perception and natural language generation in dynamic environments: Towards a computational model

  • Wolfgang Maaß
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 988)


Suppose that you are required to describe a route step-by-step to somebody who does not know the environment. A major question in this context is what kind of spatial information must be integrated in a route description. This task generally refers to two cognitive abilities: Visual perception and natural language. In this domain, a computational model for the generation of incremental route descriptions is presented. Central to this model is a distinction into a visual, a linguistic, and a conceptual-spatial level. Basing on these different levels a software agent, called MOSES, is introduced who moves through a simulated 3D environment from a starting-point to a destination. He selects visuo-spatial information and generates appropriate route descriptions. It is shown how MOSES adopts his linguistic behavior to spatial and temporal constraints. The generation process is based on a corpus of incremental route descriptions which were collected by field experiments. The agent and the 3D environment are entirely implemented.


Spatial Information Visual Perception Spatial Relation Street Segment Decision Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Allen & Kirasic 85]
    G. L. Allen and K. C. Kirasic. Effects of the cognitive organization of route knowledge on judgments of macrospatial distance. Memory and Cognition, 13(3):218–227, 1985.Google Scholar
  2. [André et al. 89]
    E. André, G. Herzog, and T. Rist. Natural Language Access to Visual Data: Dealing with Space and Movement. In: F. Nef and M. Borillo (eds.), Logical Semantics of Time, Space and Movement in Natural Language. Proc. of 1st Workshop. Hermès, 1989.Google Scholar
  3. [Baddeley & Hitch 74]
    A. D. Baddeley and G. J. Hitch. Working Memory. In: G. Bower (ed.), Recent advances in learning and motivation. New York: Academic Press, 1974. Vol. VIII.Google Scholar
  4. [Baddeley 86]
    A. D. Baddeley. Working Memory. Oxford: Oxford University Press, 1986.Google Scholar
  5. [Binford 71]
    T. O. Binford. Visual Perception by Computer. In: Proc. IEEE Conf. on Systems and Control, 1971.Google Scholar
  6. [Bryant 92]
    D. J. Bryant. A Spatial Representation System in Humans. Journal of Memory and Language, 31:74–98, 1992.Google Scholar
  7. [Couclelis 95]
    H. Couclelis. Verbal directions for way-finding: space, cognition, and language. In: J. Portugali (ed.), The Construction of Cognitive Maps. Kluwer Publishers, 1995. in print.Google Scholar
  8. [Downs & Stea 73]
    R. M. Downs and D. Stea. Cognitive Maps and Spatial Behaviour: Process and Products. In: R. M. Downs and D. Stea (eds.), Image and Environment. Cognitive Mapping and Spatial Behaviour, pp. 8–26. Chicago: Aldine, 1973.Google Scholar
  9. [Ehrlich & Johnson-Laird 82]
    K. Ehrlich and J. N. Johnson-Laird. Spatial descriptions and referential continuity. Journal of Verbal Learning and Verbal Behavior, 21:296–306, 1982.Google Scholar
  10. [Frank 87]
    A. Frank. Towards a Spatial Theory. In: Proc. of the International Symposium on Geographic Information Systems: The Research Agenda, pp. 2:215–227, Crystal City, Virginia, 1987.Google Scholar
  11. [Freksa 91]
    Ch. Freksa. Conceptual neighborhood and its role in temporal and spatial reasoning. In: M. Singh and L. Trave-Massuyes (eds.), Decision support systems and qualitative reasoning, pp. 181–187. Amsterdam: North-Holland, 1991.Google Scholar
  12. [Goodchild 88]
    M. F. Goodchild. Towards an Enumeration and Classification of GIS Functions. In: Proc. of the International Geographic Information Systems Conference: The Research Agenda, pp. 11:67–77, Washington, 1988. NASA.Google Scholar
  13. [Gopal et al. 89]
    S. Gopal, R. Klatzky, and T. Smith. NAVIGATOR: A Psychologically Based Model of Environmental Learning Through Navigation. Journal of Environmental Psychology, 9:309–331, 1989.Google Scholar
  14. [Habel 87]
    Ch. Habel. Prozedurale Aspekte der Wegplanung und Wegbeschreibung. LILOG-Report 17, IBM, Stuttgart, 1987.Google Scholar
  15. [Herrmann & Grabowski 94]
    T. Herrmann and J. Grabowski. Sprechen: Psychologie der Sprachproduktion. Spektrum, Akademischer Verlag, 1994.Google Scholar
  16. [Herskovits 86]
    A. Herskovits. Language and Spatial Cognition. An Interdisciplinary Study of the Prepositions in English. Cambridge, London: Cambridge University Press, 1986.Google Scholar
  17. [Herzog et al. 89]
    G. Herzog, C.-K. Sung, E. André, W. Enkelmann, H.-H. Nagel, T. Rist, W. Wahlster, and G. Zimmermann. Incremental Natural Language Description of Dynamic Imagery. In: Ch. Freksa and W. Brauer (eds.), Wissensbasierte Systeme. 3. Internationaler GI-Kongreß, pp. 153–162. Berlin, Heidelberg: Springer, 1989.Google Scholar
  18. [Hirtle & Jonides 85]
    S. Hirtle and J. Jonides. Evidence of hierarchies in cognitive maps. Memory and Cognition, 13(3):208–217, 1985.Google Scholar
  19. [Hoeppner et al. 90]
    W. Hoeppner, M. Carstensen, and U. Rhein. Wegauskünfte: Die Interdependenz von Such-und Beschreibungsprozessen. In: C. Freksa and C. Habel (eds.), Informatik Fachberichte 245, pp. 221–234. Springer, 1990.Google Scholar
  20. [Jackendoff 83]
    R. Jackendoff. Semantics and Cognition. Cambridge, MA: MIT Press, 1983.Google Scholar
  21. [Johnson-Laird 83]
    P. N. Johnson-Laird. Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Cambridge University Press, 1983.Google Scholar
  22. [Klein 82]
    W. Klein. Local Deixis in Route Directions. In: R. J. Jarvella and W. Klein (eds.), Speech, Place, and Action, pp. 161–182. Chichester: Wiley, 1982.Google Scholar
  23. [Koller et al. 92]
    D. Koller, K. Daniilidis, K. Thórhallson, and H. H. Nagel. Model-based Object Tracking in Traffic Scenes. In: G. Sandini (ed.), The Second European Conference on Computer Vision, pp. 437–452, Berlin, Heidelberg, 1992. Springer.Google Scholar
  24. [Kuipers 78]
    B. Kuipers. Modeling Spatial Knowledge. Cognitive Science, 2:129–153, 1978.Google Scholar
  25. [Lakoff 87]
    G. Lakoff. Women, Fire, and Dangerous Things. What Categories Reveal about the Mind. Chicago: Chicago University Press, 1987.Google Scholar
  26. [Linde & Labov 75]
    C. Linde and W. Labov. Spatial Network as a Site for the Study of Language and Thought. Language, 51:924–939, 1975.Google Scholar
  27. [Lynch 60]
    K. Lynch. The Image of the City. MIT Press, 1960.Google Scholar
  28. [Maaß et al. 95]
    W. Maaß, Jörg Baus, and Joachim Paul. Visual Grounding of Route Descriptions in Dynamic Environments. In: AAAI Fall Symposium on ”Computational Models for Integrating Language and Vision”, MIT, Cambridge, MA, 1995. AAAI. in print.Google Scholar
  29. [Maaß 93]
    W. Maaß. A Cognitive Model for the Process of Multimodal, Incremental Route Description. In: Proc. of the European Conference on Spatial Information Theory. Springer, 1993.Google Scholar
  30. [Maaß 94]
    W. Maaß. From Visual Perception to Multimodal Communication: Incremental Route Descriptions. Artificial Intelligence Review Journal, 8(5/6), December 1994. Special Volume on Integration of Natural Language and Vision Processing.Google Scholar
  31. [Maaß 95a]
    W. Maaß. Ein situierter inkrementeller Wegbeschreibungsagent in 3D-Umgebungen. PhD thesis, Universität des Saarlandes, 1995. in preparation.Google Scholar
  32. [Maaß 95b]
    W. Maaß. Selection of objects by evaluation of visual features. in preparation, 1995.Google Scholar
  33. [Marr & Nishihara 78]
    D. Marr and H. K. Nishihara. Representation and Recognition of the Spatial Organization of three-dimensional shapes. In: Proc. Royal Society of London B, pp. 269–294, 1978.Google Scholar
  34. [Marr 82]
    D. Marr. Vision: a computational investigation into the human representation and processing of visual information. San Francisco: Freemann, 1982.Google Scholar
  35. [McKevitt 94a]
    P. McKevitt (ed.). Integration of Natural Language and Vision Processing. AAAI-94 Workshop. Seattle, WA, 1994.Google Scholar
  36. [McKevitt 94b]
    P. McKevitt (ed.). Special Volume on the Integration of Natural Language and Vision Processing, volume 8: Artificial Intelligence Review Journal. Dordrecht: Kluwer, 1994.Google Scholar
  37. [McNamara et al. 92]
    T. McNamara, J. Halpin, and J. Hardy. The representation and integration in memory of spatial and nonspatial information. Memory and Cognition, 20(5):519–532, 1992.Google Scholar
  38. [Meier et al. 88]
    J. Meier, D. Metzing, T. Polzin, P. Ruhrberg, H. Rutz, und M. Vollmer. Generierung von Wegbeschreibungen. KoLiBri Arbeitsbericht 9, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, 1988.Google Scholar
  39. [Minsky 75]
    M. Minsky. A Framework for Representing Knowledge. In: P. H. Winston (ed.), The Psychology of Computer Vision. New York: McGraw-Hill, 1975.Google Scholar
  40. [Müller 88]
    S. Müller. CITYGUIDE: Ein System zur Wegplanung und Wegbeschreibung. Diplomarbeit, Fachbereich Informatik der Universität des Saarlandes, 1988.Google Scholar
  41. [Neisser 76]
    U. Neisser. Cognition and Reality. San Francisco: Freeman, 1976.Google Scholar
  42. [Rohr 94]
    K. Rohr. Towards Model-based Recognition of Human Movements in Image Sequences. Computer Vision, Graphics, and Image Processing (CVGIP): Image Understanding, 59(1):94–115, 1994.Google Scholar
  43. [Russel & Wefald 91]
    S. Russel and E. Wefald. Do the Right Thing: Studies in Limited Rationality. Cambridge, MA: MIT Press, 1991.Google Scholar
  44. [Schank & Abelson 77]
    R. C. Schank and R. P. Abelson. Scripts, Plans, Goals and Understanding. Hillsdale, NJ: Erlbaum, 1977.Google Scholar
  45. [Schirra et al. 87]
    J. R. J. Schirra, G. Bosch, C.-K. Sung, and G. Zimmermann. From Image Sequences to Natural Language: A First Step Towards Automatic Perception and Description of Motions. Applied Artificial Intelligence, 1:287–305, 1987.Google Scholar
  46. [Talmy 83]
    L. Talmy. How Language Structures Space. In: H. Pick and L. Acredolo (eds.), Spatial Orientation: Theory, Research and Application, pp. 225–282. New York, London: Plenum, 1983.Google Scholar
  47. [Tversky 92]
    B. Tversky. Distortions in cognitive maps. Geoforum, 23:131–138, 1992.Google Scholar
  48. [Wunderlich & Reinelt 82]
    D. Wunderlich and R. Reinelt. How to Get There From Here. In: R. J. Jarvella and W. Klein (eds.), Speech, Place, and Action, pp. 183–201. Chichester: Wiley, 1982.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Wolfgang Maaß
    • 1
  1. 1.Department for Computer ScienceUniversität des SaarlandesSaarbrücken 11Germany

Personalised recommendations