Cognitive Processing

, Volume 13, Supplement 1, pp 369–374 | Cite as

Exploiting spatial descriptions in visual scene analysis

  • Leon ZieglerEmail author
  • Katrin Johannsen
  • Agnes Swadzba
  • Jan P. De Ruiter
  • Sven Wachsmuth
Short Report


The reliable automatic visual recognition of indoor scenes with complex object constellations using only sensor data is a nontrivial problem. In order to improve the construction of an accurate semantic 3D model of an indoor scene, we exploit human-produced verbal descriptions of the relative location of pairs of objects. This requires the ability to deal with different spatial reference frames (RF) that humans use interchangeably. In German, both the intrinsic and relative RF are used frequently, which often leads to ambiguities in referential communication. We assume that there are certain regularities that help in specific contexts. In a first experiment, we investigated how speakers of German describe spatial relationships between different pieces of furniture. This gave us important information about the distribution of the RFs used for furniture–predicate combinations, and by implication also about the preferred spatial predicate. The results of this experiment are compiled into a computational model that extracts partial orderings of spatial arrangements between furniture items from verbal descriptions. In the implemented system, the visual scene is initially scanned by a 3D camera system. From the 3D point cloud, we extract point clusters that suggest the presence of certain furniture objects. We then integrate the partial orderings extracted from the verbal utterances incrementally and cumulatively with the estimated probabilities about the identity and location of objects in the scene, and also estimate the probable orientation of the objects. This allows the system to significantly improve both the accuracy and richness of its visual scene representation.


Spatial cognition Reference frames Spatial language 3D perception Speech perception Scene interpretation 



This work was funded by the German Research Foundation (DFG) within the Collaborative Research Center 673 “Alignment in Communication”.

Conflict of interest

This supplement was not sponsored by outside commercial interests. It was funded entirely by ECONA, Via dei Marsi, 78, 00185 Roma, Italy.


  1. Bates D, Maechler M, Bolker B (2011) lme4: linear mixed-effects models using S4 classes. R package version 0.999375-42. Accessed 6 June 2012
  2. Baum M (2011) Using spinImages for 3D object classification. Bachelor Thesis, Faculty of Technology, Bielefeld UniversityGoogle Scholar
  3. Carlson LA (1999) Selecting a reference frame. Spatial Cogn Comput 1(4):365–379. doi: 10.1023/A:1010071109785 CrossRefGoogle Scholar
  4. Carlson-Radvansky LA, Irwin DA (1993) Frames of reference in vision and language: where is above? Cognition 46:223–244. doi: 10.1016/j.bbr.2011.03.031 PubMedCrossRefGoogle Scholar
  5. Cohn A, Renz J (2007) Qualitative spatial representation and reasoning. In: Harmelen F, Lifschitz V, Porter B (eds) Handbook of knowledge representation. Elsevier, Amsterdam, pp 1–47Google Scholar
  6. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395. doi: 10.1145/358669.358692 CrossRefGoogle Scholar
  7. Graf R, Herrmann T (1989) Zur sekundären Raumreferenz: Gegenüberobjekte bei nicht-kanonischer Betrachterposition. Arbeiten aus dem SFB 245 “Sprechen und Sprachverstehen im sozialen Kontext”. Heidelberg/MannheimGoogle Scholar
  8. Johnson AE, Hebert M (1998) Surface matching for object recognition in complex 3D scenes. Image Vis Comput 16:635–651. doi: 10.1016/S0262-8856(98)00074-2 CrossRefGoogle Scholar
  9. Levinson SC (2003) Space in language and cognition. Explorations in cognitive diversity. University Press, CambridgeCrossRefGoogle Scholar
  10. Logan GD, Sadler DD (1996) A computational analysis of the apprehension of spatial relations. In: Bloom P, Peterson MA, Nadel L, Garrett M (eds) Language and space. MIT Press, Cambridge, pp 493–529Google Scholar
  11. Miller G, Johnson-Laird P (1976) Language and perception. Belknap Press, CambridgeGoogle Scholar
  12. Mukerjee A (1998) Neat versus scruffy: a review of computational models for spatial expressions. In: Oliver P, Gapp KP (eds) Representation and processing of spatial expressions. L. Erlbaum Associates Inc, Hillsdale, NJ, pp 1–35Google Scholar
  13. R Development Core Team (2011) R: A language and environment for statistical computing. R foundation for statistical computing, ViennaGoogle Scholar
  14. Rusu RB, Cousins S (2011) 3D is here: point cloud library (PCL). IEEE Int Conf Robot Autom (ICRA). doi: 10.1109/ICRA.2011.5980567

Copyright information

© Marta Olivetti Belardinelli and Springer-Verlag 2012

Authors and Affiliations

  • Leon Ziegler
    • 1
    Email author
  • Katrin Johannsen
    • 2
  • Agnes Swadzba
    • 1
  • Jan P. De Ruiter
    • 3
  • Sven Wachsmuth
    • 1
  1. 1.Applied Informatics GroupBielefeld UniversityBielefeldGermany
  2. 2.Faculty of Linguistics and Literary StudiesBielefeld UniversityBielefeldGermany
  3. 3.Chair for PsycholinguisticsBielefeld UniversityBielefeldGermany

Personalised recommendations