A Perceptual System for Language Game Experiments



This chapter describes key aspects of a visual perception system as a key component for language game experiments on physical robots. The vision system is responsible for segmenting the continuous flow of incoming visual stimuli into segments and computing a variety of features for each segment. This happens by a combination of bottom-up way processing that work on the incoming signal and top-down processing based on expectations about what was seen before or objects stored in memory. This chapter consists of two parts. The first one is concerned with extracting and maintaining world models about spatial scenes, without any prior knowledge of the possible objects involved. The second part deals with the recognition of gestures and actions which establish the joint attention and pragmatic feedback that is an important aspect of language games. experiments.

Key words

visual perception humanoid robots world models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aherne F, Thacker NA, Rockett PI (1998) The Bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika 34(4):363–368MathSciNetGoogle Scholar
  2. Baddeley AD (1983) Working memory. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences (1934-1990) 302(1110):311– 324Google Scholar
  3. Baillie JC, Ganascia JG (2000) Action categorization from video sequences. In:Google Scholar
  4. Horn W (ed) Proceedings ECAI, IOS Press, pp 643–647Google Scholar
  5. Ballard DH, Hayhoe MM, Pook PK, Rao RPN (1997) Deictic codes for the embodiment of cognition. Behavioural and Brain Sciences 20(4):723–742Google Scholar
  6. Belpaeme T, Steels L, Van Looveren J (1998) The construction and acquisition ofGoogle Scholar
  7. visual categories. In: Proceedings EWLR-6, Springer, LNCS, vol 1545, pp 1–12Google Scholar
  8. Bhattacharyya A (1943) On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin Calcutta Mathematical Society 35:99–110zbMATHGoogle Scholar
  9. Breazeal C (2002) Designing Sociable Robots. MIT PressGoogle Scholar
  10. Breazeal C (2003) Toward sociable robots. Robotics and Autonomous Systems 42(3-4):167–175zbMATHCrossRefGoogle Scholar
  11. Brooks A, Arkin R (2007) Behavioral overlays for non-verbal communication expression on a humanoid robot. Autonomous Robots 22(1):55–74CrossRefGoogle Scholar
  12. Cassell J, Torres OE, Prevost S (1999) Turn taking vs. discourse structure: how best to model multimodal conversation. Machine Conversations pp 143–154Google Scholar
  13. Chella A, Frixione M, Gaglio S (2003) Anchoring symbols to conceptual spaces: the case of dynamic scenarios. Robotics and Autonomous Systems 43(2-3):175–188CrossRefGoogle Scholar
  14. Colombo C, Del Bimbo A, Valli A (2003) Visual capture and understanding of handGoogle Scholar
  15. pointing actions in a 3-D environment. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 3(4):677–686Google Scholar
  16. Coradeschi S, Saffiotti A (2003) An introduction to the anchoring problem. Robotics and Autonomous Systems 43(2-3):85–96CrossRefGoogle Scholar
  17. Cruse H, Durr V, Schmitz J (2007) Insect walking is based on a decentralized architecture revealing a simple and robust controller. Phil Trans R Soc A 365:221–250MathSciNetCrossRefGoogle Scholar
  18. Dautenhahn K, Odgen B, Quick T (2002) From embodied to socially embedded agents–implications for interaction-aware robots. Cognitive Systems Research 3(3):397–428CrossRefGoogle Scholar
  19. Dominey PF, Boucher JD (2005) Learning to talk about events from narrated video in a construction grammar framework. Artificial Intelligence 167(1-2):31–61CrossRefGoogle Scholar
  20. Fong T, Nourbakhsh I, Dautenhahn K (2002) A survey of socially interactive robots. Robotics and Autonomous Systems 42(3-4):143–166Google Scholar
  21. Fujita M, Kuroki Y, Ishida T, Doi TT (2003) Autonomous behavior control architecture of entertainment humanoid robot sdr-4x. In: Proceedings IROS ’03, pp 960–967, vol. 1Google Scholar
  22. Gardenfors P (2000) Conceptual Spaces: The Geometry of Thought. MIT PressGoogle Scholar
  23. Haasch A, Hofemann N, Fritsch J, Sagerer G (2005) A multi-modal object attention system for a mobile robot. In: Proceedings IROS ’05, pp 2712–2717Google Scholar
  24. Hafner V, Kaplan F (2005) Learning to interpret pointing gestures: experiments with four-legged autonomous robots. In: Biomimetic Neural Learning for IntelligentGoogle Scholar
  25. Robots, LNCS, vol 3575, Springer, pp 225–234Google Scholar
  26. Hager GD, Belhumeur PN (1998) Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(10):1025–1039CrossRefGoogle Scholar
  27. Hurford JR (2003) The neural basis of predicate-argument structure. Behavioral and Brain Sciences 26(3):261–316Google Scholar
  28. Imai M, Ono T, Ishiguro H (2004) Physical relation and expression: joint attention for human-robot interaction. IEEE Transactions on Industrial Electronics 50(4):636–643CrossRefGoogle Scholar
  29. Ishiguro H (2006) Android science: conscious and subconscious recognition. Connection Science 18(4):319–332CrossRefGoogle Scholar
  30. Jungel M, Hoffmann J, Lotzsch M (2004) A real-time auto-adjusting vision system for robotic soccer. In: Polani D, Browning B, Bonarini A (eds) RoboCup 2003:Google Scholar
  31. Robot Soccer World Cup VII, Springer, LNCS, vol 3020, pp 214–225Google Scholar
  32. Kalman RE (1960) A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering 82(1):35–45CrossRefGoogle Scholar
  33. Kanda T, Kamasima M, Imai M, Ono T, Sakamoto D, Ishiguro H, Anzai Y (2007) A humanoid robot that pretends to listen to route guidance from a human. Autonomous Robots 22(1):87–100 Kaplan F, Hafner V (2006) The challenges of joint attention. Interaction Studies 7(2):129–134Google Scholar
  34. Kato H, Billinghurst M (1999) Marker tracking and HMD calibration for a videobased augmented reality conferencing system. In: Proceedings ISAR ’99, pp 85– 94Google Scholar
  35. Kopp S (2010) Social resonance and embodied coordination in face-to-face conversation with artificial interlocutors. Speech Communication 52(6):587–597CrossRefGoogle Scholar
  36. Kortenkamp D, Huber E, Bonasso RP (1996) Recognizing and interpreting gestures on a mobile robot. In: Proceedings AAAI-96, pp 915–921Google Scholar
  37. Kozima H, Yano H (2001) A robot that learns to communicate with human caregivers. In: Proceedings EPIROB ’01Google Scholar
  38. Kroger B, Kopp S, Lowit A (2009) A model for production, perception, and acquisition of actions in face-to-face communication. Cognitive ProcessingGoogle Scholar
  39. Marjanovic M, Scassellati B, Williamson M (1996) Self-taught visually-guidedGoogle Scholar
  40. pointing for a humanoid robot. In: Proceedings SAB ’96, The MIT Press, pp 35–44Google Scholar
  41. Marr D (1982) Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W. H. Freeman, San Francisco, CAGoogle Scholar
  42. Martin C, Steege FF, Gross HM (2009) Estimation of pointing poses for visually instructing mobile robots under real world conditions. Robotics and Autonomous Systems 58(2):174–185CrossRefGoogle Scholar
  43. Mishkin M, Ungerleider LG, Macko KA (1983) Object vision and spatial vision: two cortical pathways. Trends in Neurosciences 6:414–417CrossRefGoogle Scholar
  44. Nagai Y, Hosada K, Morita A, Asada M (2003) A constructive model for the development of joint attention. Connection Science 15(4):211–229CrossRefGoogle Scholar
  45. Nickel K, Stiefelhagen R (2007) Visual recognition of pointing gestures for humanrobotGoogle Scholar
  46. interaction. Image and Vision Computing 25(12):1875–1884Google Scholar
  47. Perez P, Hue C, Vermaak J, Gangnet M (2002) Color-based probabilistic tracking. In: Proceedings ECCV ’02, Springer, LNCS, vol 2350, pp 661–675Google Scholar
  48. Pfeifer R, Lungarella M, Iida F (2007) Self-organization, embodiment, and biologically inspired robotics. Science 318:1088–1093CrossRefGoogle Scholar
  49. Pylyshyn ZW (1989) The role of location indexes in spatial perception: A sketch of the FINST spatial-index model. Cognition 32(1):65–97CrossRefGoogle Scholar
  50. Pylyshyn ZW (2001) Visual indexes, preconceptual objects, and situated vision. Cognition 80(1):127–158CrossRefGoogle Scholar
  51. Scassellati B (1999) Imitation and mechanisms of joint attention: A developmentalGoogle Scholar
  52. structure for building social skills on a humanoid robot. In: Nehaniv CL (ed)Google Scholar
  53. Computation for Metaphors, Analogy, and Agents, LNCS, vol 1562, Springer, pp 176–195Google Scholar
  54. Siskind JM (1995) Grounding language in perception. Artificial Intelligence Review 8(5-6):371–391CrossRefGoogle Scholar
  55. Siskind JM (2001) Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. Journal of Artificial Intelligence Research 15:31–90zbMATHGoogle Scholar
  56. Soille P (2003) Morphological Image Analysis: Principles and Applications. SpringerGoogle Scholar
  57. Spelke ES (1990) Principles of object perception. Cognitive Science 14(1):29–56CrossRefGoogle Scholar
  58. Spranger M (2008) World models for grounded language gamesGoogle Scholar
  59. Spranger M, Pauw S, Loetzsch M, Steels L (2012) Open-ended procedural semantics. In: Steels L, Hild M (eds) Grounding Language in Robots, Springer Verlag, BerlinGoogle Scholar
  60. Steels L (1998) The origins of syntax in visually grounded robotic agents. Artificial Intelligence 103(1-2):133–156zbMATHCrossRefGoogle Scholar
  61. Steels L, Baillie JC (2003) Shared grounding of event descriptions by autonomous robots. Robotics and Autonomous Systems 43(2-3):163–173CrossRefGoogle Scholar
  62. Steels L, Kaplan F (1998) Stochasticity as a source of innovation in language games. In: Proceedings ALIFE ’98, MIT Press, pp 368–376Google Scholar
  63. Steels L, Vogt P (1997) Grounding adaptive language games in robotic agents. In:Google Scholar
  64. Proceedings ECAL ’97, The MIT Press, pp 473–484Google Scholar
  65. Tomasello M (1995) Joint attention as social cognition. In: Moore C, Dunham PJ (eds) Joint Attention: Its Origins and Role in Development, Lawrence Erlbaum Associates, Hillsdale, NJGoogle Scholar
  66. Tomasello M (1999) The Cultural Origins of Human Cognition. Harvard University Press, HarvardGoogle Scholar
  67. Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences 28:675–691Google Scholar
  68. Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cognitive Psychology 12(1):97–136CrossRefGoogle Scholar
  69. Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: Survey of an emerging domain. Image and Vision Computing 27(12):1743–1759CrossRefGoogle Scholar
  70. Wagner D, Schmalstieg D (2007) ARToolKitPlus for pose tracking on mobile devices. In: Proceedings CVWW ’07Google Scholar
  71. Yilmaz A, Javed O, Shah M (2006) Object tracking: A survey. ACM Computing Surveys 38(13):1–45Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Sony Computer Science Laboratory ParisParisFrance
  2. 2.Systems Technology LaboratorySony CorporationTokyoJapan
  3. 3.AI LabVrije Universiteit BrusselBrusselsBelgium
  4. 4.ICREA Institute for Evolutionary Biology (UPF-CSIC)BarcelonaSpain

Personalised recommendations