Kognitionswissenschaft

, Volume 8, Issue 3, pp 101–107

A Hybrid Vision Frontend for an Artificial Communicator

  • Gunther Heidemann
  • Nils Jungclaus
  • Franz Kummert
  • Gerhard Sagerer
  • Helge Ritter
Article
  • 7 Downloads

Summary

In this contribution, we present the “visual system” of an artificial communicator, which enables the communicator to recognize objects (wooden toy pieces) in camera images. It is based on a hybrid approach, which applies artificial neural nets for a holistic representation of low level knowledge. The transition to the symbolic level is realized using a semantic network as a knowledge base containing explicit object models. In the next processing step, the information extracted from single images is integrated by a scene memory to a representation suitable for the artificial communicator. On the one hand, the memory stabilizes the data extracted from static scenes, on the other hand it realizes an efficient representation of changing scenes calculating the “difference”. By these means, other modules of the communicator working on different time scales can access the scene information interactively, which is the prerequisite for a dialogue with the user.

Ein hybrides Bildanalyse-System für einen künstlichen Kommunikator

Zusammenfassung

In diesem Beitrag wird das “Sehsystem” eines künstlichen Kommunikators vorgestellt, das die Erkennung von Objekten (Holzspielzeug) in Kamerabildern ermöglicht. Es basiert auf einem hybriden Ansatz, der künstliche neuronale Netze zur holistischen Repräsentation signalnahen Wissens einsetzt. Der Übergang zur symbolischen Ebene erfolgt durch die Ankopplung einer Wissens-basis, die auf einer expliziten Objektmodellierung durch ein semantisches Netzwerk beruht. In der nächsten Verarbeitungsstufe werden die aus Einzelbildern gewonnenen Informationen durch ein Szenengedächtnis zu einer für den künstlichen Kommunikator geeigneten Repräsentation integriert. Das Gedächtnis erreicht zum einen eine Stabilisierung der aus statischen Szenen gewonnenen Daten, zum anderen durch Bildung einer “Differenz” eine effiziente Speicherung sich verändernder Szeneninhalte. Damit wird anderen, auf verschiedenen Zeitskalen arbeitenden Modulen des Kommunikators eine interaktive Abfrage der Szeneninformation ermöglicht, was die Voraussetzung für einen Dialog mit dem Benutzer darstellt.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Literatur

  1. Agre, P. & Chapman, D. (1990). What Are Plans For? In Maes, P. (Ed.), Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back, volume 6,1 of Robotics and autonomous systems, pp. 17–34.Google Scholar
  2. Ahmad, S. (1991). VISIT: An efficient computational model of human visual attention. PhD thesis, ICSI, Univ. of California, Berkley.Google Scholar
  3. Alexander, I. (1989). Neural Computing Architectures. North Oxford Academic Pub. Ltd.Google Scholar
  4. Almássy, N. & Verschure, P. (1992). Optimizing self-organization control architectures with genetic algorithms: The interaction between natural selection and ontogenesis. In R. Manderick, B. M. (Ed.), Parallel Problem Solving from Nature, 2: Proceedings of the Second Conference on Parallel Problem Solving from nature, Brussels, pp. 451–460. North-Holland.Google Scholar
  5. Blanchet, P. (1994). An Architecture for Representing and Learning Behaviors by Trial and Error. In Proceedings Congress SAB’94: From Animals to Animats III, Simulation of Adaptative Behavior, Brighton (England).Google Scholar
  6. Bollmann, M., Hoischen, R. & Mertsching, B. (1997). Interaction of static and dynamic scene — features guiding visual attention. In Paulus, E. & Wahl, F. (Hrsg.), DAGM Mustererkennung 1997, Informatik aktuell, pp. 483–490, Berlin: Springer-Verlag.Google Scholar
  7. Brooks, R. (1986). A robust layered control system for a mobile robot. In IEEE Journal of Robotics and Automation, volume 2, 1, pp. 14–23.CrossRefMathSciNetGoogle Scholar
  8. Brooks, R. (1991). Challenges for Complete Creature Architectures. In Meyer, J.-A. & Wilson, S. (Eds.), From animals to animats, First International Conference on Simulation of Adaptive Behavior, pp. 434–443.Google Scholar
  9. Cameron, S., Grossberg, S. & Guenther, F. H. (1998). A self-organizing neural network architecture for navigation using optic flow. Neural Computation, 10(2):313–352.CrossRefGoogle Scholar
  10. Chapman, D. (1991). Vision, Instruction & Action. Cambridge, MA: MIT PressGoogle Scholar
  11. Clark, A. (Ed.) (1997). Being There. Putting Brain, Body, and World Together Again. Cambridge, Massachusetts: MIT Press, A Bradford BookGoogle Scholar
  12. Doya, K. (1997). Efficient nonlinear control with actor-tutor architecture. In Mozer, M. C., Jordan, M. I. & Petsche, T. (Eds.), Advances in Neural Information Processing Systems, volume 9, p. 1012. The MIT Press.Google Scholar
  13. Fiehler, R. (Ed.) (1980). Kommunikation und Kooperation. Theoretische und empirische Untersuchungen zur kommunikativen Organisation kooperativer Prozesse. Berlin: EinhornGoogle Scholar
  14. Fislage, M. (1998). Visuelle Aufmerksamkeitssteuerung bei gestikbasierter Mensch-Maschine Interaktion. Master’s thesis, Universität Bielefeld, Technische Fakultät.Google Scholar
  15. Franklin, S. (1995). Artificial Minds. Cambridge, MA: MIT PressGoogle Scholar
  16. Franklin, S. & Graesser, A. (1996). Is It an Agent, or Just a Program: A Taxonomy for Autonomous Agents. In Müller, J. P., Woolridge, M. J. & Jennings, N. R. (Eds.), Intelligent Agents III. Agent Theories, Architectures and Languages, pp. 21–35. Berlin: SpringerGoogle Scholar
  17. Goecke, K. U. & Milde, J.-T. (1998). Natural language generation in a behavior-oriented robot control architecture. Submitted to publication.Google Scholar
  18. Happel, B. L. M. & Murre, J. M. J. (1994). Design and evolution of modular neural-network architectures. Neural Networks, 7(6–7):985–1004.CrossRefGoogle Scholar
  19. Jackendoff, R. (1990). Semantic Structures. Current studies in linguistics series, 18. Cambridge, MA: MIT PressGoogle Scholar
  20. Jacobs, R. A., Jordan, M. I. & Barto, A. G. (1991). Task decomposition through competition in a modular connectionist architecture — the what and where vision tasks. Cognitive Science, 15(2):219–250.CrossRefGoogle Scholar
  21. Lobin, H. (1998). Handlungsanweisungen. Sprachliche Spezifikation teilautonomer Aktivität. Gütersloh: Deutscher Universitäts Verlag.Google Scholar
  22. Meier, C. & Rieser, H. (1995). Modelling Situated “Reference Shiftsïn Task-Oriented Dialogue. Technical Report 95/12, SFB 360, Universität Bielefeld.Google Scholar
  23. Milde, J.-T. (1998). Action-centered communication with an embedded agent. In FLAIRS, Special Track on natural language processing and Human Computer Interaction.Google Scholar
  24. Milde, J.-T., Peters, K. & Strippgen, S. (1997). Situated communication with robots. In Proceedings of the first international Workshop on Human-Computer Conversation, Bellagio, Italy.Google Scholar
  25. Millikan, R. (Ed.) (1984). Language, Thought, and Other Biological Categories. Cambridge, Massachusetts: MIT PressGoogle Scholar
  26. Niebur, E. & Koch, C. (1996). Control of selective visual attention: Modelling the “where” pathway. In Touretzky, D., Mozer, M. & Hasselmo, M. (Eds.), Advances in Neural Information Processing Systems 8 NIPS*95. Bradford: MIT Press.Google Scholar
  27. Nölker, C. & Ritter, H. (1998). Grefit: Visuelle Erkennung Kontinuierlicher Handposturen. In Dassow, J. & Kruse, R. (Hrsg.), Informatik ’98: Informatik zwischen Bild und Sprache, pp. 213–222, Magdeburg: Springer Verlag. Reihe “Informatik aktuell”.Google Scholar
  28. Peters, K., Strippgen, S. & Milde, J.-T. (1997). Language processing in action-centered communication. In Tagungsband der DGFS, Sektion Computerlinguistik, Heidelberg.Google Scholar
  29. Peters, K., Strippgen, S. & Milde, J.-T. (1998). Cora — an instructable agent. In Proceedings of DARS, Karlsruhe.Google Scholar
  30. Rae, R. & Ritter, H. (1998). 3d real-time tracking of points of interest based on zero-disparity filtering. In Posch, S. & Ritter, H. (Hrsg.), Workshop Dynamische Perzeption, Proceedings in Artificial Intelligence, pp. 105–111, Workshop der GI-Fachgruppe 1.0.4 Bildverstehen in Bielefeld, 18.–19. Jun., Germany. Sankt Augustin, Deutschland: INFIX-VerlagGoogle Scholar
  31. Rieser, H. (1996). Repräsentations-Metonymie, Perspektive und aufgabenorientierten Dialogen. Technical Report 96/6, SFB 360, Universität Bielefeld.Google Scholar
  32. Steels, L. (1998). The origin of syntax in visually grounded agents. In Artificial Intelligence, pp. 133–156.Google Scholar
  33. Torrance, M. (1994). Natural Communication with Robots. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.Google Scholar
  34. van Dam, J. W. M., Kröse, B. J. A. & Groen, F. C. A. (1994). CNN: A neural architecture that learns multiple transformations of spatial representations. In Marinaro, M. & Morasso, P. G. (Eds.), Artificial Neural Networks, pp. 1420–1423. Berlin Heidelberg New York: Springer-Verlag.Google Scholar
  35. Weiskrantz, L. (Ed.) (1988). Thought without language. Oxford University Press.Google Scholar
  36. Zadel, S. (1996). An algorithm for bootstrapping the core of a biologically inspired motor control system. In von der Malsburg, C., von Seelen, W., Vorbrüggen, J. C. & Sendhoff, B. (Eds.), Proceedings of the International Conference on Artificial Neural Networks (ICANN’96), Bochum, Germany, July 16–19, volume 1112 of Lecture Notes in Computer Science, pp. 629–634, Berlin, Heidelberg, New York: Springer-Verlag.Google Scholar

Copyright information

© Springer-Verlag 1999

Authors and Affiliations

  • Gunther Heidemann
    • 1
  • Nils Jungclaus
    • 1
  • Franz Kummert
    • 1
  • Gerhard Sagerer
    • 1
  • Helge Ritter
    • 1
  1. 1.SFB 360Universität BielefeldBielefeld

Personalised recommendations