Cognitive Computation

, Volume 2, Issue 4, pp 316–325 | Cite as

Embodied Object Recognition using Adaptive Target Observations



In this paper, we study object recognition in the embodied setting. More specifically, we study the problem of whether the recognition system will benefit from acquiring another observation of the object under study, or whether it is time to give up, and report the observed object as unknown. We describe the hardware and software of a system that implements recognition and object permanence as two nested perception-action cycles. We have collected three data sets of observation sequences that allow us to perform controlled evaluation of the system behavior. Our recognition system uses a KNN classifier with bag-of-features prototypes. For this classifier, we have designed and compared three different uncertainty measures for target observation. These measures allow the system to (a) decide whether to continue to observe an object or to move on, and to (b) decide whether the observed object is previously seen or novel. The system is able to successfully reject all novel objects as “unknown”, while still recognizing most of the previously seen objects.


Object recognition Attention Visual search Fixation Object permanence 


  1. 1.
    Atkeson CG. Using locally weighted regression for robot learning. In: ICRA. Sacramento, CA; 1991. pp. 958–63.Google Scholar
  2. 2.
    Bishop CM. Neural networks for pattern recognition. USA: OU Press; 1995.Google Scholar
  3. 3.
    Björkman M, Eklundh J-O. Vision in the real world: finding, attending and recognizing objects. Int J Imag Syst Technol 2006; 5(16): 189–209.CrossRefGoogle Scholar
  4. 4.
    Davis J, Goadrich M. The relationship between precision-recall and ROC curves. In: ICML06. 2006; pp. 233–40.Google Scholar
  5. 5.
    Gould S et al. Peripheral-foveal vision for real-time object recognition and tracking in video. In: IJCAI. 2007.Google Scholar
  6. 6.
    Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE TPAMI 2006; 28(4):594–611.Google Scholar
  7. 7.
    Fergus R, Fei-Fei L, Perona P, Zisserman A. Learning object categories from Google’s image search. In: ICCV, vol. 2. 2005; pp. 1816–23Google Scholar
  8. 8.
    Haith M. Who put the cog in infant cognition? Is rich interpretation too costly? Infant Beh. Dev. 1998; 21(2):167–79.CrossRefGoogle Scholar
  9. 9.
    Hou X, Zhang L. Dynamic visual attention: searching for coding length increments. In: NIPS. 2008.Google Scholar
  10. 10.
    Jégou H, Douze M, Schmid C. Hamming embedding and weak geometric consistency for large scale image search. In: ECCV, volume I of LNCS. 2008; pp. 304–17Google Scholar
  11. 11.
    Lowe DG. Distinctive image features from scale-invariant keypoints. IJCV 2004; 60(2):91–110.CrossRefGoogle Scholar
  12. 12.
    Meger D, Forssén P-E, Lai K, Helmer S, McCann S, Southey T, Baumann M, Little JJ, Lowe DG. Curious George: an attentive semantic robot. Robot Autonomous Syst J. 2008; 56(6):503–511CrossRefGoogle Scholar
  13. 13.
    Nistér D, Stewénius H. Scalable recognition with a vocabulary tree. In: CVPR, vol. 2. 2006; pp. 2161–68Google Scholar
  14. 14.
    Obdrzálek S, Matas J. Object recognition using local affine frames on distinguished regions. In: BMVC. 2002; pp. 113–22.Google Scholar
  15. 15.
    Orabona F, Metta G, Sandini G. Object-based visual attention: a model for a behaving robot. In: CVPR05 Workshop APCV, 2005.Google Scholar
  16. 16.
    Palmer SE. Vision science, photons to phenomenology. Cambridge: MIT Press; 1999.Google Scholar
  17. 17.
    Riesehuber M, Poggio T. Hierarchical models of object recognition in cortex. Nature Neurosci. 1999; 2(11):1019–25.CrossRefGoogle Scholar
  18. 18.
    Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Info. Process. Manage. 1988; 24:513–23.CrossRefGoogle Scholar
  19. 19.
    Savarese S, Fei-Fei L. 3D generic object categorization, localization and pose estimation. In: ICCV. 2007.Google Scholar
  20. 20.
    Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: ICCV. 2003; pp. 1470–77.Google Scholar
  21. 21.
    Tanimoto TT. IBM internal report. 1957.Google Scholar
  22. 22.
    Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. Nature 1996;381:520–22.CrossRefPubMedGoogle Scholar
  23. 23.
    Turcot P, Lowe DG. Better matching with fewer features. In: ICCV Workshop (WS-LAVD). 2009.Google Scholar
  24. 24.
    Ude A, Gaskett C, Cheng G. Support vector machines and gabor kernels for object recognition on a humanoid with active foveated vision. In: IEEE conference on intelligent robots and systems (IROS’04). 2004.Google Scholar
  25. 25.
    Wallenberg M, Forssén P-E. A research platform for embodied visual object recognition. Technical report, SSBA’10 symposium on image analysis 2010.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Linköping UniversityLinköpingSweden

Personalised recommendations