Autonomous Robots

, Volume 40, Issue 1, pp 33–57 | Cite as

From passive to interactive object learning and recognition through self-identification on a humanoid robot

  • Natalia LyubovaEmail author
  • Serena Ivaldi
  • David Filliat


Service robots, working in evolving human environments, need the ability to continuously learn to recognize new objects. Ideally, they should act as humans do, by observing their environment and interacting with objects, without specific supervision. Taking inspiration from infant development, we propose a developmental approach that enables a robot to progressively learn objects appearances in a social environment: first, only through observation, then through active object manipulation. We focus on incremental, continuous, and unsupervised learning that does not require prior knowledge about the environment or the robot. In the first phase, we analyse the visual space and detect proto-objects as units of attention that are learned and recognized as possible physical entities. The appearance of each entity is represented as a multi-view model based on complementary visual features. In the second phase, entities are classified into three categories: parts of the body of the robot, parts of a human partner, and manipulable objects. The categorization approach is based on mutual information between the visual and proprioceptive data, and on motion behaviour of entities. The ability to categorize entities is then used during interactive object exploration to improve the previously acquired objects models. The proposed system is implemented and evaluated with an iCub and a Meka robot learning 20 objects. The system is able to recognize objects with 88.5 % success and create coherent representation models that are further improved by interactive learning.


Developmental robotics Interactive object learning  Self-identification Object recognition 



This work was supported by the French ANR program (ANR-10-BLAN-0216) through Project MACSi, and partly by the European Commission, within the CoDyCo project (FP7-ICT-2011-9, No. 600716). The authors would like to thank the anonymous reviewers for their comments that greatly helped improving the quality of the paper.


  1. Aldavert, D., Ramisa, A., López de Mántaras, R., & Toledo, R., et al. (2010). Real-time object segmentation using a bag of features approach. Artificial Intelligence Research and Development, pp. 321–329.Google Scholar
  2. Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110, 346–359.CrossRefGoogle Scholar
  3. Belongie, S., Carson, C., Greenspan, H., & Malik, J. (1998). Color-and texture-based image segmentation using EM and its application to content-based image retrieval. In IEEE Conference on Computer Vision (pp. 675–682).Google Scholar
  4. Beucher, S., & Meyer, F. (1993). The morphological approach to segmentation: The watershed transformation. mathematical morphology in image processing. Optical Engineering, 34, 433–481.Google Scholar
  5. Browatzki, B., Tikhanoff, V., Metta, G., Bulthoff, H., & Wallraven, C. (2012). Active object recognition on a humanoid robot. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 2021–2028).Google Scholar
  6. Burger, W., & Burge, M. J. (2008). Digital image processing. Berlin: Springer.CrossRefGoogle Scholar
  7. Chao, F., Lee, M. H., Jiang, M., & Zhou, C. (2014). An infant development-inspired approach to robot hand-eye coordination. International Journal of Advanced Robotic Systems, 11, 15.Google Scholar
  8. Chinellato, E., Antonelli, M., Grzyb, B., & del Pobil, A. (2011). Implicit sensorimotor mapping of the peripersonal space by gazing and reaching. IEEE Transactions on Autonomous Mental Development, 3(1), 43–53.CrossRefGoogle Scholar
  9. Chu, V., McMahon, I., Riano, L., McDonald, C., He, Q., Martinez Perez-Tejada, J., Arrigo, M., Fitter, N., Nappo, J., Darrell, T., & Kuchenbecker, K. (2013). Using robotic exploratory procedures to learn the meaning of haptic adjectives. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 3048–3055).Google Scholar
  10. Crandall, D.J., Felzenszwalb, P.F., & Huttenlocher, D.P. (2005). Spatial priors for part-based recognition using statistical models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10–17).Google Scholar
  11. Dickscheid, T., Schindler, F., & Förstner, W. (2011). Coding images with local features. International Journal on Computer Vision, 94, 154–174.CrossRefzbMATHGoogle Scholar
  12. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification (2nd ed.). New York City: Wiley-Interscience.Google Scholar
  13. Everingham, M., Eslami, S., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2014). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRefGoogle Scholar
  14. Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 264–271).Google Scholar
  15. Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 1, pp. 380–387).Google Scholar
  16. Fiala, M. (2005). Artag, a fiducial marker system using digital techniques. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 2, pp. 590–596).Google Scholar
  17. Filliat, D. (2007). A visual bag of words method for interactive qualitative localization and mapping. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 3921–3926).Google Scholar
  18. Gaël, G., & Benoît, J. (2010). Eigen v3.
  19. Gevers, T., & Smeulders, A. W. (1999). Color-based object recognition. Pattern Recognition, 32(3), 453–464.CrossRefGoogle Scholar
  20. Gold, K., & Scassellati, B. (2006). Learning acceptable windows of contingency. Connection Science, 18(2), 217–228.CrossRefGoogle Scholar
  21. Goldstein, E. B. (2010). Sensation and perception. Belmont: Wadsworth Publishing Company.Google Scholar
  22. Grauman, K., & Leibe, B. (2011). Visual object recognition. Synthesis lectures on artificial intelligence and machine learning. San Rafael: Morgan & Claypool Publishers.Google Scholar
  23. Griffith, S., Sukhoy, V., & Stoytchev, A. (2011). Using sequences of movement dependency graphs to form object categories. In IEEE-RAS International Conference on Humanoid Robots (Humanoids) (pp. 715–720).Google Scholar
  24. Gupta, M., & Sukhatme, G. (2012). Using manipulation primitives for brick sorting in clutter. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 3883–3889).Google Scholar
  25. Harman, K. L., Humphrey, G., & Goodale, M. A. (1999). Active manual control of object views facilitates visual recognition. Current Biology, 9(22), 1315–1318.CrossRefGoogle Scholar
  26. Hoffmann, M., Marques, H., Hernandez Arieta, A., Sumioka, H., Lungarella, M., & Pfeifer, R. (2010). Body schema in robotics: A review. IEEE Transactions on Autonomous Mental Development, 2(4), 304–324.CrossRefGoogle Scholar
  27. Huang, T., Yang, G., & Tang, G. (1979). A fast two-dimensional median filtering algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(1), 13–18.CrossRefGoogle Scholar
  28. Hulse, M., McBrid, S., & Lee, M. (2009). Robotic hand-eye coordination without global reference: A biologically inspired learning scheme. In IEEE International Conference on Development and Learning (ICDL) (pp. 1–6).Google Scholar
  29. Ivaldi, S., Lyubova, N., Gérardeaux-Viret, D., Droniou, A., Anzalone, S.M., Chetouani, M., Filliat, D., & Sigaud, O. (2012). Perception and human interaction for developmental learning of objects and affordances. In IEEE International Conference on Humanoid Robots (Humanoids) (pp. 248–254).Google Scholar
  30. Ivaldi, S., Nguyen, S., Lyubova, N., Droniou, A., Padois, V., Filliat, D., et al. (2013). Object learning through active exploration. IEEE Transactions on Autonomous Mental Development, 6(1), 56–72.CrossRefGoogle Scholar
  31. Ivaldi, S., Nguyen, S., Lyubova, N., Droniou, A., Padois, V., Filliat, D., et al. (2014). Object learning through active exploration. IEEE Transactions on Autonomous Mental Development, 6, 56–72.CrossRefGoogle Scholar
  32. Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87(3), 316–336.CrossRefGoogle Scholar
  33. Kemp, C., & Edsinger, A. (2006). What can i control?: The development of visual categories for a robots body and the world that it influences. In International Workshop on Epigenetic Robotics (Epirob) (pp. 33–40).Google Scholar
  34. Kraft, D., Pugeault, N., Baseski, E., Popovic, M., Kragic, D., Kalkan, S., et al. (2008). Birth of the object: Detection of objectness and extraction of object shape through object-action complexes. International Journal of Humanoid Robotics, 05(02), 247–265.CrossRefGoogle Scholar
  35. Krainin, M., Henry, P., Ren, X., & Fox, D. (2011). Manipulator and object tracking for in-hand 3D object modeling. International Journal of Robotics Research, 30(11), 1311–1327.CrossRefGoogle Scholar
  36. Law, J., Shaw, P., Lee, M., & Sheldon, M. (2014). From saccades to grasping: A model of coordinated reaching through simulated development on a humanoid robot. IEEE Transactions on Autonomous Mental Development, 6(2), 93–109.CrossRefGoogle Scholar
  37. LeCun, Y., Huang, F.J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 97–104).Google Scholar
  38. Lucas, B.D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 674–679).Google Scholar
  39. Lyubova, N. (2013). Developmental approach of perception for a humanoid robot. Ph.D. thesis, ENSTA ParisTech.Google Scholar
  40. Marjanovic, M.J., Scassellati, B., & Williamson, M.M. (1996). Self-taught visually-guided pointing for a humanoid robot. In From Animals to Animats 4: International Conference on Simulation of Adaptive Behavior (SAB) (pp. 35–44).Google Scholar
  41. Metta, G., & Fitzpatrick, P. M. (2003). Better vision through manipulation. Adaptive Behaviour, 11(2), 109–128.CrossRefGoogle Scholar
  42. Michel, P., Gold, K., & Scassellati, B. (2004). Motion-based robotic self-recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (vol. 3, pp. 2763–2768).Google Scholar
  43. Micusik, B., & Kosecka, J. (2009). Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry. In IEEE International Conference on Computer Vision (ICCV) (pp. 625–632).Google Scholar
  44. Modayil, J., & Kuipers, B. (2008). The initial development of object knowledge by a learning robot. Robotics and Autonomous Systems, 56, 879–890.CrossRefGoogle Scholar
  45. Nagi, J., Ducatelle, F., Di Caro, G.A., Ciresan, D., Meier, U., Giusti, A., Nagi, F., Schmidhuber, J., & Gambardella, L.M. (2011). Max-pooling convolutional neural networks for vision-based hand gesture recognition. In IEEE International Conference on Signal and Image Processing Applications (ICSIPA) (pp. 342–347).Google Scholar
  46. Natale, L., Orabona, F., Berton, F., Metta, G., & Sandini, G. (2005). From sensorimotor development to object perception. In IEEE/RAS International Conference on Humanoid Robots (pp. 226–231).Google Scholar
  47. Natale, L., Nori, F., Metta, G., Fumagalli, M., Ivaldi, S., Pattacini, U., et al. (2013). The icub platform: A tool for studying intrinsically motivated learning. In G. Baldassarre & M. Mirolli (Eds.), Intrinsically motivated learning in natural and artificial systems (pp. 433–458). Berlin: Springer.CrossRefGoogle Scholar
  48. Needham, A., Barrett, T., & Peterman, K. (2002). A pick-me-up for infants exploratory skills: Early simulated experiences reaching for objects using sticky mittens enhances young infants object exploration skills. Infant Behavior and Development, 25(3), 279–295.CrossRefGoogle Scholar
  49. Nguyen, S.M., Ivaldi, S., Lyubova, N., Droniou, A., Gérardeaux-Viret, D., Filliat, D., Padois, V., Sigaud, O., & Oudeyer, P.Y. (2013). Learning to recognize objects through curiosity-driven manipulation with the icub humanoid robot. In International Conference on Development and Learning (pp. 1–8).Google Scholar
  50. Orabona, F., Metta, G., & Sandini, G. (2007). A proto-object based visual attention model. In L. Paletta & E. Rome (Eds.), Attention in cognitive systems, theories and systems from an interdisciplinary viewpoint., Lecture notes in computer science Berlin: Springer.Google Scholar
  51. Piaget, J. (1999). Play, dreams and imitation in childhood. London: Routledge.Google Scholar
  52. Prest, A., Leistner, C., Civera, J., Schmid, C., & Ferrari, V. (2012). Learning object class detectors from weakly annotated video. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 3282–3289).Google Scholar
  53. Pylyshyn, Z. W. (2001). Visual indexes, preconceptual objects, and situated vision. Cognition, 80, 127–158.CrossRefGoogle Scholar
  54. Rensink, R. A. (2000). Seeing, sensing, and scrutinizing. Vision Research, 40(10–12), 1469–1487.Google Scholar
  55. Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 2, pp. 1605–1614).Google Scholar
  56. Saegusa, R., Metta, G., & Sandini, G. (2012). Body definition based on visuomotor correlation. IEEE Transactions on Industrial Electronics, 59(8), 3199–3210.CrossRefGoogle Scholar
  57. Schiebener, D., Morimoto, J., Asfour, T., & Ude, A. (2013). Integrating visual perception and manipulation for autonomous learning of object representations. Adaptive Behavior, 21(5), 328–345.CrossRefGoogle Scholar
  58. Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 593–600).Google Scholar
  59. Shih, F. Y. (2009). Image processing and mathematical morphology: Fundamentals and applications. Boca Raton: CRC PressI Llc.CrossRefGoogle Scholar
  60. Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).Google Scholar
  61. Sinapov, J., Bergquist, T., Schenck, C., Ohiri, U., Griffith, S., & Stoytchev, A. (2011). Interactive object recognition using proprioceptive and auditory feedback. International Journal of Robotic Research, 30(10), 1250–1262.CrossRefGoogle Scholar
  62. Sivic, J., & Zisserman, A. (2003). Video google: Text retrieval approach to object matching in videos. International Conference on Computer Vision, 2, 1470–1477.CrossRefGoogle Scholar
  63. Southey, T., & Little, J.J. (2006). Object discovery through motion, appearance and shape. In AAAI Workshop on Cognitive Robotics (p. 9).Google Scholar
  64. Spelke, E. S. (1990). Principles of object perception. Cognitive Science, 14, 29–56.CrossRefGoogle Scholar
  65. Spelke, E. S., & Kinzler, K. D. (2007). Core knowledge. Developmental Science, 10(1), 89–96.CrossRefGoogle Scholar
  66. Torres-Jara, E., Natale, L., & Fitzpatrick, P. (2005). Tapping into touch. In International Workshop on Epigenetic Robotics (Epirob) (pp. 79–86). Lund: Lund University Cognitive Studies.Google Scholar
  67. Ude, A., Omrčen, D., & Cheng, G. (2008). Making object learning and recognition an active process. International Journal of Humanoid Robotics, 5(02), 267–286.CrossRefGoogle Scholar
  68. van Hoof, H., Kroemer, O., & Peters, J. (2014). Probabilistic segmentation and targeted exploration of objects in cluttered environments. IEEE Transactions on Robotics, 30(5), 1198–1209.CrossRefGoogle Scholar
  69. Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal on Computer Vision, 57, 137–154.CrossRefGoogle Scholar
  70. Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–407.CrossRefzbMATHGoogle Scholar
  71. Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., et al. (2001). Autonomous mental development by robots and animals. Science, 291(5504), 599–600.CrossRefGoogle Scholar
  72. Wersing, H., Kirstein, S., Götting, M., Brandl, H., Dunn, M., Mikhailova, I., et al. (2007). Online learning of objects in a biologically motivated visual architecture. International Journal Neural Systems, 17(4), 219–230.CrossRefGoogle Scholar
  73. Yang, M.H., & Ahuja, N. (1999). Gaussian mixture model for human skin color and its application in image and video databases. In SPIE: Storage and Retrieval for Image and Video Databases (vol. 3656, pp. 458–466).Google Scholar
  74. Zhang, Z. (2012). Microsoft kinect sensor and its effect. IEEE MultiMedia, 19(2), 4–10.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Natalia Lyubova
    • 1
    Email author
  • Serena Ivaldi
    • 2
    • 3
    • 4
  • David Filliat
    • 5
  1. 1.Aldebaran-RoboticsParisFrance
  2. 2.InriaVillers-lès-NancyFrance
  3. 3.IAS, TU-DarmstadtDarmstadtGermany
  4. 4.Institut des Systèmes Intelligents et de RobotiqueCNRS UMR 7222 & Université Pierre et Marie CurieParisFrance
  5. 5.ENSTA ParisTech - INRIA FLOWERS Team, Computer Science and System Engineering LaboratoryENSTA ParisTechParisFrance

Personalised recommendations