Cognitive Processing

, Volume 19, Issue 2, pp 285–296 | Cite as

Integrating planning perception and action for informed object search

  • Luis J. Manso
  • Marco A. Gutierrez
  • Pablo Bustos
  • Pilar Bachiller
Research Report


This paper presents a method to reduce the time spent by a robot with cognitive abilities when looking for objects in unknown locations. It describes how machine learning techniques can be used to decide which places should be inspected first, based on images that the robot acquires passively. The proposal is composed of two concurrent processes. The first one uses the aforementioned images to generate a description of the types of objects found in each object container seen by the robot. This is done passively, regardless of the task being performed. The containers can be tables, boxes, shelves or any other kind of container of known shape whose contents can be seen from a distance. The second process uses the previously computed estimation of the contents of the containers to decide which is the most likely container having the object to be found. This second process is deliberative and takes place only when the robot needs to find an object, whether because it is explicitly asked to locate one or because it is needed as a step to fulfil the mission of the robot. Upon failure to guess the right container, the robot can continue making guesses until the object is found. Guesses are made based on the semantic distance between the object to find and the description of the types of the objects found in each object container. The paper provides quantitative results comparing the efficiency of the proposed method and two base approaches.


Active perception Informed search Perception-aware planning 



This work has been partially supported by the MICINN Project TIN2015-65686-C5-5-R, by the Extremaduran Government Project GR15120, by the Red de Excelencia “Red de Agentes Físicos” TIN2015-71693-REDT and by MEC project PHBP14/00083. Funding was provided by Junta de Extremadura (Ayudas Consolidación Grupos Investigación Catalogados).


  1. Aloimonos Y (1993) Active perception. Lawrence Erlbaum, HillsdaleGoogle Scholar
  2. Bissmarck F, Svensson M, Tolt G (2015) Efficient algorithms for next best view evaluation. In: IEEE/RSJ international conference on intelligent robots and systemsGoogle Scholar
  3. Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207CrossRefPubMedGoogle Scholar
  4. Canziani A, Culurciello E (2015) Visual attention with deep neural networks. In: Information sciences and systems (CISS), 2015 49th annual conference on, pp. 1–3, March 2015Google Scholar
  5. Carrasco M (2011) Visual attention: the past 25 years. Vis Res 51(13):1484–1525CrossRefPubMedPubMedCentralGoogle Scholar
  6. Connolly C (1985) The determination of next best views. In: Robotics and automation. Proceedings. 1985 IEEE international conference on, vol 2, pp 432–435. IEEEGoogle Scholar
  7. Egeth HE (1966) Parallel versus serial processes in multidimensional stimulus discrimination. Atten Percept Psychophys 1(4):245–252CrossRefGoogle Scholar
  8. Foote T (2013) TF: the transform library. In: Technologies for practical robot applications (TePRA), 2013 IEEE international conference on, open-source software workshop, pp 1–6, April 2013Google Scholar
  9. Forssén P-E, Meger D, Lai K, Helmer S, Little JJ, Lowe DG (2008) Informed visual search: combining attention and object recognition. In: Robotics and automation, 2008. icra 2008. IEEE international conference on, pp 935–942. IEEEGoogle Scholar
  10. Gutierrez MA, Banchs RE, D’Haro LF (2015) Perceptive parallel processes coordinating geometry and texture. In: Proceedings of Workshop on Multimodal Semantics for Robotic Systems 2015, Hamburg, pp 30–35Google Scholar
  11. Gutiérrez MA, Manso LJ, Pandya H, Núñez P (2017) A passive learning sensor architecture for multimodal image labeling: an application for social robots. Sensors 17(2):353CrossRefPubMedCentralGoogle Scholar
  12. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In arXiv:1512.03385
  13. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRefGoogle Scholar
  14. Lee S, Lim J, Suh IH (2015) Incremental learning from a single seed image for object detection. In: Intelligent robots and systems (IROS), 2015 IEEE/RSJ international conference on, pp 1905–1912. IEEEGoogle Scholar
  15. Manso LJ et al (2010) RoboComp: a tool-based robotics framework. In: Simulation, modeling and programming for autonomous robots, pp 251–262. SpringerGoogle Scholar
  16. Manso LJ, Bustos P, Bachiller P, Núñez P (2015) A perception-aware architecture for autonomous robots. Int J Adv Robot Syst 12(174):13Google Scholar
  17. Manso LJ, Calderita LV, Bustos P, Bandera A (2016) Use and advances in the active grammar-based modeling architecture. In: Proceedings of the workshop of physical agents, pp 1–25Google Scholar
  18. Martinez Mozos O, Chollet F, Murakami K, Morooka K, Tsuji T, Kurazume R, Hasegawa T (2012) Tracing commodities in indoor environments for service robotics. In: IFAC Proceedings Volumes, vol 45, Elsevier, pp 71–76Google Scholar
  19. Mikolov T, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systemsGoogle Scholar
  20. Milliez G, Warnier M, Clodic A, Alami R (2014) A framework for endowing an interactive robot with reasoning capabilities about perspective-taking and belief management. In: The 23rd IEEE international symposium on robot and human interactive communication, pp 1103–1109. IEEEGoogle Scholar
  21. Mnih V, Heess N, Graves A, Kavukcuoglu K (2015) Recurrent models of visual attention. In: Advances in neural information processing systems, vol 27Google Scholar
  22. Müller HJ, Krummenacher J (2006) Visual search and selective attention. Vis Cognit 14(4–8):389–410CrossRefGoogle Scholar
  23. Pillai S, Leonard J (2015) Monocular slam supported object recognition. arXiv preprint arXiv:1506.01732
  24. Quigley M et al (2009) ROS: an open-source robot operating system. In: Proc. of ICRA workshop on open source softwareGoogle Scholar
  25. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788Google Scholar
  26. Rothenstein AL, Tsotsos JK (2008) Attention links sensing to recognition. Image Vis Comput 26(1):114–126CrossRefGoogle Scholar
  27. Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3d recognition and pose using the viewpoint feature histogram. In: Intelligent robots and systems (IROS), 2010 IEEE/RSJ international conference on, pp 2155–2162. IEEEGoogle Scholar
  28. Sternberg S et al (1966) High-speed scanning in human memory. Science 153(3736):652–654CrossRefPubMedGoogle Scholar
  29. Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cognit Psychol 12(1):97–136CrossRefPubMedGoogle Scholar
  30. Tsotsos JK (2017) Attention and cognition: principles to guide modeling. In: Computational and cognitive neuroscience of vision, pp 277–295. SpringerGoogle Scholar
  31. Van der Maaten L, Hinton G (2012) Visualizing non-metric similarities in multiple maps. Mach Learn 87(1):33–55CrossRefGoogle Scholar
  32. Wallenberg M, Forssén P-E (2010) Embodied object recognition using adaptive target observations. Cognit Comput 2(4):316–325CrossRefGoogle Scholar
  33. Walther D, Rutishauser U, Koch C, Perona P (2005) Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Comput Vis Image Underst 100(1):41–63CrossRefGoogle Scholar
  34. Wolfe JM, Gray W (2007) Guided search 4.0. Integrated models of cognitive systems, pp 99–119Google Scholar
  35. Xu K, Ba J, Kiros R, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044,

Copyright information

© Marta Olivetti Belardinelli and Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Luis J. Manso
    • 1
  • Marco A. Gutierrez
    • 1
  • Pablo Bustos
    • 1
  • Pilar Bachiller
    • 1
  1. 1.RoboLab - Robotics and Artificial Vision Laboratory, Escuela Politécnica de CáceresUniversidad de ExtremaduraBadajozSpain

Personalised recommendations