On the advantages of foveal mechanisms for active stereo systems in visual search tasks
- 361 Downloads
In this work we study how information provided by foveated images sampled according to the log-polar transformation can be integrated over time in order to build accurate world representations and accomplish visual search tasks in an efficient manner. We focus on a specific visual information modality depth and on how to store it in a flexible memory structure. We propose a probabilistic observational model for a stereo system that relies on the Unscented Transform in order to propagate uncertainty in stereo matching, due to spatial quantization in the retina, to the 3D Cartesian domain. Probabilistic depth measurements are integrated in a novel Sensory Ego-Sphere whose topology can be biased with foveal-like distributions, according to the autonomous agent short-term tasks and goals. Furthermore, we investigate an Upper Confidence Bound algorithm for the task of simultaneously finding the closest object to the observer (visual search) and learning the surrounding environment 3D map (mapping). The performance of task execution is assessed both with a foveated log-polar sensor and a classical uniform one. The advantage of foveal vision and custom ego-sphere representations are illustrated in a series of experiments with a realistic simulator.
KeywordsStereoscopic vision Foveal vision Active vision Sensory ego-sphere
This work has been partially supported by the Portuguese Foundation for Science and Technology (FCT) Project [UID/EEA/50009/2013]. Rui Figueiredo is funded by FCT Ph.D. Grant PD/BD/105779/2014. Helder Araújo would like to thank FCT (Portuguese Foundation for Science and Technology) grant UID-EEA-0048-2013.
- Ahmad, S., & Yu, A. J. (2013). Active sensing as bayes-optimal sequential decision making, CoRR, vol. abs/1305.6650. http://arxiv.org/abs/1305.6650.
- Audibert, J. -Y., & Bubeck, S. (2010). Best arm identification in multi-armed bandits. In COLT-23th conference on learning theory-2010 (pp. 13-p).Google Scholar
- Avelino, J. A., Figueiredo, R., Moreno, P., & Bernardino, A. (2016). On the perceptual advantages of visual suppression mechanisms for dynamic robot systems. In International conference on biologically inspired cognitive architectures (BICA).Google Scholar
- Bernardino, A., & Santos-Victor, J. (2002). A binocular stereo algorithm for log-polar foveated systems. In H. Blthoff, C. Wallraven, S. -W. Lee , & T. Poggio (Eds.), Biologically motivated computer vision, ser. Lecture notes in computer science, (Vol. 2525, pp. 127–136). Berlin: Springer.Google Scholar
- Carrasco, M. (2011). Visual attention: The past 25 years. Vision research, 51(13), 1484–1525. (vision Research 50th Anniversary Issue: Part 2). http://www.sciencedirect.com/science/article/pii/S0042698911001544.
- Colombo, C., Rucci, M., & Dario, P. (1996). Integrating selective attention and space-variant sensing in machine vision. In J. L. C. Sanz (Ed.), Image technology: Advances in image processing, multimedia and machine vision (pp. 109–127). Springer Berlin Heidelberg.Google Scholar
- Cox, D. D., John, S. (1992). Sdo: A statistical method for global optimization. In IEEE international conference on systems, man and cybernetics (pp. 1241–1246). IEEE.Google Scholar
- Crawford, L. E., Landy, D., & Presson, A. N. (2014). Bias in spatial memory: Prototypes or relational categories. In Poster presented at the 36th annual conference of the cognitive science Society, Quebec.Google Scholar
- Edelman, S. (1995). Receptive fields for vision: From hyperacuity to object recognition. http://cogprints.org/570/.
- Ferreira, J., Bessière, P., Mekhnacha, K., Lobo, J., Dias, J., & Laugier, C. (2008). Bayesian models for multimodal perception of 3D structure and motion. In International conference on cognitive systems (CogSys 2008), Karlsruhe, Germany. https://hal.archives-ouvertes.fr/hal-00338800.
- Fleming, K. A., Peters, R. A., & Bodenheimer, R. E. (2006). Image mapping and visual attention on a sensory ego-sphere. In 2006 IEEE/RSJ international conference on intelligent robots and systems, IROS 2006, Beijing, China (pp. 241–246). October 9-15, 2006. doi: 10.1109/IROS.2006.281688.
- Hirose, M., Furuhashi, H., Miyasaka, T., & Araki, K. (2002). Reconstruction of range data by means of geodesic dome type data structure. The Journal of the Institute of Image Electronics Engineers of Japan, 31(3), 388–395.Google Scholar
- Hoffman, M. D., Brochu, E., & de Freitas, N. (2011). Portfolio allocation for bayesian optimization. Citeseer.Google Scholar
- Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C., & Burgard, W. (2013). OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots. http://octomap.github.com.
- Itti, L., & Baldi, P. F. (2006). Bayesian surprise attracts human attention. In Advances in neural information processing systems (NIPS*2005) (Vol. 19, pp. 547–554). Cambridge, MA: MIT Press. su;mod;bu;td;ey.Google Scholar
- Koch, C., & Ullman, S. (1987). Shifts in selective visual attention: Towards the underlying neural circuitry. In L. M. Vaina (Ed.), Matters of intelligence: Conceptual structures in cognitive neuroscience (pp. 115–141). Dordrecht: Springer Netherlands.Google Scholar
- Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Fluids Engineering, 86(1), 97–106.Google Scholar
- Lizotte, D., Wang, T., Bowling, M., & Schuurmans, D. (2007). Automatic gait optimization with gaussian process regression. In Proceedings of the IJCAI (pp. 944–949).Google Scholar
- Mockus, J. (1974). On bayesian methods for seeking the extremum. In Proceedings of the IFIP technical conference (pp. 400–404). London, UK: Springer. http://dl.acm.org/citation.cfm?id=646296.687872.
- Moreno, P., Nunes, R., Figueiredo, R., Ferreira, R., Bernardino, A., Santos-Victor, J., Beira, R., Vargas, L., Aragão, D., & Aragão, M. (2015). Vizzy: A humanoid on wheels for assistive robotics. In Robot 2015: Second Iberian robotics conference (pp. 17–28). Springer International Publishing 2016.Google Scholar
- Pamplona, D., & Bernardino, A. (2009). Smooth foveal vision with Gaussian receptive fields. In 9th IEEE-RAS international conference on humanoid robots, humanoids 2009, Paris, France (pp. 223–229). December 7–10, 2009. http://dx.doi.org/10.1109/ICHR.2009.5379575.
- Perrollaz, M., Spalanzani, A., & Aubert, D. (2010). Probabilistic representation of the uncertainty of stereo-vision and application to obstacle detection. In Intelligent vehicles symposium (IV), 2010 IEEE (pp. 313–318). June 2010.Google Scholar
- Posner, M. (2012). Cognitive neuroscience of attention. Guilford Press. http://books.google.pt/books?id=8yjEjoS7EQsC.
- Ruesch, J., Lopes, M., Bernardino, A., Hornstein, J., Santos-Victor, J., & Pfeifer, R. (2008). Multimodal saliency-based bottom-up attention a framework for the humanoid robot ICUB. In IEEE international conference on robotics and automation, 2008. ICRA 2008 (pp. 962–967). May 2008.Google Scholar
- Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (1st ed.). Cambridge, MA: MIT Press.Google Scholar
- Vijayakumar, S., Conradt, J., Shibata, T., & Schaal, S. (2001). Overt visual attention for a humanoid robot. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, 2001 (Vol. 4, pp. 2332–2337). IEEE.Google Scholar
- von Helmholtz, H. & König, A. (1896). Handbuch der physiologischen Optik (Vol. 1). L. Voss. https://books.google.pt/books?id=Lb4KAAAAIAAJ.
- Wang, J., & Liu, Y. (2007). A closed-form solution of reconstruction from nonparallel stereo geometry used in image guided system for surgery. In N. Sebe, Y. Liu, Y. Zhuang, & T. Huang (Eds) Multimedia content analysis and mining, ser. Lecture notes in computer science (Vol. 4577, pp. 371–380). Berlin Heidelberg: Springer.Google Scholar
- Weiman, C. F. R. (1995). Binocular stereo via log-polar retinas. In SPIE, Ed.Google Scholar