Abstract
This paper presents a robotic head for social robots to attend to scene saliency with bio-inspired saccadic behaviors. Scene saliency is determined by measuring low-level static scene information, motion, and object prior knowledge. Towards the extracted saliency spots, the designed robotic head is able to turn gazes in a saccadic manner while obeying eye–head coordination laws with the proposed control scheme. The results of the simulation study and actual applications show the effectiveness of the proposed method in discovering of scene saliency and human-like head motion. The proposed techniques could possibly be applied to social robots to improve social sense and user experience in human–robot interaction.
Similar content being viewed by others
References
Asfour, T., Welke, K., Azad, P., Ude, A., & Dillmann, R. (2008). The Karlsruhe humanoid head. In Proceedings of IEEE-RAS international conference on humanoid robots (pp. 447–453).
Breazeal, C. (2000). Sociable machines: Expressive social exchange between humans and robots. Ph.D. thesis, Massachusetts Institute of Technology.
Butko, N., Zhang, L., Cottrell, G., & Movellan J. (2008). Visual saliency model for robot cameras. In IEEE international conference on robotics and automation (pp. 2398–2403).
Choi, S.-B., Ban, S.-W., & Lee, M. (2004). Biologically motivated visual attention system using bottom-up saliency map and top-down inhibition. Neural Information Processing—Letters and Review, 2(1), 19–25.
Corbetta, M., & Shulman, G. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 201–215.
Crawford, J., Martinez-Trujillo, J., & Klier, E. (2003). Neural control of three-dimensional eye and head movements. Current Opinion in Neurobiology, 13(6), 655–662.
Crawford, J., & Vilis, T. (1991). Axes of eye rotation and listing’s law during rotations of the head. Journal of Neurophysiology, 65(3), 407–423.
Cui, R., Gao, B., & Guo, J. (2012). Pareto-optimal coordination of multiple robots with safety guarantees. In Autonomous Robots, 1–17. doi:10.1007/s10514-012-9302-3.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Computer Society conference on computer vision and pattern recognition.
Donders, F. (1848). Beitrag zur lehre von den bewegungen des menschlichen auges. Holland Beitr Anat Physiol Wiss, 1(104), 384.
Doretto, G., Chiuso, A., Wu, Y., & Soatto, S. (2003). Dynamic textures. International Journal of Computer Vision, 51(2), 91–109.
Gao, D., & Vasconcelos, N. (2007). Bottom-up saliency is a discriminant process. In IEEE 11th international conference on computer vision, ICCV 2007 (pp. 1–6).
Ge, S., He, H., & Zhang, Z. (2011). Bottom-up saliency detection for attention determination. Machine Vision and Applications, 24, 1–14.
Glenn, B., & Vilis, T. (1992). Violations of listing’s law after large eye and head gaze shifts. Journal of Neurophysiology, 68(1), 309–318.
Goossens, H., & Opstal, A. (1997). Human eye–head coordination in two dimensions under different sensorimotor conditions. Experimental Brain Research, 114(3), 542–560.
Guitton, D., & Volle, M. (1987). Gaze control in humans: Eye–head coordination during orienting movements to targets within and beyond the oculomotor range. Journal of Neurophysiology, 58(3), 427–459.
Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision (Vol. 2). New York: Cambridge University Press.
He, H., Ge, S., & Zhang, Z. (2011). Visual attention prediction using saliency determination of scene understanding for social robots. Special issue on towards an effective design of social robots. International Journal of Social Robotics, 3, 457–468.
He, H., Zhang, Z., & Ge, S. (2010). Attention determination for social robots using salient region detection. In International conference on social robotics (pp. 295–304). Heidelberg: Springer.
Heuring, J., & Murray, D. (1999). Modeling and copying human head movements. IEEE Transactions on Robotics and Automation, 15(6), 1095–1108.
Hwang, A. D., Higgins, E. C., & Pomplun, M. (2009). A model of top-down attentional control during visual search in complex scenes. Journal of Vision, 9(5), 25.1–25.18.
Itti, L. (2003). Realistic avatar eye and head animation using a neurobiological model of visual attention. Tech. Rep. Defense Technical Information Center Document.
Itti, L. (2005). Models of bottom-up attention and saliency. In L. Itti, G. Rees, & J. K. Tsotsos (Eds.), Neurobiology of attention (pp. 576–582). San Diego, CA: Elsevier.
Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2010). Learning to predict where humans look. In International conference on computer vision.
Kanan, C., Tong, M. H., Zhang, L., & Cottrell, G. W. (2009). Sun: Top-down saliency using natural statistics. Visual Cognition, 17(6–7), 979–1003.
Laschi, C., Asuni, G., Guglielmelli, E., Teti, G., Johansson, R., Konosu, H., et al. (2008). A bio-inspired predictive sensory-motor coordination scheme for robot reaching and preshaping. Autonomous Robots, 25(1), 85–101.
Le Meur, O., Le Callet, P., Barba, D., & Thoreau, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5), 802–817.
Lopes, M., Bernardino, A., Santos-Victor, J., Rosander, K., & von Hofsten, C. (2009). Biomimetic eye-neck coordination. In Proceedings of IEEE international conference on development and learning (pp. 1–8).
Maini, E., Manfredi, L., Laschi, C., & Dario, P. (2008). Bioinspired velocity control of fast gaze shifts on a robotic anthropomorphic head. Autonomous Robots, 25(1), 37–58.
Medendorp, W., Van Gisbergen, J., Horstink, M., & Gielen, C. (1999). Donders’ law in torticollis. Journal of Neurophysiology, 82(5), 2833.
Milanese, R., Wechsler, H., Gill, S., Bost, J.-M., & Pun, T. (1994). Integration of bottom-up and top-down cues for visual attention using non-linear relaxation. In IEEE Computer Society conference on computer vision and pattern recognition, Proceedings CVPR’94 (pp. 781–785).
Morel, J., & Yu, G. (2009). Asift: A new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences, 2(2), 438–469.
Nagai, Y., Hosoda, K., Morita, A., & Asada, M. (2003). A constructive model for the development of joint attention. Connection Science, 15(4), 211–229.
Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimizing detection speed. In 2006 IEEE Computer Society conference on computer vision and pattern recognition (Vol. 2, pp. 2049–2056).
Oliva, A., Torralba, A., Castelhano, M. S., & Henderson, J. M. (2003). Top-down control of visual attention in object detection. In Proceedings of 2003 IEEE international conference on image processing, ICIP 2003 (Vol. 1, pp. 1–253).
Pagel, M., Maël, E., & Von Der Malsburg, C. (1998). Self calibration of the fixation movement of a stereo camera head. Autonomous Robots, 5(3), 355–367.
Raphan, T. (1998). Modeling control of eye orientation in three dimensions. I. Role of muscle pulleys in determining saccadic trajectory. Journal of Neurophysiology, 79(5), 2653.
Seo, H. J., & Milanfar, P. (2009). Nonparametric bottom-up saliency detection by self-resemblance. In IEEE computer society conference on computer vision and pattern recognition workshops. CVPR Workshops 2009 (pp. 45–52).
Smith, R. (2007). An overview of the tesseract ocr engine. In Proceedings of the ninth international conference on document analysis and recognition.
Tsagarakis, N., Metta, G., Sandini, G., Vernon, D., Beira, R., Becchi, F., et al. (2007). Icub: the design and realization of an open humanoid platform for cognitive and neuroscience research. Advanced Robotics, 21(10), 1151–1175.
Tweed, D. (1997). Three-dimensional model of the human eye–head saccadic system. Journal of Neurophysiology, 77(2), 654.
Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Westheimer, G. (1957). Kinematics of the eye. Journal of the Optical Society of America, 47, 967–974.
Acknowledgments
The research is partially funded by Singapore National Research Foundation, Interactive Digital Media R&D Program, under research grant R-705-000-017-279, and the National Basic Research Program of China (973 Program) under Grant 2011CB707005.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Computation of linear projections using corresponding points
Given one corresponding position, the projective transformation maps is
To solve the optimal linear projection, the linear projection (28) can be rewritten in the matrix form Hartley and Zisserman (2000),
where
Assuming that corresponding points in both images can be identified and correspondence can be approximated with linear maps within a small movement of the camera, the projection matrix can be computed by
where \(V_{ij}=[\mathbf v _{1}^{x},\mathbf v _{1}^{y}, \mathbf v _{2}^{x},\mathbf v _{2}^{y},\ldots , \mathbf v _{k}^{x},\mathbf v _{k}^{y}]\) with \(k\) corresponding points between the two images.
1.2 Quaternion element projection
Property 1
(Quaternion element projection) Let a quaternion be \(x=o\mathsf 1 +a\mathsf i +b\mathsf j +c\mathsf k \) with \(1\), \(\mathsf i \), \(\mathsf j \) and \(\mathsf k \) as the element basis. If \(a=0\), then \(x*\mathsf i =\mathsf i *x^{+}\) where \(x^{+}\) is the conjugate of \(x\). The rest for \(b\) and \(c\) may be deduced by analogy.
Proof
The proof is straightforward by expanding the left and right sides of \(x*\mathsf i =\mathsf i *x^{+}\) with the quaternion definition. \(\square \)
Rights and permissions
About this article
Cite this article
He, H., Ge, S.S. & Zhang, Z. A saliency-driven robotic head with bio-inspired saccadic behaviors for social robotics. Auton Robot 36, 225–240 (2014). https://doi.org/10.1007/s10514-013-9346-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-013-9346-z