Abstract
Estimating the focus of attention of a person highly depends on her/his gaze directionality. Here, we propose a new method for estimating visual focus of attention using head rotation, as well as fuzzy fusion of head rotation and eye gaze estimates, in a fully automatic manner, without the need for any special hardware or a priori knowledge regarding the user, the environment or the setup. Instead, we propose a system aimed at functioning under unpretending conditions, only with the usage of simple hardware, like a normal web-camera. Our system is aimed at functioning in a human-computer interaction environment, considering a person is facing a monitor with a camera adjusted on top. To this aim, we propose in this paper two novel techniques, based on local and appearance information, estimating head rotation, and we adaptively fuse them in a common framework. The system is able to recognize head rotational movement, under translational movements of the user towards any direction, without any knowledge or a-priori estimate of the user’s distance from the camera or camera intrinsic parameters.
Similar content being viewed by others
Notes
Here, the word ’common’ is used to distinguish from different types of web-cameras, such as wide or narrow angle, or infrared.
Here, saturation is used, although different color channels (or combinations) can be used
RMS is also calculated, here, as a stricter criterion than the mean absolute error (MAE), since it ’punishes’ large errors
As, in HPEG dataset, no depth information is given, here, we approximated distance from camera through the area formed by LEDs positions when the subject is facing frontally
The HPEG dataset is freely available at http://www.image.ece.ntua.gr/~stiast/
References
Aggarwal, G., Veeraraghavan, A., & Chellappa, R. (2005). 3D facial pose tracking in uncalibrated videos. In textitProceedings of the International Conference on Pattern Recognition and Machine Intelligence (PReMI) (pp. 515–520).
Ahlberg, J. (2001). An active model for facial feature tracking. EURASIP Journal on Applied Signal processing, 2002, 566–571.
Asteriadis, S., Nikolaidis, N., Pitas, I., & Pardàs, M. (2007). Detection of facial characteristics based on edge information. In textitProceedings of the Second International Conference on Computer Vision Theory and Applications (VISAPP) (vol 2, pp. 247–252). Barcelona, Spain.
Asteriadis, S., Nikolaidis, N., & Pitas, I. (2009a). Facial feature detection using distance vector fields. Pattern Recognition, 42(7), 1388–1398.
Asteriadis, S., Soufleros, D., Karpouzis, K., & Kollias, S. (2009b). A natural head pose and eye gaze dataset. In Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, November 2–6, Boston, MA.
Asteriadis, S., Tzouveli, P., Karpouzis, K., & Kollias, S. (2009c). Estimation of behavioral user state based on eye gaze and head pose: Application in an e-learning environment. Multimedia Tools and Applications, 41(3), 469–493.
Asteriadis, S., Karpouzis, K., & Kollias, S. D. (2011). Robust validation of visual focus of attention using adaptive fusion of head and eye gaze patterns. In: ICCV Workshops (pp. 414–421).
Ba, S. O., & Odobez, J. M. (2011). Multiperson visual focus of attention from head pose and meeting contextual cues. IEEE Transactions Pattern Analysis Machine Intelligence, 33(1), 101–116.
Begley, S., Mallon, J., & Whelan, P. F. (2008). Removing pose from face images. In: International Symposium on Visual Computing (pp. 692–702).
Cascia, M. L., Sclaroff, S., & Athitsos, V. (2000). Fast, reliable head tracking under varying illumination: An approach based on robust registration of texture-mapped 3d models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 322–336.
Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2(3), 267–278.
Cootes, T., Walker, K., & Taylor, C. (2000). View-based active appearance models. In: Fourth IEEE International Conference on Automatic Face and Gesture Recognition (pp. 227–232).
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.
Dornaika, F., & Davoine, F. (2008). Simultaneous facial action tracking and expression recognition in the presence of head motion. International Journal of Computer Vision, 76, 257–281.
Fathi, A., & Mori, G. (2007). Human pose estimation using motion exemplars. In: Proceedings of IEEE International Conference on Computer Vision (pp. 1–8).
Gee, A., & Cipolla, R. (1994a). Determining the gaze of faces in images. Image and Vision Computing, 12, 639–647.
Gee, A., & Cipolla, R. (1994b). Non-intrusive gaze tracking for human-computer interaction. Proceedings of the International Conference on Mechatronics and Machine Vision in Practice (pp. 112–117). Australia: Toowoomba.
Gourier, N., Hall, D., & Crowley, J. (2004). Estimating face orientation from robust detection of salient facial features. In International Workshop on Visual Observation of Deictic Gestures (ICPR). Cambridge.
Haralick, R. M., & Shapiro, L. G. (1992). Computer and robot vision. Reading, MA: Addison-Wesley.
Horprasert, T., Yacoob, Y., & Davis, L. S. (1996). Computing 3-d head orientation from a monocular image sequence. pp. 242–247.
Horvitz, E., Breese, J. S., & Henrion, M. (1988). Decision theory in expert systems and artificial intelligence. International Journal of Approximate Reasoning, 2(3), 247–302.
Jang, J. S. R. (1993). ANFIS adaptive-network-based Fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics, 23, 665–684.
Jensen, F. V. (1996). An introduction to Bayesian networks. New York: Springer.
Jesorsky, O., Kirchberg, K., & Frischholz, R. (2001). Robust face detection using the Hausdorff distance. Lecture Notes in Computer Science (pp. 90–95).
Ji, Q., & Yang, X. (2002). Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging, 8(5), 357–377.
Kourkoutis, L., Panoulas, K., & Hadjileontiadis, L. (2007). Automated iris and Gaze detection using chrominance: Application to human-computer interaction using a low resolution webcam. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence. IEEE Computer Society (vol 01, pp. 536–539).
Kovac, J., Peer, P., & Solina, F. (2003). Human skin colour clustering for face detection. In IEEE International Conference on Computer as a Tool (vol 2).
LeCun, Y. (1989). Generalization and network design strategies. Conectionism in perspective.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems (pp. 396–404). San Mateo, CA: Morgan Kaufmann.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE (vol 86, pp. 2278–2324).
LeCun, Y., Bottou, L., Orr, G., & Muller, K. (1998b). Neural networks: Tricks of the trade. In K. Muller (Ed.), Efficient backprop. New York: Springer.
Lefevre, S., & Odobez, J. M. (2009). Structure and appearance features for robust 3d facial actions tracking. In International Conference on Multimedia Computing and Systems/International Conference on Multimedia and Expo (pp. 298–301).
Liu, F., Lin, X., Li, S. Z., & Shi, Y. (2003). Multi-modal face tracking using bayesian network. In Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures.
Ma, B., Shan, S., Chen, X., & Gao, W. (2008). Head yaw estimation from asymmetry of facial appearance. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38(6), 1501–1512.
Magee, J. J., Betke, M., Gips, J., Scott, M. R., & Waber, B. N. (2008). A human-computer interface using symmetry between eyes to detect gaze direction. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 38(6), 1–1261.
Messer, K., Kittler, J., Sadeghi, M., Marcel, S., Marcel, C., Bengio, S., Cardinaux, F., Czyz, J., Srisuk, S., Petrou, M., Kurutach, W., Kadyrov, E., Kepenekci, B., Tek, F. B., Akar, G. B., & Deravi, F. (2003). Face verification competition on the xm2vts database. In Proceedings of the International Conference on Audio and Video Based Biometric Person Authentication (pp. 964–974).
Morency, L. P., Rahimi, A., & Darrell, T. (2003). Adaptive view-based appearance model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 803–810).
Morency, L. P., Whitehill, J., & Movellan, J. R. (2010). Monocular head pose estimation using generalized adaptive view-based appearance model. Image Vision Computing, 28(5), 754–761.
Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.
Murphy-Chutorian, E., Doshi, A., & Trivedi, M. M. (2007). Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation. In Proceedings of the IEEE Conference on Intelligent Transportation Systems (pp. 709–714).
Nguyen, M. H., Modrego Pardo, P. J., & la Torre, F. D. (2008). Facial feature detection with optimal pixel reduction SVM. In Proceedings of the eighth IEEE International Conference on Automatic Face and Gesture Recognition (pp. 1–6).
Osadchy, M., LeCun, Y., & Miller, M. (2007). Synergistic face detection and pose estimation with energy-based models. Journal of Machine Learning Research, 8, 1197–1215.
Peters, C., Asteriadis, S., & Karpouzis, K. (2009). Investigating shared attention with a virtual agent using a gaze-based interface. Journal on Multimodal User Interfaces, 3(1–2), 119–130. doi:101007/s12193-009-0029-1.
Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics (pp. 352–360).
Shaker, N., Asteriadis, S., Yannakakis, Y., & Karpouzis, K. (2011). A game-based corpus for analysing the interplay between game context and player experience. EmoGames workshop, International Conference on Affective Computing and Intelligent Interaction (ACII2011) (pp. 547–556), October 9, Memphis, TN.
Sim, T., Baker, S., & Bsat, M. (2003). The cmu pose, illumination, and expression database. IEEE Transactions Pattern Analysis Machine Intelligence, 25(12), 1615–1618.
Stiefelhagen, R. (2004). Estimating head pose with neural networks: Results on the pointing04 ICPR workshop evaluation data. In Pointing 04 Workshop (ICPR), Cambridge.
Sung, J., Kanade, T., & Kim, D. (2008). Pose robust face tracking by combining active appearance models and cylinder head models. International Journal of Computer Vision, 80(2), 260–274.
Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modelling and control. IEEE Transactions on Systems, Man, and Cybernetics, 15(1), 116–132.
Tan, K., Kriegman, D., & Ahuja, N. (2002). Appearance-based eye gaze estimation. In IEEE Workshop on Applications of Computer Vision (pp. 191–195).
Toyama, K., & Horvitz, E. (2000). Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proceedings of 4th Asian Conference on Computer Vision (ACCV).
Valenti, R., Sebe, N., & Gevers, T. (2012). Combining head pose and eye location information for gaze estimation. IEEE Transactions on Image Processing, 21(2), 802–815.
Viola, P. A., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the Conference on Computer Vision and Pattern Recognition (vol 1, pp. 511–518).
Voit, M., & Stiefelhagen, R. (2010). 3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios. In Proceedings of ICMI-MLMI.
Wang, J. G. (2003). Eye Gaze estimation from a single image of one eye. In IEEE International conference on Computer Vision (pp. 136–143).
Weidenbacher, U., Layher, G., Bayerl, P., & Neumann, H. (2006). Detection of head pose and Gaze direction for human-computer interaction. Perception and Interactive Technologies, 4021, 9–19.
Xiao, J., & Cohn, J. F. (2003). Robust full-motion recovery of head by dynamic templates and re-registration techniques. International Journal of Imaging Systems and Technology, 13, 85–94.
Acknowledgments
This research was supported by the FP7 ICT European project SIREN (project no: 258453). We would also like to thank all participants of the HPEG dataset for their helpful participation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Asteriadis, S., Karpouzis, K. & Kollias, S. Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation. Int J Comput Vis 107, 293–316 (2014). https://doi.org/10.1007/s11263-013-0691-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0691-3