Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation

Asteriadis, Stylianos; Karpouzis, Kostas; Kollias, Stefanos

doi:10.1007/s11263-013-0691-3

Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation

Published: 10 December 2013

Volume 107, pages 293–316, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Stylianos Asteriadis¹,
Kostas Karpouzis¹ &
Stefanos Kollias¹

1229 Accesses
39 Citations
9 Altmetric
Explore all metrics

Abstract

Estimating the focus of attention of a person highly depends on her/his gaze directionality. Here, we propose a new method for estimating visual focus of attention using head rotation, as well as fuzzy fusion of head rotation and eye gaze estimates, in a fully automatic manner, without the need for any special hardware or a priori knowledge regarding the user, the environment or the setup. Instead, we propose a system aimed at functioning under unpretending conditions, only with the usage of simple hardware, like a normal web-camera. Our system is aimed at functioning in a human-computer interaction environment, considering a person is facing a monitor with a camera adjusted on top. To this aim, we propose in this paper two novel techniques, based on local and appearance information, estimating head rotation, and we adaptively fuse them in a common framework. The system is able to recognize head rotational movement, under translational movements of the user towards any direction, without any knowledge or a-priori estimate of the user’s distance from the camera or camera intrinsic parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Gaze Tracking With An Active Multi-Camera System

Real-Time Gaze Estimation Using Monocular Vision

Real-Time Gaze Estimation Using a Kinect and a HD Webcam

Notes

Here, the word ’common’ is used to distinguish from different types of web-cameras, such as wide or narrow angle, or infrared.
Here, saturation is used, although different color channels (or combinations) can be used
\(E'_{0}\) and \(M'_{0}\) are the coordinates of \(E_{0}\) and \(M_{0}\), translated on the second frame so that \(C_{0}\) coincides with \(C_{1}\). This has been done, in order for a visual explanation of Eqs. 8 and 9 to be given.
RMS is also calculated, here, as a stricter criterion than the mean absolute error (MAE), since it ’punishes’ large errors
As, in HPEG dataset, no depth information is given, here, we approximated distance from camera through the area formed by LEDs positions when the subject is facing frontally
The HPEG dataset is freely available at http://www.image.ece.ntua.gr/~stiast/

References

Aggarwal, G., Veeraraghavan, A., & Chellappa, R. (2005). 3D facial pose tracking in uncalibrated videos. In textitProceedings of the International Conference on Pattern Recognition and Machine Intelligence (PReMI) (pp. 515–520).
Ahlberg, J. (2001). An active model for facial feature tracking. EURASIP Journal on Applied Signal processing, 2002, 566–571.
Article Google Scholar
Asteriadis, S., Nikolaidis, N., Pitas, I., & Pardàs, M. (2007). Detection of facial characteristics based on edge information. In textitProceedings of the Second International Conference on Computer Vision Theory and Applications (VISAPP) (vol 2, pp. 247–252). Barcelona, Spain.
Asteriadis, S., Nikolaidis, N., & Pitas, I. (2009a). Facial feature detection using distance vector fields. Pattern Recognition, 42(7), 1388–1398.
Article MATH Google Scholar
Asteriadis, S., Soufleros, D., Karpouzis, K., & Kollias, S. (2009b). A natural head pose and eye gaze dataset. In Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, November 2–6, Boston, MA.
Asteriadis, S., Tzouveli, P., Karpouzis, K., & Kollias, S. (2009c). Estimation of behavioral user state based on eye gaze and head pose: Application in an e-learning environment. Multimedia Tools and Applications, 41(3), 469–493.
Article Google Scholar
Asteriadis, S., Karpouzis, K., & Kollias, S. D. (2011). Robust validation of visual focus of attention using adaptive fusion of head and eye gaze patterns. In: ICCV Workshops (pp. 414–421).
Ba, S. O., & Odobez, J. M. (2011). Multiperson visual focus of attention from head pose and meeting contextual cues. IEEE Transactions Pattern Analysis Machine Intelligence, 33(1), 101–116.
Article Google Scholar
Begley, S., Mallon, J., & Whelan, P. F. (2008). Removing pose from face images. In: International Symposium on Visual Computing (pp. 692–702).
Cascia, M. L., Sclaroff, S., & Athitsos, V. (2000). Fast, reliable head tracking under varying illumination: An approach based on robust registration of texture-mapped 3d models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 322–336.
Article Google Scholar
Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2(3), 267–278.
Google Scholar
Cootes, T., Walker, K., & Taylor, C. (2000). View-based active appearance models. In: Fourth IEEE International Conference on Automatic Face and Gesture Recognition (pp. 227–232).
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.
Article Google Scholar
Dornaika, F., & Davoine, F. (2008). Simultaneous facial action tracking and expression recognition in the presence of head motion. International Journal of Computer Vision, 76, 257–281.
Article Google Scholar
Fathi, A., & Mori, G. (2007). Human pose estimation using motion exemplars. In: Proceedings of IEEE International Conference on Computer Vision (pp. 1–8).
Gee, A., & Cipolla, R. (1994a). Determining the gaze of faces in images. Image and Vision Computing, 12, 639–647.
Article Google Scholar
Gee, A., & Cipolla, R. (1994b). Non-intrusive gaze tracking for human-computer interaction. Proceedings of the International Conference on Mechatronics and Machine Vision in Practice (pp. 112–117). Australia: Toowoomba.
Google Scholar
Gourier, N., Hall, D., & Crowley, J. (2004). Estimating face orientation from robust detection of salient facial features. In International Workshop on Visual Observation of Deictic Gestures (ICPR). Cambridge.
Haralick, R. M., & Shapiro, L. G. (1992). Computer and robot vision. Reading, MA: Addison-Wesley.
Google Scholar
Horprasert, T., Yacoob, Y., & Davis, L. S. (1996). Computing 3-d head orientation from a monocular image sequence. pp. 242–247.
Horvitz, E., Breese, J. S., & Henrion, M. (1988). Decision theory in expert systems and artificial intelligence. International Journal of Approximate Reasoning, 2(3), 247–302.
Article Google Scholar
Jang, J. S. R. (1993). ANFIS adaptive-network-based Fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics, 23, 665–684.
Article Google Scholar
Jensen, F. V. (1996). An introduction to Bayesian networks. New York: Springer.
Google Scholar
Jesorsky, O., Kirchberg, K., & Frischholz, R. (2001). Robust face detection using the Hausdorff distance. Lecture Notes in Computer Science (pp. 90–95).
Ji, Q., & Yang, X. (2002). Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging, 8(5), 357–377.
Article MATH MathSciNet Google Scholar
Kourkoutis, L., Panoulas, K., & Hadjileontiadis, L. (2007). Automated iris and Gaze detection using chrominance: Application to human-computer interaction using a low resolution webcam. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence. IEEE Computer Society (vol 01, pp. 536–539).
Kovac, J., Peer, P., & Solina, F. (2003). Human skin colour clustering for face detection. In IEEE International Conference on Computer as a Tool (vol 2).
LeCun, Y. (1989). Generalization and network design strategies. Conectionism in perspective.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems (pp. 396–404). San Mateo, CA: Morgan Kaufmann.
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE (vol 86, pp. 2278–2324).
LeCun, Y., Bottou, L., Orr, G., & Muller, K. (1998b). Neural networks: Tricks of the trade. In K. Muller (Ed.), Efficient backprop. New York: Springer.
Google Scholar
Lefevre, S., & Odobez, J. M. (2009). Structure and appearance features for robust 3d facial actions tracking. In International Conference on Multimedia Computing and Systems/International Conference on Multimedia and Expo (pp. 298–301).
Liu, F., Lin, X., Li, S. Z., & Shi, Y. (2003). Multi-modal face tracking using bayesian network. In Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures.
Ma, B., Shan, S., Chen, X., & Gao, W. (2008). Head yaw estimation from asymmetry of facial appearance. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38(6), 1501–1512.
Article Google Scholar
Magee, J. J., Betke, M., Gips, J., Scott, M. R., & Waber, B. N. (2008). A human-computer interface using symmetry between eyes to detect gaze direction. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 38(6), 1–1261.
Article Google Scholar
Messer, K., Kittler, J., Sadeghi, M., Marcel, S., Marcel, C., Bengio, S., Cardinaux, F., Czyz, J., Srisuk, S., Petrou, M., Kurutach, W., Kadyrov, E., Kepenekci, B., Tek, F. B., Akar, G. B., & Deravi, F. (2003). Face verification competition on the xm2vts database. In Proceedings of the International Conference on Audio and Video Based Biometric Person Authentication (pp. 964–974).
Morency, L. P., Rahimi, A., & Darrell, T. (2003). Adaptive view-based appearance model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 803–810).
Morency, L. P., Whitehill, J., & Movellan, J. R. (2010). Monocular head pose estimation using generalized adaptive view-based appearance model. Image Vision Computing, 28(5), 754–761.
Article Google Scholar
Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.
Article Google Scholar
Murphy-Chutorian, E., Doshi, A., & Trivedi, M. M. (2007). Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation. In Proceedings of the IEEE Conference on Intelligent Transportation Systems (pp. 709–714).
Nguyen, M. H., Modrego Pardo, P. J., & la Torre, F. D. (2008). Facial feature detection with optimal pixel reduction SVM. In Proceedings of the eighth IEEE International Conference on Automatic Face and Gesture Recognition (pp. 1–6).
Osadchy, M., LeCun, Y., & Miller, M. (2007). Synergistic face detection and pose estimation with energy-based models. Journal of Machine Learning Research, 8, 1197–1215.
Google Scholar
Peters, C., Asteriadis, S., & Karpouzis, K. (2009). Investigating shared attention with a virtual agent using a gaze-based interface. Journal on Multimodal User Interfaces, 3(1–2), 119–130. doi:101007/s12193-009-0029-1.
Google Scholar
Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics (pp. 352–360).
Shaker, N., Asteriadis, S., Yannakakis, Y., & Karpouzis, K. (2011). A game-based corpus for analysing the interplay between game context and player experience. EmoGames workshop, International Conference on Affective Computing and Intelligent Interaction (ACII2011) (pp. 547–556), October 9, Memphis, TN.
Sim, T., Baker, S., & Bsat, M. (2003). The cmu pose, illumination, and expression database. IEEE Transactions Pattern Analysis Machine Intelligence, 25(12), 1615–1618.
Article Google Scholar
Stiefelhagen, R. (2004). Estimating head pose with neural networks: Results on the pointing04 ICPR workshop evaluation data. In Pointing 04 Workshop (ICPR), Cambridge.
Sung, J., Kanade, T., & Kim, D. (2008). Pose robust face tracking by combining active appearance models and cylinder head models. International Journal of Computer Vision, 80(2), 260–274.
Article Google Scholar
Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modelling and control. IEEE Transactions on Systems, Man, and Cybernetics, 15(1), 116–132.
Article MATH Google Scholar
Tan, K., Kriegman, D., & Ahuja, N. (2002). Appearance-based eye gaze estimation. In IEEE Workshop on Applications of Computer Vision (pp. 191–195).
Toyama, K., & Horvitz, E. (2000). Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proceedings of 4th Asian Conference on Computer Vision (ACCV).
Valenti, R., Sebe, N., & Gevers, T. (2012). Combining head pose and eye location information for gaze estimation. IEEE Transactions on Image Processing, 21(2), 802–815.
Article MathSciNet Google Scholar
Viola, P. A., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the Conference on Computer Vision and Pattern Recognition (vol 1, pp. 511–518).
Voit, M., & Stiefelhagen, R. (2010). 3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios. In Proceedings of ICMI-MLMI.
Wang, J. G. (2003). Eye Gaze estimation from a single image of one eye. In IEEE International conference on Computer Vision (pp. 136–143).
Weidenbacher, U., Layher, G., Bayerl, P., & Neumann, H. (2006). Detection of head pose and Gaze direction for human-computer interaction. Perception and Interactive Technologies, 4021, 9–19.
Article Google Scholar
Xiao, J., & Cohn, J. F. (2003). Robust full-motion recovery of head by dynamic templates and re-registration techniques. International Journal of Imaging Systems and Technology, 13, 85–94.
Article Google Scholar

Download references

Acknowledgments

This research was supported by the FP7 ICT European project SIREN (project no: 258453). We would also like to thank all participants of the HPEG dataset for their helpful participation.

Author information

Authors and Affiliations

Image, Video and Multimedia Systems Lab, National Technical University of Athens, 9 Iroon Polytechniou str, 157 80 , Athens, Greece
Stylianos Asteriadis, Kostas Karpouzis & Stefanos Kollias

Authors

Stylianos Asteriadis
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Karpouzis
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Kollias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stylianos Asteriadis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asteriadis, S., Karpouzis, K. & Kollias, S. Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation. Int J Comput Vis 107, 293–316 (2014). https://doi.org/10.1007/s11263-013-0691-3

Download citation

Received: 24 May 2012
Accepted: 02 December 2013
Published: 10 December 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11263-013-0691-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation

Abstract

Access this article

Similar content being viewed by others

Human Gaze Tracking With An Active Multi-Camera System

Real-Time Gaze Estimation Using Monocular Vision

Real-Time Gaze Estimation Using a Kinect and a HD Webcam

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation

Abstract

Access this article

Similar content being viewed by others

Human Gaze Tracking With An Active Multi-Camera System

Real-Time Gaze Estimation Using Monocular Vision

Real-Time Gaze Estimation Using a Kinect and a HD Webcam

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation