Deep Learning-Based Pupil Center Detection for Fast and Accurate Eye Tracking System

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12364)


In augmented reality (AR) or virtual reality (VR) systems, eye tracking is a key technology and requires significant accuracy as well as real-time operation. Many techniques for detecting pupil centers with error range of iris radius have been developed, but few techniques have precise performance with error range of pupil radius. In addition, the conventional methods rarely guarantee real-time pupil center detection in a general-purpose computer environment due to high complexity. Thus, we propose more accurate pupil center detection by improving the representation quality of the network in charge of pupil center detection. This is realized by representation learning based on mutual information. Also, the latency of the entire system is greatly reduced by using non-local block and self-attention block with large receptive field, which makes it accomplish real-time operation. The proposed system not only shows real-time performance of 52 FPS in a general-purpose computer environment but also provides state-of-the-art accuracy in terms of fine level index of 96.71%, 99.84% and 96.38% for BioID, GI4E and Talking Face Video datasets, respectively.


Remote eye tracking Mobile applications 



This work was supported by ‘The Cross-Ministry Giga KOREA Project’ grant funded by the Korea government (MSIT) [1711093798, Development of full-3D mobile display terminal and its contents] and Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [2020-0-01389, Artificial Intelligence Convergence Research Center (Inha University)].

Supplementary material

504475_1_En_3_MOESM1_ESM.pdf (505 kb)
Supplementary material 1 (pdf 504 KB)

Supplementary material 2 (mp4 60264 KB)


  1. 1.
    Li, J., Li, S.: Eye-model-based gaze estimation by RGB-d camera. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2014)Google Scholar
  2. 2.
    Timm, F., Barth, E.: Accurate eye centre localisation by means of gradients (2011)Google Scholar
  3. 3.
    Gou, C., Yue, W., Wang, K., Wang, K., Wang, F.-Y., Ji, Q.: A joint cascaded framework for simultaneous eye detection and eye state estimation. Pattern Recogn. 67, 23–31 (2017)CrossRefGoogle Scholar
  4. 4.
    Levinshtein, A., Phung, E., Aarabi, P.: Hybrid eye center localization using cascaded regression and hand-crafted model fitting. Image Vision Comput. 71, 17–24 (2018)CrossRefGoogle Scholar
  5. 5.
    Tian, D., He, G., Wu, J., Chen, H., Jiang, Y.: An accurate eye pupil localization approach based on adaptive gradient boosting decision tree. In: 2016 Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2016)Google Scholar
  6. 6.
    Vater, S., Puente León, F.: Combining isophote and cascade classifier information for precise pupil localization. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 589–593. IEEE (2016)Google Scholar
  7. 7.
    Zhang, W., Smith, M.L., Smith, L.N., Farooq, A.: Eye center localization and gaze gesture recognition for human-computer interaction. JOSA A 33(3), 314–325 (2016)CrossRefGoogle Scholar
  8. 8.
    Kacete, A., Royan, J.,Seguier, R., Collobert, M., Soladie, C.: Real-time eye pupil localization using hough regression forest. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)Google Scholar
  9. 9.
    Ahuja, K., Banerjee, R., Nagar, S., Dey, K., Barbhuiya, F.: Eye center localization and detection using radial mapping. In: 2016 IEEE International Conference on image processing (ICIP), pp. 3121–3125. IEEE (2016)Google Scholar
  10. 10.
    Cai, H., et al.: Accurate eye center localization via hierarchical adaptive convolution. In: BMVC, p. 284 (2018)Google Scholar
  11. 11.
    Xiao, F., Huang, K., Qiu, Y., Shen, H.: Accurate iris center localization method using facial landmark, snakuscule, circle fitting and binary connected component. Multimedia Tools Appl. 77(19), 25333–25353 (2018)CrossRefGoogle Scholar
  12. 12.
    Xia, Y., Hui, Yu., Wang, F.-Y.: Accurate and robust eye center localization via fully convolutional networks. IEEE/CAA J. Automatica Sinica 6(5), 1127–1138 (2019)CrossRefGoogle Scholar
  13. 13.
    Choi, J.H., Lee, K.I., Kim, Y.C., Song, B.C.: Accurate eye pupil localization using heterogeneous CNN models. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2179–2183. IEEE (2019)Google Scholar
  14. 14.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  15. 15.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  16. 16.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)Google Scholar
  17. 17.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)Google Scholar
  18. 18.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar
  19. 19.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). Scholar
  20. 20.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  21. 21.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  22. 22.
    Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018)
  23. 23.
    Belghazi, M.I., et al.: Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018)
  24. 24.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  25. 25.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)Google Scholar
  26. 26.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a cpu real-time face detector with high accuracy. In 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9. IEEE (2017)Google Scholar
  27. 27.
    Szegedy, C., et al.: Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  28. 28.
    Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks (2018). arXiv preprint arXiv:1805.08318
  29. 29.
    King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)Google Scholar
  30. 30.
    Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)Google Scholar
  31. 31.
    Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust face detection using the hausdorff distance. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 90–95. Springer, Heidelberg (2001). Scholar
  32. 32.
    Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. 9(4), 1–20 (2013)CrossRefGoogle Scholar
  33. 33.
    Petrovska-Delacrétaz, D., et al.: The IV 2 multimodal biometric database (including iris, 2d, 3d, stereoscopic, and talking face data), and the IV 2–2007 evaluation campaign. In: 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems, pp. 1–7. IEEE (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Electrical EngineeringInha UniversityIncheonKorea

Personalised recommendations