Attention Estimation for Input Switch in Scalable Multi-display Environments

  • Xingyuan Bu
  • Mingtao PeiEmail author
  • Yunde Jia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9950)


Multi-Display Environments (MDEs) have become commonplace in office desks for editing and displaying different tasks, such as coding, searching, reading, and video-communicating. In this paper, we present a method of automatic switch for routing one input (including mouse/keyboard, touch pad, joystick, etc.) to different displays in scalable MDEs based on the user attention estimation. We set up an MDE in our office desk, in which each display is equipped with a webcam to capture the user’s face video for detecting if the user is looking at the display. We use Convolutional Neural Networks (CNNs) to learn the attention model from face videos with various poses, illuminations, and occlusions for achieving a high performance of attention estimation. Qualitative and quantitative experiments demonstrate the effectiveness and potential of the proposed approach. The results of the user study also shows that the participants deemed that the system is wonderful, useful, and friendly.


Multi-display environment Attention estimation Input switch Convolutional neural network 


  1. 1.
    Andrews, C., Endert, A., North, C.: Space to think: large high-resolution displays for sensemaking. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 55–64. ACM (2010)Google Scholar
  2. 2.
    Ashdown, M., Oka, K., Sato, Y.: Combining head tracking and mouse input for a gui on multiple monitors. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1188–1191. ACM (2005)Google Scholar
  3. 3.
    Lander, C., Gehring, S., Krüger, A., Boring, S., Bulling, A.: Gazeprojector: accurate gaze estimation and seamless gaze interaction across multiple displays. In: Proceedings of UIST (2015)Google Scholar
  4. 4.
    Dostal, J., Kristensson, P.O., Quigley, A.: Subtle gaze-dependent techniques for visualising display changes in multi-display environments. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 137–148. ACM (2013)Google Scholar
  5. 5.
    Das, D., Rashed, M., Kobayashi, Y., Kuno, Y., et al.: Supporting human-robot interaction based on the level of visual focus of attention. IEEE Trans. Hum.-Mach. Syst. 45(6), 664–675 (2015)CrossRefGoogle Scholar
  6. 6.
    Lu, W., Jia, Y.: An eye-tracking study of user behavior in web image search. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 170–182. Springer, Heidelberg (2014)Google Scholar
  7. 7.
    Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 478–500 (2010)CrossRefGoogle Scholar
  8. 8.
    Asteriadis, S., Karpouzis, K., Kollias, S.: Visual focus of attention in non-calibrated environments using gaze estimation. Int. J. Comput. Vis. 107(3), 293–316 (2014)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    Bulo, S., Kontschieder, P.: Neural decision forests for semantic image labelling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 81–88 (2014)Google Scholar
  12. 12.
    Xiong, C., Zhao, X., Tang, D., Jayashree, K., Yan, S., Kim, T.K.: Conditional convolutional neural network for modality-aware face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3667–3675 (2015)Google Scholar
  13. 13.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  14. 14.
    Hutchings, D.: An investigation of Fitts’ law in a multiple-display environment. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3181–3184. ACM (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Beijing Laboratory of Intelligent Information Technology, School of Computer ScienceBeijing Institute of TechnologyBeijingPeople’s Republic of China

Personalised recommendations