Gaze and motion information fusion for human intention inference


An algorithm, named gaze-based multiple model intention estimator (G-MMIE), is presented for early prediction of the goal location (intention) of human reaching actions. The trajectories of the arm motion for reaching tasks are modeled by using an autonomous dynamical system with contracting behavior towards the goal location. To represent the dynamics of human arm reaching motion, a neural network (NN) is used. The parameters of the NN are learned under constraints derived based on contraction analysis. The constraints ensure that the trajectories of the dynamical system converge to a single equilibrium point. In order to use the motion model learned from a few demonstrations in new scenarios with multiple candidate goal locations, an interacting multiple-model (IMM) framework is used. For a given reaching motion, multiple models are obtained by translating the equilibrium point of the contracting system to different known candidate locations. Hence, each model corresponds to the reaching motion that ends at the respective candidate location. Further, since humans tend to look toward the location they are reaching for, prior probabilities of the goal locations are calculated based on the information about the human’s gaze. The posterior probabilities of the models are calculated through interacting model matched filtering. The candidate location with the highest posterior probability is chosen to be the estimate of the true goal location. Detailed quantitative evaluations of the G-MMIE algorithm on two different datasets involving 15 subjects, and comparisons with state-of-the-art intention inference algorithms are presented.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. Admoni, H., Scassellati, B.: Social eye gaze in human-robot interaction: a review. J. Hum. Robot Interact. 6(1), 25–63 (2017)

    Article  Google Scholar 

  2. Bader, T., Vogelgesang, M., Klaus, E.: Multimodal integration of natural gaze behavior for intention recognition during object manipulation. In: International conference on Multimodal interfaces, pp. 199–206 (2009)

  3. Baldwin, D.A., Baird, J.A.: Discerning intentions in dynamic human action. Trends Cognit. Sci. 5(4), 171–178 (2001)

    Article  Google Scholar 

  4. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. Wiley, New York (2001)

    Google Scholar 

  5. Bartlett, M.S., Littlewort, G., Fasel, I., Movellan, J.R.: Real time face detection and facial expression recognition: development and applications to human computer interaction. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), vol. 5, pp. 53–53 (2003)

  6. Dani, A.P., McCourt, M., Curtis, J.W., Mehta, S.: Information fusion in human-robot collaboration using neural network representation. In: IEEE Conference on Systems, Man, Cybernetics, pp. 2114–2120 (2014)

  7. Ding, H., Reißig, G., Wijaya, K., Bortot, D., Bengler, K., Stursberg, O.: Human arm motion modeling and long-term prediction for safe and efficient human-robot-interaction. In: IEEE International Conference on Robotics and Automation, pp. 5875–5880 (2011)

  8. Elfring, J., Van De Molengraft, R., Steinbuch, M.: Learning intentions for improved human motion prediction. Robotics Auton. Syst. 62(4), 591–602 (2014)

    Article  Google Scholar 

  9. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  10. Flanagan, J.R., Johansson, R.S.: Action plans used in action observation. Nature 424(6950), 769–771 (2003)

    Article  Google Scholar 

  11. Friesen, A.L., Rao, R.P.: Gaze following as goal inference: a Bayesian model. In: Annual Conference of the Cognitive Science Society, vol. 33 (2011)

  12. Gehrig, D., Krauthausen, P., Rybok, L., Kuehne, H., Hanebeck, U.D., Schultz, T., Stiefelhagen, R.: Combined intention, activity, and motion recognition for a humanoid household robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4819–4825 (2011)

  13. Goodrich, M.A., Schultz, A.C.: Human-robot interaction: a survey. Found. Trends Hum. Comput. Interact. 1(3), 203–275 (2007)

    Article  MATH  Google Scholar 

  14. Granstrom, K., Willett, P., Bar-Shalom, Y.: Systematic approach to IMM mixing for unequal dimension states. IEEE Trans. Aerosp. Electron. Syst. 51(4), 2975–2986 (2015)

    Article  Google Scholar 

  15. Gredebäck, G., Falck-Ytter, T.: Eye movements during action observation. Perspect. Psychol. Sci. 10(5), 591–598 (2015)

    Article  Google Scholar 

  16. Hart, J.W., Gleeson, B., Pan, M., Moon, A., MacLean, K., Croft, E.: Gesture, gaze, touch, and hesitation: timing cues for collaborative work. In: HRI Workshop on Timing in Human-Robot Interaction, Bielefeld (2014)

  17. Hayhoe, M., Ballard, D.: Eye movements in natural behavior. Trends Cognit. Sci. 9(4), 188–194 (2005)

    Article  Google Scholar 

  18. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)

  19. Kleinke, C.L.: Gaze and eye contact: a research review. Psychol. Bull. 100(1), 78 (1986)

    Article  Google Scholar 

  20. Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. Int. J. Robotics Res. 32(8), 951–970 (2013)

    Article  Google Scholar 

  21. Kulic, D., Croft, E.A.: Affective state estimation for human-robot interaction. IEEE Trans. Robotics 23(5), 991–1000 (2007)

    Article  Google Scholar 

  22. Li, Y., Ge, S.: Human-robot collaboration based on motion intention estimation. IEEE/ASME Trans. Mechatron. 19(3), 1007–1014 (2014)

    Article  Google Scholar 

  23. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)

  24. Liu, C., Hamrick, J.B., Fisac, J.F., Dragan, A.D., Hedrick, J.K., Sastry, S.S., Griffiths, T.L.: Goal inference improves objective and perceived performance in human-robot collaboration. In: International Conference on Autonomous Agents & Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 940–948 (2016)

  25. Lohmiller, W., Slotine, J.J.E.: On contraction analysis for nonlinear systems. Automatica 34(6), 683–696 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  26. Luo, R., Hayne, R., Berenson, D.: Unsupervised early prediction of human reaching for human–robot collaboration in shared workspaces. Autonom. Robots 42(3), 631–648 (2017)

  27. MacKay, D.J.: Bayesian interpolation. Neural Comput. 4(3), 415–447 (1992)

    Article  MATH  Google Scholar 

  28. Mainprice, J., Hayne, R., Berenson, D.: Predicting human reaching motion in collaborative tasks using inverse optimal control and iterative re-planning. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 885–892 (2015)

  29. Matsumoto, Y., Heinzmann, J., Zelinsky, A.: The essential components of human-friendly robot systems. In: International Conference on Field and Service Robotics, pp. 43–51 (1999)

  30. Matuszek, C., Herbst, E., Zettlemoyer, L., Fox, D.: Learning to parse natural language commands to a robot control system. In: Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 88. Springer, Heidelberg (2013)

  31. Monfort, M., Liu, A., Ziebart, B.D.: Intent prediction and trajectory forecasting via predictive inverse linear-quadratic regulation. In: AAAI Conference on Artificial Intelligence, pp. 3672–3678 (2015)

  32. Morato, C., Kaipa, K.N., Zhao, B., Gupta, S.K.: Toward safe human robot collaboration by using multiple kinects based real-time human tracking. J. Comput. Inf. Sci. Eng. 14(1):011,006–1–011,006–9 (2014)

  33. Pérez-D’Arpino, C., Shah, J.A.: Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6175–6182 (2015)

  34. Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., Carey, T.: Human-computer interaction. Addison-Wesley Longman Ltd, Essex, UK (1994)

  35. Ravichandar, H., Dani, A.P.: Human intention inference through interacting multiple model filtering. In: IEEE Conference on Multisensor Fusion and Integration (MFI), pp. 220–225 (2015a)

  36. Ravichandar, H., Dani, A.P.: Learning contracting nonlinear dynamics from human demonstration for robot motion planning. In: ASME Dynamic Systems and Control Conference (DSCC) (2015b)

  37. Ravichandar, H., Dani, A.P.: Human intention inference using expectation-maximization algorithm with online model learning. IEEE Trans. Autom. Sci. Eng. 14(2), 855–868 (2017)

    Article  Google Scholar 

  38. Ravichandar, H., Kumar, A., Dani, A.: Bayesian human intention inference through multiple model filtering with gaze-based priors. In: 19th International Conference on Information Fusion (FUSION), pp. 2296–2302 (2016)

  39. Ravichandar, H., Salehi, I., Dani, A.: Learning partially contracting dynamical systems from demonstrations. In: Proceedings of the 1st Annual Conference on Robot Learning, PMLR, vol. 78, pp. 369–378 (2017)

  40. Razin, Y.S., Pluckter, K., Ueda, J., Feigh, K.: Predicting task intent from surface electromyography using layered hidden markov models. IEEE Robotics Autom. Lett. 2(2), 1180–1185 (2017)

    Article  Google Scholar 

  41. Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems (NIPS) (2015)

  42. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    MathSciNet  Article  Google Scholar 

  43. Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cognit. Sci. 3(6), 233–242 (1999)

    Article  Google Scholar 

  44. Simon, M.A.: Understanding human action: social explanation and the vision of social science. SUNY Press, Albany, NY (1982)

  45. Song, D., Kyriazis, N., Oikonomidis, I., Papazov, C., Argyros, A., Burschka, D., Kragic, D.: Predicting human intention in visual observations of hand/object interactions. In: 2013 IEEE International Conference on Robotics and Automation, IEEE, pp. 1608–1615 (2013)

  46. Strabala, K.W., Lee, M.K., Dragan, A.D., Forlizzi, J.L., Srinivasa, S., Cakmak, M., Micelli, V.: Towards seamless human-robot handovers. J. Hum. Robot Interact. 2(1), 112–132 (2013)

    Article  Google Scholar 

  47. Traver, V.J., del Pobil, A.P., Perez-Francisco, M.: Making service robots human-safe. In: IEEE/RSJ International Conference on Intelligent Robots and Systems., pp. 696–701 (2000)

  48. Tsai, C.S., Hu, J.S., Tomizuka, M.: Ensuring safety in human-robot coexistence environment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4191–4196 (2014)

  49. Warrier, R.B., Devasia, S.: Inferring intent for novice human-in-the-loop iterative learning control. IEEE Trans. Control Syst. Technol. 25(5), 1698–1710 (2017)

    Article  Google Scholar 

  50. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, IEEE, pp. 3485–3492 (2010)

  51. Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: International Conference on Computer Vision (ICCV), Barcelona (2011)

  52. Yarbus, A.L.: Eye movements during perception of complex objects. In: Eye Movements and Vision, Springer, Boston, MA, pp. 171–211 (1967)

  53. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems 27 (NIPS), pp. 487–495 (2014)

Download references

Author information



Corresponding author

Correspondence to Ashwin Dani.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ravichandar, H.C., Kumar, A. & Dani, A. Gaze and motion information fusion for human intention inference. Int J Intell Robot Appl 2, 136–148 (2018).

Download citation


  • Human intention inference
  • Information fusion
  • Human-robot collaboration