Gaze and motion information fusion for human intention inference

  • Harish Chaandar Ravichandar
  • Avnish Kumar
  • Ashwin DaniEmail author
Regular Paper


An algorithm, named gaze-based multiple model intention estimator (G-MMIE), is presented for early prediction of the goal location (intention) of human reaching actions. The trajectories of the arm motion for reaching tasks are modeled by using an autonomous dynamical system with contracting behavior towards the goal location. To represent the dynamics of human arm reaching motion, a neural network (NN) is used. The parameters of the NN are learned under constraints derived based on contraction analysis. The constraints ensure that the trajectories of the dynamical system converge to a single equilibrium point. In order to use the motion model learned from a few demonstrations in new scenarios with multiple candidate goal locations, an interacting multiple-model (IMM) framework is used. For a given reaching motion, multiple models are obtained by translating the equilibrium point of the contracting system to different known candidate locations. Hence, each model corresponds to the reaching motion that ends at the respective candidate location. Further, since humans tend to look toward the location they are reaching for, prior probabilities of the goal locations are calculated based on the information about the human’s gaze. The posterior probabilities of the models are calculated through interacting model matched filtering. The candidate location with the highest posterior probability is chosen to be the estimate of the true goal location. Detailed quantitative evaluations of the G-MMIE algorithm on two different datasets involving 15 subjects, and comparisons with state-of-the-art intention inference algorithms are presented.


Human intention inference Information fusion Human-robot collaboration 


  1. Admoni, H., Scassellati, B.: Social eye gaze in human-robot interaction: a review. J. Hum. Robot Interact. 6(1), 25–63 (2017)CrossRefGoogle Scholar
  2. Bader, T., Vogelgesang, M., Klaus, E.: Multimodal integration of natural gaze behavior for intention recognition during object manipulation. In: International conference on Multimodal interfaces, pp. 199–206 (2009)Google Scholar
  3. Baldwin, D.A., Baird, J.A.: Discerning intentions in dynamic human action. Trends Cognit. Sci. 5(4), 171–178 (2001)CrossRefGoogle Scholar
  4. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. Wiley, New York (2001)CrossRefGoogle Scholar
  5. Bartlett, M.S., Littlewort, G., Fasel, I., Movellan, J.R.: Real time face detection and facial expression recognition: development and applications to human computer interaction. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), vol. 5, pp. 53–53 (2003)Google Scholar
  6. Dani, A.P., McCourt, M., Curtis, J.W., Mehta, S.: Information fusion in human-robot collaboration using neural network representation. In: IEEE Conference on Systems, Man, Cybernetics, pp. 2114–2120 (2014)Google Scholar
  7. Ding, H., Reißig, G., Wijaya, K., Bortot, D., Bengler, K., Stursberg, O.: Human arm motion modeling and long-term prediction for safe and efficient human-robot-interaction. In: IEEE International Conference on Robotics and Automation, pp. 5875–5880 (2011)Google Scholar
  8. Elfring, J., Van De Molengraft, R., Steinbuch, M.: Learning intentions for improved human motion prediction. Robotics Auton. Syst. 62(4), 591–602 (2014)CrossRefGoogle Scholar
  9. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  10. Flanagan, J.R., Johansson, R.S.: Action plans used in action observation. Nature 424(6950), 769–771 (2003)CrossRefGoogle Scholar
  11. Friesen, A.L., Rao, R.P.: Gaze following as goal inference: a Bayesian model. In: Annual Conference of the Cognitive Science Society, vol. 33 (2011)Google Scholar
  12. Gehrig, D., Krauthausen, P., Rybok, L., Kuehne, H., Hanebeck, U.D., Schultz, T., Stiefelhagen, R.: Combined intention, activity, and motion recognition for a humanoid household robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4819–4825 (2011)Google Scholar
  13. Goodrich, M.A., Schultz, A.C.: Human-robot interaction: a survey. Found. Trends Hum. Comput. Interact. 1(3), 203–275 (2007)CrossRefzbMATHGoogle Scholar
  14. Granstrom, K., Willett, P., Bar-Shalom, Y.: Systematic approach to IMM mixing for unequal dimension states. IEEE Trans. Aerosp. Electron. Syst. 51(4), 2975–2986 (2015)CrossRefGoogle Scholar
  15. Gredebäck, G., Falck-Ytter, T.: Eye movements during action observation. Perspect. Psychol. Sci. 10(5), 591–598 (2015)CrossRefGoogle Scholar
  16. Hart, J.W., Gleeson, B., Pan, M., Moon, A., MacLean, K., Croft, E.: Gesture, gaze, touch, and hesitation: timing cues for collaborative work. In: HRI Workshop on Timing in Human-Robot Interaction, Bielefeld (2014)Google Scholar
  17. Hayhoe, M., Ballard, D.: Eye movements in natural behavior. Trends Cognit. Sci. 9(4), 188–194 (2005)CrossRefGoogle Scholar
  18. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  19. Kleinke, C.L.: Gaze and eye contact: a research review. Psychol. Bull. 100(1), 78 (1986)CrossRefGoogle Scholar
  20. Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. Int. J. Robotics Res. 32(8), 951–970 (2013)CrossRefGoogle Scholar
  21. Kulic, D., Croft, E.A.: Affective state estimation for human-robot interaction. IEEE Trans. Robotics 23(5), 991–1000 (2007)CrossRefGoogle Scholar
  22. Li, Y., Ge, S.: Human-robot collaboration based on motion intention estimation. IEEE/ASME Trans. Mechatron. 19(3), 1007–1014 (2014)CrossRefGoogle Scholar
  23. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)Google Scholar
  24. Liu, C., Hamrick, J.B., Fisac, J.F., Dragan, A.D., Hedrick, J.K., Sastry, S.S., Griffiths, T.L.: Goal inference improves objective and perceived performance in human-robot collaboration. In: International Conference on Autonomous Agents & Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 940–948 (2016)Google Scholar
  25. Lohmiller, W., Slotine, J.J.E.: On contraction analysis for nonlinear systems. Automatica 34(6), 683–696 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  26. Luo, R., Hayne, R., Berenson, D.: Unsupervised early prediction of human reaching for human–robot collaboration in shared workspaces. Autonom. Robots 42(3), 631–648 (2017)Google Scholar
  27. MacKay, D.J.: Bayesian interpolation. Neural Comput. 4(3), 415–447 (1992)CrossRefzbMATHGoogle Scholar
  28. Mainprice, J., Hayne, R., Berenson, D.: Predicting human reaching motion in collaborative tasks using inverse optimal control and iterative re-planning. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 885–892 (2015)Google Scholar
  29. Matsumoto, Y., Heinzmann, J., Zelinsky, A.: The essential components of human-friendly robot systems. In: International Conference on Field and Service Robotics, pp. 43–51 (1999)Google Scholar
  30. Matuszek, C., Herbst, E., Zettlemoyer, L., Fox, D.: Learning to parse natural language commands to a robot control system. In: Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 88. Springer, Heidelberg (2013)Google Scholar
  31. Monfort, M., Liu, A., Ziebart, B.D.: Intent prediction and trajectory forecasting via predictive inverse linear-quadratic regulation. In: AAAI Conference on Artificial Intelligence, pp. 3672–3678 (2015)Google Scholar
  32. Morato, C., Kaipa, K.N., Zhao, B., Gupta, S.K.: Toward safe human robot collaboration by using multiple kinects based real-time human tracking. J. Comput. Inf. Sci. Eng. 14(1):011,006–1–011,006–9 (2014)Google Scholar
  33. Pérez-D’Arpino, C., Shah, J.A.: Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6175–6182 (2015)Google Scholar
  34. Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., Carey, T.: Human-computer interaction. Addison-Wesley Longman Ltd, Essex, UK (1994)Google Scholar
  35. Ravichandar, H., Dani, A.P.: Human intention inference through interacting multiple model filtering. In: IEEE Conference on Multisensor Fusion and Integration (MFI), pp. 220–225 (2015a)Google Scholar
  36. Ravichandar, H., Dani, A.P.: Learning contracting nonlinear dynamics from human demonstration for robot motion planning. In: ASME Dynamic Systems and Control Conference (DSCC) (2015b)Google Scholar
  37. Ravichandar, H., Dani, A.P.: Human intention inference using expectation-maximization algorithm with online model learning. IEEE Trans. Autom. Sci. Eng. 14(2), 855–868 (2017)CrossRefGoogle Scholar
  38. Ravichandar, H., Kumar, A., Dani, A.: Bayesian human intention inference through multiple model filtering with gaze-based priors. In: 19th International Conference on Information Fusion (FUSION), pp. 2296–2302 (2016)Google Scholar
  39. Ravichandar, H., Salehi, I., Dani, A.: Learning partially contracting dynamical systems from demonstrations. In: Proceedings of the 1st Annual Conference on Robot Learning, PMLR, vol. 78, pp. 369–378 (2017)Google Scholar
  40. Razin, Y.S., Pluckter, K., Ueda, J., Feigh, K.: Predicting task intent from surface electromyography using layered hidden markov models. IEEE Robotics Autom. Lett. 2(2), 1180–1185 (2017)CrossRefGoogle Scholar
  41. Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  42. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  43. Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cognit. Sci. 3(6), 233–242 (1999)CrossRefGoogle Scholar
  44. Simon, M.A.: Understanding human action: social explanation and the vision of social science. SUNY Press, Albany, NY (1982)Google Scholar
  45. Song, D., Kyriazis, N., Oikonomidis, I., Papazov, C., Argyros, A., Burschka, D., Kragic, D.: Predicting human intention in visual observations of hand/object interactions. In: 2013 IEEE International Conference on Robotics and Automation, IEEE, pp. 1608–1615 (2013)Google Scholar
  46. Strabala, K.W., Lee, M.K., Dragan, A.D., Forlizzi, J.L., Srinivasa, S., Cakmak, M., Micelli, V.: Towards seamless human-robot handovers. J. Hum. Robot Interact. 2(1), 112–132 (2013)CrossRefGoogle Scholar
  47. Traver, V.J., del Pobil, A.P., Perez-Francisco, M.: Making service robots human-safe. In: IEEE/RSJ International Conference on Intelligent Robots and Systems., pp. 696–701 (2000)Google Scholar
  48. Tsai, C.S., Hu, J.S., Tomizuka, M.: Ensuring safety in human-robot coexistence environment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4191–4196 (2014)Google Scholar
  49. Warrier, R.B., Devasia, S.: Inferring intent for novice human-in-the-loop iterative learning control. IEEE Trans. Control Syst. Technol. 25(5), 1698–1710 (2017)CrossRefGoogle Scholar
  50. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, IEEE, pp. 3485–3492 (2010)Google Scholar
  51. Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: International Conference on Computer Vision (ICCV), Barcelona (2011)Google Scholar
  52. Yarbus, A.L.: Eye movements during perception of complex objects. In: Eye Movements and Vision, Springer, Boston, MA, pp. 171–211 (1967)Google Scholar
  53. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems 27 (NIPS), pp. 487–495 (2014)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of ConnecticutStorrsUSA

Personalised recommendations