Skip to main content

Learning Legible Motion from Human–Robot Interactions

Abstract

In collaborative tasks, displaying legible behavior enables other members of the team to anticipate intentions and to thus coordinate their actions accordingly. Behavior is therefore considered to be legible when an observer is able to quickly and correctly infer the intention of the agent generating the behavior. In previous work, legible robot behavior has been generated by using model-based methods to optimize task-specific models of legibility. In our work, we rather use model-free reinforcement learning with a generic, task-independent cost function. In the context of experiments involving a joint task between (thirty) human subjects and a humanoid robot, we show that: (1) legible behavior arises when rewarding the efficiency of joint task completion during human–robot interactions (2) behavior that has been optimized for one subject is also more legible for other subjects (3) the universal legibility of behavior is influenced by the choice of the policy representation.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    The experiment in Sect. 3 was previously reported [21]. Those in Sects. 4 and 5 are novel.

References

  1. 1.

    Alami R, Clodic A, Montreuil Vincent, Sisbot Emrah Akin (2006) Toward human-aware robot task planning. In: AAAI spring symposium. To boldly go where no human–robot team has gone before, pp 39–46

  2. 2.

    Becchio C, Manera V, Sartori L, Cavallo A, Castiello U (2012) Grasping intentions: from thought experiments to empirical evidence. Front Hum Neurosci 6

  3. 3.

    Cakmak M, Srinivasa SS, Lee MK, Kiesler S, Forlizzi J (2011) Using spatial and temporal contrast for fluent robot–human hand-overs. In Proceedings of the 6th international conference on human–robot interaction, ACM, pp 489–496

  4. 4.

    Craig JJ (2005) Introduction to robotics: mechanics and control, 3rd edn. Prentice Hall, New Jersey

    Google Scholar 

  5. 5.

    Dragan A, Holladay R, Srinivasa S (2014) An analysis of deceptive robot motion. In: Robotics science and systems (July 2014)

  6. 6.

    Dragan A, Srinivasa S (2013) Generating legible motion. In: Robotics science and systems (June 2013)

  7. 7.

    Dragan AD, Lee KCT, Srinivasa, SS (2013) Legibility and predictability of robot motion. In: 8th ACM/IEEE international conference on 2013 human–robot interaction (HRI), IEEE, pp 301–308

  8. 8.

    Flash T, Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model. J Neurosci 5(7):1688–1703

    Google Scholar 

  9. 9.

    Glasauer S, Huber M, Basili P, Knoll A, Brandt T (2010) Interacting in time and space: Investigating human-human and human–robot joint action. In: IEEE international workshop on robot and human interactive communication

  10. 10.

    Ijspeert A, Nakanishi J, Pastor P, Hoffmann H, Schaal S (2013) Dynamical movement primitives: learning attractor models for motor behaviors. Neural Comput 25(2):328–373

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1):171–203

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Lee MK, Forlizzi J, Kiesler S, Cakmak M, Srinivasa S (2011) Predictability or adaptivity?: designing robot handoffs modeled from trained dogs and people. In: Proceedings of the 6th international conference on human–robot interaction. ACM, pp 179–180

  13. 13.

    Lichtenthäler C, Lorenzy T, Kirsch A (2012) Influence of legibility on perceived safety in a virtual human–robot path crossing task. In: Proceedings of the IEEE international workshop on robot and human interactive communication, pp 676–681

  14. 14.

    Mainprice J, Sisbot EA, Siméon T, Alami R (2010) Planning safe and legible hand-over motions for human–robot interaction, In: IARP workshop on technical

  15. 15.

    Pagello E, D’Angelo A, Montesello F, Garelli F, Ferrari C (1999) Cooperative behaviors in multi-robot systems through implicit communication. Robot AutonSyst 29(1):65–77

    Google Scholar 

  16. 16.

    Sartori L, Becchio C, Castiello U (2011) Cues to intention: the role of movement information. Cognition 119(2):242–252

    Article  Google Scholar 

  17. 17.

    Sebanz N, Bekkering H, Knoblich G (2006) Joint action: bodies and minds moving together. Trends Cogn Sci 10(2):70–76

    Article  Google Scholar 

  18. 18.

    Strabala K, Lee MK, Dragan A, Forlizzi J, Srinivasa SS (2012) Learning the communication of intent prior to physical collaboration. In: RO-MAN, 2012 IEEE. IEEE, pp 968–973

  19. 19.

    Strabala KW, Lee MK, Dragan AD, Forlizzi JL, Srinivasa S, Cakmak M, Micelli V (2013) Towards seamless human-robot handovers. J Hum Robot Interact 2(1):112–132

    Article  Google Scholar 

  20. 20.

    Stulp F, Isik M, Beetz M (2006) Implicit coordination in robotic teams using learned prediction models. In: Robotics and automation, 2006. Proceedings 2006 IEEE International Conference on ICRA 2006. IEEE, pp 1330–1335

  21. 21.

    Stulp F, Grizou J, Busch B, Lopes M (2015) Facilitating intention prediction for humans by optimizing robot motions. In: International conference on intelligent robots and systems (IROS)

  22. 22.

    Stulp F, Herlant L, Hoarau A, Raiola G (2014) Simultaneous on-line discovery and improvement of robotic skill options. In: International conference on intelligent robots and systems (IROS)

  23. 23.

    Stulp F, Sigaud O (2012) Policy improvement methods: between black-box optimization and episodic reinforcement learning. hal-00738463

  24. 24.

    Takayama L, Dooley D, Ju W (2011) Expressing thought: improving robot readability with animation principles. In: Proceedings of the 6th ACM/IEEE international conference on human–robot interaction (HRI), pp 69–76

  25. 25.

    Vesper C, van der Wel RPRD, Knoblich G, Sebanz N (2011) Making oneself predictable: reduced temporal variability facilitates joint action coordination. Exp Brain Res 211(3–4):517–530

    Article  Google Scholar 

  26. 26.

    Zhao M, Shome R, Yochelson I, Bekris K, Kowler E (2014) An experimental study for identifying features of legible manipulator paths. In: International symposium on experimental robotics (ISER)

Download references

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and by the EU FP7-ICT project 3rdHand under grant agreement no 610878.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Baptiste Busch.

Additional information

This paper is an extended version of a previous work of Stulp et al. [21] and contains additional experimental results and more detailed discussions.

Appendices

Appenndix 1: Policy Improvement through Black-Box Optimization

Policy improvement is a form of model-free reinforcement learning, where the parameters \(\mathbf {\theta }\) of a parameterized policy \(\pi _{\mathbf {\theta }}\) are optimized through trial-and-error interaction with the environment. The optimization algorithm we use is \(\hbox {PI}^{\textsf {BB}}\), short for “Policy Improvement through Black-Box optimization” [23]. It optimizes the parameters \(\mathbf {\theta }\) with a two-step iterative procedure. The first step is to locally explore the policy parameter space by sampling K parameter vectors \(\mathbf {\theta } _k\) from the Gaussian distribution \(\mathcal {N}(\mathbf {\theta },\mathbf {\Sigma })\), to execute the policy with each \(\mathbf {\theta } _k\), and to determine the cost \(J_k\) of each execution. This exploration step is visualized in Fig. 12, where \(\mathcal {N}(\mathbf {\theta },\mathbf {\Sigma })\) is represented as the large (blue) circle, and the samples \(J_{k=1\dots 10}\) are small (blue) dots.

The second step is to update the policy parameters \(\mathbf {\theta }\). Here, the costs \(J_k\) are converted into weights \(P_k\) with

$$\begin{aligned} P_k&= e^{\left( \frac{-h(J_k-\min (\mathbf {J}))}{\max (\mathbf {J}) -\min (\mathbf {J})}\right) } \end{aligned}$$
(6)

where low-cost samples thus have higher weights. For the samples in Fig. 12, this mapping is visualized (to the right). The weights are also represented in the left figure as filled (green) circles, where a larger circle implies a higher weights. The parameters \(\mathbf {\theta }\) are then updated with reward-weighted averaging

$$\begin{aligned} \mathbf {\theta }&\leftarrow \sum _{k=1}^{K} P_k \mathbf {\theta } _k \end{aligned}$$
(7)

Furthermore, exploration is decreased after each iteration \(\mathbf {\Sigma } \leftarrow \lambda \mathbf {\Sigma } \) with a decay factor \(0<\lambda \le 1\). The updated policy and exploration parameters (red circle in Fig. 12) are then used for the next exploration/update step in the iteration.

In the optimization experiments described in this article, the parameters of \(\hbox {PI}^{\textsf {BB}}\)are \(K=8\) (trials per update), \(\mathbf {\Sigma } =5\mathbf {I}\) (initial exploration magnitude) and \(\lambda =0.9\) (exploration decay).

Despite its simplicity, \(\hbox {PI}^{\textsf {BB}}\)is able to learn robot skills efficiently and robustly [22]. Alternatively, algorithms such as \(\hbox {PI}^{2}\), PoWER, NES, PGPE, or CMA-ES could be used, see [11, 23] for an overview and comparisons.

Fig. 13
figure13

Results for Experiment A  (left column) and B (right column). The start of the optimization phase is indicated by the vertical dashed line. (Top row) Average \((\mu \pm \sigma )\) of the robot button pushing time (\(T_\mathrm{robot}\)). It varies little for the DMP policy (left) and even less for the viapoint policy (right). For the latter this is to be expected, as the duration of pressing the button is not dependent on the parameters of the policy in which exploration and optimization takes place. (Second row) Average \((\mu \pm \sigma )\) of the subject button pushing time \(T_\mathrm{subject}\), over all nine subjects. Variance is quite high because some subjects push quickly overall, whereas others are more careful. (Third row) Again the average subject button time, but this time normalized with respect to the average value of \(T_\mathrm{subject}\) during the last eight trials of the habituation for each subject. This reduces the variance caused by the overall differences between subjects. For this graph, the results of the experiment in the opposite column has been added as a dashed line to facilitate comparison between experiments A  and B . (Fourth row) Average \((\mu \pm \sigma )\) of the trajectory completion at prediction time, i.e, the relative amount of trajectory (timewise) observed by the subject when it presses the button. This value is calculated using the formula \(100(1 - \frac{T_{robot} - T_{subject}}{T_{robot}})\). (Bottom) Number of times the incorrect button was pushed, averagedS over blocks of eight trials and all nine subjects

Fig. 14
figure14

Results for Experiment C with DMP (left column) and viapoint (right column) policies pre-optimized. The start of the optimization phase is indicated by the vertical dashed line. (Top row) Average \((\mu \pm \sigma )\) of the subject button pushing time \(T_\mathrm{subject}\), over all nine subjects. (Second row) Again the average subject button time, but this time normalized with respect to the average value of \(T_\mathrm{subject}\) during the last eight trials of the habituation for each subject. For this graph, the results of the experiment in the opposite column has been added as a dashed line to facilitate comparison between experiments DMP and viapoint policies. (Third row) Average \((\mu \pm \sigma )\) of the trajectory completion at prediction time, i.e, the relative amount of trajectory (timewise) observed by the subject when it presses the button. This value is calculated using the formula \(100\,(1 - \frac{T_{robot} - T_{subject}}{T_{robot}})\). (Bottom) Number of times the incorrect button was pushed, averaged over blocks of eight trials and all nine subjects

Appendix 2: Complete results for Experiment A  and B

See Fig. 13.

Appendix 3: Complete results for Experiment C

See Fig. 14.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Busch, B., Grizou, J., Lopes, M. et al. Learning Legible Motion from Human–Robot Interactions. Int J of Soc Robotics 9, 765–779 (2017). https://doi.org/10.1007/s12369-017-0400-4

Download citation

Keywords

  • Human–robot interaction (HRI)
  • Legible motion
  • Implicit coordination
  • Reinforcement learning (RL)