Learning proactive behavior for interactive social robots
- 513 Downloads
Learning human–robot interaction logic from example interaction data has the potential to leverage “big data” to reduce the effort and time spent on designing interaction logic or crafting interaction content. Previous work has demonstrated techniques by which a robot can learn motion and speech behaviors from non-annotated human–human interaction data, but these techniques only enable a robot to respond to human-initiated inputs, and do not enable the robot to proactively initiate interaction. In this work, we propose a method for learning both human-initiated and robot-initiated behavior for a social robot from human–human example interactions, which we demonstrate for a shopkeeper interacting with a customer in a camera shop scenario. This was achieved by extending an existing technique by (1) introducing a concept of a customer yield action, (2) incorporating interaction history, represented by sequences of discretized actions, as inputs for training and generating robot behavior, and (3) using an “attention mechanism” in our learning system for training robot behaviors, that learns which parts of the interaction history are more important for generating robot behaviors. The proposed method trains a robot to generate multimodal actions, consisting of speech and locomotion behaviors. We compared this study with the previous technique in two ways. Cross-validation on the training data showed higher social appropriateness of predicted behaviors using the proposed technique, and a user study of live interaction with a robot showed that participants perceived the proposed technique to produce behaviors that were more proactive, socially-appropriate, and better in overall quality.
KeywordsHuman–robot interaction Data-driven learning Learning by imitation Social robotics Service robots Proactive behaviors
This work was supported in part by the JST ERATO Ishiguro Symbiotic Human-Robot Interaction Project, Grant Number JPMJER1401, and in part by JSPS KAKENHI Grant Number 25240042.
Compliance with ethical standards
This research was conducted in compliance with the standards and regulations of our company’s ethical review board, which requires each experiment to be subject to a review and approval procedure according to strict ethical guidelines.
- Admoni, H., & Scassellati, B. (2014). Data-driven model of nonverbal behavior for socially assistive human–robot interactions. In Proceedings of the 16th international conference on multimodal interaction (pp. 196–199), ACM.Google Scholar
- Awais, M., & Henrich, D. (2012). Proactive premature intention estimation for intuitive human–robot collaboration. In 2012 IEEE/RSJ international conference on intelligent robots and systems (pp. 4098–4103), IEEE.Google Scholar
- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Breazeal, C., DePalma, N., Orkin, J., Chernova, S., & Jung, M. (2013). Crowdsourcing human–robot interaction: new methods and system evaluation in a public environment. Journal of Human–Robot Interaction, 2(1), 82–111.Google Scholar
- Chao, C., & Thomaz, A. L. (2011). Timing in multimodal turn-taking interactions: Control and analysis using timed petri nets. Journal of Human–Robot Interaction, 1(1), 1–16.Google Scholar
- Cheng, J., Dong, L., & Lapata, M. (2016). Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.
- Chernova, S., DePalma, N., Morant, E., & Breazeal, C. (2011). Crowdsourcing human–robot interaction: Application from virtual to physical worlds. In RO-MAN, 2011 IEEE, July 31 2011–Aug. 3 2011 (pp. 21–26). https://doi.org/10.1109/roman.2011.6005284.
- Glas, D. F., Brščič, D., Miyashita, T., & Hagita, N. (2015). SNAPCAT-3D: Calibrating networks of 3D range sensors for pedestrian tracking. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 712–719), IEEE.Google Scholar
- Gu, E., & Badler, N. I. (2006). Visual attention and eye gaze during multiparty conversations with distractions. In International workshop on intelligent virtual agents (pp. 193–204), Springer.Google Scholar
- Guéguen, L. (2001). Segmentation by maximal predictive partitioning according to composition biases. In O. Gascuel, & M.-F. Sagot (Eds.), Computational biology. Lecture Notes in Computer Science (Vol. 2066, pp. 32–44). Berlin: Springer.Google Scholar
- Hall, E. T. (1966). The hidden dimension. London: The Bodley Head Ltd.Google Scholar
- Hayashi, K., Sakamoto, D., Kanda, T., Shiomi, M., Koizumi, S., Ishiguro, H., et al. (2007). Humanoid robots as a passive-social medium—A field experiment at a train station. In 2007 2nd ACM/IEEE international conference on human–robot interaction (HRI), 9–11 March 2007 (pp. 137–144).Google Scholar
- Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., et al. (2015). Teaching machines to read and comprehend. In Advances in neural information processing systems (pp. 1693–1701).Google Scholar
- Huang, C.-M., Cakmak, M., & Mutlu, B. (2015). Adaptive coordination strategies for human–robot handovers. In Proceedings of robotics: Science and systems.Google Scholar
- Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
- Kawai, H., Toda, T., Ni, J., Tsuzaki, M., & Tokuda, K. (2004). XIMERA: A new TTS from ATR based on corpus-based technologies. In Fifth ISCA workshop on speech synthesis.Google Scholar
- Kitade, T., Satake, S., Kanda, T., & Imai, M. (2013). Understanding suitable locations for waiting. In Proceedings of the 8th ACM/IEEE international conference on Human–robot interaction (pp. 57–64), IEEE Press.Google Scholar
- Michalowski, M. P., Sabanovic, S., & Simmons, R. (2006). A spatial model of engagement for a social robot. In 9th IEEE international workshop on advanced motion control, 2006 (pp. 762–767). michalowski06: IEEE.Google Scholar
- Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Interspeech (Vol. 2, p. 3)Google Scholar
- Mohammad, Y., & Nishdia, T. (2012). Self-initiated imitation learning. Discovering what to imitate. In 2012 12th International conference on control, automation and systems (ICCAS), 2012 (pp. 726–732), IEEE.Google Scholar
- Mutlu, B., Shiwa, T., Kanda, T., Ishiguro, H., & Hagita, N. (2009). Footing in human–robot conversations: How robots might shape participant roles using gaze cues. Paper presented at the Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, La Jolla, California, USA.Google Scholar
- Orkin, J., & Roy, D. (2007). The restaurant game: Learning social behavior and language from thousands of players online. Journal of Game Development, 3(1), 39–60.Google Scholar
- Orkin, J., & Roy, D. (2009). Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1 (pp. 385–392). International Foundation for Autonomous Agents and Multiagent SystemsGoogle Scholar
- Raffel, C., & Ellis, D. P. (2015). Feed-forward networks with attention can solve some long-term memory problems. arXiv preprint arXiv:1512.08756.
- Raux, A., & Eskenazi, M. (2008). Optimizing endpointing thresholds using dialogue features in a spoken dialogue system. In Proceedings of the 9th SIGdial workshop on discourse and dialogue (pp. 1–10). Association for Computational LinguisticsGoogle Scholar
- Rich, C., Ponsler, B., Holroyd, A., & Sidner, C. L. (2010). Recognizing engagement in human–robot interaction. In 2010 5th ACM/IEEE international conference on human–robot interaction (HRI) (pp. 375–382), IEEEGoogle Scholar
- Robins, B., Dautenhahn, K., & Dickerson, P. (2009). From isolation to communication: a case study evaluation of robot assisted play for children with autism with a minimally expressive humanoid robot. In Second international conferences on advances in computer–human interactions, 2009. ACHI’09 (pp. 205–211), IEEE.Google Scholar
- Satake, S., Hayashi, K., Nakatani, K., & Kanda, T. (2015). Field trial of an information-providing robot in a shopping mall. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1832–1839), IEEE.Google Scholar
- Satake, S., Kanda, T., Glas, D. F., Imai, M., Ishiguro, H., & Hagita, N. (2009). How to approach humans? Strategies for social robots to initiate interaction. In Proceedings of the 4th ACM/IEEE international conference on human robot interaction, La Jolla, California, USA (pp. 109–116), ACM. https://doi.org/10.1145/1514095.1514117.
- Schmid, A. J., Weede, O., & Worn, H. (2007). Proactive robot task selection given a human intention estimate. In RO-MAN 2007—The 16th IEEE international symposium on robot and human interactive communication, 26–29 Aug. 2007 (pp. 726–731). https://doi.org/10.1109/roman.2007.4415181.
- Schrempf, O. C., Hanebeck, U. D., Schmid, A. J., & Worn, H. (2005). A novel approach to proactive human–robot cooperation. In ROMAN 2005. IEEE international workshop on robot and human interactive communication, 2005. (pp. 555–560), IEEEGoogle Scholar
- Shi, C., Kanda, T., Shimada, M., Yamaoka, F., Ishiguro, H., & Hagita, N. (2010). Easy development of communicative behaviors in social robots. In 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS), 18–22 Oct. 2010 (pp. 5302–5309). https://doi.org/10.1109/iros.2010.5650128.
- Shi, C., Shimada, M., Kanda, T., Ishiguro, H., & Hagita, N. (2011). Spatial formation model for initiating conversation. In Proceedings of robotics: Science and systems VII.Google Scholar
- Shiomi, M., Kanda, T., Glas, D. F., Satake, S., Ishiguro, H., & Hagita, N. (2009). Field trial of networked social robots in a shopping mall. In IEEE/RSJ international conference on intelligent robots and systems, 2009. IROS 2009. St. Louis, MO, USA, 10–15 Oct. 2009 (pp. 2846–2853). shiomi09: IEEE Press. https://doi.org/10.1109/iros.2009.5354242.
- Sugiyama, O., Kanda, T., Imai, M., Ishiguro, H., & Hagita, N. (2007). Natural deictic communication with humanoid robots. In 2007 IEEE/RSJ international conference on intelligent robots and systems (pp. 1441–1448), IEEE.Google Scholar
- Sukhbaatar, S., Weston, J., & Fergus, R. (2015). End-to-end memory networks. In Advances in neural information processing systems (pp. 2440–2448).Google Scholar
- Toris, R., Kent, D., & Chernova, S. (2014). The robot management system: A framework for conducting human–robot interaction studies through crowdsourcing. Journal of Human–Robot Interaction, 3(2), 25–49.Google Scholar
- Triebel, R., Arras, K., Alami, R., Beyer, L., Breuers, S., Chatila, R., et al. (2016). Spencer: A socially aware service robot for passenger guidance and help in busy airports. In Field and service robotics (pp. 607–622), Springer.Google Scholar
- Yamaoka, F., Kanda, T., Ishiguro, H., & Hagita, N. (2008). How close? A model of proximity control for information-presenting robots. In Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction, Amsterdam, The Netherlands (pp. 137–144), ACM. https://doi.org/10.1145/1349822.1349841.
- Young, J. E., Igarashi, T., Sharlin, E., Sakamoto, D., & Allen, J. (2014). Design and evaluation techniques for authoring interactive and stylistic behaviors. ACM Transactions on Interactive Intelligent Systems (TiiS), 3(4), 23.Google Scholar