Dynamic Reward Shaping: Training a Robot by Voice

Tenorio-Gonzalez, Ana C.; Morales, Eduardo F.; Villaseñor-Pineda, Luis

doi:10.1007/978-3-642-16952-6_49

Ana C. Tenorio-Gonzalez²¹,
Eduardo F. Morales²¹ &
Luis Villaseñor-Pineda²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6433))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1594 Accesses
28 Citations

Abstract

Reinforcement Learning is commonly used for learning tasks in robotics, however, traditional algorithms can take very long training times. Reward shaping has been recently used to provide domain knowledge with extra rewards to converge faster. The reward shaping functions are normally defined in advance by the user and are static. This paper introduces a dynamic reward shaping approach, in which these extra rewards are not consistently given, can vary with time and may sometimes be contrary to what is needed for achieving a goal. In the experiments, a user provides verbal feedback while a robot is performing a task which is translated into additional rewards. It is shown that we can still guarantee convergence as long as most of the shaping rewards given per state are consistent with the goals and that even with fairly noisy interaction the system can still produce faster convergence times than traditional reinforcement learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbeel, P., Ng, A.Y.: Apprenticeship Learning via Inverse Reinforcement Learning. In: 21st International Conference on Machine Learning (2004)
Google Scholar
Conn, K., Peters, R.A.: Reinforcement Learning with a Supervisor for a Mobile Robot in a Real-World Environment. In: IEEE Computational Intelligence in Robotics and Automation (2007)
Google Scholar
Dorigo, M., Colombetti, M.: Robot Shaping: Developing Autonomous Agents through Learning. Artificial Intelligence Journal 2, 321–370 (1993)
Google Scholar
Grzes, M., Kudenko, D.: Learning Shaping Rewards in Model-based Reinforcement Learning. In: Workshop on Adaptive Learning Agents ALA-AAMAS (2009)
Google Scholar
Gullapalli, V.: Reinforcement Learning and its Application to Control. Ph.D. Thesis. University of Masachussetts (1992)
Google Scholar
Iida, F., Tabata, M., Hara, F.: Generating Personality Character in a Face Robot through Interaction with Human. In: 7th IEEE International Workshop on Robot and Human Communication, pp. 481–486 (1998)
Google Scholar
Konidaris, G., Barto, A.: Autonomous Shaping: Knowledge Transfer in Reinforcement Learning. In: 23rd International Conference on Machine Learning (2006)
Google Scholar
Knox, W.B., Stone, P.: Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning. In: 9th International Conference Autonomous Agents and Multiagent Systems (2010)
Google Scholar
Laud, A.: Theory and Application of Reward Shaping in Reinforcement Learning. PhD. Thesis. University of Illinois (2004)
Google Scholar
Lockerd, T.A., Breazeal, C.: Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance. In: 21st National Conference on Artificial Intelligence (2006)
Google Scholar
Lockerd, T.A., Hoffman, G., Breazeal, C.: Real-Time Interactive Reinforcement Learning for Robots. In: Workshop on Human Comprehensible Machine Learning (2005)
Google Scholar
Lockerd, T.A., Hoffman, G., Breazeal, C.: Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots. In: 15th IEEE International Symposium on Robot and Human Interactive Communication, pp. 352–357 (2006)
Google Scholar
Marthi, B.: Automatic Shaping and Decomposition of Reward Functions. In: 24th International Conference on Machine Learning, pp. 601–608 (2007)
Google Scholar
Mataric, M.: Reward Functions for Accelerated Learning. In: 11th International Conference on Machine Learning, pp. 182–189 (1994)
Google Scholar
Ng, A.Y., Harada, D., Rusell, S.: Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping. In: 16th International Conference on Machine Learning, pp. 278–287 (1999)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement Learning for Humanoid Robotics. In: 3rd IEEE-RAS International Conference on Humanoid Robots (2003)
Google Scholar
Pineda, L.: Corpus DIMEx100 (Level T22). DIME Project. Computer Sciences Department. IIMAS, UNAM, ISBN:970-32-3395-3
Google Scholar
Randlov, J., Alstrom, P.: Learning to Drive a Bicycle using Reinforcement Learning and Shaping. In: 15th International Conference on Machine Learning, pp. 463–471 (1998)
Google Scholar
Rybski Paul, E., Kevin, Y., Jeremy, S., Veloso Manuela, M.: Interactive Robot Task Training through Dialog and Demonstration. In: ACM/IEEE International Conference on Human Robot Interaction, pp. 255–262 (2007)
Google Scholar
Smart, W., Kaelbling, L.: Effective Reinforcement Learning for Mobile Robots. In: IEEE International Conference on Robotics and Automation (2002)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1999)
Google Scholar
Tenorio-Gonzalez, A.C.: Instruction of Tasks to a Robot using on-line Feedback provided by Voice. M.S. Thesis (2010)
Google Scholar
Wang, Y., Huber, M., Papudesi, V.N., Cook, D.J.: User-Guided Reinforcement Learning of Robot Assistive Tasks for an Intelligent Environment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2003)
Google Scholar
Zhang, Y., Weng, J.: Action Chaining by a Developmental Robot with a Value System. In: 2nd International Conference on Development and Learning (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro #1, 72840, Tonantzintla, México
Ana C. Tenorio-Gonzalez, Eduardo F. Morales & Luis Villaseñor-Pineda

Authors

Ana C. Tenorio-Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo F. Morales
View author publications
You can also search for this author in PubMed Google Scholar
Luis Villaseñor-Pineda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento Académico de Computación, Instituto Tecnológico Autónomo de México, Río Hondo No. 1, 01000, Mexico, D.F., México
Angel Kuri-Morales
Department of Computer Science and Engineering, Universidad Nacional del Sur, Alem 1253, 8000, Bahía Blanca, Argentina
Guillermo R. Simari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tenorio-Gonzalez, A.C., Morales, E.F., Villaseñor-Pineda, L. (2010). Dynamic Reward Shaping: Training a Robot by Voice. In: Kuri-Morales, A., Simari, G.R. (eds) Advances in Artificial Intelligence – IBERAMIA 2010. IBERAMIA 2010. Lecture Notes in Computer Science(), vol 6433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16952-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-642-16952-6_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16951-9
Online ISBN: 978-3-642-16952-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics