Multiple Model Q-Learning for Stochastic Asynchronous Rewards

Campbell, Jeffrey S.; Givigi, Sidney N.; Schwartz, Howard M.

doi:10.1007/s10846-015-0222-2

Multiple Model Q-Learning for Stochastic Asynchronous Rewards

Published: 02 May 2015

Volume 81, pages 407–422, (2016)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Jeffrey S. Campbell¹,
Sidney N. Givigi² &
Howard M. Schwartz¹

383 Accesses
6 Citations
Explore all metrics

Abstract

This paper investigates reinforcement learning problems where a stochastic time delay is present in the reinforcement signal, but the delay is unknown to the learning agent. This work posits that the agent may receive individual reinforcements out of order, which is a relaxation of an important assumption in previous works from the literature. To that end, a stochastic time delay is introduced into a mobile robot line-following application. The main contribution of this work is to provide a novel stochastic approximation algorithm, which is an extension of Q-learning, for the time-delayed reinforcement problem. The paper includes a proof of convergence as well as grid world simulation results from MATLAB, results of line-following simulations within the Cyberbotics Webots mobile robot simulator, and finally, experimental results using an e-Puck mobile robot to follow a real track despite the presence of large, stochastic time delays in its reinforcement signal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Arel, I., Liu, C., Urbanik, T., Kohls, A.G.: Reinforce-ment learning-based multi-agent system for network traffic signal control. Intell. Trans. Syst. IET 4(2), 128–135 (2010)
Article Google Scholar
Borrajo, D., Parker, L.E., et al.: A reinforcement learning algorithm in cooperative multi-robot domains. J. Intell. Robot. Syst. 43(2-4), 161–174 (2005)
Article Google Scholar
Campbell, A.S., Schwartz, H.M.: Multiple model control improvements: hypothesis testing and modified model arrangement. Control Intell. Syst. 35(3), 236–243 (2007)
MATH Google Scholar
Campbell, J.S., Givigi, S.N., Schwartz, H.M.: Multiple model Q-learning for stochastic reinforcement delays. In: Proceedings of the 2014 IEEE international conference on systems, man, and cybernetics. SMC (2014)
Chen, C., Li, H.-X., Dong, D.: Hybrid control for robot navigation - a hierarchical q-learning algorithm. IEEE Robot. Autom. Mag. 15(2), 37–47 (2008)
Article Google Scholar
Chinthalapati, V.L.R., Yadati, N., Karumanchi, R.: Learning dynamic prices in multiseller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 36(1), 92–106 (2006)
Article Google Scholar
Gonzalez-Valenzuela, S., Vuong, S.T., Leung, V.C.M.: A mobile-directory approach to service discovery in wire- less ad hoc networks. IEEE Trans. Mob. Comput. 7(10), 1242–1256 (2008)
Article Google Scholar
Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput. 6(6), 1185–1201 (1994)
Article MATH Google Scholar
Kartoun, U., Stern, H., Edan, Y.: A human-robot collaborative reinforcement learning algorithm. J. Intell. Robot. Syst. 60(2), 217–239 (2010)
Article MATH Google Scholar
Katsikopoulos, K.V., Engelbrecht, S.E.: Markov decision processes with delays and asynchronous cost collection. IEEE Trans. Autom. Control 48(4), 568–574 (2003)
Article MathSciNet Google Scholar
Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Wiering, M., Otterlo, M. (eds.) Reinforcement Learning, volume 12 of Adaptation, Learning, and Optimization, pp. 579–610. Springer, Berlin Heidelberg (2012)
Google Scholar
Rahimiyan, M., Mashhadi, H.R.: An adaptive q -learning algorithm developed for agent-based computational modeling of electricity market. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 40(5), 547–556 (2010)
Article Google Scholar
Ribeiro, C.H.C.: Embedding a priori knowledge in reinforcement learning. J. Intell. Robot. Syst. 21(1), 51–71 (1998)
Article MathSciNet Google Scholar
Sahingoz, O.K.: Networking models in flying ad-hoc networks (FANETs): Concepts and challenges. J. Intell. Robot. Syst. 74(1-2), 513–527 (2014)
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sutton, R.S., Andrew, G.B.: Reinforcement learning: Introduction (1998)
Szita, I., Lőrincz, A.: Optimistic initialization and greediness lead to polynomial time learning in factored mdps. In: Proceedings of the 26th annual international conference on machine learning, pp. 1001–1008. ACM (2009)
Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., Paragios, N.: Parsing facades with shape grammars and reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1744–1756 (2013)
Article Google Scholar
Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Mach. Learn. 16(3), 185–202 (1994)
MathSciNet MATH Google Scholar
Walsh, T.J., Nouri, A., Li, L., Littman, M.L.: Learning and planning in environments with delayed feedback. Auton. Agents Multi-Agent Syst. 18(1), 83–105 (2009)
Article Google Scholar
Wang, H., Gao, Y., Chen, X.: Rl-dot: A reinforcement learning npc team for playing domination games. IEEE Trans. Comput. Intell. AI Games 2(1), 17–26 (2010)
Article Google Scholar
Watkins, C.J.CH., Dayan, P.: Q-learning. Machine Learning (1992)
Cornish, C.J., Watkins, H.: Learning from delayed rewards. PhD thesis, University of Cambridge (1989)

Download references

Author information

Authors and Affiliations

Systems and Computer Engineering, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, K1S 5B6, Canada
Jeffrey S. Campbell & Howard M. Schwartz
Department of Electrical and Computer Engineering, Royal Military College of Canada, P.O. Box 17000 Station Forces, Kingston, Ontario, K7K 7B4, Canada
Sidney N. Givigi

Authors

Jeffrey S. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Sidney N. Givigi
View author publications
You can also search for this author in PubMed Google Scholar
Howard M. Schwartz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sidney N. Givigi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Campbell, J.S., Givigi, S.N. & Schwartz, H.M. Multiple Model Q-Learning for Stochastic Asynchronous Rewards. J Intell Robot Syst 81, 407–422 (2016). https://doi.org/10.1007/s10846-015-0222-2

Download citation

Received: 27 January 2014
Accepted: 17 February 2015
Published: 02 May 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10846-015-0222-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multiple Model Q-Learning for Stochastic Asynchronous Rewards

Abstract

Access this article

Similar content being viewed by others

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Research on Path Planning Algorithm for Mobile Robot Based on Improved Reinforcement Learning

The Challenges of Reinforcement Learning in Robotics and Optimal Control

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple Model Q-Learning for Stochastic Asynchronous Rewards

Abstract

Access this article

Similar content being viewed by others

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Research on Path Planning Algorithm for Mobile Robot Based on Improved Reinforcement Learning

The Challenges of Reinforcement Learning in Robotics and Optimal Control

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation