Skip to main content
Log in

Multiple Model Q-Learning for Stochastic Asynchronous Rewards

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

This paper investigates reinforcement learning problems where a stochastic time delay is present in the reinforcement signal, but the delay is unknown to the learning agent. This work posits that the agent may receive individual reinforcements out of order, which is a relaxation of an important assumption in previous works from the literature. To that end, a stochastic time delay is introduced into a mobile robot line-following application. The main contribution of this work is to provide a novel stochastic approximation algorithm, which is an extension of Q-learning, for the time-delayed reinforcement problem. The paper includes a proof of convergence as well as grid world simulation results from MATLAB, results of line-following simulations within the Cyberbotics Webots mobile robot simulator, and finally, experimental results using an e-Puck mobile robot to follow a real track despite the presence of large, stochastic time delays in its reinforcement signal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arel, I., Liu, C., Urbanik, T., Kohls, A.G.: Reinforce-ment learning-based multi-agent system for network traffic signal control. Intell. Trans. Syst. IET 4(2), 128–135 (2010)

    Article  Google Scholar 

  2. Borrajo, D., Parker, L.E., et al.: A reinforcement learning algorithm in cooperative multi-robot domains. J. Intell. Robot. Syst. 43(2-4), 161–174 (2005)

    Article  Google Scholar 

  3. Campbell, A.S., Schwartz, H.M.: Multiple model control improvements: hypothesis testing and modified model arrangement. Control Intell. Syst. 35(3), 236–243 (2007)

    MATH  Google Scholar 

  4. Campbell, J.S., Givigi, S.N., Schwartz, H.M.: Multiple model Q-learning for stochastic reinforcement delays. In: Proceedings of the 2014 IEEE international conference on systems, man, and cybernetics. SMC (2014)

  5. Chen, C., Li, H.-X., Dong, D.: Hybrid control for robot navigation - a hierarchical q-learning algorithm. IEEE Robot. Autom. Mag. 15(2), 37–47 (2008)

    Article  Google Scholar 

  6. Chinthalapati, V.L.R., Yadati, N., Karumanchi, R.: Learning dynamic prices in multiseller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 36(1), 92–106 (2006)

    Article  Google Scholar 

  7. Gonzalez-Valenzuela, S., Vuong, S.T., Leung, V.C.M.: A mobile-directory approach to service discovery in wire- less ad hoc networks. IEEE Trans. Mob. Comput. 7(10), 1242–1256 (2008)

    Article  Google Scholar 

  8. Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput. 6(6), 1185–1201 (1994)

    Article  MATH  Google Scholar 

  9. Kartoun, U., Stern, H., Edan, Y.: A human-robot collaborative reinforcement learning algorithm. J. Intell. Robot. Syst. 60(2), 217–239 (2010)

    Article  MATH  Google Scholar 

  10. Katsikopoulos, K.V., Engelbrecht, S.E.: Markov decision processes with delays and asynchronous cost collection. IEEE Trans. Autom. Control 48(4), 568–574 (2003)

    Article  MathSciNet  Google Scholar 

  11. Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Wiering, M., Otterlo, M. (eds.) Reinforcement Learning, volume 12 of Adaptation, Learning, and Optimization, pp. 579–610. Springer, Berlin Heidelberg (2012)

    Google Scholar 

  12. Rahimiyan, M., Mashhadi, H.R.: An adaptive q -learning algorithm developed for agent-based computational modeling of electricity market. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 40(5), 547–556 (2010)

    Article  Google Scholar 

  13. Ribeiro, C.H.C.: Embedding a priori knowledge in reinforcement learning. J. Intell. Robot. Syst. 21(1), 51–71 (1998)

    Article  MathSciNet  Google Scholar 

  14. Sahingoz, O.K.: Networking models in flying ad-hoc networks (FANETs): Concepts and challenges. J. Intell. Robot. Syst. 74(1-2), 513–527 (2014)

    Article  Google Scholar 

  15. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  16. Sutton, R.S., Andrew, G.B.: Reinforcement learning: Introduction (1998)

  17. Szita, I., Lőrincz, A.: Optimistic initialization and greediness lead to polynomial time learning in factored mdps. In: Proceedings of the 26th annual international conference on machine learning, pp. 1001–1008. ACM (2009)

  18. Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., Paragios, N.: Parsing facades with shape grammars and reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1744–1756 (2013)

    Article  Google Scholar 

  19. Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Mach. Learn. 16(3), 185–202 (1994)

    MathSciNet  MATH  Google Scholar 

  20. Walsh, T.J., Nouri, A., Li, L., Littman, M.L.: Learning and planning in environments with delayed feedback. Auton. Agents Multi-Agent Syst. 18(1), 83–105 (2009)

    Article  Google Scholar 

  21. Wang, H., Gao, Y., Chen, X.: Rl-dot: A reinforcement learning npc team for playing domination games. IEEE Trans. Comput. Intell. AI Games 2(1), 17–26 (2010)

    Article  Google Scholar 

  22. Watkins, C.J.CH., Dayan, P.: Q-learning. Machine Learning (1992)

  23. Cornish, C.J., Watkins, H.: Learning from delayed rewards. PhD thesis, University of Cambridge (1989)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sidney N. Givigi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Campbell, J.S., Givigi, S.N. & Schwartz, H.M. Multiple Model Q-Learning for Stochastic Asynchronous Rewards. J Intell Robot Syst 81, 407–422 (2016). https://doi.org/10.1007/s10846-015-0222-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-015-0222-2

Keywords

Navigation