Skip to main content

Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2019 (ICCSA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11620))

Included in the following conference series:

Abstract

Rewards and punishments in different forms are pervasive and present in a wide variety of decision-making scenarios. By observing the outcome of a sufficient number of repeated trials, one would gradually learn the value and usefulness of a particular policy or strategy. However, in a given environment, the outcomes resulting from different trials are subject to chance influence and variations. In learning about the usefulness of a given policy, significant costs are involved in systematically undertaking the sequential trials; therefore, in most learning episodes, one would wish to keep the cost within bounds by adopting learning stopping rules. In this paper, we examine the deployment of different stopping strategies in given learning environments which vary from highly stringent for mission critical operations to highly tolerant for non-mission critical operations, and emphasis is placed on the former with particular application to aviation safety. In policy evaluation, two sequential phases of learning are identified, and we describe the outcomes variations using a probabilistic model, with closed-form expressions obtained for the key measures of performance. Decision rules that map the trial observations to policy choices are also formulated. In addition, simulation experiments are performed, which corroborate the validity of the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ziebart, B.D., Maas, A.L., A. Bagnell, J., Dey, A.K.: Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI 08), vol. 8, pp. 1433–1438 (2008)

    Google Scholar 

  2. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  3. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: International Conference on Machine Learning (1998)

    Google Scholar 

  4. Santana, H., Ramalho, G., Corruble, V., Ratitch, B.: Multi-agent patrolling with reinforcement learning. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, vol. 3, pp. 1122–1129. IEEE Computer Society (2004)

    Google Scholar 

  5. Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)

    MathSciNet  MATH  Google Scholar 

  6. Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Auton. Agents Multi-Agent Syst. 11(3), 387–434 (2005)

    Article  Google Scholar 

  7. Ipek, E., Mutlu, O., Martínez, J.F., Caruana, R.: Self-optimizing memory controllers: a reinforcement learning approach. In: ACM SIGARCH Computer Architecture News, vol. 36, no. 3. IEEE Computer Society (2008)

    Google Scholar 

  8. Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 38(22), 156–172 (2008)

    Article  Google Scholar 

  9. Albrecht, S.V., Stone, P.: Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif. Intell. 258, 66–95 (2018)

    Article  MathSciNet  Google Scholar 

  10. Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4), e0172395 (2017)

    Article  Google Scholar 

  11. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)

    Google Scholar 

  12. Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)

  13. Wei, Q., Lewis, F.L., Sun, Q., Yan, P., Song, R.: Discrete-time deterministic Q-learning: a novel convergence analysis. IEEE Trans. Cybern. 47(5), 1224–1237 (2017)

    Article  Google Scholar 

  14. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  15. Van Hasselt, H., Wiering, M.A.: Using continuous action spaces to solve discrete problems. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 09), pp. 1149–1156. IEEE (2009)

    Google Scholar 

  16. Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)

    Article  Google Scholar 

  17. Feller, W.: An Introduction to Probability Theory and its Applications, vol. 1, 3rd edn. Wiley, Hoboken (2008)

    MATH  Google Scholar 

  18. Rodrigues, C., Cusick, S: Commercial Aviation Safety, 5th edn. (2012)

    Google Scholar 

  19. Deng, J., Leung, C.H.C.: Dynamic time warping for music retrieval using time series modeling of musical emotions. IEEE Trans. Affect. Comput. 6(2), 137–151 (2015)

    Article  Google Scholar 

  20. Zhang, H.L., Leung, C.H.C., Raikundalia, G.K.: Topological analysis of AOCD-based agent networks and experimental results. J. Comput. Syst. Sci. 74, 255–278 (2008)

    Article  MathSciNet  Google Scholar 

  21. Azzam, I., Leung, C.H.C., Horwood, J.: Implicit concept-based image indexing and retrieval. In: Proceedings of the IEEE International Conference on Multi-media Modeling, Brisbane, Australia, pp. 354–359 (2004)

    Google Scholar 

  22. Zhang, H.L., Leung, C.H.C., Raikundalia, G.K.: Classification of intelligent agent network topologies and a new topological description language for agent networks. In: Shi, Z., Shimohara, K., Feng, D. (eds.) IIP 2006. IIFIP, vol. 228, pp. 21–31. Springer, Boston, MA (2006). https://doi.org/10.1007/978-0-387-44641-7_3

    Chapter  Google Scholar 

  23. Kuang, N.L.J., Leung, C.H.C., Sung, V.: Stochastic reinforcement learning. In: Proceedings of the IEEE International Conference on Artificial Intelligence and Knowledge Engineering, California, USA, pp. 244–248 (2018)

    Google Scholar 

  24. Kuang, N.L.J., Leung, C.H.C.: Performance dynamics and termination errors in reinforcement learning – a unifying perspective. In: Proceedings of the IEEE International Conference on Artificial Intelligence and Knowledge Engineering, California, USA, pp. 129–133 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clement H. C. Leung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kuang, N.L., Leung, C.H.C. (2019). Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11620. Springer, Cham. https://doi.org/10.1007/978-3-030-24296-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24296-1_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24295-4

  • Online ISBN: 978-3-030-24296-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics