Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation

Kuang, Nikki Lijing; Leung, Clement H. C.

doi:10.1007/978-3-030-24296-1_26

Nikki Lijing Kuang²⁴ &
Clement H. C. Leung²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11620))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1763 Accesses
2 Citations

Abstract

Rewards and punishments in different forms are pervasive and present in a wide variety of decision-making scenarios. By observing the outcome of a sufficient number of repeated trials, one would gradually learn the value and usefulness of a particular policy or strategy. However, in a given environment, the outcomes resulting from different trials are subject to chance influence and variations. In learning about the usefulness of a given policy, significant costs are involved in systematically undertaking the sequential trials; therefore, in most learning episodes, one would wish to keep the cost within bounds by adopting learning stopping rules. In this paper, we examine the deployment of different stopping strategies in given learning environments which vary from highly stringent for mission critical operations to highly tolerant for non-mission critical operations, and emphasis is placed on the former with particular application to aviation safety. In policy evaluation, two sequential phases of learning are identified, and we describe the outcomes variations using a probabilistic model, with closed-form expressions obtained for the key measures of performance. Decision rules that map the trial observations to policy choices are also formulated. In addition, simulation experiments are performed, which corroborate the validity of the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ziebart, B.D., Maas, A.L., A. Bagnell, J., Dey, A.K.: Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI 08), vol. 8, pp. 1433–1438 (2008)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: International Conference on Machine Learning (1998)
Google Scholar
Santana, H., Ramalho, G., Corruble, V., Ratitch, B.: Multi-agent patrolling with reinforcement learning. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, vol. 3, pp. 1122–1129. IEEE Computer Society (2004)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)
MathSciNet MATH Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Auton. Agents Multi-Agent Syst. 11(3), 387–434 (2005)
Article Google Scholar
Ipek, E., Mutlu, O., Martínez, J.F., Caruana, R.: Self-optimizing memory controllers: a reinforcement learning approach. In: ACM SIGARCH Computer Architecture News, vol. 36, no. 3. IEEE Computer Society (2008)
Google Scholar
Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 38(22), 156–172 (2008)
Article Google Scholar
Albrecht, S.V., Stone, P.: Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif. Intell. 258, 66–95 (2018)
Article MathSciNet Google Scholar
Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4), e0172395 (2017)
Article Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
Google Scholar
Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
Wei, Q., Lewis, F.L., Sun, Q., Yan, P., Song, R.: Discrete-time deterministic Q-learning: a novel convergence analysis. IEEE Trans. Cybern. 47(5), 1224–1237 (2017)
Article Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Van Hasselt, H., Wiering, M.A.: Using continuous action spaces to solve discrete problems. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 09), pp. 1149–1156. IEEE (2009)
Google Scholar
Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)
Article Google Scholar
Feller, W.: An Introduction to Probability Theory and its Applications, vol. 1, 3rd edn. Wiley, Hoboken (2008)
MATH Google Scholar
Rodrigues, C., Cusick, S: Commercial Aviation Safety, 5th edn. (2012)
Google Scholar
Deng, J., Leung, C.H.C.: Dynamic time warping for music retrieval using time series modeling of musical emotions. IEEE Trans. Affect. Comput. 6(2), 137–151 (2015)
Article Google Scholar
Zhang, H.L., Leung, C.H.C., Raikundalia, G.K.: Topological analysis of AOCD-based agent networks and experimental results. J. Comput. Syst. Sci. 74, 255–278 (2008)
Article MathSciNet Google Scholar
Azzam, I., Leung, C.H.C., Horwood, J.: Implicit concept-based image indexing and retrieval. In: Proceedings of the IEEE International Conference on Multi-media Modeling, Brisbane, Australia, pp. 354–359 (2004)
Google Scholar
Zhang, H.L., Leung, C.H.C., Raikundalia, G.K.: Classification of intelligent agent network topologies and a new topological description language for agent networks. In: Shi, Z., Shimohara, K., Feng, D. (eds.) IIP 2006. IIFIP, vol. 228, pp. 21–31. Springer, Boston, MA (2006). https://doi.org/10.1007/978-0-387-44641-7_3
Chapter Google Scholar
Kuang, N.L.J., Leung, C.H.C., Sung, V.: Stochastic reinforcement learning. In: Proceedings of the IEEE International Conference on Artificial Intelligence and Knowledge Engineering, California, USA, pp. 244–248 (2018)
Google Scholar
Kuang, N.L.J., Leung, C.H.C.: Performance dynamics and termination errors in reinforcement learning – a unifying perspective. In: Proceedings of the IEEE International Conference on Artificial Intelligence and Knowledge Engineering, California, USA, pp. 129–133 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
Nikki Lijing Kuang
School of Science and Engineering, Chinese University of Hong Kong, Shenzhen, China
Clement H. C. Leung

Authors

Nikki Lijing Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Clement H. C. Leung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Clement H. C. Leung .

Editor information

Editors and Affiliations

Covenant University, Ota, Nigeria
Sanjay Misra
University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
Saint Petersburg State University, Saint Petersburg, Russia
Vladimir Korkhov
Polytechnic University of Bari, Bari, Italy
Carmelo Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, VIC, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuang, N.L., Leung, C.H.C. (2019). Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11620. Springer, Cham. https://doi.org/10.1007/978-3-030-24296-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-24296-1_26
Published: 29 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24295-4
Online ISBN: 978-3-030-24296-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics