Skip to main content
Log in

A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

  • Regular Submission
  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availibility Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Araya-López V.M. Thomas, Buffet O.: Near-optimal brl using optimistic local transitions. In: ICML’12: Proceedings of the 29th International Coference on Machine Learning, Omnipres, Edinburgh, Scotland, pp 97–104 (2012)

  2. Asiain, E., Clempner, J.B., Poznyak, A.S.: Controller exploitation-exploration: A reinforcement learning architecture. Soft Computing 23(11), 3591–3604 (2019)

    Article  MATH  Google Scholar 

  3. Asmuth J., Li L., Littman M., Nouri A., Wingate D.: A bayesian sampling approach to exploration in reinforcement learning. In: UAI ’09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Montreal, Quebec, Canada, pp 19–26 (2009)

  4. Bellman R.: (1961) Adaptive Control Processes: A Guided Tour. Princeton University Press

  5. Besson, R., Le Pennec, E.: Allassonnière S,: Learning from both experts and data. Entropy 21(12), 1208 (2019). https://doi.org/10.3390/e21121208

    Article  Google Scholar 

  6. Castro P.S., Precup D.: Using linear programming for bayesian exploration in markov decision processes. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, Hyderabad, India, pp 2437–2442 (2007)

  7. Chalkiadakis G., Boutilier C.: Coordination in multiagent reinforcementlearning: A bayesian approach. In: Proceedings of the 2nd InternationalJoint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, Melbourne, Australia, pp 709–716 (2013)

  8. Choi J., Kim K.E.: Map inference for bayesian inverse reinforcement learning. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, p 1989–1997 (2011)

  9. Clempner, J.B.: A markovian stackelberg game approach for computing an optimal dynamic mechanism. Computational and Applied Mathematics 40(6), 1–25 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  10. Clempner, J.B.: A proximal/gradient approach for computing the nash equilibrium in controllable markov games. J Optim Theory Appl 188(3), 847–862 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  11. Clempner, J.B.: A dynamic mechanism design for controllable and ergodic markov games. Computational Economics To be published (2022). https://doi.org/10.1007/s10614-022-10240-y

    Article  Google Scholar 

  12. Clempner, J.B., Poznyak, A.S.: A tikhonov regularization parameter approach for solving lagrange constrained optimization problems. Engineering Optimization (2018). https://doi.org/10.1080/0305215X.2017.1418866, to be published

  13. Clempner, J.B., Poznyak, A.S.: A tikhonov regularized penalty function approach for solving polylinear programming problems. J. Comput. Appl. Math. 328, 267–286 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Clempner, J.B., Poznyak, A.S.: A nucleus for bayesian partially observable markov games: Joint observer and mechanism design. Engineering Applications of Artificial Intelligence 95, 103876 (2020)

    Article  Google Scholar 

  15. Clempner, J.B., Poznyak, A.S.: Analytical method for mechanism design in partially observable markov games. Mathematics 9(4), 1–15 (2021)

    Article  Google Scholar 

  16. Clempner, J.B., Poznyak, A.S.: Computing a mechanism for a bayesian and partially observable markov approach. To be published, Int. J. Appl. Math. Comp. Sci (2023)

    MATH  Google Scholar 

  17. Dearden R., Friedman N., Andre D.: Model based bayesian exploration. In: UAI’99: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., Stockholm, Sweden (1999)

  18. Feldbaum A.A.: Dual control theory, parts i and ii. Automation and Remote Control 21:874–880 and 1033–1039 (1961)

  19. Filatov, N., Unbehauen, H.: Survey of adaptive dual control methods. IEEE Control Theoryand Applications 147, 118–128 (2000)

    Article  Google Scholar 

  20. van Geen C., Gerraty R.T.: Hierarchical bayesian models of reinforcement learning: Introduction and comparison to alternative methods, bioRxiv 2020.10.19.345512, https://doi.org/10.1101/2020.10.19.345512 (2020)

  21. Ghavamzadeh, M., Engel, Y.: Bayesian actor-critic algorithms. In: International Conference on Machine Learning, pp. 297–304. Coravallis, Oregon, USA (2007)

    Chapter  Google Scholar 

  22. Ghavamzadeh, M., Engel, Y.: Bayesian policy gradient algorithms. Neural Information Processing Systems 19, 457–464 (2007)

    Google Scholar 

  23. Ghavamzadeh, M., Mannor, S., Pineau, J., Tamar, A.: Bayesian reinforcement learning: A survey. Foundations and Trends in Machine Learning 8(5–6), 359–492 (2015)

    Article  MATH  Google Scholar 

  24. Grover D., Basu D., Dimitrakakis C.: Bayesian reinforcement learning via deep, sparse sampling. In: Chiappa S, Calandra R (eds) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR, 108, 3036–3045 (2020)

  25. Harsanyi, J.C., Selten, R.: A general theory of equilibrium selection in games. MIT Press, Cambridge, Massachusetts (1988)

    MATH  Google Scholar 

  26. Kassab R., Simeone O.: Federated generalized bayesian learning via distributed stein variational gradient descent, arXiv 2020, arXiv:2009.06419 (2020)

  27. Klenske, E.D., Hennig, P.: Dual controlfor approximate bayesian reinforcement learning. Journal of Machine Learning Research 17, 1–30 (2016)

    MATH  Google Scholar 

  28. Kolter J., Ng A.Y.: Near-bayesian exploration in polynomial time. In: ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, Quebec, Canada, 513–520 (2009)

  29. Kottke, D., Herde, M., Cea, Sandrock: oward optimal probabilistic active learning using a bayesian approach. Mach Learn 110, 1199–1231 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nolan S., Smerzi A., Pezzè L.: A machine learning approach to bayesian parameter estimation, arXiv:2006.02369v2 (2020)

  31. Osband I., Roy B.V., Russo D.: (more) efficient reinforcement learning via posterior sampling. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Curran Associates Inc., Lake Tahoe, Nevada, 3003–3011 (2013)

  32. Poupart P., Vlassis N., Hoey J., Regan K.: An analytic solution to discrete bayesian reinforcement learning. In: ICML ’06: Proceedings of the 23rd international conference on Machine learning, Association for Computing Machinery, Pittsburgh, Pennsylvania, USA, 697–704 (2006)

  33. Senda, K., Hishinuma, T., Tani, Y.: Approximate bayesian reinforcement learning based on estimation of plant. Autonomous Robots 44, 845–857 (2020)

    Article  Google Scholar 

  34. Sutton, R.S., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge, MA, Introduction (1998)

    MATH  Google Scholar 

  35. Trejo, K.K., Clempner, J.B., Poznyak, A.S.: Computing the stackelberg/nash equilibria using the extraproximal method: Convergence analysis and implementation details for markov chains games. International Journal of Applied Mathematics and Computer Science 25(2), 337–351 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  36. Trejo, K.K., Clempner, J.B., Poznyak, A.S.: Computing the bargaining approach for equalizing the ratios of maximal gains in continuous-time markov chains games. Computational Economics 54, 933–955 (2019). https://doi.org/10.1007/s10614-018-9859-9

    Article  Google Scholar 

  37. Trejo K.K., Juarez R., Clempner J.B., Poznyak A.S.: Non-cooperative bargaining with unsophisticated agents. Computational Economics 1–38 (2020)

  38. Vasilyeva, M., Tyrylgin, A., Brown, D., Mondal, A.: Preconditioning markov chain monte carlo method for geomechanical subsidence using multiscale method and machine learning technique. Journal of Computational and Applied Mathematics 392, 113420 (2021)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julio B. Clempner.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Clempner, J.B. A Bayesian reinforcement learning approach in markov games for computing near-optimal policies. Ann Math Artif Intell 91, 675–690 (2023). https://doi.org/10.1007/s10472-023-09860-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-023-09860-3

Keywords

Mathematics Subject Classification (2010)

Navigation