Skip to main content
Log in

Approachability in Stackelberg Stochastic Games with Vector Costs

  • Published:
Dynamic Games and Applications Aims and scope Submit manuscript

Abstract

The notion of approachability was introduced by Blackwell (Pac J Math 6(1):1–8, 1956) in the context of vector-valued repeated games. The famous ‘Blackwell’s approachability theorem’ prescribes a strategy for approachability, i.e., for ‘steering’ the average vector cost of a given agent toward a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/decision-making problems in dynamically changing environments, we address the approachability problem in Stackelberg stochastic games with vector-valued cost functions. We make two main contributions. Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell’s. Secondly, we give a reinforcement learning algorithm for learning the approachable strategy when the transition kernel is unknown. We also recover as a by-product Blackwell’s necessary and sufficient conditions for approachability for convex sets in this setup and thus a complete characterization. We give sufficient conditions for non-convex sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The estimates of ibid. are for an o.d.e. limit, but a similar argument works for differential inclusion limits.

  2. This x-dependence is the only additional feature (or point of difference) here as compared to [1].

References

  1. Abounadi J, Bertsekas D, Borkar VS (2001) Learning algorithms for Markov decision processes with average cost. SIAM J Control Optim 40(3):681–698

    Article  MathSciNet  MATH  Google Scholar 

  2. Altman E (1999) Constrained Markov decision processes, vol 7. CRC Press, Boca Raton

    MATH  Google Scholar 

  3. Aubin JP, Cellina A (1984) Differential inclusions: set-valued maps and viability theory. Springer, New York

    Book  MATH  Google Scholar 

  4. Bardi M, Capuzzo-Dolcetta I (2008) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Springer, Berlin

    MATH  Google Scholar 

  5. Barwell AD (2011) Omega-limit sets of discrete dynamical systems. Ph.D. dissertation, University of Birmingham

  6. Benaim M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J Control Optim 44(1):328–348

    Article  MathSciNet  MATH  Google Scholar 

  7. Benaim M, Hofbauer J, Sorin S (2006) Stochastic approximations and differential inclusions, part ii: applications. Math Oper Res 31(4):673–695

    Article  MathSciNet  MATH  Google Scholar 

  8. Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6(1):1–8

    Article  MathSciNet  MATH  Google Scholar 

  9. Borkar VS (1991) Topics in controlled Markov chains. Longman Scientific & Technical, Harlow

    MATH  Google Scholar 

  10. Borkar VS (1998) Asynchronous stochastic approximation. SIAM J Control and Optim 36(3):840–851

    Article  MathSciNet  MATH  Google Scholar 

  11. Borkar VS (2005) An actor-critic algorithm for constrained Markov decision processes. Syst Control Lett 54(3):207–213

    Article  MathSciNet  MATH  Google Scholar 

  12. Borkar VS (2008) Stochastic approximation: a dynamical systems viewpoint. Hindustan Publ Agency, New Delhi, Cambridge University Press, Cambridge

    MATH  Google Scholar 

  13. Borkar VS, Meyn SP (2000) The ode method for convergence of stochastic approximation and reinforcement learning. SIAM J Control and Optim 38(2):447–469

    Article  MathSciNet  MATH  Google Scholar 

  14. Even-Dar E, Kakade S, Mansour Y (2009) Online Markov decision processes. Math Oper Res 34(3):726–736

    Article  MathSciNet  MATH  Google Scholar 

  15. Filar J, Vrieze K (1996) Competitive Markov decision processes. Springer, New York

    Book  MATH  Google Scholar 

  16. Kamal S (2010) A vector minmax problem for controlled Markov chains. Arxiv preprint, arXiv:1011.0675v1

  17. Mannor S, Shimkin N (2003) The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Math Oper Res 28(2):327–345

    Article  MathSciNet  MATH  Google Scholar 

  18. Mannor S, Shimkin N (2004) A geometric approach to multi-criterion reinforcement learning. J Mach Learn Res 5:325–360

    MathSciNet  MATH  Google Scholar 

  19. Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom Behav 56(1):135–147

    Article  MathSciNet  MATH  Google Scholar 

  20. Patek SD (1997) Stochastic shortest path games: theory and algorithms. Ph.D. dissertation, Lab. for Information and Decision Systems, MIT

  21. Perchet V (2014) Approachability, regret and calibration; implications and equivalences. J Dyn Games 1(2):181–254

    Article  MathSciNet  MATH  Google Scholar 

  22. Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    MATH  Google Scholar 

  23. Shimkin N, Shwartz A (1993) Guaranteed performance regions in Markovian systems with competing decision makers. IEEE Trans Autom Control 38(1):84–95

    Article  MathSciNet  MATH  Google Scholar 

  24. Steuer RE (1989) Multiple criteria optimization: theory, computation, and application. Wiley, New York

    MATH  Google Scholar 

  25. Tucker W (1999) The Lorenz attractor exists. Comptes Rendus Acad Sci Ser I Math 328(12):1197–1202

    MathSciNet  MATH  Google Scholar 

  26. Wagner DH (1977) Survey of measurable selection theorems. SIAM J Control and Optim 15(5):859–903

    Article  MathSciNet  MATH  Google Scholar 

  27. Yu JY, Mannor S, Shimkin N (2009) Markov decision processes with arbitrary reward processes. Math Oper Res 34(3):737–757

    Article  MathSciNet  MATH  Google Scholar 

  28. Yu H, Bertsekas DP (2013) On boundedness of Q-learning iterates for stochastic shortest path problems. Math Oper Res 38(2):209–227

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the anonymous reviewers for an outstanding job of refereeing, which has greatly improved the quality and readability of our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dileep Kalathil.

Additional information

The work of VSB was supported in part by a J. C. Bose Fellowship and a grant for Distributed Computation for Optimization over Large Networks and High Dimensional Data Analysis from the Department of Science and Technology, Government of India. RJ and DKs research is supported by the Office of Naval Research (ONR) Young Investigator Award N000141210766 and the National Science Foundation (NSF) CAREER Award 0954116.

Appendix

Appendix

Consider a controlled Markov chain \(\{s_n\}\) with a finite state space S, compact metric action space U with metric d and running cost k(iu), with transition probabilities p(j | iu). Assume kp to be continuous in u. Also assume that Assumption 1 is true. The dynamic programming equation then is

$$\begin{aligned} V(i) = \min _u\left( k(i,u) - \kappa + \sum _jp(j | i,u)V(j)\right) , \quad \ i \in S. \end{aligned}$$

Then, \(\kappa \) is the optimal cost. Let \(U^*(i)\) denote the set of minimizers on the right-hand side, which will perforce be compact and non-empty by standard arguments. Suppose

$$\begin{aligned} d(a_n, U^*(s_n)) \rightarrow 0. \end{aligned}$$

Then, we have

$$\begin{aligned} V(s_n) - k(s_n, a_n) - \kappa + E[V(s_{n+1})|s_n, a_n]\rightarrow & {} 0 \\ \Longrightarrow \ \lim _{n\uparrow \infty }\frac{1}{n}\sum _{m = 0}^{n-1}E[k(s_m, a_m)] - \kappa= & {} \lim _{n\uparrow \infty }\frac{1}{n}(E[V(s_{n})] - E[V(s_0)]) = 0. \end{aligned}$$

Hence, \(\{a_n\}\) is optimal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalathil, D., Borkar, V.S. & Jain, R. Approachability in Stackelberg Stochastic Games with Vector Costs. Dyn Games Appl 7, 422–442 (2017). https://doi.org/10.1007/s13235-016-0198-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13235-016-0198-y

Keywords

Navigation