Approachability in Stackelberg Stochastic Games with Vector Costs

Kalathil, Dileep; Borkar, Vivek S.; Jain, Rahul

doi:10.1007/s13235-016-0198-y

Approachability in Stackelberg Stochastic Games with Vector Costs

Published: 04 July 2016

Volume 7, pages 422–442, (2017)
Cite this article

Dynamic Games and Applications Aims and scope Submit manuscript

Dileep Kalathil¹,
Vivek S. Borkar² &
Rahul Jain³

412 Accesses
4 Citations
Explore all metrics

Abstract

The notion of approachability was introduced by Blackwell (Pac J Math 6(1):1–8, 1956) in the context of vector-valued repeated games. The famous ‘Blackwell’s approachability theorem’ prescribes a strategy for approachability, i.e., for ‘steering’ the average vector cost of a given agent toward a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/decision-making problems in dynamically changing environments, we address the approachability problem in Stackelberg stochastic games with vector-valued cost functions. We make two main contributions. Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell’s. Secondly, we give a reinforcement learning algorithm for learning the approachable strategy when the transition kernel is unknown. We also recover as a by-product Blackwell’s necessary and sufficient conditions for approachability for convex sets in this setup and thus a complete characterization. We give sufficient conditions for non-convex sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Certainty Equivalence

Zero-Sum Stochastic Games

Notes

The estimates of ibid. are for an o.d.e. limit, but a similar argument works for differential inclusion limits.
This x-dependence is the only additional feature (or point of difference) here as compared to [1].

References

Abounadi J, Bertsekas D, Borkar VS (2001) Learning algorithms for Markov decision processes with average cost. SIAM J Control Optim 40(3):681–698
Article MathSciNet MATH Google Scholar
Altman E (1999) Constrained Markov decision processes, vol 7. CRC Press, Boca Raton
MATH Google Scholar
Aubin JP, Cellina A (1984) Differential inclusions: set-valued maps and viability theory. Springer, New York
Book MATH Google Scholar
Bardi M, Capuzzo-Dolcetta I (2008) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Springer, Berlin
MATH Google Scholar
Barwell AD (2011) Omega-limit sets of discrete dynamical systems. Ph.D. dissertation, University of Birmingham
Benaim M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J Control Optim 44(1):328–348
Article MathSciNet MATH Google Scholar
Benaim M, Hofbauer J, Sorin S (2006) Stochastic approximations and differential inclusions, part ii: applications. Math Oper Res 31(4):673–695
Article MathSciNet MATH Google Scholar
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6(1):1–8
Article MathSciNet MATH Google Scholar
Borkar VS (1991) Topics in controlled Markov chains. Longman Scientific & Technical, Harlow
MATH Google Scholar
Borkar VS (1998) Asynchronous stochastic approximation. SIAM J Control and Optim 36(3):840–851
Article MathSciNet MATH Google Scholar
Borkar VS (2005) An actor-critic algorithm for constrained Markov decision processes. Syst Control Lett 54(3):207–213
Article MathSciNet MATH Google Scholar
Borkar VS (2008) Stochastic approximation: a dynamical systems viewpoint. Hindustan Publ Agency, New Delhi, Cambridge University Press, Cambridge
MATH Google Scholar
Borkar VS, Meyn SP (2000) The ode method for convergence of stochastic approximation and reinforcement learning. SIAM J Control and Optim 38(2):447–469
Article MathSciNet MATH Google Scholar
Even-Dar E, Kakade S, Mansour Y (2009) Online Markov decision processes. Math Oper Res 34(3):726–736
Article MathSciNet MATH Google Scholar
Filar J, Vrieze K (1996) Competitive Markov decision processes. Springer, New York
Book MATH Google Scholar
Kamal S (2010) A vector minmax problem for controlled Markov chains. Arxiv preprint, arXiv:1011.0675v1
Mannor S, Shimkin N (2003) The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Math Oper Res 28(2):327–345
Article MathSciNet MATH Google Scholar
Mannor S, Shimkin N (2004) A geometric approach to multi-criterion reinforcement learning. J Mach Learn Res 5:325–360
MathSciNet MATH Google Scholar
Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom Behav 56(1):135–147
Article MathSciNet MATH Google Scholar
Patek SD (1997) Stochastic shortest path games: theory and algorithms. Ph.D. dissertation, Lab. for Information and Decision Systems, MIT
Perchet V (2014) Approachability, regret and calibration; implications and equivalences. J Dyn Games 1(2):181–254
Article MathSciNet MATH Google Scholar
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
MATH Google Scholar
Shimkin N, Shwartz A (1993) Guaranteed performance regions in Markovian systems with competing decision makers. IEEE Trans Autom Control 38(1):84–95
Article MathSciNet MATH Google Scholar
Steuer RE (1989) Multiple criteria optimization: theory, computation, and application. Wiley, New York
MATH Google Scholar
Tucker W (1999) The Lorenz attractor exists. Comptes Rendus Acad Sci Ser I Math 328(12):1197–1202
MathSciNet MATH Google Scholar
Wagner DH (1977) Survey of measurable selection theorems. SIAM J Control and Optim 15(5):859–903
Article MathSciNet MATH Google Scholar
Yu JY, Mannor S, Shimkin N (2009) Markov decision processes with arbitrary reward processes. Math Oper Res 34(3):737–757
Article MathSciNet MATH Google Scholar
Yu H, Bertsekas DP (2013) On boundedness of Q-learning iterates for stochastic shortest path problems. Math Oper Res 38(2):209–227
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors are grateful to the anonymous reviewers for an outstanding job of refereeing, which has greatly improved the quality and readability of our paper.

Author information

Authors and Affiliations

EECS at UC Berkeley, Berkeley, CA, USA
Dileep Kalathil
EE department, IIT Bombay, Mumbai, India
Vivek S. Borkar
Department of EE, CS and ISE, University of Southern California, Los Angeles, CA, USA
Rahul Jain

Authors

Dileep Kalathil
View author publications
You can also search for this author in PubMed Google Scholar
Vivek S. Borkar
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dileep Kalathil.

Additional information

The work of VSB was supported in part by a J. C. Bose Fellowship and a grant for Distributed Computation for Optimization over Large Networks and High Dimensional Data Analysis from the Department of Science and Technology, Government of India. RJ and DKs research is supported by the Office of Naval Research (ONR) Young Investigator Award N000141210766 and the National Science Foundation (NSF) CAREER Award 0954116.

Appendix

Consider a controlled Markov chain $\{s_n\}$ with a finite state space S, compact metric action space U with metric d and running cost k(i, u), with transition probabilities p(j | i, u). Assume k, p to be continuous in u. Also assume that Assumption 1 is true. The dynamic programming equation then is

$$\begin{aligned} V(i) = \min _u\left( k(i,u) - \kappa + \sum _jp(j | i,u)V(j)\right) , \quad \ i \in S. \end{aligned}$$

Then, $\kappa $ is the optimal cost. Let $U^*(i)$ denote the set of minimizers on the right-hand side, which will perforce be compact and non-empty by standard arguments. Suppose

$$\begin{aligned} d(a_n, U^*(s_n)) \rightarrow 0. \end{aligned}$$

Then, we have

$$\begin{aligned} V(s_n) - k(s_n, a_n) - \kappa + E[V(s_{n+1})|s_n, a_n]\rightarrow & {} 0 \\ \Longrightarrow \ \lim _{n\uparrow \infty }\frac{1}{n}\sum _{m = 0}^{n-1}E[k(s_m, a_m)] - \kappa= & {} \lim _{n\uparrow \infty }\frac{1}{n}(E[V(s_{n})] - E[V(s_0)]) = 0. \end{aligned}$$

Hence, $\{a_n\}$ is optimal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalathil, D., Borkar, V.S. & Jain, R. Approachability in Stackelberg Stochastic Games with Vector Costs. Dyn Games Appl 7, 422–442 (2017). https://doi.org/10.1007/s13235-016-0198-y

Download citation

Published: 04 July 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s13235-016-0198-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approachability in Stackelberg Stochastic Games with Vector Costs

Abstract

Access this article

Similar content being viewed by others

Certainty Equivalence

Zero-Sum Stochastic Games

Zero-Sum Stochastic Games

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approachability in Stackelberg Stochastic Games with Vector Costs

Abstract

Access this article

Similar content being viewed by others

Certainty Equivalence

Zero-Sum Stochastic Games

Zero-Sum Stochastic Games

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation