Abstract
The notion of approachability was introduced by Blackwell (Pac J Math 6(1):1–8, 1956) in the context of vector-valued repeated games. The famous ‘Blackwell’s approachability theorem’ prescribes a strategy for approachability, i.e., for ‘steering’ the average vector cost of a given agent toward a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/decision-making problems in dynamically changing environments, we address the approachability problem in Stackelberg stochastic games with vector-valued cost functions. We make two main contributions. Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell’s. Secondly, we give a reinforcement learning algorithm for learning the approachable strategy when the transition kernel is unknown. We also recover as a by-product Blackwell’s necessary and sufficient conditions for approachability for convex sets in this setup and thus a complete characterization. We give sufficient conditions for non-convex sets.
Similar content being viewed by others
Notes
The estimates of ibid. are for an o.d.e. limit, but a similar argument works for differential inclusion limits.
This x-dependence is the only additional feature (or point of difference) here as compared to [1].
References
Abounadi J, Bertsekas D, Borkar VS (2001) Learning algorithms for Markov decision processes with average cost. SIAM J Control Optim 40(3):681–698
Altman E (1999) Constrained Markov decision processes, vol 7. CRC Press, Boca Raton
Aubin JP, Cellina A (1984) Differential inclusions: set-valued maps and viability theory. Springer, New York
Bardi M, Capuzzo-Dolcetta I (2008) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Springer, Berlin
Barwell AD (2011) Omega-limit sets of discrete dynamical systems. Ph.D. dissertation, University of Birmingham
Benaim M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J Control Optim 44(1):328–348
Benaim M, Hofbauer J, Sorin S (2006) Stochastic approximations and differential inclusions, part ii: applications. Math Oper Res 31(4):673–695
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6(1):1–8
Borkar VS (1991) Topics in controlled Markov chains. Longman Scientific & Technical, Harlow
Borkar VS (1998) Asynchronous stochastic approximation. SIAM J Control and Optim 36(3):840–851
Borkar VS (2005) An actor-critic algorithm for constrained Markov decision processes. Syst Control Lett 54(3):207–213
Borkar VS (2008) Stochastic approximation: a dynamical systems viewpoint. Hindustan Publ Agency, New Delhi, Cambridge University Press, Cambridge
Borkar VS, Meyn SP (2000) The ode method for convergence of stochastic approximation and reinforcement learning. SIAM J Control and Optim 38(2):447–469
Even-Dar E, Kakade S, Mansour Y (2009) Online Markov decision processes. Math Oper Res 34(3):726–736
Filar J, Vrieze K (1996) Competitive Markov decision processes. Springer, New York
Kamal S (2010) A vector minmax problem for controlled Markov chains. Arxiv preprint, arXiv:1011.0675v1
Mannor S, Shimkin N (2003) The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Math Oper Res 28(2):327–345
Mannor S, Shimkin N (2004) A geometric approach to multi-criterion reinforcement learning. J Mach Learn Res 5:325–360
Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom Behav 56(1):135–147
Patek SD (1997) Stochastic shortest path games: theory and algorithms. Ph.D. dissertation, Lab. for Information and Decision Systems, MIT
Perchet V (2014) Approachability, regret and calibration; implications and equivalences. J Dyn Games 1(2):181–254
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Shimkin N, Shwartz A (1993) Guaranteed performance regions in Markovian systems with competing decision makers. IEEE Trans Autom Control 38(1):84–95
Steuer RE (1989) Multiple criteria optimization: theory, computation, and application. Wiley, New York
Tucker W (1999) The Lorenz attractor exists. Comptes Rendus Acad Sci Ser I Math 328(12):1197–1202
Wagner DH (1977) Survey of measurable selection theorems. SIAM J Control and Optim 15(5):859–903
Yu JY, Mannor S, Shimkin N (2009) Markov decision processes with arbitrary reward processes. Math Oper Res 34(3):737–757
Yu H, Bertsekas DP (2013) On boundedness of Q-learning iterates for stochastic shortest path problems. Math Oper Res 38(2):209–227
Acknowledgments
The authors are grateful to the anonymous reviewers for an outstanding job of refereeing, which has greatly improved the quality and readability of our paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of VSB was supported in part by a J. C. Bose Fellowship and a grant for Distributed Computation for Optimization over Large Networks and High Dimensional Data Analysis from the Department of Science and Technology, Government of India. RJ and DKs research is supported by the Office of Naval Research (ONR) Young Investigator Award N000141210766 and the National Science Foundation (NSF) CAREER Award 0954116.
Appendix
Appendix
Consider a controlled Markov chain \(\{s_n\}\) with a finite state space S, compact metric action space U with metric d and running cost k(i, u), with transition probabilities p(j | i, u). Assume k, p to be continuous in u. Also assume that Assumption 1 is true. The dynamic programming equation then is
Then, \(\kappa \) is the optimal cost. Let \(U^*(i)\) denote the set of minimizers on the right-hand side, which will perforce be compact and non-empty by standard arguments. Suppose
Then, we have
Hence, \(\{a_n\}\) is optimal.
Rights and permissions
About this article
Cite this article
Kalathil, D., Borkar, V.S. & Jain, R. Approachability in Stackelberg Stochastic Games with Vector Costs. Dyn Games Appl 7, 422–442 (2017). https://doi.org/10.1007/s13235-016-0198-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-016-0198-y