Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory

Kallenberg, L. C. M.

doi:10.1007/BF01414028

Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory

Articles
Published: March 1994

Volume 40, pages 1–42, (1994)
Cite this article

Zeitschrift für Operations Research Aims and scope Submit manuscript

L. C. M. Kallenberg¹

320 Accesses
20 Citations
Explore all metrics

Abstract

This paper gives an overview of linear programming methods for solving standard and nonstandard Markovian control problems. Standard problems are problems with the usual criteria such as expected total (discounted) rewards and average expected rewards; we also discuss a particular class of stochastic games. In nonstandard problems there are additional considerations as side constraints, multiple criteria or mean-variance tradeoffs. In a second companion paper efficient linear programing algorithms are discussed for some applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

References

Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications 112:236–252
Google Scholar
Beutler FJ, Ross KW (1986) Time-average optimal constrained semi-Markov decision processes. Advances in Applied Probability 18:341–359
Google Scholar
Bewley T, Kohlberg E (1978) On stochastic games with stationary optimal stategies. Mathematics of Operations Research 3:104–125
Google Scholar
Blackwell D (1962) Discrete dynamic programming. Annals of Mathematical Statistics 33:719–726
Google Scholar
Dantzig GB (1963) Linear programming and extensions. Princeton University Press, Princeton
Google Scholar
De Ghellinck GT (1960) Les problèmes de décisions séquentielles. Cahiers du Centre de Recherche Opérationelle 2:161–179
Google Scholar
De Ghellinck GT, Eppen GD (1967) Linear programming solutions for separable Markovian decision problems. Management Science 13:371–394
Google Scholar
Denardo EV (1967) Contraction mapping in the theory underlying dynamic programming. SIAM Review 9:165–177
Google Scholar
Denardo EV (1970a) On linear programming in a Markov decision problem. Management Science 16:281–288
Google Scholar
Denardo EV (1970b) Computing a bias-optimal policy in a discrete-time Markov decision problem. Operations Research 18:279–289
Google Scholar
Denardo EV (1973) A Markov decision problem. In: Hu TC, Robinson SM (eds) Mathematical Programming, Academic Press 33–68
Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM Journal on Applied Mathematics 16:468–487
Google Scholar
Denardo EV, Miller BL (1968) An optimality condition for discrete dynamic programming with no discounting. Annals of Mathematical Statistics 39:1220–1227
Google Scholar
Denardo EV, Rothblum UG (1979) Optimal stopping, exponential utility, and linear programming. Mathematical Programming 16:228–244
Google Scholar
D'Epenoux F (1960) Sur un problème de production et de stockage dans l'aléatoire. Revue Française de Recherche Operationelle 14:3–16
Google Scholar
D'Epenoux F (1963) A probabilistic production and inventory problem. Management Science 10:98–108
Google Scholar
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Google Scholar
Derman C, Klein M (1965) Some remarks on finite horizon Markovian decision models. Operations Research 13:272–278
Google Scholar
Derman C, Strauch R (1966) A note on memoryless rules for controlling sequential control problems. Annals of Mathematical Statistics 37:276–278
Google Scholar
Derman C, Veinott AF Jr (1972) Constrained Markov decision chains. Management Science 19:389–390
Google Scholar
Dirickx YMI, Rao MR (1979) Linear programming methods for computing gain-optimal policies in Markov decision models. Cahiers du Centre Etudes Recherche Opérationelle 21:133–142
Google Scholar
Durinovic S, Filar JA, Katehakis MN, Lee HM (1986) Multi-objective Markov decision process with average reward criterion. Large Scale System 10:215–226
Google Scholar
Federgruen A, Schweitzer PJ (1980) A survey of asymptotic value-iteration for undiscounted Markovian decision processes 73–110. In: Hartley R, Thomas LC, White DJ (eds) Recent developments in Markov decision processes, Academic Press
Filar JA (1980) Algorithms for solving some undiscounted stochastic games. PhD Thesis, University of Illinois at Chicago
Filar JA, Kallenberg LCM, Lee HM (1989) Variance-penalized Markov decision Processes. Mathematics of Operations Research 14:147–161.
Google Scholar
Filar JA, Lee HM (1985) Gain/Variability tradeoffs in undiscounted Markov decision Processes. Proceedings of 24th Conference on Decision and Control IEEE 1106–1112
Filar JA, Schultz TA (1988) Communicating MDP's: equivalence and LP properties. OR Letters 7:303–307
Google Scholar
Gillette D (1957) Stochastic games with zero stop probabilities. In: Dresher M, Tucker AW, Wolfe P (eds) Contributions to the theory of games, Volume 3, Annals of Mathematical Studies 39, Princeton University Press 179–189
Heilmann W-R (1977) Lineare Programmierung stochastischer dynamischer Entscheidungsmodelle. Thesis, Universität Hamburg
Heilmann W-R (1978) Solving stochastic dynamic programming by linear programming — An annotated bibliography. Zeitschrift für Operations Research 22:43–53
Google Scholar
Hordijk A (1978) From linear to dynamic programming via shortest paths. In: Baayen, PC et al. (eds) Proceedings of the Bicentennial Congress of the Wiskundig Genootschap, Mathematical Centre, Amsterdam
Google Scholar
Hordijk A, Dekker R, Kallenberg LCM (1985) Sensitivity analysis in discounted Markov decision problems. OR Spektrum 7:143–151
Google Scholar
Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Management Science 25:352–362
Google Scholar
Hordijk A, Kallenberg LCM (1980) Linear programming methods for solving finite Markovian decision problems. In: Fandel G et al. (eds) Operations Research Proceedings 1980 468–482
Hordijk A, Kallenberg LCM (1981a) Linear programming and Markov games I. In: Moeschlin O, Pallaschke D (eds) Game Theory and Mathematical Economics. North Holland, Amsterdam 291–305
Google Scholar
Hordijk A, Kallenberg LCM (1981b) Linear programming and Markov games II. In: Moeschlin O, Pallaschke D (eds) Game Theory and Mathematical Economics. North Holland, Amsterdam 307–320
Google Scholar
Hordijk A, Kallenberg LCM (1984a) Transient policies in discrete dynamic programming: linear programming including suboptimality tests and additional constraints. Mathematical Programming 30:46–70
Google Scholar
Hordijk A, Kallenberg LCM (1984b) Constrained undiscounted dynamic programming. Mathematics of Operations Research 9:276–289
Google Scholar
Howard RA (1960) Dynamic Programming and Markov Processes. Wiley, New York
Google Scholar
Huang Y, Kallenberg LCM (1994) On finding optimal policies for Markov decision chains: A unifying framework for mean-variance tradeoffs. Mathematics of Operations Research 19
Kallenberg LCM (1981a) Unconstrained and constrained dynamic programming over a finite horizon. Report, University of Leiden, The Netherlands
Google Scholar
Kallenberg LCM (1981b) Linear programming to compute a bias-optimal policy. In: B. Fleischmenn et al. (eds) Operations Research Proceedings 433–440
Kallenberg LCM (1983) Linear programming and finite Markovian control problems. Mathematical Centre Tracts # 148, Amsterdam
Google Scholar
Kallenberg LCM (1994) Efficient algorithms to determine the classification of a Markov decision problem. Report, University of Leiden, The Netherlands
Google Scholar
Kawai H (1987) A variance minimization problem for a Markov decision process. European Journal of operational Research 31:140–145
Google Scholar
Manne AS (1960) Linear programming and sequential decisions. Management Science 6:259–267
Google Scholar
Mertens JF, Neyman A (1981) Stochastic games. International Journal of Game Theory 10:53–56
Google Scholar
Miller BL, Veinott AF Jr (1969) Discrete dynamic programming with a small interest rate. Annals of Mathematical Statistics 40:366–370
Google Scholar
Mine H, Osaki S (1970) Markovian decision processes. Elsevier, New York
Google Scholar
Nazareth JL, Kulkarni RB (1986) Linear programming formulations of Markov decision processes. OR Letters 5:13–16
Google Scholar
Osaki S, Mine H (1969) Linear programming considerations on Markovian decision processes with no discounting. Journal of Mathematical Analysis and Applications 26:221–232
Google Scholar
Parthasarathy T, Raghavan TES (1981) An ordered field property for stochastic games when one player controls transition probabilities. Journal of Optimization Theory and Applications 33:375–392
Google Scholar
Porteus EL (1980) Overview of iterative methods for discounted finite Markov and semi-Markov decision chains. In: Hartley R, Thomas LC, White DJ (eds) Recent developments in Markov decision processes, Academic Press 1–20
Puterman ML (1988) Markov decision processes: a survey. In: Heyman DP, Sobel MJ (eds) Handbook of Operations Research and Management Science, Volume 2: Stochastic Models. North Holland, Amsterdam 331–434
Google Scholar
Puterman ML, Brumelle SL (1979) On the convergence of policy iteration in stationary dynamic programming. Mathematics of Operations Research 4:60–69
Google Scholar
Raghavan TES, Filar JA (1991) Algorithms for stochastic games — A survey. Zeitschrift für Operations Research 35:437–472
Google Scholar
Ross SM (1983) Introduction to stochastic programming. Academic Press
Rothblum UG (1979) Solving stopping stochastic games by maximizing a linear function subject to quadratic constraints. In: Moeschlin O, Pallaschke D (eds) Game Theory and Mathematical Economics. North Holland, Amsterdam 103–105
Google Scholar
Shapley LS (1953) Stochastic games. Proceedings National Academy of Sciences USA 39:1095–1100
Google Scholar
Sobel MJ (1985) Maximal mean/standard deviation ratio in an undiscounted MDP. OR Letters 4:157–159
Google Scholar
Stein J (1988) On efficiency of linear programming applied to discounted Markovian decision problems. OR Spektrum 10:153–160
Google Scholar
Sutherland WRS (1980) Optimality in transient Markov chains and linear programming. Mathematical Programming 18:1–6
Google Scholar
Van Nunen, JAEE (1976) Contracting Markov decision processes. Mathematical Centre Tract #71, Amsterdam
Google Scholar
Veinott AF Jr (1966) On finding optimal policies in discrete dynamic programming with no discounting. Annals of Mathematical Statistics 37:1284–1294
Google Scholar
Veinott AF Jr (1969) Discrete dynamic programming with sensitive discount optimality criteria. Annals of Mathematical Statistics 40:1635–1660
Google Scholar
Veinott AF Jr (1974) Markov decision chains. In: GB Dantzig, BC Eaves (eds) Studies in Mathematics, Volume 10: Studies in Optimization. The Mathematical Association of America 124–159
Vrieze OJ (1981) Linear programming and undiscounted stochastic games in which one player controls the transitions. OR Spektrum 3:29–35
Google Scholar
Wessels J, Van Nunen JAEE (1975) Discounted semi-Markov decision processes: linear programming and policy iteration. Statistica Neerlandica 29:1–7
Google Scholar
White, DJ (1988) Mean, variance and probabilistic criteria in finite Markov decision processes: a review. Journal of Optimization Theory and Applications 56:1–30
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Leiden University, P.O. Box 9512, 2300, RA Leiden, The Netherlands
L. C. M. Kallenberg

Authors

L. C. M. Kallenberg
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kallenberg, L.C.M. Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory. ZOR - Methods and Models of Operations Research 40, 1–42 (1994). https://doi.org/10.1007/BF01414028

Download citation

Received: 15 November 1992
Revised: 15 January 1994
Issue Date: March 1994
DOI: https://doi.org/10.1007/BF01414028

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation