Abstract
We present an Linear Programming formulation of MDPs with countable state and action spaces and no unichain assumption. This is an extension of the Hordijk and Kallenberg (1979) formulation in finite state and action spaces. We provide sufficient conditions for both existence of optimal solutions to the primal LP program and absence of duality gap. Then, existence of a (possibly randomized) average optimal policy is also guaranteed. Existence of a stationary average optimal deterministic policy is also investigated.
Similar content being viewed by others
References
Altman E, Shwartz A (1991) Markov decision problems and state-action frequencies. SIAM J Contr Opt 39:786–809
Anderson EJ, Nash P (1987) Linear programming in infinite dimensional spaces. Wiley, Chichester
Borkar V (1988) A convex analytic approach to Markov decision processes. Prob Th Rel Fields 78:583–602
DeGhellinck GT (1960) Les problèmes de décisions séquentielles. Cah Cent Etud Rech Oper 2:161–179
Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J Appl Math 16:468–487
Denardo EV (1970) On linear programming in a Markov decision problem. Manag Sci 16:281–288
D'Epenoux F (1960) Sur un probleme de production et de stockage dans l'aleatoire. Rev Fr Rech Oper 14:3–16
Derman C (1970) Finite state Markovian decision processes. Academic Press, New-York
Heilmann WR (1978) Solving stochastic dynamic programming by linear programming — An annoted bibliograhy. Zeit Oper Res 22:43–53
Hernandez-Lerma O, Lasserre JB (1993) Linear programming and average optimality of Markov control processes on borel spaces — unbounded costs. SIAM J Contr Opt to appear
Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Manag Sc 25:352–362
Kallenberg LCM (1983) Linear programming and finite Markovian control problems. Mathematical Centre Tracts 148, Mathematical Centre, Amsterdam
Kurano M (1989) The existence of a minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin. SIAM J Contr Opt 27:296–307
Lasserre JB (1993) Average optimal policies and linear programming in countable state Markov decision processes. J Math Anal Appl to appear
Manne AS (1960) Linear programming and sequential decisions. Manag Sci 6:259–267
Spieksma F (1990) Geometrically ergodic Markov chains and the optimal control of queues. PhD Thesis. University of Leiden
Yamada K (1975) Duality theorem in Markovian decision problems. J Math Anal Appl 50:579–595
Yosida K (1978) Functional analysis. 5th Ed., Springer-Verlag, Berlin
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hordijk, A., Lasserre, J.B. Linear programming formulation of MDPs in countable state space: The multichain case. ZOR - Methods and Models of Operations Research 40, 91–108 (1994). https://doi.org/10.1007/BF01414031
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01414031