Abstract
As an extension of the discrete-time case, this note investigates the variance of the total cumulative reward for the embedded Markov chain of semi-Markov processes. Under the assumption that the chain is aperiodic and contains a single class of recurrent states recursive formulae for the variance are obtained which show that the variance growth rate is asymptotically linear in time. Expressions are provided to compute this growth rate.
Similar content being viewed by others
References
Benito F (1982) Calculating the variance in Markov processes with random reward. Trabajos Estadistica Investigacion Operativa 33:73–85
Filar J, Kallenberg LCM, Lee H-M (1989) Variance penalized Markov decision processes. Math Oper Res 14:147–161
Huang Y, Kallenberg LCM (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance-tradeoffs. Math Oper Res 19:434–448
Jaquette SC (1972) Markov decision processes with a new optimality criterion: small interest rates. Ann Math Stat 43:1894–1901
Jaquette SC (1973) Markov decision processes with a new optimality criterion: discrete time. Ann Statist 1:496–505
Jaquette SC (1975) Markov decision processes with a new optimality criterion: continuous time. Ann Stat 3:547–553
Kadota Y (1997) A minimum average-variance in Markov decision processes. Bull Inform Cybern 29:83–89
Kawai H (1987) A variance minimization problem for a Markov decision process. Eur J Oper Res 31:140–145
Kurano M (1987) Markov decision processes with a minimum-variance criterion. J Math Anal Appl 123:572–583
Mandl P (1971) On the variance in controlled Markov chains. Kybernetika 7:1–12
Puterman ML (1994) Markov decision processes–discrete stochastic dynamic programming. Wiley, New York
Ross SM (1970) Applied probability models with optimization applications. Holden–Day, San Francisco
Sladký K, Sitař M (2004) Optimal solutions for undiscounted variance penalized Markov decision chains. In: Marti K, Ermoliev Y, Pflug G. (ed). Dynamic stochastic optimization. Springer, Berlin Heidelberg New York, pp. 43–66
Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:794–802
Sobel MJ (1985) Maximal mean/standard deviation ratio in an undiscounted MDP. Oper Res Lett 4:157–159
White DJ (1988) Mean variance and probability criteria in finite Markov decision processes: a review. J Optim Theory Appl 56:1–29
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sladký, K. On mean reward variance in semi-Markov processes. Math Meth Oper Res 62, 387–397 (2005). https://doi.org/10.1007/s00186-005-0039-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-005-0039-z