Abstract
. In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Manuscript received: October 1997/final version received: March 1998
Rights and permissions
About this article
Cite this article
Guo, X. Nonstationary denumerable state Markov decision processes – with average variance criterion. Mathematical Methods of OR 49, 87–96 (1999). https://doi.org/10.1007/PL00020908
Issue Date:
DOI: https://doi.org/10.1007/PL00020908