Markov Decision Processes with Multiple Long-Run Average Objectives

  • Krishnendu Chatterjee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4855)


We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto optimal point can be. In contrast to the single-objective case, the memoryless strategy may require randomization. We show that the Pareto curve can be approximated (a) in polynomial time in the size of the MDP for irreducible MDPs; and (b) in polynomial space in the size of the MDP for all MDPs. Additionally, we study the problem if a given value vector is realizable by any strategy, and show that it can be decided in polynomial time for irreducible MDPs and in NP for all MDPs. These results provide algorithms for design exploration in MDP models with multiple long-run average objectives.


Feasible Solution Transient State Markov Decision Process Reward Function Recurrent State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chatterjee, K.: Markov decision processes with multiple long-run average objectives. Technical Report, UC Berkeley, UCB/EECS-2007-105 (2007)Google Scholar
  2. 2.
    Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 325–336. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Etessami, K., Kwiatkowska, M., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, Springer, Heidelberg (2007)Google Scholar
  4. 4.
    Etzioni, O., Hanks, S., Jiang, T., Karp, R.M., Madari, O., Waarts, O.: Efficient information gathering on the internet. In: FOCS 1996, pp. 234–243. IEEE Computer Society Press, Los Alamitos (1996)Google Scholar
  5. 5.
    Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Heidelberg (1997)zbMATHGoogle Scholar
  6. 6.
    Garey, M.R., Johnson, D.S.: Computers and Intractability. W.H. Freeman, New York (1979)zbMATHGoogle Scholar
  7. 7.
    Hartley, R.: Finite discounted, vector Markov decision processes. Technical report, Department of Decision Theory, Manchester University (1979)Google Scholar
  8. 8.
    Koski, J.: Multicriteria truss optimization. In: Multicriteria Optimization in Engineering and in the Sciences (1988)Google Scholar
  9. 9.
    Owen, G.: Game Theory. Academic Press, London (1995)Google Scholar
  10. 10.
    Papadimitriou, C.H., Yannakakis, M.: On the approximability of trade-offs and optimal access of web sources. In: FOCS 2000, pp. 86–92. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  11. 11.
    Puterman, M.L.: Markov Decision Processes. John Wiley and Sons, Chichester (1994)zbMATHCrossRefGoogle Scholar
  12. 12.
    Szymanek, R., Catthoor, F., Kuchcinski, K.: Time-energy design space exploration for multi-layer memory architectures. In: DATE 04, IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  13. 13.
    White, D.J.: Multi-objective infinite-horizon discounted Markov decision processes. Journal of Mathematical Analysis and Applications 89(2), 639–647 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Yang, P., Catthoor, F.: Pareto-optimization based run time task scheduling for embedded systems. In: CODES-ISSS 2003, pp. 120–125. ACM Press, New York (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Krishnendu Chatterjee
    • 1
  1. 1.UC BerkeleyUSA

Personalised recommendations