Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes

  • Thomas Furmston
  • David Barber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)


Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation.


Markov Decision Processes Planning Lagrange Duality 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning 1(1), 1–71 (2007)CrossRefzbMATHGoogle Scholar
  3. 3.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, 2nd edn. Athena Scientific, Belmont (2000)Google Scholar
  4. 4.
    Shachter, R.D.: Probabilistic Inference and Influence Diagrams. Operations Research 36, 589–604 (1988)CrossRefzbMATHGoogle Scholar
  5. 5.
    Williams, R.: Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)zbMATHGoogle Scholar
  6. 6.
    Toussaint, M., Storkey, A., Harmeling, S.: Bayesian Time Series Models. In: Expectation-Maximization Methods for Solving (PO)MDPs and Optimal Control Problems, Cambridge University, Cambridge (in press 2011), Google Scholar
  7. 7.
    Furmston, T., Barber, D.: Efficient Inference in Markov Control Problems. In: Uncertainty in Artificial Intelligence. North-Holland, Amsterdam (2011)Google Scholar
  8. 8.
    Furmston, T., Barber, D.: An analysis of the Expectation Maximisation algorithm for Markov Decision Processes. Research Report RN/11/13, Centre for Computational Statistics and Machine Learning, University College London (2011)Google Scholar
  9. 9.
    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
  10. 10.
    Sontag, D., Globerson, A., Jaakkola, T.: Introduction to Dual Decomposition for Inference. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimisation for Machine Learning, MIT Press, Cambridge (2011)Google Scholar
  11. 11.
    Furmston, T., Barber, D.: Variational Methods for Reinforcement Learning. AISTATS 9(13), 241–248 (2010)Google Scholar
  12. 12.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  13. 13.
    Komodakis, N., Paragios, N., Tziritas, G.: MRF Optimization via Dual Decomposition: Message-Passing Revisited. In: IEEE 11th International Conference on Computer Vision, ICCV, pp. 1–8 (2007)Google Scholar
  14. 14.
    Dearden, R., Friedman, N., Russell, S.: Bayesian Q learning. AAAI 15, 761–768 (1998)MathSciNetGoogle Scholar
  15. 15.
    Sutton, R.: Generalization in Reinforcment Learning: Successful Examples Using Sparse Coarse Coding. NIPS (8), 1038–1044 (1996)Google Scholar
  16. 16.
    Hoffman, M., Doucet, A., De Freitas, N., Jasra, A.: Bayesian Policy Learning with Trans-Dimensional MCMC. NIPS (20), 665–672 (2008)Google Scholar
  17. 17.
    Hoffman, M., de Freitas, N., Doucet, A., Peters, J.: An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Rewards. AISTATS 5(12), 232–239 (2009)Google Scholar
  18. 18.
    Salakhutdinov, R., Roweis, S., Ghahramani, Z.: Optimization with EM and Expectation-Conjugate-Gradient. ICML (20), 672–679 (2003)Google Scholar
  19. 19.
    Fraley, C.: On Computing the Largest Fraction of Missing Information for the EM Algorithm and the Worst Linear Function for Data Augmentation. Research Report EDI-INF-RR-0934, University OF Washington (1999)Google Scholar
  20. 20.
    Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2011)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Thomas Furmston
    • 1
  • David Barber
    • 1
  1. 1.Department of Computer ScienceUniversity College LondonLondonUK

Personalised recommendations