Skip to main content

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)


Previous work on planning as active inference addresses finite horizon problems and solutions valid for online planning. We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) as probabilistic inference. Furthermore, we discuss online and offline methods for planning under uncertainty. In an SSP MDP, the horizon is indefinite and unknown a priori. SSP MDPs generalize finite and infinite horizon MDPs and are widely used in the artificial intelligence community. Additionally, we highlight some of the differences between solving an MDP using dynamic programming approaches widely used in the artificial intelligence community and approaches used in the active inference community. F

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

    Linear programming approaches are also popular methods for solving MDPs [2, 9, 12, 22]. Additionally, other methods exist in the reinforcement learning community such as policy gradient methods [14, 27, 28].

  2. 2.

    An expectation-maximization algorithm can be viewed as performing free-energy minimization [16, 21]. In the E-step, the free-energy is computed and the M-step updates the parameters to minimize the free-energy.

  3. 3.

    The distinction between a plan and a policy when using active inference has been briefly discussed in [20]. Additionally, other methods computing plans as probabilistic inference have been proposed before active inference in [1, 33].


  1. Attias, H.: Planning by probabilistic inference. In: AISTATS (2003)

    Google Scholar 

  2. Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)

    Article  MathSciNet  Google Scholar 

  3. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming: an overview. In: Proceedings of 1995 34th IEEE Conference on Decision and Control, vol. 1, pp. 560–564. IEEE (1995)

    Google Scholar 

  4. Campbell, M., Hoane, A.J., Hsu, F.: Deep blue. Artif. Intell. 134, 57–83 (2002)

    Article  Google Scholar 

  5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2009)

    Google Scholar 

  6. Crites, R.H., Barto, A.G., et al.: Improving elevator performance using reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1017–1023 (1996)

    Google Scholar 

  7. Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. arXiv preprint arXiv:2001.07203 (2020)

  8. Da Costa, L., Sajid, N., Parr, T., Friston, K., Smith, R.: The relationship between dynamic programming and active inference: The discrete, finite-horizon case. arXiv preprint arXiv:2009.08111 (2020)

  9. d’Epenoux, F.: A probabilistic production and inventory problem. Manage. Sci. 10(1), 98–108 (1963)

    Article  Google Scholar 

  10. Duckworth, P., Lacerda, B., Hawes, N.: Time-bounded mission planning in time-varying domains with semi-mdps and gaussian processes (2021)

    Google Scholar 

  11. Etessami, K., Kwiatkowska, M., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 50–65. Springer, Heidelberg (2007).

    Chapter  MATH  Google Scholar 

  12. Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53–113. Springer, Heidelberg (2011).

    Chapter  Google Scholar 

  13. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)

    Article  MathSciNet  Google Scholar 

  14. Grondman, I., Busoniu, L., Lopes, G.A., Babuska, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6), 1291–1307 (2012)

    Google Scholar 

  15. Kaplan, R., Friston, K.J.: Planning and navigation as active inference. Biol. Cybern. 112(4), 323–343 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  16. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)

    Google Scholar 

  17. Kolobov, A.: Planning with Markov Decision Processes: An AI Perspective, vol. 6. Morgan & Claypool Publishers, San Rafael (2012)

    Google Scholar 

  18. Kumar, A., Zilberstein, S., Toussaint, M.: Probabilistic inference techniques for scalable multiagent decision making. J. Artif. Intell. Res. 53, 223–270 (2015)

    Article  MathSciNet  Google Scholar 

  19. Lacerda, B., Faruq, F., Parker, D., Hawes, N.: Probabilistic planning with formal performance guarantees for mobile service robots. Int. J. Robot. Res. 38(9), 1098–1123 (2019)

    Article  Google Scholar 

  20. Millidge, B., Tschantz, A., Seth, A.K., Buckley, C.L.: On the relationship between active inference and control as inference. In: IWAI 2020. CCIS, vol. 1326, pp. 3–11. Springer, Cham (2020).

    Chapter  Google Scholar 

  21. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Massachusetts (2012)

    Google Scholar 

  22. Nazareth, J.L., Kulkarni, R.B.: Linear programming formulations of Markov decision processes. Oper. Res. Lett. 5(1), 13–16 (1986)

    Article  MathSciNet  Google Scholar 

  23. Painter, M., Lacerda, B., Hawes, N.: Convex hull Monte-Carlo tree-search. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 217–225 (2020)

    Google Scholar 

  24. Pezzato, C., Hernandez, C., Wisse, M.: Active inference and behavior trees for reactive action planning and execution in robotics. arXiv preprint arXiv:2011.09756 (2020)

  25. Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)

  26. Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)

    Google Scholar 

  27. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)

    Google Scholar 

  28. Thomas, P.S., Brunskill, E.: Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines. arXiv preprint arXiv:1706.06643 (2017)

  29. Tomy, M., Lacerda, B., Hawes, N., Wyatt, J.L.: Battery charge scheduling in long-life autonomous mobile robots via multi-objective decision making under uncertainty. Robot. Auton. Syst. 133, 103629 (2020)

    Article  Google Scholar 

  30. Toussaint, M., Charlin, L., Poupart, P.: Hierarchical pomdp controller optimization by likelihood maximization. In: UAI, vol. 24, pp. 562–570 (2008)

    Google Scholar 

  31. Toussaint, M., Harmeling, S., Storkey, A.: Probabilistic inference for solving (po) mdps. University of Edinburgh, School of Informatics Research Report EDI-INF-RR-0934 (2006)

    Google Scholar 

  32. Toussaint, M., Storkey, A.: Probabilistic inference for solving discrete and continuous state markov decision processes. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 945–952. ACM (2006)

    Google Scholar 

  33. Verma, D., Rao, R.P.: Goal-based imitation as probabilistic inference over graphical models. In: Advances in Neural Information Processing Systems, pp. 1393–1400 (2006)

    Google Scholar 

  34. Yoon, S.W., Fern, A., Givan, R.: Ff-replan: a baseline for probabilistic planning. In: ICAPS, vol. 7, pp. 352–359 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mohamed Baioumy .

Editor information

Editors and Affiliations

A Appendix: Illustrations

A Appendix: Illustrations

Fig. 1.
figure 1

An illustration of a \(4\times 4\) grid world (right). The initial state is blue and goal state is green. An illustration for a policy (left) and a plan (middle).

Fig. 2.
figure 2

Annotated world states (left) and a posterior over a temporal state (right).

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baioumy, M., Lacerda, B., Duckworth, P., Hawes, N. (2021). On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93735-5

  • Online ISBN: 978-3-030-93736-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics