Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations

  • Joni Pajarinen
  • Jaakko Peltonen
  • Ari Hottinen
  • Mikko A. Uusitalo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)


Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightforward optimization of plans (policies) computationally intractable. To solve this, we introduce an efficient POMDP planning algorithm. Many current methods store the policy partly through a set of “value vectors” which is updated at each iteration by planning one step further; the size of such vectors follows the size of the state space, making computation intractable for large POMDPs. We store the policy as a graph only, which allows tractable approximations in each policy update step: for a state space described by several variables, we approximate beliefs over future states with factorized forms, minimizing Kullback-Leibler divergence to the non-factorized distributions. Our other speedup approximations include bounding potential rewards. We demonstrate the advantage of our method in several reinforcement learning problems, compared to four previous methods.


Cognitive Radio Partially Observable Markov Decision Process Path Probability Opportunistic Spectrum Access Observation Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Sondik, E.J.: The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research 26(2), 282–304 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Cassandra, A.R.: A Survey of POMDP Applications. Technical report, Austin, USA (1998) Presented at the AAAI Fall Symposium (1998)Google Scholar
  3. 3.
    Spaan, M., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 195–220 (2005)zbMATHGoogle Scholar
  4. 4.
    Smith, T., Simmons, R.: Point-Based POMDP Algorithms: Improved Analysis and Implementation. In: Twenty-First Annual Conf. on Uncertainty in Artif. Int., Arlington, Virginia, pp. 542–549. AUAI Press (2005)Google Scholar
  5. 5.
    Poupart, P.: Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis, Univ. of Toronto, Toronto, Canada (2005)Google Scholar
  6. 6.
    Pajarinen, J., Peltonen, J., Uusitalo, M.A., Hottinen, A.: Latent state models of primary user behavior for opportunistic spectrum access. In: 20th Intl. Symposium on Personal, Indoor and Mobile Radio Communications, pp. 1267–1271. IEEE, Los Alamitos (2009)CrossRefGoogle Scholar
  7. 7.
    Zhao, Q., Tong, L., Swami, A., Chen, Y.: Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework. IEEE J. Sel. Areas Commun. 25(3), 589–600 (2007)CrossRefGoogle Scholar
  8. 8.
    Boutilier, C., Poole, D.: Computing optimal policies for partially observable decision processes using compact representations. In: Thirteenth National Conf. on Artif. Int., pp. 1168–1175. The AAAI Press, Menlo Park (1996)Google Scholar
  9. 9.
    Cassandra, A., Littman, M., Zhang, N.: Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. In: 13th Annual Conf. on Uncertainty in Artif. Int., pp. 54–61. Morgan Kaufmann, San Francisco (1997)Google Scholar
  10. 10.
    Poupart, P., Boutilier, C.: Value-directed compression of POMDPs. In: Becker, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 1547–1554. MIT Press, Cambridge (2003)Google Scholar
  11. 11.
    Li, X., Cheung, W., Liu, J., Wu, Z.: A novel orthogonal NMF-based belief compression for POMDPs. In: Ghahramani, Z. (ed.) 24th Annual International Conference on Machine Learning, pp. 537–544. Omnipress (2007)Google Scholar
  12. 12.
    Boyen, X., Koller, D.: Tractable inference for complex stochastic processes. In: Fourteenth Annual Conf. on Uncertainty in Artif. Int., pp. 33–42. Morgan Kaufmann, San Francisco (1998)Google Scholar
  13. 13.
    McAllester, D., Singh, S.: Approximate planning for factored POMDPs using belief state simplification. In: Fifteenth Annual Conf. on Uncertainty in Artif. Int., pp. 409–417. Morgan Kaufmann, San Francisco (1999)Google Scholar
  14. 14.
    Murphy, K., Weiss, Y.: The factored frontier algorithm for approximate inference in DBNs. In: Seventeenth Annual Conf. on Uncertainty in Artif. Int., pp. 378–385. Morgan Kaufmann, San Francisco (2001)Google Scholar
  15. 15.
    Paquet, S., Tobin, L., Chaib-draa, B.: An online POMDP algorithm for complex multiagent environments. In: Fourth International Joint Conference on Autonomous Agents and Multiagent systems, pp. 970–977. ACM, New York (2005)CrossRefGoogle Scholar
  16. 16.
    Poupart, P., Vlassis, N.: Model-based Bayesian reinforcement learning in partially observable domains. In: Tenth Intl. Symp. on Artif. Intelligence and Math. (2008)Google Scholar
  17. 17.
    Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G., Mihailidis, A.: A decision-theoretic approach to task assistance for persons with dementia. In: Nineteenth Intl. Joint Conf. on Artif. Int., vol. 19, pp. 1293–1299 (2005)Google Scholar
  18. 18.
    Haykin, S.: Cognitive radio: brain-empowered wireless communications. IEEE J. Sel. Areas Commun. 23, 201–220 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Joni Pajarinen
    • 1
  • Jaakko Peltonen
    • 1
  • Ari Hottinen
    • 2
  • Mikko A. Uusitalo
    • 2
  1. 1.Department of Information and Computer ScienceAalto University School of Science and TechnologyAaltoFinland
  2. 2.Nokia Research CenterNOKIA GROUPFinland

Personalised recommendations