Feature Extraction for Decision-Theoretic Planning in Partially Observable Environments

  • Hajime Fujita
  • Yutaka Nakamura
  • Shin Ishii
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4131)


In this article, we propose a feature extraction technique for decision-theoretic planning problems in partially observable stochastic domains and show a novel approach for solving them. To maximize an expected future reward, all the agent has to do is to estimate a Markov chain over a statistic variable related to rewards. In our approach, an auxiliary state variable whose stochastic process satisfies the Markov property, called internal state, is introduced to the model with the assumption that the rewards are dependent on the pair of an internal state and an action. The agent then estimates the dynamics of an internal state model based on the maximum likelihood inference made while acquiring its policy; the internal state model represents an essential feature necessary to decision-making. Computer simulation results show that our technique can find an appropriate feature for acquiring a good policy, and can achieve faster learning with fewer policy parameters than a conventional algorithm, in a reasonably sized partially observable problem.


Hide Markov Model Internal State Reinforcement Learning Belief State Dynamic Bayesian Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATHGoogle Scholar
  2. 2.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  3. 3.
    Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems 36, 37–51 (2001)MATHCrossRefGoogle Scholar
  4. 4.
    Collins, S., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive dynamic walkers. Science Magazine 307, 1082–1085 (2005)Google Scholar
  5. 5.
    Tesauro, G.: TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6, 215–219 (1994)CrossRefGoogle Scholar
  6. 6.
    Ishii, S., Fujita, H., Mitsutake, M., Yamazaki, T., Matsuda, J., Matsuno, Y.: A reinforcement learning scheme for a partially-observable multi-agent game. Machine Learning 59, 31–54 (2005)MATHCrossRefGoogle Scholar
  7. 7.
    Singh, S., Bertsekas, D.: Reinforcement learning for dynamic channel allocation in cellular telephone systems. Advances in Neural Information Processing Systems 9, 974–980 (1996)Google Scholar
  8. 8.
    Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 746–752 (1998)Google Scholar
  9. 9.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Thrun, S.: Monte Carlo POMDPs. Advances in Neural Information Processing Systems 12, 1064–1070 (2000)Google Scholar
  11. 11.
    Yoshimoto, J., Ishii, S., Sato, M.: System identification based on on-line variational Bayes method and its application to reinforcement learning. In: Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing, vol. 2714, pp. 123–131 (2003)Google Scholar
  12. 12.
    Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research 13, 33–94 (2000)MATHMathSciNetGoogle Scholar
  13. 13.
    Nakamura, Y., Mori, T., Ishii, S.: An off-policy natural gradient method for a partially observable Markov decision process. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 431–436. Springer, Heidelberg (2005)Google Scholar
  14. 14.
    Chrisman, L., Littman, M.L.: Hidden state and short-term memory. Presentation at Reinforcement learning workshop, Machine Learning Conference (1993)Google Scholar
  15. 15.
    Bengio, Y., Frasconi, P.: Input-output HMM’s for sequence processing. IEEE Transactions on Neural Networks 7, 1231–1249 (1996)CrossRefGoogle Scholar
  16. 16.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  17. 17.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)MATHGoogle Scholar
  18. 18.
    Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proceedings of the 10th National Conference on Artificial Intelligence, pp. 183–188 (1992)Google Scholar
  19. 19.
    Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1104–1111 (1995)Google Scholar
  20. 20.
    McAllester, D., Singh, S.: Approximate planning for factored POMDPs using belief state simplification. In: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence, pp. 409–416 (1999)Google Scholar
  21. 21.
    Kitakoshi, D., Shioya, H., Nakano, R.: Analysis for adaptability of policy-improving system with a mixture model of bayesian networks to dynamic environments. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3684, pp. 730–737. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hajime Fujita
    • 1
  • Yutaka Nakamura
    • 1
  • Shin Ishii
    • 1
  1. 1.Graduate School of Information ScienceNara Institute of Science and Technology (NAIST)IkomaJapan

Personalised recommendations