Active Learning in Partially Observable Markov Decision Processes

  • Robin Jaulmes
  • Joelle Pineau
  • Doina Precup
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two approaches to this problem. The first relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has theoretical guarantees, but is impractical when many of the parameters are uncertain. The second, called MEDUSA, incrementally improves the POMDP model using selected queries, while still optimizing reward. Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.


Active Learn Belief State Reward Function Dirichlet Distribution Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Anderson, B., Moore, A.: Active Learning in HMMs. In: ICML 2005 (2005)Google Scholar
  2. Brafman, R.I., Shani, G.: Resolving perceptual asliasing with noisy sensors. In: NIPS 2005 (2005)Google Scholar
  3. Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active Learning with Statistical Models. In: NIPS 1996 (1996)Google Scholar
  4. Dearden, R., Friedman, N., Andre, N.: Model Based Bayesian Exploration. In: UAI 1999 (1999)Google Scholar
  5. Kaelbling, L., Littman, M., Cassandra, A.: Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101 (1998)Google Scholar
  6. Littman, M., Cassandra, A., Kaelbling, L.: Learning policies for partially observable environments: Scaling up. Technical Report. Brown University (1995)Google Scholar
  7. McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Thesis. University of Rochester (1996)Google Scholar
  8. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI 2003 (2003)Google Scholar
  9. Poupart, P., Boutilier, C.: VDCBPI: an Approximate Scalable Algorithm for Large Scale POMDPs. In: NIPS 2005 (2005)Google Scholar
  10. Singh, S., Littman, M., Jong, N.K., Pardoe, D., Stone, P.: Learning Predictive State Representations. In: ICML 2003 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Robin Jaulmes
    • 1
  • Joelle Pineau
    • 1
  • Doina Precup
    • 1
  1. 1.School of Computer ScienceMcGill UniversityMontreal

Personalised recommendations