Generation of Search Behavior by a Modification of Q-MDP Value Method

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 302)


We modify Q-MDP value method and observe the behaviors of a robot with the modified method in an environment, where state information of the robot is essentially indefinite. In Q-MDP value method, an action in every time step is chosen based on a calculation of expectation values with a probability distribution, which is the output of a probabilistic state estimator. The modified method uses a weighting function with the probability distribution in the calculation so as to give precedence to the states near the goal of the task. We applied our method to a simple robot navigation problem in an incomplete sensor environment. As a result, the method makes the robot take a kind of searching behavior without explicit implementation.


Q-MDP value method Particle filters Belief states Partially observable Markov decision process 


  1. 1.
    Silver, D., Veness, J.: Monte-Carlo Planning in Large POMDPs. In: NIPS. Volume 23. (2010) 2164–2172Google Scholar
  2. 2.
    Bonet, B., Geffner, H.: Solving POMDPs: RTDP-BEL vs. Point-based Algorithms. In: IJCAI. (2009) 1641–1646Google Scholar
  3. 3.
    Ong, S.C., Png, S.W., Hsu, D., Lee, W.S.: Planning under Uncertainty for Robotic Tasks with Mixed Observability. The International Journal of Robotics Research 29(8) (2010) 1053–1068CrossRefGoogle Scholar
  4. 4.
    Roy, N., Burgard, W., Fox, D., Thrun, S.: Coastal Navigation - Mobile Robot Navigation with Uncertainty in Dynamic Environments. In: Proc. of IEEE ICRA. (1999) 35–40Google Scholar
  5. 5.
    Thrun, S., Burgard, W., Fox, D.: Probabilistic ROBOTICS. MIT Press (2005)Google Scholar
  6. 6.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton, NJ (1957)MATHGoogle Scholar
  7. 7.
    Littman, M.L., et al.: Learning Policies for Partially Observable Environments: Scaling Up. In: Proceedings of International Conference on Machine Learning. (1995) 362–370Google Scholar
  8. 8.
    Ueda, R., Arai, T., Sakamoto, K., Jitsukawa, Y., Umeda, K., Osumi, H., Kikuchi, T., Komura, M.: Real-Time Decision Making with State-Value Function under Uncertainty of State Estimation. In: Proc. of ICRA. (2005)Google Scholar
  9. 9.
    Jitsukawa, Y., et al.: Fast Decision Making of Autonomous Robot under Dynamic Environment by Sampling Real-Time Q-MDP Value Method. In: Proc. of IROS. (2007) 1644–1650Google Scholar
  10. 10.
    Thrun, S., et al.: Probabilistic ROBOTICS. MIT Press (2005)Google Scholar
  11. 11.
    Latombe, J.C.: Robot Motion Planning. Kluwer Academic Publishers, Boston, MA (1991)CrossRefGoogle Scholar
  12. 12.
    Fox, D., Thrun, S., Burgard, W., Dellaert, F.: Particle Filters for Mobile Robot Localization. A. Doucet, N. de Freitas, and N. Gordon, editors, Sequential Monte Carlo Methods in Practice (2000) 470–498Google Scholar
  13. 13.
    Lenser, S., Veloso, M.: Sensor resetting localization for poorly modelled robots. In: Proc. of IEEE ICRA. (2000) 1225–1232Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Advanced Institute of Industrial TechnologyTokyoJapan

Personalised recommendations