An Online POMDP Algorithm Used by the PoliceForce Agents in the RoboCupRescue Simulation

  • Sébastien Paquet
  • Ludovic Tobin
  • Brahim Chaib-draa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4020)


In the RoboCupRescue simulation, the PoliceForce agents have to decide which roads to clear to help other agents to navigate in the city. In this article, we present how we have modelled their environment as a POMDP and more importantly we present our new online POMDP algorithm enabling them to make good decisions in real-time during the simulation. Our algorithm is based on a look-ahead search to find the best action to execute at each cycle. We thus avoid the overwhelming complexity of computing a policy for each possible situation. To show the efficiency of our algorithm, we present some results on standard POMDPs and in the RoboCupRescue simulation environment.


Multiagent System Markov Decision Process Online Algorithm Belief State Reward Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Papadimitriou, C., Tsisiklis, J.N.: The complexity of markov decision processes. Mathematics of Operations Research 12, 441–450 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: The Nineteenth National Conference on Artificial Intelligence (AAAI 2004) (2004)Google Scholar
  3. 3.
    Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 1025–1032 (2003)Google Scholar
  4. 4.
    Poupart, P.: Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes. PhD thesis, University of Toronto (to appear, 2005)Google Scholar
  5. 5.
    Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence(UAI 2004), Banff, Canada (2004)Google Scholar
  6. 6.
    Spaan, M.T.J., Vlassis, N.: A point-based POMDP algorithm for robot planning. In: Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, Louisiana, pp. 2399–2404 (2004)Google Scholar
  7. 7.
    Aberdeen, D.: A (revised) survey of approximate methods for solving partially observable markov decision processes. Technical report, National ICT Australia (2003)Google Scholar
  8. 8.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Technical Report CS-96-08, Brown University (1996)Google Scholar
  9. 9.
    Boutilier, C., Poole, D.: Computing optimal policies for partially observable decision processes using compact representations. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI 1996), Portland, Oregon, USA, pp. 1168–1175. AAAI Press / MIT Press (1996)Google Scholar
  10. 10.
    Boyen, X., Koller, D.: Tractable inference for complex stochastic processes. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 33–42 (1998)Google Scholar
  11. 11.
    Kitano, H.: Robocup rescue: A grand challenge for multi-agent systems. In: Proceedings ICMAS 2000, Boston (2000)Google Scholar
  12. 12.
    Geffner, H., Bonet, B.: Solving large POMDPs using real time dynamic programming (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sébastien Paquet
    • 1
  • Ludovic Tobin
    • 1
  • Brahim Chaib-draa
    • 1
  1. 1.DAMAS LaboratoryLaval University 

Personalised recommendations