Policy Improvements for Probabilistic Pursuit-Evasion Game

  • Dong Jun Kwak
  • H. Jin Kim


This paper focuses on a pursuit-evasion game (PEG) which involves two teams: one side consists of pursuers trying to minimize the time required to capture evaders, and the other side consists of evaders trying to maximize the capture time by escaping the pursuers. In this paper, we propose a hybrid pursuit policy for a probabilistic PEG, which possesses the combined merits of local-max and global-max pursuit policies proposed in previous literature. A method to find optimal pursuit and evasion polices for two competitive parties of the pursuers and evaders is also proposed. For this, we employ an episodic parameter optimization (EPO) algorithm to learn good values for the weighting parameters of a hybrid pursuit policy and an intelligent evasion policy. The EPO algorithm is performed during the numerous repeated simulation runs of the PEG and the reward of each episode is updated using reinforcement learning, and the optimal weighting parameters are selected by using particle swarm optimization. We analyze the trend of the optimal parameter values with respect to the number of the pursuers and evaders. The proposed strategy is validated both in simulations and experiments with small ground robots.


Pursuit-evasion game Probabilistic game Multiple robots Reinforcement learning Particle swarm optimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Olfati-Saber, R.: Flocking for multi-agent dynamic systems: algorithms and theory. IEEE Trans. Automat. Control 51(3), 401–420 (2006)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Yu, J., LaValle, S.M., Liberzon, D.: Rendezvous without coordinates. In: Proceedings of the 47th IEEE Conference on Decision and Control (CDC 2008), pp. 1803–1808 (2008)Google Scholar
  3. 3.
    Kim, D.H., Kim, J.H.: A real-time limit-cycle navigation method for fast mobile robots and its application to robot soccer. Robot. Auton. Syst. 42(1), 17–30 (2003)CrossRefzbMATHGoogle Scholar
  4. 4.
    Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature 433, 513–516 (2005)CrossRefGoogle Scholar
  5. 5.
    Ren, W., Sorensen, N.: Distributed coordination architecture for multi-robot formation control. Robot. Auton. Syst. 56(4), 324–333 (2008)CrossRefGoogle Scholar
  6. 6.
    Isaacs, R.: Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. Wiley, New York (1965)zbMATHGoogle Scholar
  7. 7.
    Vidal, R., Shakernia, O., Kim, H.J., Shim, D.H., Sastry, S.: Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. IEEE Trans. Robot. Autom. 18(5), 662–669 (2002)CrossRefGoogle Scholar
  8. 8.
    Kwak, D.J., Kim, H.J.: Probabilistic pursuit-evasion game. In: Proceedings of Korea Automatic Control Conference (2009)Google Scholar
  9. 9.
    Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, pp. 1942–1948 (1995)Google Scholar
  10. 10.
    Clerc, M., Kennedy, J.: The particle swarm explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)CrossRefGoogle Scholar
  11. 11.
    Benkoski, S.J., Monticino, M.G., Weisinger, J.R.: A survey of the search theory literature. Nav. Res. Log. 38(4), 469–494 (1991)CrossRefzbMATHGoogle Scholar
  12. 12.
    Chung, T., Hollinger, G., Isler, V.: Search and pursuit-evasion in mobile robotics. Auton. Robot. 31(4), 299–316 (2011)CrossRefGoogle Scholar
  13. 13.
    Kehagias, A., Hollinger, G., Singh, S.: A graph search algorithm for indoor pursuit/evasion. Math. Comput. Model. 50(9–10), 1305–1317 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Huang, H., Zhang, W., Ding, J., Stipanovic, D.M., Tomlin, C.J.: Guaranteed decentralized pursuit-evasion in the plane with multiple pursuers. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC 2011), pp. 1803–1808 (2011)Google Scholar
  15. 15.
    Murrieta-Cid, R., Muppirala, T., Sarmiento, A., Bhattacharya, S., Hutchinson, S.: Surveillance strategies for a pursuer with finite sensor range. Int. J. Robot. Res. 26(3), 233–253 (2007)CrossRefGoogle Scholar
  16. 16.
    Murrieta-Cid, R., Monroy, R., Hutchinson, S., Laumond, J.P., Tomlin, C.J.: A Complexity result for the pursuit-evasion game of maintaining visibility of a moving evader. In: Proceedings of the IEEE Conference on Robotics and Automation (ICRA 2008), pp. 2657–2664 (2008)Google Scholar
  17. 17.
    Tovar, B., LaValle, S.M.: Visibility-based pursuit-evasion with bounded speed. Int. J. Robot. Res. 27(11–12), 1350–1360 (2008)CrossRefGoogle Scholar
  18. 18.
    Bhattacharya, S., Hutchinson, S.: On the existence of nash equilibrium for a two player pursuit-evasion game with visibility constraints. In: Chirikjian, G.S., Choset, H., Morales, M., Murphey, T. (eds.) Algorithmic Foundation of Robotics VIII, pp. 251–265. Springer Berlin Heidelberg (2009)Google Scholar
  19. 19.
    Vieira, M., Govindan, R., Sukhatme, G.: Scalable and practical pursuit-evasion with networked robots. Intel. Serv. Robotics 2(4), 247–263 (2009)CrossRefGoogle Scholar
  20. 20.
    Stone, L.D.: Theory of Optimal Search. Academic Press, New York (1975)zbMATHGoogle Scholar
  21. 21.
    Yan, I., Blankenship, G.L.: Numerical methods in search path planning. In: Proceedings of the IEEE Conference on Decision and Control, pp. 1563–1569 (1988)Google Scholar
  22. 22.
    Bourgault, F., Furukawa, T., Durrant-Whyte, H.F.: Optimal search for a lost target in a bayesian world. Field Serv. Robot. 24, 209–222 (2006)CrossRefGoogle Scholar
  23. 23.
    Tisdale, J., Ryan, A., Kim, Z., Tornqvist, D., Hedrick, J.K.: A multiple UAV system for vision-based search and localization. In: Proceedings of the IEEE Conference on American Control Conference (ACC 2008), pp. 1985–1990 (2008)Google Scholar
  24. 24.
    Tisdale, J., Kim, Z., Hedrick, J.K.: Autonomous UAV path planning and estimation. IEEE Trans. Robot. Autom. Mag. 16(2), 35–42 (2009)CrossRefGoogle Scholar
  25. 25.
    Furukawa, T., Bourgault, F., Lavis, B., Durrant-Whyte, H.F.: Recursive Bayesian search-and-tracking using coordinated uavs for lost targets. In: Proceedings of the IEEE Conference on Robotics and Automation (ICRA 2006), pp. 2521–2526 (2006)Google Scholar
  26. 26.
    Chung, C.F., Furukawa, T.: Coordinated pursuer control using particle filters for autonomous search-and-capture. Robot. Auton. Syst. 57(6–7), 700–711 (2009)CrossRefGoogle Scholar
  27. 27.
    Grocholsky, B., Keller, J., Kumar, V., Pappas, G.: Cooperative air and ground surveillance. IEEE Trans. Robot. Autom. Mag. 13(3), 16–25 (2006)CrossRefGoogle Scholar
  28. 28.
    Santana, H., Ramalho, G., Corruble, V., Ratitch, B.: Multi-agent patrolling with reinforcement learning. In: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, vol. 3, pp. 1122–1129 (2004)Google Scholar
  29. 29.
    Xueqing Sun, Tao Mao, Kralik, J.D., Ray, L.E.: Cooperative multi-robot reinforcement learning: a framework in hybrid state space. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), pp. 1190–1196 (2009)Google Scholar
  30. 30.
    Cai, Y., Yang, S., Xu, X., Mittal, G.: A hierarchical reinforcement learning based approach for multi-robot cooperation in unknown environments. Adv. Intell. Soft Comput. 144, 69–74 (2012)CrossRefGoogle Scholar
  31. 31.
    Nanduri, V., Das, T.K.: A reinforcement learning algorithm for obtaining the Nash equilibrium of multi-player matrix games. IIE Trans. 41(2), 158–167 (2009)CrossRefGoogle Scholar
  32. 32.
    Pugh, J., Martinoli, A.: Distributed scalable multi-robot learning using particle swarm optimization. Swarm Intell. 3(3), 203–222 (2009)CrossRefGoogle Scholar
  33. 33.
    Niehaus, C., Rofer, T., Laue, T.: Gait-optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the 2nd Workshop on Humanoid Soccer Robots @ IEEE-RAS 7th International Conference on Humanoid Robots (2007)Google Scholar
  34. 34.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)Google Scholar
  35. 35.
    Dubins, L.E.: On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. Am. J. Math. 79(3), 497–516 (1957)CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Khosla, P., Volpe, R.: Superquadric artificial potentials for obstacle avoidance and approach. In: Proceedings of the 1988 IEEE International Conference on Robotics and Automation, pp. 1778–1784 (1988)Google Scholar
  37. 37.
    Ng, A.Y., Jordan, M.I.: PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI’00), pp. 406–415 (2000)Google Scholar
  38. 38.
    Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat, S., Zufferey, J., Floreano, D., Martinoli, A.: The e-puck, a robot designed for education in engineering. In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, pp. 59–65 (2009)Google Scholar
  39. 39.
    Video clip: Available online. (2013). Accessed 2 May 2013

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.School of Mechanical and Aerospace EngineeringSeoul National UniversitySeoulKorea

Personalised recommendations