Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation

  • Omid Aghazadeh
  • Maziar Ahmad Sharbafi
  • Abolfazl Toroghi Haghighat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5001)

Abstract

Decision making in complex, multi agent and dynamic environments such as Rescue Simulation is a challenging problem in Artificial Intelligence. Uncertainty, noisy input data and stochastic behavior which is a common difficulty of real time environment makes decision making more complicated in such environments. Our approach to solve the bottleneck of dynamicity and variety of conditions in such situations is reinforcement learning. Classic reinforcement learning methods usually work with state and action value functions and temporal difference updates. Using function approximation is an alternative method to hold state and action value functions directly. Many Reinforcement learning methods in continuous action and state spaces implement function approximation and TD updates such as TD, LSTD, iLSTD, etc. A new approach to online reinforcement learning in continuous action or state spaces is presented in this paper which doesn’t work with TD updates. We have named it Parametric Reinforcement Learning. This method is utilized in Robocup Rescue Simulation / Police Force agent’s decision making process and the perfect results of this utilization have been shown in this paper. Our simulation results show that this method increases the speed of learning and simplicity of use. It has also very low memory usage and very low costing computation time.

Keywords

Reinforcement Learning Multi Agent Coordination Decision Making 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahmad Sharbafi, M., Lucas, C., AmirGhiasvand, O., Aghazadeh, O., Toroghi Haghighat, A.: Using Emotional Learning in Rescue Simulation Environment, Transactions on Engineering, Computing and Technology.  13, 333–337 (2006)Google Scholar
  2. 2.
    Allen-Williams, M.: Coordination in multi-agent systems, PhD thesis, University of Southampton (2006)Google Scholar
  3. 3.
    Dorais, G., Bonasso, R., Kortenkamp, D., Pell, P., Schreckenghost., D.: Adjustable autonomy for human-centered autonomous systems on Mars. In: Mars Society Conference (1998)Google Scholar
  4. 4.
    Schurr, N., Marecki, J., Lewis, J.P., Tambe, M., Scerri, P.: The defacto system: Coordinating human-agent teams for the future. In: Multi-Agent Programming, pp. 197–215. Springer, New York (2005)CrossRefGoogle Scholar
  5. 5.
    Scerri, P., Sycara, K., Tambe, M.: Adjustable Autonomy in the Context of Coordination. In: AIAA 1st Intelligent Systems Technical Conference, Chicago, Illinois (2004)Google Scholar
  6. 6.
    Santamaria, J.C., Sutton, R.S., Ram, A.: Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 6(2), 163–218 (1998)CrossRefGoogle Scholar
  7. 7.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
  8. 8.
    Baird, L.: Reinforcement learning in continuous time: Advantage updating. In Neural Networks. IEEE World Congress on Computational Intelligence 4, 2448–2453 (1994)Google Scholar
  9. 9.
    Doya, K.: Temporal difference learning in continuous time and space. In: Advances in Neural Information Processing Systems, pp. 1073–1079. The MIT Press (1996)Google Scholar
  10. 10.
    van Kampen, E.-J.: Continuous Adaptive Critic Flight Control using Approximated Plant Dynamics, Master of Science Thesis Faculty of Aerospace Engineering, Delft University of Technology (2006)Google Scholar
  11. 11.
    Martin Appl.: Model-Based Reinforcement Learning in Continuous Environments, PhD thesis Technical University of Munich (2000)Google Scholar
  12. 12.
    Precup, D., Sutton, R., Dasgupta, S.: Off-Policy Temporal-Difference Learning with Function Approximation. In: ICML 2001, pp. 417–424 (2001)Google Scholar
  13. 13.
    Sutton, R.: Open Theoretical Questions in Reinforcement Learning. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 11–17. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  14. 14.
    Habibi, J., Ahmadi, M., Nouri, A., Sayyadian, M., Nevisi, M.: Utilizing Different Multiagent Methods in Robocup Rescue Simulation. In: Polani, D., Browning, B., Bonarini, A., Yoshida, K. (eds.) RoboCup 2003. LNCS (LNAI), vol. 3020. Springer, Heidelberg (2004)Google Scholar
  15. 15.
    Kitano, H., Tadokoro, S., Noda, I., Matsubara, H., Takahashi, T., Shinjou, A., Shimada, S.: RoboCup-Rescue: Search and Rescue in Large Scale Disasters as a Domain for Autonomous Agents Research. In: IEEE Conference on Man, Systems, and Cybernetics (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Omid Aghazadeh
    • 1
  • Maziar Ahmad Sharbafi
    • 1
    • 2
  • Abolfazl Toroghi Haghighat
    • 1
  1. 1.Mechatrronic Research LabAzad University of QazvinQazvinIran
  2. 2.Electrical and Computer engineering DepartmentUniversity of TehranTehranIran

Personalised recommendations