Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation
Decision making in complex, multi agent and dynamic environments such as Rescue Simulation is a challenging problem in Artificial Intelligence. Uncertainty, noisy input data and stochastic behavior which is a common difficulty of real time environment makes decision making more complicated in such environments. Our approach to solve the bottleneck of dynamicity and variety of conditions in such situations is reinforcement learning. Classic reinforcement learning methods usually work with state and action value functions and temporal difference updates. Using function approximation is an alternative method to hold state and action value functions directly. Many Reinforcement learning methods in continuous action and state spaces implement function approximation and TD updates such as TD, LSTD, iLSTD, etc. A new approach to online reinforcement learning in continuous action or state spaces is presented in this paper which doesn’t work with TD updates. We have named it Parametric Reinforcement Learning. This method is utilized in Robocup Rescue Simulation / Police Force agent’s decision making process and the perfect results of this utilization have been shown in this paper. Our simulation results show that this method increases the speed of learning and simplicity of use. It has also very low memory usage and very low costing computation time.
KeywordsReinforcement Learning Multi Agent Coordination Decision Making
Unable to display preview. Download preview PDF.
- 1.Ahmad Sharbafi, M., Lucas, C., AmirGhiasvand, O., Aghazadeh, O., Toroghi Haghighat, A.: Using Emotional Learning in Rescue Simulation Environment, Transactions on Engineering, Computing and Technology. 13, 333–337 (2006)Google Scholar
- 2.Allen-Williams, M.: Coordination in multi-agent systems, PhD thesis, University of Southampton (2006)Google Scholar
- 3.Dorais, G., Bonasso, R., Kortenkamp, D., Pell, P., Schreckenghost., D.: Adjustable autonomy for human-centered autonomous systems on Mars. In: Mars Society Conference (1998)Google Scholar
- 5.Scerri, P., Sycara, K., Tambe, M.: Adjustable Autonomy in the Context of Coordination. In: AIAA 1st Intelligent Systems Technical Conference, Chicago, Illinois (2004)Google Scholar
- 7.Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
- 8.Baird, L.: Reinforcement learning in continuous time: Advantage updating. In Neural Networks. IEEE World Congress on Computational Intelligence 4, 2448–2453 (1994)Google Scholar
- 9.Doya, K.: Temporal difference learning in continuous time and space. In: Advances in Neural Information Processing Systems, pp. 1073–1079. The MIT Press (1996)Google Scholar
- 10.van Kampen, E.-J.: Continuous Adaptive Critic Flight Control using Approximated Plant Dynamics, Master of Science Thesis Faculty of Aerospace Engineering, Delft University of Technology (2006)Google Scholar
- 11.Martin Appl.: Model-Based Reinforcement Learning in Continuous Environments, PhD thesis Technical University of Munich (2000)Google Scholar
- 12.Precup, D., Sutton, R., Dasgupta, S.: Off-Policy Temporal-Difference Learning with Function Approximation. In: ICML 2001, pp. 417–424 (2001)Google Scholar
- 14.Habibi, J., Ahmadi, M., Nouri, A., Sayyadian, M., Nevisi, M.: Utilizing Different Multiagent Methods in Robocup Rescue Simulation. In: Polani, D., Browning, B., Bonarini, A., Yoshida, K. (eds.) RoboCup 2003. LNCS (LNAI), vol. 3020. Springer, Heidelberg (2004)Google Scholar
- 15.Kitano, H., Tadokoro, S., Noda, I., Matsubara, H., Takahashi, T., Shinjou, A., Shimada, S.: RoboCup-Rescue: Search and Rescue in Large Scale Disasters as a Domain for Autonomous Agents Research. In: IEEE Conference on Man, Systems, and Cybernetics (1999)Google Scholar