Abstract
Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent’s decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Williams, R.J.: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
Kimura, H., Yamamura, M., Kobayashi, S.: Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward. In: Proceedings of the 12th International Conference on Machine Learning, pp. 295–303 (1995)
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Neural Information Processing Systems (Proc. NIPS 1999 Conf.), vol. 12, pp. 1057–1063 (2000)
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. In: Advances in Neural Information Processing Systems (Proc. NIPS 1999 Conf.), vol. 12, pp. 1008–1014 (2000)
Baird, L., Moore, A.: Gradient Descent for General Reinforcement Learning. In: Advances in Neural Information Processing Systems (Proc. NIPS 1998 Conf.), vol. 11, pp. 968–974 (1999)
Igarashi, H., Ishihara, S., Kimura, M.: Reinforcement Learning in Non-Markov Decision Processes —Statistical Properties of Characteristic Eligibility. IEICE Transactions on Information and Systems J90-D(9), 2271–2280 (2007) (in Japanese)
Ishihara, S., Igarashi, H.: Applying the Policy Gradient Method to Behavior Learning in Multi-agent Systems: The Pursuit Problem. Systems and Computers in Japan 37(10), 101–109 (2006)
Peshkin, L., Kim, K.E., Meuleau, N., Kaelbling, L.P.: Learning to cooperative via policy search. In: Proc. of 16th Conference on Uncertainty in Artificial Intelligence (UAI 2000), pp. 489–496 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishihara, S., Igarashi, H. (2008). Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-89197-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)