Two Dimensional Evaluation Reinforcement Learning
To solve the problem of tradeoff between exploration and exploitation actions in reinforcement learning, the authors have proposed two-dimensional evaluation reinforcement learning, which distinguishes between reward and punishment evaluation forecasts. The proposed method use the difference between reward evaluation and punishment evaluation as a factor for determining the action and the sum as a parameter for determining the ratio of exploration to exploitation. In this paper we described an experiment with a mobile robot searching for a path and the subsequent conflict between exploration and exploitation actions. The results of the experiment prove that using the proposed method of reinforcement learning using the tw o dimensions of reward and punishment can generate a better path than using the conventional reinforcement learning method.
KeywordsArtificial Intelligence Mobile Robot Problem Complexity Learning Method Exploitation Action
Unable to display preview. Download preview PDF.
- L. P. Kaelbling, K. L. Littman and A. W. Moore: Reinforcement learning: A survey, Journal of Arti_cial Intelligence Research, vol. 4, 237–285 (1996).Google Scholar
- N. E. Miller: Liberalization of basic S-R concepts:extensions to conict behavior, motivation and social learning, in Koch. S (Ed), Psychology:A Study of a Science, study 1 vol. 2,196–292, New York: McFraw-Hill (1959).Google Scholar
- B. Milner: Effects of different brain lesions on card sorting, Archives of Neurology, vol. 9, 10–100 (1963).Google Scholar
- H. Okada and H. Yamakawa: Neural netowrk model for attention and reinforcement learning, SIG-CII-97 10, 4–14 (1997).Google Scholar
- H. Okada, H. Yamakawa and T. Omori: Neural Network model for the preservation behavior of frontal lobe injured patients, ICONIP’98, 1465–1469 (1998).Google Scholar
- A. G. Barto, R. S. Suttond and C. W. Anderson: Neuronlike Adaptive Elements That Can Solve Difficut Learning Control Problems, IEEE Transaction on Systems, Man and Cybernetics, vol. 13, no. 5, 834–846 (1983).Google Scholar