Two Dimensional Evaluation Reinforcement Learning

  • Hiroyuki Okada
  • Hiroshi Yamakawa
  • Takashi Omori
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2084)


To solve the problem of tradeoff between exploration and exploitation actions in reinforcement learning, the authors have proposed two-dimensional evaluation reinforcement learning, which distinguishes between reward and punishment evaluation forecasts. The proposed method use the difference between reward evaluation and punishment evaluation as a factor for determining the action and the sum as a parameter for determining the ratio of exploration to exploitation. In this paper we described an experiment with a mobile robot searching for a path and the subsequent conflict between exploration and exploitation actions. The results of the experiment prove that using the proposed method of reinforcement learning using the tw o dimensions of reward and punishment can generate a better path than using the conventional reinforcement learning method.


Artificial Intelligence Mobile Robot Problem Complexity Learning Method Exploitation Action 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    L. P. Kaelbling, K. L. Littman and A. W. Moore: Reinforcement learning: A survey, Journal of Arti_cial Intelligence Research, vol. 4, 237–285 (1996).Google Scholar
  2. [2]
    N. E. Miller: Liberalization of basic S-R concepts:extensions to conict behavior, motivation and social learning, in Koch. S (Ed), Psychology:A Study of a Science, study 1 vol. 2,196–292, New York: McFraw-Hill (1959).Google Scholar
  3. [3]
    J. R. Ison and A. J. Rosen: The effect of amobarbital sodium on differential instrumental conditioning and subsequent extinction, Psyhopharmacologia, vol. 10, 417–425 (1967).CrossRefGoogle Scholar
  4. [4]
    B. Milner: Effects of different brain lesions on card sorting, Archives of Neurology, vol. 9, 10–100 (1963).Google Scholar
  5. [5]
    H. Okada and H. Yamakawa: Neural netowrk model for attention and reinforcement learning, SIG-CII-97 10, 4–14 (1997).Google Scholar
  6. [6]
    H. Okada, H. Yamakawa and T. Omori: Neural Network model for the preservation behavior of frontal lobe injured patients, ICONIP’98, 1465–1469 (1998).Google Scholar
  7. [7]
    A. G. Barto, R. S. Suttond and C. W. Anderson: Neuronlike Adaptive Elements That Can Solve Difficut Learning Control Problems, IEEE Transaction on Systems, Man and Cybernetics, vol. 13, no. 5, 834–846 (1983).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Hiroyuki Okada
    • 1
  • Hiroshi Yamakawa
    • 1
  • Takashi Omori
    • 2
  1. 1.Real World Computing PartnershipChibaJapan
  2. 2.Hokkaido UniversitySapporoJapan

Personalised recommendations