Fuzzy reinforcement Learning and dynamic programming

  • Hamid R. Berenji
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 847)


In this paper, we develop a new algorithm called Fuzzy Q-Learning (or FQ-Learning) which extends Watkin's Q-Learning method. It can be used for decision processes in which the goals and/or the constraints, but not necessarily the system under control, are fuzzy in nature. An example of a fuzzy constraint is: “the weight of object A must not be substantially heavier than w” where w is a specified weight. Similarly, an example of a fuzzy goal is: “the robot must be in the vicinity of door k”. We show that FQ-Learning provides an alternative solution to this problem which is simpler than the Bellman-Zadeh's fuzzy dynamic programming approach. We apply the algorithm to a multistage decision making problem.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    A. G. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Submitted to AI Journal special issue on Computational Theories of Interaction and Agency, 1993.Google Scholar
  2. [2]
    A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:834–846, 1983.Google Scholar
  3. [3]
    R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.Google Scholar
  4. [4]
    R.E. Bellman and L.A. Zadeh. Decision-making in a fuzzy environment. Management Science, 17(4):B–141:B-164, 1970.Google Scholar
  5. [5]
    H.R. Berenji and P. Khedkar. Learning and tuning fuzzy logic controllers through reinforcements. IEEE Transactions on Neural Networks, 3(5), 1992.Google Scholar
  6. [6]
    H.R. Berenji, Y. Jani R.N Lea, P. Khedkar, A. Malkani, and J. Hoblit. Space shuttle attitude control by fuzzy logic and reinforcement learning. In Second IEEE International conference on Fuzzy Systems, San Francisco, CA, March 1993.Google Scholar
  7. [7]
    L.J. Lin. Programming robots using reinforcement learning and teaching. In Proceedings of the Ninth National Conference on Artificial Intelligence, 1991.Google Scholar
  8. [8]
    A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, page to appear.Google Scholar
  9. [9]
    R.S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.Google Scholar
  10. [10]
    R.S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, 1990.Google Scholar
  11. [11]
    G. Tesauro. Practical issues in temporal difference learning. Machine Learning, (8):257–277, 1992.Google Scholar
  12. [12]
    G. Tesauro. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215–219, 1994.Google Scholar
  13. [13]
    C. Watkins and P. Dayan. Q-learning. Machine Learning, (8):279–292, 1992.Google Scholar
  14. [14]
    C.J.C.H. Watkins. Learning with Delayed Rewards. PhD thesis, Cambridge University, Psychology Department, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Hamid R. Berenji
    • 1
  1. 1.Intelligent Inference Systems Corp. Artificial Intelligence Research Branch, MS: 269-2NASA Ames Research CenterMountain View

Personalised recommendations