Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata

  • Andrew G. Barto
  • P. Anandan
  • Charles W. Anderson

Abstract

A class of learning tasks is described that combines aspects of learning automaton tasks and supervised learning pattern-classification tasks. We call these associative reinforcement learning tasks. An algorithm is presented, called the associative reward-penalty, or AR−P, algorithm, for which a form of optimal performance has been proved. This algorithm simultaneously generalizes a class of stochastic learning automata and a class of supervised learning pattern-classification methods. Simulation results are presented that illustrate the associative reinforcement learning task and the performance of the the AR−P algorithm. Additional simulation results are presented showing how cooperative activity in networks of interconnected AR−P automata can olve difficult nonlinear associative learning problems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    K.S. Narendra and M.A.L. Thathachar, “Learning Automata—A Survey,” IEEE Trans. Syst., Man, Cybern., vol. 4, pp. 323–334, 1974.MathSciNetMATHGoogle Scholar
  2. [2]
    K.S. Narendra and S. Lakshmivarahan, “Learning Automata—A Critique,” J. Cybern. and Inf. Sci., vol. 1, pp. 53–65, 1977.Google Scholar
  3. [3]
    P. Mars, K.S. Narendra, and M. Crystall, “Learning Automata Control of Computer Communication Networks,” Proc. Third Yale Workshop on Applications of Adaptive Systems Theory, 1983.Google Scholar
  4. [4]
    L.G. Mason, “Learning Automata and Telecommunications Switching,” Proc. Third Yale Workshop on Applications of Adaptive Systems Theory, 1983.Google Scholar
  5. [5]
    R.M. Wheeler and K.S. Narendra, “Models for Decentralized Decisionmaking,” Report No. 8403, Electrical Engineering, Yale University, 1984.Google Scholar
  6. [6]
    R.A. Jarvis, “Teaching a Stochastic Automaton to Skillfully Play Hand/Eye Games,” J. of Cybern. and Inf. Sci., vol. 1, pp. 161–177, 1977.Google Scholar
  7. [7]
    S. Lakshmivarahan, Learning Algorithms and Applications Springer-Verlag, New York, 1981.Google Scholar
  8. [8]
    I.H. Witten, “An Adaptive Optimal Controller for Discrete-time Markov Environments,” Inf. and Contr., vol. 34, pp. 286–295, 1977.MathSciNetMATHCrossRefGoogle Scholar
  9. [9]
    A.G. Barto and P. Anandan, “Pattern Recognizing Stochastic Learning Automata,” IEEE Trans. on Syst., Man, Cybern., vol. 15, pp. 360–375, 1985.MathSciNetMATHGoogle Scholar
  10. [10]
    R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis Wiley, New York, 1973.Google Scholar
  11. [11]
    M.A.L. Thathachar and K.R. Ramakrishnan, “An Automaton Model of a Hierarchical Learning System,” Proc. 8th Triennial World Congress, IFAC Control Science and Technology, Kyoto, Japan, pp. 1065–1070, 1981.Google Scholar
  12. [12]
    A.G. Barto, Editor. “Simulation Experiments with Goal-seeking Adaptive Elements,” Air Force Wright Aeronautical Laboratories/Avionics Laboratory Technical Report AFWAL-TR-84–1022, Wright-Patterson AFB, Ohio, 1984.Google Scholar
  13. [13]
    A.G. Barto, C.W. Anderson, and R.S. Sutton, “Synthesis of Nonlinear Control Surfaces by a Layered Associative Search Network,” Biol. Cybern., vol. 43, pp. 175–185, 1982.MATHCrossRefGoogle Scholar
  14. [14]
    A.G. Barto and R.S. Sutton, “Landmark Learning: An Illustration of Associative Search,” Biol. Cybern., vol. 42, pp. 1–8, 1981.MATHCrossRefGoogle Scholar
  15. [15]
    A.G. Barto, R.S. Sutton, and C.W. Anderson, “Neuronlike Elements That Can Solve Difficult Learning Control Problems,” IEEE Trans. on Syst., Man, Cybern., vol. SMC13, pp. 834–846, 1983.Google Scholar
  16. [16]
    A.G. Barto, R.S. Sutton, and P.S. Brouwer, “Associative Search Network: A Reinforcement Learning Associative Memory,” Biol. Cybern., vol. 40, pp 201–211, 1981.MATHCrossRefGoogle Scholar
  17. [17]
    R.S. Sutton and A.G. Barto, “Toward a Modern Theory of Adaptive Networks: Expectation and Prediction,” Psych. Rev., vol. 88, pp. 135–171, 1981.CrossRefGoogle Scholar
  18. [18]
    J.A. Feldman (Ed.), Special Issue on Connectionist Models and Their Applications, Cognitive Science, vol. 9, 1985.Google Scholar
  19. [19]
    G. Hinton and J. Anderson, Parallel Models of Associative Memory Erlbaum, Hilsdale, N. J., 1981.Google Scholar
  20. [20]
    T. Kohonen, Associative Memory: A System Theoretic Approach Springer, Berlin, 1977.Google Scholar
  21. [21]
    A.H. Klopf, The Hedonistic Neuron: A Theory of Memory, Learning, and Intelligence Hemisphere, Washington, D.C., 1982.Google Scholar
  22. [22]
    D.H. Ackley, G.E. Hinton, and T.J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” Cognitive Science, vol. 9, pp. 147–169, 1985.CrossRefGoogle Scholar
  23. [23]
    D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning Internal Representations by Error Propagation,” ICS Report 8506, Institute for Cognitive Science, University of California, San Diego, 1985.Google Scholar
  24. [24]
    B. Widrow and M.E. Hoff, “Adaptive Switching Circuits,” 1960 WESCON Convention Record Part IV, pp. 96–104, 1960.Google Scholar
  25. [25]
    R.L. Kasyap, C.C. Blaydon, and K.S. Fu, “Stochastic Approximation,” in Adaptation, Learning and Pattern Recognition Systems: Theory and Applications J.M. Mendel and K.S. Fu, Eds. Academic Press, New York, 1970.Google Scholar
  26. [26]
    B. Widrow, N.K.. Gupta, and S. Maitra, “Punish/Reward: Learning with a Critic in Adaptive Threshold Systems,” IEEE Trans. on Syst., Man, Cybern., vol. 5, pp. 455465, 1973.Google Scholar
  27. [27]
    S. Lakshmivarahan, “e-optimal Learning Algorithms—Non-absorbing Barrier Type,” Technical Report EECS 7901, School of Electrical Engineering and Computer Sciences, University of Oklahoma, Norman, Oklahoma, 1979.Google Scholar

Copyright information

© Springer Science+Business Media New York 1986

Authors and Affiliations

  • Andrew G. Barto
    • 1
  • P. Anandan
    • 1
  • Charles W. Anderson
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of MassachusettsAmherstCanada

Personalised recommendations