Advertisement

ICANN ’94 pp 651-654 | Cite as

A Comparison Study of Unbounded and Real-valued Reinforcement Associative Reward-Penalty Algorithms

  • R. Neville
  • T. J. Stonham

Abstract

A comparison study was carried out between two Associative Reward-Penalty. or A R-P , algorithms. The regimes solve nonlinear supervised learning tasks utilising multi-layer feedforward networks. We introduce a variant of the A R-P algorithm, called the ’Unbounded’ reinforcement A R-P algorithm. The ’Unbounded’ reinforcement A R-P is compared with the real-valued reinforcement A R-P algorithm. The ’Unbounded’ reinforcement method utilises a quantised real-valued reinforcement. which is a payoff metric optimised by an Associated Critic Net.

Keywords

Output Error Training Vector Input Address Boltzmann Machine Training Artificial Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A.G. Barto and M.I. Jordan. Gradient. following without back-propagation in layered networks. In Proceeding 1st IEEE Conference on Neural Networks, pages II.629-II.636. IEEE, 1987.Google Scholar
  2. [2]
    K.N. Gurney. Training nets of hardware realizable sigma-pi units. Neural Networks, 5:289-:303, 1992.CrossRefGoogle Scholar
  3. [3]
    D. Gorse and J.G Taylor. A continuous input. ram-based stochastic neural model. Neural Networks, 4:657–665, 1991.CrossRefGoogle Scholar
  4. [4]
    I. Aleksander. Canonical neural nets based on logic nodes. In 1st IEE International Conference on Artificial Neural Networks, pages 110–114, 1989.Google Scholar
  5. [5]
    I. Aleksander. Weightless neural tools: Towards cognitive macrostructures. In CAIP Neural Network Workshop, New Jersey, 1990. Rutgers University.Google Scholar
  6. [6]
    R.S. Sutton A.G. Barto and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Transactions on systems, man, and cybernetics, SMC- 13(5):834–846. September/October 1983.Google Scholar
  7. [7]
    T.J Sejnowski G.E. Hinton and D.H. Ackley. Boltzmann machines: Constraint satisfaction networks that learn. Technical Report. CMU-CS-84–119. Carnegie Mellon university. Pittsburgh. PA, 1984.Google Scholar

Copyright information

© Springer-Verlag London Limited 1994

Authors and Affiliations

  • R. Neville
    • 1
  • T. J. Stonham
    • 1
  1. 1.Dept Elec EngBrunel UniversityUxbridge, MiddxUK

Personalised recommendations