A Comparison Study of Unbounded and Real-valued Reinforcement Associative Reward-Penalty Algorithms
A comparison study was carried out between two Associative Reward-Penalty. or A R-P , algorithms. The regimes solve nonlinear supervised learning tasks utilising multi-layer feedforward networks. We introduce a variant of the A R-P algorithm, called the ’Unbounded’ reinforcement A R-P algorithm. The ’Unbounded’ reinforcement A R-P is compared with the real-valued reinforcement A R-P algorithm. The ’Unbounded’ reinforcement method utilises a quantised real-valued reinforcement. which is a payoff metric optimised by an Associated Critic Net.
Unable to display preview. Download preview PDF.
- A.G. Barto and M.I. Jordan. Gradient. following without back-propagation in layered networks. In Proceeding 1st IEEE Conference on Neural Networks, pages II.629-II.636. IEEE, 1987.Google Scholar
- I. Aleksander. Canonical neural nets based on logic nodes. In 1st IEE International Conference on Artificial Neural Networks, pages 110–114, 1989.Google Scholar
- I. Aleksander. Weightless neural tools: Towards cognitive macrostructures. In CAIP Neural Network Workshop, New Jersey, 1990. Rutgers University.Google Scholar
- R.S. Sutton A.G. Barto and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Transactions on systems, man, and cybernetics, SMC- 13(5):834–846. September/October 1983.Google Scholar
- T.J Sejnowski G.E. Hinton and D.H. Ackley. Boltzmann machines: Constraint satisfaction networks that learn. Technical Report. CMU-CS-84–119. Carnegie Mellon university. Pittsburgh. PA, 1984.Google Scholar