A Sequential Training Strategy for Locally Recurrent Neural Networks
In locally recurrent neural networks, the output of a dynamic neuron is only fed back to itself. This particular structure makes it possible to train the network sequentially. A sequential orthogonal training method is developed in this chapter to train locally recurrent neural networks. The networks considered here contain a single-hidden-layer and dynamic neurons are located in the hidden layer. During network training, the first hidden neuron is used to model the relationship between inputs and outputs whereas other hidden neurons are added sequentially to model the relationship between inputs and model residuals. When adding a hidden neuron, its contribution is due to that part of its output vector which is orthogonal to the space spanned by the output vectors of the previous hidden neurons. The Gram-Schmidt orthogonalisation technique is used at each training step to form a set of orthogonal bases for the space spanned by the hidden neuron outputs. The optimum hidden layer weights can be obtained through gradient based optimisation method while the output layer weights can be found using least squares regression. Hidden neurons are added sequentially and the training procedure terminates when the model error is lower than a predefined level. Using this training method, the necessary number of hidden neurons can be found and, hence, avoiding the problem of over fitting. Neurons with mixed types of activation functions and dynamic orders can be incorporated into a single network. Mixed node networks can offer improved performance in terms of representation capabilities and network size parsimony. The excellent performance of the proposed technique is demonstrated by application examples.
KeywordsHide Neuron Model Predictive Control Recurrent Neural Network Distillation Column Continuous Stir Tank Reactor
Unable to display preview. Download preview PDF.
- Ballard, D.H. (1988), “Cortical connections and parallel processing: structure and function,” in Vision,Brain, and Cooperative Computation, ed. M. Arbib and Hamson, MIT Press, 563–621.Google Scholar
- Fahlman, S. (1990), “The cascade-correlation learning architecture,” in Advances in Neural Information Processing Systems 2, ed. D. Touretzky, Morgan Kaufmann, 524–532.Google Scholar
- Fahlman, S. (1991), “The recurrent cascade-correlation architecture,” in Advances in Neural Information Processing Systems 3, ed. R. Lipp-maim, J. Moody, and D. Touretzky, Morgan Kaufmann, 190–196.Google Scholar
- Haesloop, D. and B.R. Holt (1990), “A neural network structure for system identification,” Proc. ACC, 2460–2465.Google Scholar
- Montague, G. A., M. T. Tham, M. J. Willis, and A. J. Morris (1992), “Predictive control of distillation columns using dynamic neural networks,” 3rd IFAC Symposium on Dynamics and Control of Chemical Reactors,Distillation Columns, and Batch Processes, Maryland, USA, 231–236.Google Scholar
- Solla, S. (1992), “Capacity control in classifiers for pattern recognition,” Proc. IEEE Workshop on Neural Networks for Signal Processing II, ed. S. Kung, F. Fallside, J.A. Sorenson, and C. Kamm, 255–266.Google Scholar
- Willis, M. J., C. Di Massimo, G. A. Montague, M. T. Tham, and A. J. Morris (1991), “On artificial neural networks in process engineering,” Proceedings of IEE, Part D 138, 256–266.Google Scholar
- Zhang, J., A. J. Morris, G. A. Montague, and M. T. Tham (1994), “Dynamic system modelling using mixed node neural networks,” in preprint of IFAC Symposium ADCHEM’94, Kyoto, Japan, May 25–27, 114–119.Google Scholar