Annealed RNN learning of finite state automata
In recurrent neural network (RNN) learning of finite state automata (FSA), we discuss how a neuro gain (β) influences the stability of the state representation and the performance of the learning. We formally show that the existence of the critical neuro gain (β0): any β larger than β0 makes an RNN maintain the stable representation of states of an acquired FSA. Considering the existence of β0 and avoidance of local minima, we propose a new RNN learning method with the scheduling of β, called an annealed RNN learning. Our experiments show that the annealed RNN learning went beyond than a constant β learning.
Unable to display preview. Download preview PDF.
- 1.J. Hertz, A. Krogh and R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley (1991)Google Scholar
- 2.Leong K.L., Fixed Point Analysis for Discrete-Time Recurrent Neural Networks, Proc. IJCNN IV (1992) 134–139Google Scholar
- 3.P. Tiňo, B.G. Horne, C.L. Giles and P.C. Collingwood, Finite State Machines and Recurrent Neural Networks — Automata and Dynamical System Approaches, UMIACS-TR-95-1, CS-TR-3396 (1995)Google Scholar
- 4.M.W. Goudreau, C.L. Giles, S.T. Chakradhar and D. Chen, First-Order Versus Second-Order Single-Layer Recurrent Neural Networks, IEEE Neural Networks, Vol. 5 No. 3 (1994) 511–111Google Scholar
- 5.P. Manolios and R. Fanelli, First-Order Recurrent Neural Networks and Deterministic Finite State Automata, Neural Computation, Vol. 6 (1994) 1155–1173Google Scholar
- 6.S. Das, C.L. Giles and G.Z. Sun, Using Prior Knowledge in a NNPDA to Learn Context-Free Languages, Advances in Neural Information Processing Systems, Vol. 5 (1993) 65–71.Google Scholar
- 7.D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning Internal Representations by Error Propagation, in Parallel Distributed Processing, D.E. Rumelhart and J.L. McClelland (1986) Vol. 1 318–362Google Scholar
- 8.N. Ueda and R. Nakano, Deterministic Annealing Variant of the EM Algorithm, Advances in Neural Information Processing Systems, Vol. 7 (1995) 545–552.Google Scholar