Inferring stochastic regular grammars with recurrent neural networks
Recent work has shown that the extraction of symbolic rules improves the generalization performance of recurrent neural networks trained with complete (positive and negative) samples of regular languages. This paper explores the possibility of inferring the rules of the language when the network is trained instead with stochastic, positive-only data. For this purpose, a recurrent network with two layers is used. If instead of using the network itself, an automaton is extracted from the network after training and the transition probabilities of the extracted automaton are estimated from the sample, the relative entropy with respect to the true distribution is reduced.
KeywordsHide Neuron Relative Entropy Recurrent Neural Network Generalization Performance Regular Language
Unable to display preview. Download preview PDF.
- 1.Giles, C.L., Miller, C.B, Chen, D., Chen, H.H., Sun, G.Z., Lee, Y.C.: “Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks” Neural Computation 4 (1992) 393–405.Google Scholar
- 2.Giles, C.L., Miller, C.B, Chen, D., Sun, G.Z., Chen, H.H., Lee, Y.C.: “Extracting and Learning an Unknown Grammar with Recurrent Neural Networks”, in Advances in Neural Information Processing Systems 4 (J. Moody et al., eds.), Morgan-Kaufmann, San Mateo, CA, 1992, p. 317–324.Google Scholar
- 3.Watrous, R.L., Kuhn, G.M.: “Induction of Finite-State Automata Using Second-Order Recurrent Networks” Advances in Neural Information Processing Systems 4 (J. Moody et al., Eds), Morgan-Kaufmann, San Mateo, CA, 1992, p. 306–316.Google Scholar
- 4.Watrous, R.L., Kuhn, G.M.: “Induction of Finite-State Languages Using Second-Order Recurrent Networks” Neural Computation 4 (1992) 406–414.Google Scholar
- 6.Kolen, J.F.: “Fool's Gold: Extracting Finite-State Automata from Recurrent Network Dynamics” in Advances in Neural Information Processing Systems 6 (C.L. Giles, S.J. Hanson, J.D. Cowan Eds.), Morgan-Kaufmann, San Mateo, CA, 1994, 501–508.Google Scholar
- 7.Casey, M.: “The Dynamics of Discrete-Time Computation with Application to Recurrent Neural Networks and Finite State Machine Extraction” Neural Computation (1996), to appear.Google Scholar
- 8.Blair, A.D., Pollack, J.B.: “Precise Analysis of Dynamical Recognizers”. Tech. Rep. CS-95-181, Comput. Sci. Dept., Brandeis Univ. (1996).Google Scholar
- 10.Castaño, M.A., Casacuberta, F.E. Vidal, E.: “Simulation of Stochastic Regular Grammars through Simple Recurrent Networks”, in New Trends in Neural Computation (J. Mira, J. Cabestany and A. Prieto, Eds.). Lecture Notes in Computer Science 686, Springer-Verlag, Berlin, 1993, p. 210–215.Google Scholar
- 11.Carrasco, R.C., Oncina, J.: “Learning Stochastic Regular Grammars by Means of a State Merging Method” in Grammatical Inference and Applications (R.C. Carrasco and J. Oncina, Eds.), Lecture Notes in Artificial Intelligence 862, Springer-Verlag, Berlin, 1994, p. 139–152.Google Scholar
- 12.Williams, R.J., Zipser, D.: “A learning algorithm for continually running fully recurrent neural networks” Neural Computation 1 (1989) 270–280.Google Scholar
- 13.Forcada, M.L., Carrasco, R.C.: “Learning the Initial State of a Second-Order Recurrent Neural Network during Regular-Language Inference” Neural Computation 7 (1995) 923–930.Google Scholar
- 14.Cover, T.M. and Thomas, J.A.: Elements of Information Theory, John Wiley and Sons, New York, 1991.Google Scholar