Abstract
In this paper, we propose some techniques for injecting finite state automata into Recurrent Radial Basis Function networks (R2BF). When providing proper hints and constraining the weight space properly, we show that these networks behave as automata. A technique is suggested for forcing the learning process to develop automata representations that is based on adding a proper penalty function to the ordinary cost. Successful experimental results are shown for inductive inference of regular grammars.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Abu-Mostafa, Y. S. (1990). Learning from hints in neural networks. Journal of Complexity, 6:192–198.
Al-Mashouq, K. A. and Reed, I. S. (1991). Including hints in training neural nets. Neural Computation, 3(4):418.
Angluin, D. and Smith, C. H. (1983). Inductive inference: Theory and methods. ACM Computing Surveys, 15(3):237–269.
Baum, E. B. and Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1(1):151–160.
Bengio, Y., Gori, M. and Mori, R. D. (1992). Learning the dynamic nature of speech with back-propagation for sequences. Pattern Recognition Letters, 13(5):375–385. Special issue on Artificial Neural Networks.
Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166. Special Issue on Dynamic Recurrent Neural Networks.
Bianchini, M., Gori, M. and Maggini, M. (1994). On the problem of local minima in recurrent neural networks. IEEE Transactions on Neural Networks. 5(2):167–177. Special Issue on Dynamic Recurrent Neural Networks.
Cleeremans, A., Servan-Schreiber, D. and McClelland, J. L. (1989). Finite state automata and simple recurrent networks. Neural Computation 1(3):372–381.
Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, EC-14:326–334.
Das, S. and Mozer, M. C. (1994). A unified gradient-descent/clustering architecture for finite state machine induction. In Cowan, J., Tesauro, G., and Alspector, J., editors, Neural Information Processing Systems 6, pages 19–26.
Elman, J. L. (1990). Finding structure in time. Cognitive Sciences, 14:179–211.
Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2/3):195–226. Special issue on Connectionist Approaches to Language Learning.
Fogelman-Soulie, F., Robert, Y., and Tchuente, M. (1987). Automata networks in computers science. Manchester University Press.
Frasconi, P. and Gori, M. (1993). Multilayered networks and the C-G uncertainty principle. In SPIE International Conference, Science of Artificial Neural Networks, pages 396–401, Orlando, Florida.
Frasconi, P., Gori, M., Maggini, M., and Soda, G. (1991). A unified approach for integrating explicit knowledge and learning by examples in recurrent networks. In Proceedings of IEEE-IJCNN91, volume I, pages 811–816, Seattle WA.
Frasconi, P., Gori, M., Maggini, M., and Soda, G. (1995). Unified integration of explicit rules and learning by example in recurrent networks. IEEE Transactions on Knowledge and Data Engineering, 7.
Frasconi, P., Gori, M., and Soda, G. (in press). Recurrent neural networks and prior knowledge for sequence processing: a constrained nondeterministic approach. Knowledge-based Systems.
Geman, S., Bienenstock, E., and Dourstat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1–58.
Giles, C. L. and Maxwell, T. (1987). Learning, invariance, and generalization in high-order neural networks. Applied Optics, 26(23):4972–4978.
giles, C. L., Miller, C. B., Chen, D., Chen, H.H., Sun, G.Z., and Lee, Y. C. (1992a). Extracting and learning an unknown grammar with recurrent neural networks. In Moody, J., Hanson, S. and Lippmann, R., editors, Advances in Neural Information Processing System 4, pages 317–324, San Mateo CA. Morgan Kauffman Publishers.
Giles, C. L., Miller, C. B., Chen, D., Sun, G. Z., Chen, H. H., Sun, G.Z., and Lee, Y. C. (1992b). Learning and extracting finite state automata with second-order recurrent neural networks. Neural Computation, 4(3):393–405.
Gold, E. M. (1967). Language identification in the limit. Information and Control, 10:447–474.
Goles, E. and Martinez, S. (1990). Neural and Automata Networks. Kluwer Academic Publishers, Dordrecht, Boston, London.
Gori, M., Maggini, M., and Soda, G. (1994). Scheduling of modular architectures for inductive inference of regular grammars. In Proceedings of the workshop on Combining Symbolic and Connectionist Processing, ECAI'94, pages 78–87, Amsterdam.
Gori, M. and Soda, G. (1993). Projecting sub-symbolic onto symbolic representations in artificial neural networks. In Torasso, P., editor, Lecture Notes in Artificial Intelligence, Advances in Artificial Intelligence, pages 84–89.
Gori, M. and Tesi, A. (1992). On the problem of local minima in backpropagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(1):76–86.
Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading MA.
Kolen, J. F. (1994). Recurrent networks: State machines or iterated function systems? In Mozer, M. C., Smolensky, P., Touretzky, D. S., Elman, J. L., and Weigend, A. S., editors, Proceedings of the 1993 Connectionist Models Summer School, pages 203–210, Hillsdale NJ, Erlbaum.
Kuhn, G., Watrous, R. L. and Ladendorf, B. (1990). Connected recognition with a recurrent network. Speech Communication, 9:41–49.
le Cun, Y. (1989). Generalization and network design strategies. In Pfeifer, R., Schreter, Z., Fogelman, F., and Steels, L., editors, Connectionism in Perspective, pages 143–155, Amsterdam. Elsevier. Proceedings of the International Conference Connectionism in Perspective, University of Zürich, 10–13. October 1988.
Mano, M. M. (1988). Computer Engineering, Hardware Design. Prentice-Hall.
McCulloch, W. S. and Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115–133.
Miller, C. B. and Giles, C. L. (1993). Experimental comparison of the effect of order in recurrent neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7(4):849–872. Special Issue on Applications of Neural Networks to Pattern Recognition.
Minsky, M. L. and Papert, S. A. (1988). Perceptrons-Expanded Edition. MIT Press, Cambridge.
Moody, J. and Darken, C. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281–294.
Omlin, C. W. and giles, C. L. (1992). Training second-order recurrent neural networks using hints. In Sleeman D. and Edwards, P., editors, Proceedings of the Ninth International Conference on Machine Learning, pages 363–368, San Mateo CA. Morgan Kaufman Publishers.
Omlin, C. W. and Giles, C. L. (1994). Constructing deterministic finite-state automata in sparse recurrent neural networks. In Proceedings of the IEEE International Conference on Neural Networks (ICNN'94), pages 1732–1737.
Perantonis, S. J. and Lisboa, P. J. G. (1992). Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers. IEEE Transactions on Neural Networks, 3(2):241–251.
Pollack, J. B. (1991). The induction of dynamical recognizers. Machine Learning, 7(2/3):196–227. Special issue on Connectionist Approaches to Language Learning.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning Internal Representations by Error Propagation, volume 1: Foundations, chapter 8, pages 318–362. MIT Press, Cambridge.
Servan-Schreiber, D., Cleeremans, A., and McClelland, J. L. (1991). Graded state machines: the representation of temporal contingencies in simple recurrent networks. Machine Learning, 7(2/3):161–194. Special issue on Connectionist Approaches to Language Learning.
Shavlik, J. W. (1994). Combining symbolic and neural training. Machine Learning, 14(3):321–331.
Sontag, E. D. and Sussman, H. J. (1989). Backpropagation separates when perceptrons do. In Proceedings of the International Joint Conference on Neural Networks, volume I, pages 639–642, Washington DC. IEEE Press.
Tomita, M. (1982). Dynamic construction of finite-state automata from examples using hill-climbing. In Proceedings of the Fourth Annual Cognitive Science Conference, pages 105–108, Ann Arbor MI.
Towell, G. G. and Shavlik, J. W. (1993). The extraction of refined rules from knowledge-based neural networks. Machine Learning, 13(1):71–101.
Towell, G. G., Shavlik, J. W., and Noordewier, M. O. (1990). Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), pages 861–866, Boston MA.
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3):328–339.
Watrous, R. L. and Kuhn, G. M. (1992). Induction of finite-state languages using second-order recurrent networks. Neural Computation, 4(3):406–414.
Watrous, R. L., Towell, G. G., Glassman, M. S., Shahraray, M., and Theivanayagam, D. (1993). Synthesize, optimize, analyze, repeat (SOAR): Application of neural network tools to ECG patient monitoring. In Proceedings of the 1993 International Symposium on Nonlinear Theory and Its Applications, Honolulu.
Williams, R. J. and Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation. 2(4):490–501.
Williams, R. J. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280.
Yu, X. H. (1992). Can backpropagation error surface not have local minima? IEEE Transactions on Neural Networks, 3(6):1019–1020.
Zeng, Z., Goodman, R., and Smyth, P. (1993). Learning finite state machines with self-clustering recurrent networks. Neural Computation, 5(6):976–990.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Frasconi, P., Gori, M., Maggini, M. et al. Representation of finite state automata in Recurrent Radial Basis Function networks. Mach Learn 23, 5–32 (1996). https://doi.org/10.1007/BF00116897
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00116897