Abstract
Previously neural networks have shown interesting performance results for tasks such as classification, but they still suffer from an insufficient focus on the structure of the knowledge represented therein. In this paper, we analyze various knowledge extraction techniques in detail and we develop new transducer extraction techniques for the interpretation of recurrent neural network learning. First, we provide an overview of different possibilities to express structured knowledge using neural networks. Then, we analyze a type of recurrent network rigorously, applying a broad range of different techniques. We argue that analysis techniques, such as weight analysis using Hinton diagrams, hierarchical cluster analysis, and principal component analysis may be useful for providing certain views on the underlying knowledge. However, we demonstrate that these techniques are too static and too low-level for interpreting recurrent network classifications. The contribution of this paper is a particularly broad analysis of knowledge extraction techniques. Furthermore, we propose dynamic learning analysis and transducer extraction as two new dynamic interpretation techniques. Dynamic learning analysis provides a better understanding of how the network learns, while transducer extraction provides a better understanding of what the network represents.
Similar content being viewed by others
References
S. H¨olldobler, “A structured connectionist unification algorithm,” in Proceedings of the National Conference of the American Association on Artificial Intelligence 90, Boston, MA, 1990, pp. 587–593.
F. Kurfeß, “Unification on a connectionist simulator,” in Arti-ficial Neural Networksedited by T. Kohonen, K. M¨akisara, O. Simula, and J. Kangas, North-Holland, pp. 471–476, 1991.
A. Sperduti, A. Starita, and C. Goller, “Learning distributed representations for the classifications of terms,” in Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, 1995, pp. 494–515.
S. Wermter, Hybrid Connectionist Natural Language Processing, Chapman and Hall, Thomson International: London, UK, 1995.
J. Hallam (ed.), “Hybrid Problems, Hybrid Solutions,” IOS Press: Sheffield, UK, 1996, in Proceedings of the 10th Biennial Conference on AI and Cognitive Science (AISB-95), Amsterdam.
L.R. Medsker, Hybrid Intelligent Systems, Kluwer Academic Publishers: Boston, 1995.
R. Sun, “Schemas, logics and neural assemblies,” Applied Intelligence, vol. 5, pp. 83–102, 1995.
S. Wermter, E. Riloff, and G. Scheler, Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, Springer: Berlin, 1996.
J.L. Elman, E.A. Bates, M.H. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plunkett, Rethinking Innateness, MIT Press: Cambridge, MA, 1996.
M.W. Craven, “Extracting Comprehensible Models from Trained Neural Networks,” Ph.D. Thesis, University of Wisconsin-Madison, 1996.
S. Wermter, “Preference moore machines for neural fuzzy integration,” in Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, 1999, pp. 840–845.
R. Andrews and J. Diederich, Rules and Networks, Queensland University of Technology: Brisbane, Australia, 1996.
S. Abe, M. Kayama, H. Takenaga, and T. Kitamura, “Extracting algorithms from pattern classification neural networks,” Neural Networks, vol. 6, no. 5, pp. 729–735, 1993.
J. Shavlik, “A framework for combining symbolic and neural learning,” in Artificial Intelligence and Neural Networks: Steps Towards Principled Integration, edited by V. Honavar and L. Uhr, Academic Press: San Diego, pp. 561–580, 1994.
G.E. Hinton, “Learning distributed representations of concepts,” in Proceedings of the 8th Meeting of the Cognitive Science Society, 1986.
R.P. Gorman and T.J. Sejnowski, “Analysis of hidden units in a layered network trained to classify sonar targets,” Neural Networks, vol. 1, pp. 75–89, 1988.
J.L. Elman, “Language as a dynamical system,” in Mind as Motion: Explorations in the Dynamics of Cognition, edited by R.F. Port and T. van Gelder, MIT: Cambridge, MA, pp. 195–225, 1995.
C.L. Giles and C.W. Omlin, “Extraction, insertion and refinement of symbolic rules in dynamically driven recurrent neural networks,” Connection Science, vol. 5, pp. 307–337, 1993.
C.W. Omlin and C.L. Giles, “Extraction of rules from discretetime recurrent neural networks,” Neural Networks, vol. 9, no. 1, pp. 41–52, 1996.
J. Wiles and J. Elman, “Learning to count without a counter: a case study of dynamics and activation landscapes in recurrent networks,” in Proceedings of the AAAI Workshop on Computational Cognitive Modeling: Source of the Power, Portland, Oregon, 1996.
S. Wermter and V. Weber, “SCREEN: Learning a flat syntactic and semantic spoken language analysis using artificial neural networks,” Journal of Artificial Intelligence Research, vol. 6, no. 1, pp. 35–85, 1997.
S. Wermter and M. Meurer, “Building lexical representations dynamically using artificial neural networks,” in Proceedings of the International Conference of the Cognitive Science Society, Stanford, 1997, pp. 802–807.
S.Wermter and M. L¨ochel, “Learning dialog act processing,” in Proceedings of the International Conference on Computational Linguistics, Copenhagen, Denmark, 1996, pp. 740–745.
J.L. Elman, “Distributed representations, simple recurrent networks, and grammatical structure,” Machine Learning, vol. 7, pp. 195–226, 1991.
S. Wermter, “The hybrid approach to artificial neural networkbased language processing,” in A Handbook of Natural Language Processing, edited by R. Dale, H. Moisl, and H. Somers, Marcel Dekker, 2000.
T.L. Booth, Sequential Machines and Automata Theory, John Wiley: New York, 1967.
Z.Kohavi, Switching and Finite Automata Theory, McGrawHill: New York, 1970.
M.W. Shields, An Introduction to Automata Theory, Blackwell Scientific Publications: London, 1987.
J. Hopcroft and J. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley: Reading, MA, 1979.
P.S. Churchland and T.J. Sejnowski, The Computational Brain, MIT Press: Cambridge, MA, 1992.
T. Winograd, Language as a Cognitive Process. Addison-Wesley: Reading, MA, 1983.
R. Kaplan, “Finite state technology. ” in Survey of the State of the Art in Human Language Technology, edited by R.A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, V. Zue, G. Varile, and A. Zampolli, NSF: EU, 1995, pp. 419–422.
E.S. Santos, “Fuzzy sequential functions,” Journal of Cybernetics, vol. 3, no. 3, pp. 15–31, 1973.
S.C. Kremer, “On the computational power of Elman-style recurrent networks,” IEEE Transactions on Neural Networks, vol. 6, no. 4, pp. 1000–1004, 1995.
S.C. Kremer, “A theory of grammatical induction in the connectionist paradigm,” Technical Report, Ph.D. dissertation, Dept. of Computing Science, University of Alberta, Edmonton, 1996.
M.W. Goudreau and C.L. Giles, “On recurrent neural networks and representing finite-state recognizers,” in Proceedings of the Third International Conference on Neural Networks, 1995, pp. 51–55.
M.W. Goudreau, C.L. Giles, S.T. Chakradhar, and D. Chen, “First-order vs. second-order single layer recurrent neural networks,” IEEE Transactions on Neural Networks, vol. 5, no. 3, pp. 511–513, 1994.
C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun, and Y.C. Lee, “Learning and extracted finite state automata with secondorder recurrent neural networks,” Neural Computation, vol. 4, no. 3, pp. 393–405, 1992.
P. Tino, B.G. Horne, C.L. Giles, and P.C. Collingwood, “Finite state machines and recurrent neural networks,” Technical Report CS-TR-3396, University of Maryland, College Park, 1995.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wermter, S. Knowledge Extraction from Transducer Neural Networks. Applied Intelligence 12, 27–42 (2000). https://doi.org/10.1023/A:1008320219610
Issue Date:
DOI: https://doi.org/10.1023/A:1008320219610