Advertisement

Machine Learning

, Volume 13, Issue 1, pp 71–101 | Cite as

Extracting refined rules from knowledge-based neural networks

  • Geoffrey G. Towell
  • Jude W. Shavlik
Article

Abstract

Neural networks, despite their empirically proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge must be inserted into a neural network. Second, the network must be refined. Third, the refined knowledge must be extracted from the network. We have previously described a method for the first step of this process. Standard neural learning techniques can accomplish the second step. In this article, we propose and empirically evaluate a method for the final, and possibly most difficult, step. Our method efficiently extracts symbolic rules from trained neural networks. The four major results of empirical tests of this method are that the extracted rules 1) closely reproduce the accuracy of the network from which they are extracted; 2) are superior to the rules produced by methods that directly refine symbolic rules; 3) are superior to those produced by previous techniques for extracting rules from trained neural networks; and 4) are “human comprehensible.” Thus, this method demonstrates that neural networks can be used to effectively refine symbolic knowledge. Moreover, the rule-extraction technique developed herein contributes to the understanding of how symbolic and connectionist approaches to artificial intelligence can be profitably integrated.

Keywords

theory refinement integrated learning representational shift rule extraction from neural networks 

References

  1. Berenji, H.R. (1991). Refinement of approximate reasoning-based controllers by reinforcement learaning.Proceedings of the Eighth International Machine Learning Workshop (pp. 475–479). Evanston, IL: Morgan Kaufmann.Google Scholar
  2. Bochereau, L., & Bourgine, P. (1990). Extraction of semantic features and logical rules from a multilayer neural network.International Joint Conference on Neural Networks (Vol. 2) (pp. 579–582). Washington, D.C.: Erlbaum.Google Scholar
  3. Bruner, J.S., Goodnow, J.J., & Austin, G.A. (1956).A study of thinking. New York: Wiley.Google Scholar
  4. Dzeroski, S., & Lavrac, N. (1991). Learning relations from noisy examples: An empirical comparison of LINUS and FOIL.Proceedings of the Eighth International Machine Learning Workshop (pp. 399–402). Evanston, IL: Morgan Kaufmann.Google Scholar
  5. Fahlman, S.E., & Lebiere, C. (1989). The cascade-correlation learning architecture.Advances in neural information processing systems (Vol. 2) (pp. 524–532). Denver, CO: Morgan Kaufmann.Google Scholar
  6. Fisher, D.H., & McKusick, K.B. (1989). An empirical comparison of ID3 and backpropagation.Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 788–793). Detroit, MI: Morgan Kaufmann.Google Scholar
  7. Fu, L.M. (1991). Rule learning by searching on adapted nets.Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 590–595). Anaheim, CA: AAAI Press.Google Scholar
  8. Goldman, S.A., & Kearns, M.J. (1991). On the complexity of teaching.Proceedings of the Fourth Annual Workshop on Computational Learning Theory (pp. 303–314). Santa Cruz, CA: Morgan Kaufmann.Google Scholar
  9. Harley, C.B., & Reynolds, R.P. (1987). Analysis ofE. coli promoter sequences.Nucleic Acids Research, 15, 2343–2361.Google Scholar
  10. Hartigan, J.A. (1975).Clustering algorithms. New York: Wiley.Google Scholar
  11. Hawley, D.K., & McClure, W.R. (1983). Compilation and analysis of escherichia coli promotor DNA sequences.Nucleic Acids Research, 11, 2237–2255.Google Scholar
  12. Hayashi, Y. (1990). A neural expert system with automated extraction of fuzzy if-then rules.Advances in neural information processing systems (Vol. 3) (pp. 578–584). Denver, CO: Morgan Kaufmann.Google Scholar
  13. Hinton, G.E. (1989). Connectionist learning procedures.Artificial Intelligence, 40, 185–234.Google Scholar
  14. Judd, S. (1988). On the complexity of loading shallow neural networks.Journal of Complexity, 4, 177–192.Google Scholar
  15. Koudelka, G.B., Harrison, S.C., & Ptashne, M. (1987). Effect of non-contacted bases on the affinity of 434 operator for 434 repressor and Cro.Nature, 326, 886–888.Google Scholar
  16. Le Cun, Y., Denker, J.S., & Solla, S.A. (1989). Optimal brain damage.Advances in neural information processing systems (Vol. 2) (pp. 598–605). Denver, CO: Morgan Kaufmann.Google Scholar
  17. Masuoka, R., Watanabe, N., Kawamura, A., Owada, Y., & Asakawa, K. (1990). Neurofuzzy system—fuzzy inference using a structured neural network.Proceedings of the International Conference on Fuzzy Logic & Neural Networks (pp. 173–177). Iizuka, Japan.Google Scholar
  18. McDermott, J. (1982). R1: A rule-based configurer of computer systems.Artificial Intelligence, 19, 21–32.Google Scholar
  19. McMillan, C., Mozer, M.C., & Smolensky, P. (1991). The connectionist scientist game: Rule extraction and refinement in a neural network.Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society. Chicago, IL: Erlbaum.Google Scholar
  20. Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information.Psychological Review, 63, 81–97.Google Scholar
  21. Mozer, M.C., & Smolensky, P. (1988). Skeletonization: A technique for trimming the fat from a network via relevance assessment.Advances in neural information processing systems (Vol. 1) (pp. 107–115). Denver, CO: Morgan Kaufmann.Google Scholar
  22. Murphy, P.M., & Pazzani, M.J. (1991). ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees.Proceedings of the Eighth International Machine Learning Workshop (pp. 183–187). Evanston, IL: Morgan Kaufmann.Google Scholar
  23. Nessier, U., & Weene, P. (1962). Hierarchies in concept attainment.Journal of Experimental Psychology, 64, 640–645.Google Scholar
  24. Noordewier, M.O., Towell, G.G., & Shavlik, J.W. (1991). Training knowledge-based neural networks to recognize genes in DNA sequences.Advances in neural information processing systems (Vol. 3) (pp. 530–536). Denver, CO: Morgan Kaufmann.Google Scholar
  25. Nowlan, S.J., & Hinton, G.E. (1991). Simplifying neural networks by soft weight-sharing.Advances in neural information processing systems (Vol. 4) (pp. 993–1000). Denver, CO: Morgan Kaufmann.Google Scholar
  26. Ourston, D. (1991).Using explanation-based and empirical methods in theory revision. Ph.D. thesis, Department of Computer Sciences, University of Texas, Austin, TX.Google Scholar
  27. Ourston, D., & Mooney, R.J. (1990). Changing the rules: A comprehensive approach to theory refinement.Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 815–820). Boston, MA: AAAI Press.Google Scholar
  28. Pazzani, M. (1992). When prior knowledge hinders learning.Workshop Notes of Constraining Learning with Prior Knowledge (pp. 44–52). San Jose, CA.Google Scholar
  29. Pratt, L.Y., Mostow, J., & Kamm, C.A. (1991). Direct transfer of learned information among neural networks.Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 584–589). Anaheim, CA: AAAI Press.Google Scholar
  30. Quinlan, J.R. (1987). Simplifying decision trees.International Journal of Man-Machine Studies, 27, 221–234.Google Scholar
  31. Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.),Parallel distributed processing: Explorations in the microstructure of cognition. Volume 1: Foundations (pp. 318–362). Cambridge, MA: MIT Press.Google Scholar
  32. Saito, K., & Nakano, R. (1988). Medical diagnostic expert system based on PDP model.Proceedings of IEEE International Conference on Neural Networks (Vol. 1) (pp. 255–262). San Diego, CA: IEEE.Google Scholar
  33. Sestito, S., & Dillon, T. (1990). Using multi-layered neural networks for learning symbolic knowledge.Proceedings of the Fourth Australian Joint Conference on Artificial Intelligence. Perth, Australia: World Scientific.Google Scholar
  34. Shavlik, J.W., Mooney, R.J., & Towell, G.G. (1991). Symbolic and neural net learning algorithms: An empirical comparison.Machine Learning, 6, 111–143.Google Scholar
  35. Stormo, G.D. (1990). Consensus patterns in DNA.Methods in enzymology (Vol. 183). Orlando, FL: Academic Press.Google Scholar
  36. Sutton, R.S. (1986). Two problems with backpropagation and other steepest descent learning procedures for networks.Program of the Eighth Annual Conference of the Cognitive Science Society (pp. 823–831). Amherst, MA: Erlbaum.Google Scholar
  37. Thompson, K., Langley, P., & Iba, W. (1991). Using background knowledge in concept formation.Proceedings of the Eighth International Machine Learning Workshop (pp. 554–558). Evanston, IL: Morgan Kaufmann.Google Scholar
  38. Thrun, S., Bala, J. Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Dzeroski, S., Fahlman, S., Fisher, D., Hamann, R., Kaufman, K., Keller, S., Kononenko, I., Kreuziger, J., Michalski, R., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., Van de Welde, W., Wenzel, W., Wnek, J., & Zhang, J. (1991).The MONK's problem: A performance comparison of different learning algorithms. (Technical Report CMU-CS-91-197). Pittsburgh, PA: Carnegie Mellon.Google Scholar
  39. Towell, G.G. (1991).Symbolic knowledge and neural networks: Insertion, refinement, and extraction. Ph.D. thesis, Computer Sciences Department, University of Wisconsin, Madison, WI.Google Scholar
  40. Towell, G.G., & Shavlik, J.W. (1991). Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules.Advances in neural information processing systems (Vol. 4) (pp. 977–984). Denver, CO: Morgan Kaufmann.Google Scholar
  41. Towell, G.G., Shavlik, J.W., & Noordewier, M.O. (1990). Refinement of approximately correct domain theories by knowledge-based neural networks.Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 861–866). Boston: MA: AAAI Press.Google Scholar
  42. Weiss, S.M., & Kulikowski, C.A. (1990).Computer systems that learn. San Mateo, CA: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • Geoffrey G. Towell
    • 1
  • Jude W. Shavlik
    • 1
  1. 1.University of WisconsinMadison

Personalised recommendations