The Extraction of Information and Knowledge from Trained Neural Networks

  • David J. Livingstone
  • Antony Browne
  • Raymond Crichton
  • Brian D. Hudson
  • David Whitley
  • Martyn G. Ford
Part of the Methods in Molecular Biology™ book series (MIMB, volume 458)

Abstract

In the past, neural networks were viewed as classification and regression systems whose internal representations were incomprehensible. It is now becoming apparent that algorithms can be designed that extract comprehensible representations from trained neural networks, enabling them to be used for data mining and knowledge discovery, that is, the discovery and explanation of previously unknown relationships present in data. This chapter reviews existing algorithms for extracting comprehensible representations from neural networks and outlines research to generalize and extend the capabilities of one of these algorithms, TREPAN. This algorithm has been generalized for application to bioinformatics data sets, including the prediction of splice site junctions in human DNA sequences, and cheminformatics. The results generated on these data sets are compared with those generated by a conventional data mining technique (C5) and appropriate conclusions are drawn.

Keywords

Bioinformatics cheminformatics rule extraction C5 TREPAN data miningvM-of-Nrule 

Notes

Acknowledgement

This chapter is dedicated to our dear friend and colleague, Martyn, who passed away after a brave fight with cancer on June 7, 2007.

References

  1. 1.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA.Google Scholar
  2. 2.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont, CA.Google Scholar
  3. 3.
    Craven MW, Shavlik JW (1994) Using sampling and queries to extract rules from trained neural networks. In: Proc. of the 11th international conference on machine learning. Morgan Kaufmann, San Mateo, CA, pp. 37–45.Google Scholar
  4. 4.
    Bullinaria JA (1997) Analysing the internal representations of trained neural networks. In: Browne A (ed) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK, pp. 3–26Google Scholar
  5. 5.
    Browne A (1997) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK.Google Scholar
  6. 6.
    Gallant SI (1998) Connectionist expert systems, Communications of the ACM 31:152–169.CrossRefGoogle Scholar
  7. 7.
    Gallant SI, Hayashi Y (1990) A neural network expert system with confidence measurements. IPMU:562–567.Google Scholar
  8. 8.
    Saito K, Nakano R (1988) Medical diagnostic expert system based on PDP model. In: Proc. of IEEE international conf. on neural networks, pp. 255–262.Google Scholar
  9. 9.
    Shavlik J, Towell G (1989) An approach to combining explanation-based and neural learning algorithms, Connection Science 1:233–255.CrossRefGoogle Scholar
  10. 10.
    Baba K, Enbutu I, Yoda M (1990) Explicit representation of knowledge acquired from plant historical data using neural networks. Neural Networks. 3:155–160.Google Scholar
  11. 11.
    Bochereau L, Boutgine P (1990) Extraction of semantic features and logical rules from multilayer neural networks. In: International joint conference on neural networks, Washington, DC, vol. 2, pp. 579–582.Google Scholar
  12. 12.
    Goh TH (1993) Semantic extraction using neural network modelling and sensitivity analysis. In Proc. international joint conf. on neural networks, Nagoya, Japan, pp.1031–1034.Google Scholar
  13. 13.
    McMillan C, Mozer M, Smolensky P (1993) Dynamic conflict resolution in a connectionist rule-based system. In: Proc. of the 13th IJCAI, pp.1366–1371.Google Scholar
  14. 14.
    Yeung D, Fong H (1994) Knowledge matrix: AN explanation and knowledge refinement facility for a rule induced neural network. In: Proc. 12th national conf. on artificial intelligence, vol. 2, pp. 889–894Google Scholar
  15. 15.
    Yoon B, Lacher R (1994) Extracting rules by destructive learning. In: Neural networks, 1994. IEEE world congress on computational intelligence, pp. 1766–1771.Google Scholar
  16. 16.
    Sethi I, Yoo J (1994) Symbolic approximation of feedforward networks. In: Gesema E, Kanal L (eds) Pattern recognition in practice, IV: multiple paradigms, comparative studies and hybrid systems. North-Holland, Amsterdam, pp. 313–324.Google Scholar
  17. 17.
    Fletcher G, Hinde C (1995) Using neural networks as a tool for constructive rule based architectures. Knowledge Based Systems 8:183–187.CrossRefGoogle Scholar
  18. 18.
    Thrun SB (1995) Extracting rules from artificial neural networks with distributed representations. In: Tesauro G, Touretzky D, Leen T (eds) Advances in neural information processing systems MIT Press, San Mateo, CA, pp. 505–512.Google Scholar
  19. 19.
    Benitez J, Castro J, Requina JI (1997) Are artificial neural networks black boxes? IEEE Trans Neural Networks 8:1156–1164.CrossRefGoogle Scholar
  20. 20.
    Taha I, Ghosh J (1997) Evaluating and ordering of rules extracted from feedforward networks. In: Proc. IEEE international conf. on neural networks, pp. 408–413.Google Scholar
  21. 21.
    Ampratwum CS, Picton PD, Browne A (1998) Rule extraction from neural network models of chemical species in optical emission spectra. In: Proc. workshop on recent advances in soft computing, pp. 53–64.Google Scholar
  22. 22.
    Maire F (1999) Rule extraction by backpropagation of polyhedrons. Neural Networks, 12:717–725.CrossRefPubMedGoogle Scholar
  23. 23.
    Ishikawa M (2000) Rule extraction by successive regularization. Neural Networks 13:1171–183.CrossRefPubMedGoogle Scholar
  24. 24.
    Setiono R (2000) Extracting m-of-n rules from trained neural networks. IEEE Trans Neural Networks 11:512–519.CrossRefGoogle Scholar
  25. 25.
    Ultsch A, Mantyk R, Halmans G (1993) Connectionist knowledge aquisition tool CONKAT. In: Hand J (ed) Artificial intelligence frontiers in statistics AI and statistics, vol. III, Chapman and Hall, London, pp. 256–263.Google Scholar
  26. 26.
    Giles C, Omlin C (1993) Extraction, insertion, and refinement of symbolic rules in dynamically driven recurrent networks. Connection Science 5:307–328.CrossRefGoogle Scholar
  27. 27.
    Giles C, Omlin C (1993) Rule refinement with recurrent neural networks. In: Proc. IEEE international conf. on neural networks, pp. 801–806.Google Scholar
  28. 28.
    McGarry K, Wermter S, MacIntyre J (1999) Knowledge extraction from radial basis function networks and multi layer perceptrons. In: Proc. international joint conf. on neural networks (Washington, DC), pp. 2494–2497.Google Scholar
  29. 29.
    Andrews R, Tickle AB, Golea M, Diederich J (1997) Rule extraction from trained artificial neural networks. In: Browne A (ed.) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK, pp. 61–100.Google Scholar
  30. 30.
    Tickle, A, Maire, F, Bologna, G, Andrews, R, Diederich J (2000) Lessons from past, current issues, and future research directions in extracting knowledge embedded in artificial neural networks. In: Wermter S, Sun R (eds) Hybrid neural systems. Springer-Verlag, Berlin, pp. 226–239.CrossRefGoogle Scholar
  31. 31.
    Shavlik J (1994) Combining symbolic and neural learning. Machine Learning 14:321–331Google Scholar
  32. 32.
    Bologna G (2000) Rule extraction from a multilayer perceptron with staircase activation functions. In: Proc. international joint conf. on neural networks, Como, Italy, pp. 419–424.Google Scholar
  33. 33.
    Craven MW, Shavlik JW (1997) Understanding time series networks. Int J Neural Syst 8:373–384CrossRefPubMedGoogle Scholar
  34. 34.
    Browne A (1998) Detecting systematic structure in distributed representations. Neural Networks 11:815–824.CrossRefPubMedGoogle Scholar
  35. 35.
    Browne A, Picton P (1999) Two analysis techniques for feed-forward networks. Behaviormetrika 26:75–87.CrossRefGoogle Scholar
  36. 36.
    Hayashi Y (1991) A neural expert system with automated extraction of fuzzy if-then rules and its application to medical diagnosis. In: Lippmann R, Moody J, Touretzky D (eds) Advances in neural information processing systems, vol. 3. Morgan Kaufmann, San Mateo, CA.Google Scholar
  37. 37.
    Halgamuge S.K, Glesner M (1994) Neural networks in designing fuzzy systems for real world applications. Fuzzy Sets and Systems 65:1–12.CrossRefGoogle Scholar
  38. 38.
    Carpenter G, Tan, A.H. (1995) Rule extraction: From neural architecture to symbolic representation. Connect. Sci 7:3–27.CrossRefGoogle Scholar
  39. 39.
    Mitra S, Hayashi Y (2000) Neuro-fuzzy rule generation: survey in a soft computing framework. IEEE Trans Neural Networks 11:748–768.CrossRefGoogle Scholar
  40. 40.
    Sun R, Peterson T (1998) Autonomous learning of sequential tasks: experiments and analyses. IEEE Trans Neural Networks 9:1217–1234.CrossRefGoogle Scholar
  41. 41.
    Towell G, Shavlik JW (1993) The extraction of refined rules from knowledge based neural networks. Machine Learning 31:71–101.Google Scholar
  42. 42.
    Fu L (1994) Rule generation from neural networks. IEEE Trans Systems, Man and Cybernetics, 24:1114–1124.CrossRefGoogle Scholar
  43. 43.
    Thrun SB (1994) Extracting provably correct rules from neural networks. Technical report IAI-TR-93–5, Institut fur Informatik III Universitat Bonn.Google Scholar
  44. 44.
    Craven MW, Shavlik JW (1997) Understanding time series networks. Int J Neural Syst 8:373–384.CrossRefPubMedGoogle Scholar
  45. 45.
    Matlab. The Mathworks Inc., Natick, MA, www.mathworks.com/products/matlab.
  46. 46.
    Nabney IT (2002) NETLAB: algorithms for pattern recognition. Springer, Heidelberg, www.ncrg.aston.ac.uk/netlab.
  47. 47.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–286.CrossRefGoogle Scholar
  48. 48.
    Vapnik V (1995) The nature of statistical learning theory. Springer, New York.Google Scholar
  49. 49.
    Browne A, Hudson BD, Whitley DC,Ford MG, Picton P (2004) Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic domain problems. Neurocomputing 57:275–293.CrossRefGoogle Scholar
  50. 50.
    Rupniak, NM, Kramer MS (1999) Discovery of the antidepressant and anti-emetic efficacy of substance P receptor (NK1) antagonists. Trends Pharmacol Sci 20:485–490.CrossRefPubMedGoogle Scholar
  51. 51.
    Wang, J.X, DiPasquale, A.J, Bray, A.M, Maeji N.J, Geysen, H.M. (1993) Study of stereo-requirements of Substance P binding to NK1 receptors using analogues with systematic D-amino acid replacements. Biorg. Med. Chem. Lett., 3:451–456.CrossRefGoogle Scholar
  52. 52.
    Kulp D, Haussler D, Reese MG, Eeckman FH (1996) A generalized hidden Markov model for the recognition of human genes in DNA. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 134–142.Google Scholar
  53. 53.
    Salzberg S, Chen X, Henderson J, Fasman K (1996) Finding genes in DNA using decision trees and dynamic programming. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 201–210.Google Scholar
  54. 54.
    Yada T, Hirosawa M (1996) Gene recognition in cyanobacterium genomic sequence data using the hidden Markov model. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 252–260.Google Scholar
  55. 55.
    Ying X, Uberbacher EC (1996) Gene prediction by pattern recognition and homology search. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 241–251.Google Scholar
  56. 56.
    Burset, M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomics 34:353–367.CrossRefPubMedGoogle Scholar
  57. 57.
    Thanaraj TA (1999) A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures. Nucleic Acids Res 27:2627–2637.CrossRefPubMedGoogle Scholar
  58. 58.
    Thanaraj TA (2000) Positional characterization of false positives from computational prediction of human splice sites. Nucleic Acids Res 28:744–754.CrossRefPubMedGoogle Scholar
  59. 59.
    Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comp Sci 41:1308–-1315.Google Scholar
  60. 60.
    Cerius-2. MSI Inc., San Leandro, CA.Google Scholar
  61. 61.
    Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Res 46:3–26.CrossRefGoogle Scholar
  62. 62.
    Kiralj R, Ferreira MMC (2003) A priori molecular descriptors in QSAR: a case of HIV-1 protease inhibitors, I. The chemometric approach J Mol Graph Mod 21:435–448.CrossRefGoogle Scholar
  63. 63.
    Young S, Sacks S (2000) Analysis of a large, high-throughput screening data using recursive partitioning. In: Gundertofte K, Jorgensen FS (eds) Molecular modelling and prediction of biological activity. Kluwer Academic/Plenum Press, New York, pp. 149–156.Google Scholar
  64. 64.
    Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, Whitley DC, Pitt WR (2003) A consensus neural network based technique for identifying poorly soluble compounds. J Chem Inf Comput Sci 43:674–679.PubMedGoogle Scholar
  65. 65.
    Watson JD, Hopkins NH, Roberts JW, Argetsinger J, Weiner A (1987) Molecular biology of the gene (4th edn). Benjamin Cummings, Menlo Park, CA.Google Scholar
  66. 66.
    Sharkey AJC, Sharkey NE, Chandroth GO (1996) Neural nets and diversity. Neural Computing and Applications 4:218–227.CrossRefGoogle Scholar
  67. 67.
    Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recogn 7:705–719.CrossRefGoogle Scholar
  68. 68.
    Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227.Google Scholar
  69. 69.
    Breiman L (1996) Bagging predictors. Mach Learn 26:123–140.Google Scholar
  70. 70.
    Wolpert DH (1992) Stacked generalization. Neural Networks 5:241–259.CrossRefGoogle Scholar
  71. 71.
    Yang S, Browne A, Picton P (2002) Multistage neural network ensembles. In: Proc. 3rd international workshop on multiple classifier systems, lecture notes in computer science, vol. 2364. Springer, Heidelberg, pp. 91–97.Google Scholar

Copyright information

© Humana Press, a part of Springer Science + Business Media, LLC 2008

Authors and Affiliations

  • David J. Livingstone
    • 1
  • Antony Browne
    • 2
  • Raymond Crichton
    • 3
  • Brian D. Hudson
    • 4
  • David Whitley
    • 5
  • Martyn G. Ford
  1. 1.ChemQuest, Sandown, UK and Centre for Molecular DesignUniversity of PortsmouthPortsmouthUK
  2. 2.Department of Computing, School of Engineering and Physical SciencesUniversity of SurreyGuildfordUK
  3. 3.Centre for Molecular DesignUniversity of PortsmouthPortsmouthUK
  4. 4.Centre for Molecular DesignUniversity of PortsmouthUK
  5. 5.Centre for Molecular DesignUniversity of PortsmouthPortsmouthUK

Personalised recommendations