Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring

  • David Martens
  • Johan Huysmans
  • Rudy Setiono
  • Jan Vanthienen
  • Bart Baesens
Part of the Studies in Computational Intelligence book series (SCI, volume 80)

Summary

Innovative storage technology and the rising popularity of the Internet have generated an ever-growing amount of data. In this vast amount of data much valuable knowledge is available, yet it is hidden. The Support Vector Machine (SVM) is a state-of-the-art classification technique that generally provides accurate models, as it is able to capture non-linearities in the data. However, this strength is also its main weakness, as the generated non-linear models are typically regarded as incomprehensible black-box models. By extracting rules that mimic the black box as closely as possible, we can provide some insight into the logics of the SVM model. This explanation capability is of crucial importance in any domain where the model needs to be validated before being implemented, such as in credit scoring (loan default prediction) and medical diagnosis. If the SVM is regarded as the current state-of-the-art, SVM rule extraction can be the state-of-the-art of the (near) future. This chapter provides an overview of recently proposed SVM rule extraction techniques, complemented with the pedagogical Artificial Neural Network (ANN) rule extraction techniques which are also suitable for SVMs. Issues related to this topic are the different rule outputs and corresponding rule expressiveness; the focus on high dimensional data as SVM models typically perform well on such data; and the requirement that the extracted rules are in line with existing domain knowledge. These issues are explained and further illustrated with a credit scoring case, where we extract a Trepan tree and a RIPPER rule set from the generated SVM model. The benefit of decision tables in a rule extraction context is also demonstrated. Finally, some interesting alternatives for SVM rule extraction are listed.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    E. Altendorf, E. Restificar, and T.G. Dietterich. Learning from sparse data by exploiting monotonicity constraints. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, Edinburgh, Scotland, 2005.Google Scholar
  2. 2.
    Robert Andrews, Joachim Diederich, and Alan B. Tickle. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8(6):373-389, 1995.CrossRefGoogle Scholar
  3. 3.
    B. Baesens, R. Setiono, C. Mues, and J. Vanthienen. Using neural network rule extraction and decision tables for credit-risk evaluation. Management Science, 49(3):312-329, 2003.CrossRefGoogle Scholar
  4. 4.
    B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J.A.K. Suykens, and J. Vanthienen. Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6):627-635, 2003.MATHCrossRefGoogle Scholar
  5. 5.
    N. Barakat and J. Diederich. Learning-based rule-extraction from support vector machines. In 14th International Conference on Computer Theory and Applications ICCTA 2004 Proceedings, Alexandria, Egypt, 2004.Google Scholar
  6. 6.
    N. Barakat and J. Diederich. Eclectic rule-extraction from support vector machines. International Journal of Computational Intelligence, 2 (1):59-62, 2005.Google Scholar
  7. 7.
    A. Ben-David. Monotonicity maintenance in information-theoretic machine learning algorithms. Machine Learning, 19(1):29-43, 1995.Google Scholar
  8. 8.
    C.M. Bishop. Neural networks for pattern recognition. Oxford University Press, Oxford, UK, 1996.MATHGoogle Scholar
  9. 9.
    G.E.P. Box and D.R. Cox. An analysis of transformations. Journal of the Royal Statistical Society Series B, 26:211-243, 1964.MATHMathSciNetGoogle Scholar
  10. 10.
    O. Boz. Converting A Trained Neural Network To A Decision Tree. DecText -Decision Tree Extractor. PhD thesis, Lehigh University, Department of Computer Science and Engineering, 2000.Google Scholar
  11. 11.
    L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression trees. Wadsworth and Brooks, Monterey, CA, 1994.Google Scholar
  12. 12.
    P.L. Brockett, X. Xia, and R. Derrig. Using kohonen's self-organizing feature map to uncover automobile bodily injury claims fraud. International Journal of Risk and Insurance, 65:245-274, 1998.CrossRefGoogle Scholar
  13. 13.
    M. Brown, W. Grundy, D. Lin, N. Cristianini, C. Sugnet, M. Ares Jr., and D. Haussler. Support vector machine classification of microarray gene expression data. Technical UCSC-CRL-99-09, University of California, Santa Cruz, 1999.Google Scholar
  14. 14.
    F. Chen. Learning accurate and understandable rules from SVM classifiers. Master's thesis, Simon Fraser University, 2004.Google Scholar
  15. 15.
    P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261-283, 1989.Google Scholar
  16. 16.
    W. Cohen. Fast effective rule induction. In Armand Prieditis and Stuart Russell, editors, Proceedings of the 12th International Conference on Machine Learning, pages 115-123, Tahoe City, CA, 1995. Morgan Kaufmann Publishers.Google Scholar
  17. 17.
    M.W. Craven. Extracting Comprehensible Models from Trained Neural Networks. PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison, 1996.Google Scholar
  18. 18.
    M.W. Craven and J.W. Shavlik. Extracting tree-structured representations of trained networks. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 24-30. The MIT Press, 1996.Google Scholar
  19. 19.
    M.W. Craven and J.W. Shavlik. Rule extraction: Where do we go from here? Working paper, University of Wisconsin, Department of Computer Sciences, 1999.Google Scholar
  20. 20.
    N. Cristianini and J. Shawe-Taylor. An introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, New York, NY, USA, 2000.Google Scholar
  21. 21.
    H. Daniels and M. Velikova. Derivation of monotone decision models from non-monotone data. Discussion Paper 30, Tilburg University, Center for Economic Research, 2003.Google Scholar
  22. 22.
    G. Deboeck and T. Kohonen. Visual Explorations in Finance with selforganizing maps. Springer-Verlag, 1998.Google Scholar
  23. 23.
    EMC. Groundbreaking study forecasts a staggering 988 billion gigabytes of digital information created in 2010. Technical report, EMC, March 6, 2007.Google Scholar
  24. 24.
    A.J. Feelders and M. Pardoel. Pruning for monotone classification trees. In Advanced in intelligent data analysis V, volume 2810, pages 1-12. Springer, 2003.Google Scholar
  25. 25.
    G. Fung, S. Sandilya, and R.B. Rao. Rule extraction from linear support vector machines. In Proceedings of the 11th ACM SIGKDD international Conference on Knowledge Discovery in Data Mining, pages 32-40, 2005.Google Scholar
  26. 26.
    S. Hettich and S. D. Bay. The uci kdd archive [http://kdd.ics.uci.edu], 1996.
  27. 27.
    T. Honkela, S. Kaski, K. Lagus, and T. Kohonen. WEBSOM—self-organizing maps of document collections. In Proceedings of Workshop on Self-Organizing Maps (WSOM’97), pages 310-315. Helsinki University of Technology, Neural Networks Research Centre, Espoo, Finland, 1997.Google Scholar
  28. 28.
    J. Huysmans, B. Baesens, and J. Vanthienen. ITER: an algorithm for predictive regression rule extraction. In 8th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2006), volume 4081, pages 270-279. Springer Verlag, lncs 4081, 2006.Google Scholar
  29. 29.
    J. Huysmans, B. Baesens, and J. Vanthienen. Using rule extraction to improve the comprehensibility of predictive models. Research 0612, K.U.Leuven KBI, 2006.Google Scholar
  30. 30.
    J. Huysmans, B. Baesens, and J. Vanthienen. Minerva: sequential covering for rule extraction. 2007.Google Scholar
  31. 31.
    J. Huysmans, D. Martens, B. Baesens, J. Vanthienen, and T. van Gestel. Country corruption analysis with self organizing maps and support vector machines. In International Workshop on Intelligence and Security Informatics (PAKDD-WISI 2006), volume 3917, pages 103-114. Springer Verlag, lncs 3917, 2006.Google Scholar
  32. 32.
    J. Huysmans, C. Mues, B. Baesens, and J. Vanthienen. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. 2007.Google Scholar
  33. 33.
    T. Joachims. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell, MA, USA, 2002.Google Scholar
  34. 34.
    U. Johansson, R. König, and L. Niklasson. Rule extraction from trained neural networks using genetic programming. In Joint 13th International Conference on Artificial Neural Networks and 10th International Conference on Neural Information Processing, ICANN/ICONIP 2003, pages 13-16, 2003.Google Scholar
  35. 35.
    U. Johansson, R. König, and L. Niklasson. The truth is in there - rule extraction from opaque models using genetic programming. In 17th International Florida AI Research Symposium Conference FLAIRS Proceedings, 2004.Google Scholar
  36. 36.
    R. Kohavi and J.R. Quinlan. Decision-tree discovery. In W. Klosgen and J. Zytkow, editors, Handbook of Data Mining and Knowledge Discovery, pages 267-276. Oxford University Press, 2002.Google Scholar
  37. 37.
    T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59-69, 1982.MATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    T. Kohonen. Self-Organising Maps. Springer-Verlag, 1995.Google Scholar
  39. 39.
    M. Mannino and M. Koushik. The cost-minimizing inverse classification problem: A genetic algorithm approach. Decision Support Systems, 29:283-300, 2000.CrossRefGoogle Scholar
  40. 40.
    U. Markowska-Kaczmar and M. Chumieja. Discovering the mysteries of neural networks. International Journal of Hybrid Intelligent Systems, 1(3-4):153-163, 2004.MATHGoogle Scholar
  41. 41.
    U. Markowska-Kaczmar and W. Trelak. Extraction of fuzzy rules from trained neural network using evolutionary algorithm. In European Symposium on Artificial Neural Networks (ESANN), pages 149-154, 2003.Google Scholar
  42. 42.
    D. Martens, B. Baesens, T. Van Gestel, and J. Vanthienen. Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research, Forthcoming.Google Scholar
  43. 43.
    D. Martens, M. De Backer, R. Haesen, B. Baesens, C. Mues, and J. Vanthienen. Ant-based approach to the knowledge fusion problem. In Proceedings of the Fifth International Workshop on Ant Colony Optimization and Swarm Intelligence, Lecture Notes in Computer Science, pages 85-96. Springer, 2006.Google Scholar
  44. 44.
    D. Martens, M. De Backer, R. Haesen, M. Snoeck, J. Vanthienen, and B. Baesens. Classification with ant colony optimization. IEEE Transaction on Evolutionary Computation, Forthcoming.Google Scholar
  45. 45.
    R. Michalski. On the quasi-minimal solution of the general covering problem. In Proceedings of the 5th International Symposium on Information Processing (FCIP 69), pages 125-128, 1969.Google Scholar
  46. 46.
    H. Nú ñez, C. Angulo, and A. Català. Rule extraction from support vector machines. In European Symposium on Artificial Neural Networks (ESANN), pages 107-112, 2002.Google Scholar
  47. 47.
    M. Pazzani, S. Mani, and W. Shankle. Acceptance by medical experts of rules generated by machine learning. Methods of Information in Medicine, 40(5):380-385, 2001.Google Scholar
  48. 48.
    J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81-106, 1986.Google Scholar
  49. 49.
    J.R. Quinlan. C4.5 programs for machine learning. Morgan Kaufmann, 1993.Google Scholar
  50. 50.
    J.R. Rabuñal, J. Dorado, A. Pazos, J. Pereira, and D. Rivero. A new approach to the extraction of ANN rules and to their generalization capacity through GP. Neural Computation, 16(47):1483-1523, 2004.MATHCrossRefGoogle Scholar
  51. 51.
    B.D. Ripley. Neural networks and related methods for classification. Journal of the Royal Statistical Society B, 56:409-456, 1994.MATHMathSciNetGoogle Scholar
  52. 52.
    G.P.J. Schmitz, C. Aldrich, and F.S. Gouws. Ann-dt: An algorithm for the extraction of decision trees from artificial neural networks. IEEE Transactions on Neural Networks, 10(6):1392-1401, 1999.CrossRefGoogle Scholar
  53. 53.
    R. Setiono, B. Baesens, and C. Mues. Risk management and regulatory compliance: A data mining framework based on neural network rule extraction. In Proceedings of the International Conference on Information Systems (ICIS 2006), 2006.Google Scholar
  54. 54.
    J. Sill. Monotonic networks. In Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.Google Scholar
  55. 55.
    D.W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.Google Scholar
  56. 56.
    I.A. Taha and J. Ghosh. Symbolic interpretation of artificial neural networks. IEEE Transactions on Knowledge and Data Engineering, 11(3):448-463, 1999.CrossRefGoogle Scholar
  57. 57.
    P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison Wesley, Boston, MA, 2005.Google Scholar
  58. 58.
    M. Tipping. The relevance vector machine. In Advances in Neural Information Processing Systems, San Mateo, CA. Morgan Kaufmann, 2000.Google Scholar
  59. 59.
    M. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211-244, 2001.MATHCrossRefMathSciNetGoogle Scholar
  60. 60.
    T. Van Gestel, B. Baesens, P. Van Dijcke, J. Garcia, J.A.K. Suykens, and J. Van-thienen. A process model to develop an internal rating system: credit ratings. Decision Support Systems, forthcoming.Google Scholar
  61. 61.
    T. Van Gestel, B. Baesens, P. Van Dijcke, J.A.K. Suykens, J. Garcia, and T. Alderweireld. Linear and non-linear credit scoring by combining logistic regression and support vector machines. Journal of Credit Risk, 1(4), 2006.Google Scholar
  62. 62.
    T. Van Gestel, D. Martens, B. Baesens, D. Feremans, J; Huysmans, and J. Vanthienen. Forecasting and analyzing insurance companies ratings.Google Scholar
  63. 63.
    T. Van Gestel, J.A.K. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B. De Moor, and J. Vandewalle. Benchmarking least squares support vector machine classifiers. CTEO, Technical Report 0037, K.U. Leuven, Belgium, 2000.Google Scholar
  64. 64.
    V. N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA, 1995.MATHGoogle Scholar
  65. 65.
    M. Velikova and H. Daniels. Decision trees for monotone price models. Computational Management Science, 1(3-4):231-244, 2004.MATHCrossRefGoogle Scholar
  66. 66.
    M. Velikova, H. Daniels, and A. Feelders. Solving partially monotone problems with neural networks. In Proceedings of the International Conference on Neural Networks, Vienna, Austria, March 2006.Google Scholar
  67. 67.
    J. Vesanto. Som-based data visualization methods. Intelligent Data Analysis, 3:111-26, 1999.MATHCrossRefGoogle Scholar
  68. 68.
    Z.-H. Zhou, Y. Jiang, and S.-F. Chen. Extracting symbolic rules from trained neural network ensembles. AI Communications, 16(1):3-15, 2003.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • David Martens
    • 1
  • Johan Huysmans
    • 1
  • Rudy Setiono
    • 2
  • Jan Vanthienen
    • 1
  • Bart Baesens
    • 1
    • 3
  1. 1.Department of Decision Sciences and Information ManagementK.U.LeuvenLeuvenBelgium
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore
  3. 3.School of ManagementUniversity of SouthamptonHighfield SouthamptonUK

Personalised recommendations