Data Mining and Knowledge Discovery

, Volume 21, Issue 1, pp 52–90 | Cite as

ENDER: a statistical framework for boosting decision rules

  • Krzysztof Dembczyński
  • Wojciech Kotłowski
  • Roman Słowiński
Article

Abstract

Induction of decision rules plays an important role in machine learning. The main advantage of decision rules is their simplicity and human-interpretable form. Moreover, they are capable of modeling complex interactions between attributes. In this paper, we thoroughly analyze a learning algorithm, called ENDER, which constructs an ensemble of decision rules. This algorithm is tailored for regression and binary classification problems. It uses the boosting approach for learning, which can be treated as generalization of sequential covering. Each new rule is fitted by focusing on examples which were the hardest to classify correctly by the rules already present in the ensemble. We consider different loss functions and minimization techniques often encountered in the boosting framework. The minimization techniques are used to derive impurity measures which control construction of single decision rules. Properties of four different impurity measures are analyzed with respect to the trade-off between misclassification (discrimination) and coverage (completeness) of the rule. Moreover, we consider regularization consisting of shrinking and sampling. Finally, we compare the ENDER algorithm with other well-known decision rule learners such as SLIPPER, LRI and RuleFit.

Keywords

Decision rules Impurity measures Ensemble Boosting Forward stagewise additive modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
  2. Bazan JG (1998) Discovery of decision rules by matching new objects against data tables. In: Polkowski L, Skowron A (eds) Rough sets and current trends in computing, volume 1424 of Lecture notes in artificial intelligence. Springer, Warsaw, pp 521–528CrossRefGoogle Scholar
  3. Błaszczyński J, Dembczyński K, Kotłowski W, Słowiński R, Szeląg M (2006) Ensembles of decision rules. Found Comput Decis Sci 31(3–4): 221–232Google Scholar
  4. Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I (2000) An implementation of logical analysis of data. IEEE Trans Knowl Data Eng 12: 292–306CrossRefGoogle Scholar
  5. Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140MATHMathSciNetGoogle Scholar
  6. Brzezińska I, Greco S, SŁowiński R (2007) Mining Pareto-optimal rules with respect to support and confirmation or support and anti-support. Eng Appl Artif Intell 20(5): 587–600CrossRefGoogle Scholar
  7. Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3: 261–283Google Scholar
  8. Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference of machine learning (ICML 1995). Morgan Kaufmann, Tahoe City, pp 115–123Google Scholar
  9. Cohen WW, Singer Y (1999) A simple, fast, and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence. AAAI Press/The MIT Press, Orlando, pp 335–342Google Scholar
  10. Dembczyński K, KotŁowski W, SŁowiński R (2008a) Maximum likelihood rule ensembles. In: Proceedings of the twenty-fifth international conference on machine learning (ICML 2008). Omnipress, Helsinki, pp 224–231Google Scholar
  11. Dembczyński K, KotŁowski W, SŁowiński R (2008b) Solving regression by learning an ensemble of decision rules. In: Rutkowski L, Tadeusiewicz R, Zadeh LA, Zurada JM (eds) Artificial intelligence and soft computing, volume 5097 of Lecture notes in artificial intelligence. Springer, Zakopane, pp 533–544Google Scholar
  12. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30MathSciNetGoogle Scholar
  13. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2): 139–158CrossRefGoogle Scholar
  14. Domingos P (1996) Unifying instance-based and rule-based induction. Mach Learn 24(2): 141–168MathSciNetGoogle Scholar
  15. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1): 119–139MATHCrossRefMathSciNetGoogle Scholar
  16. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5): 1189–1232MATHCrossRefGoogle Scholar
  17. Friedman JH, Popescu BE (2003) Importance sampled learning ensembles. Technical report, Department of Statistics, Stanford UniversityGoogle Scholar
  18. Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3): 916–954MATHCrossRefGoogle Scholar
  19. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28(2): 337–407MATHCrossRefMathSciNetGoogle Scholar
  20. Fürnkranz J (1996) Separate-and-conquer rule learning. Artif Intell Rev 13(1): 3–54CrossRefGoogle Scholar
  21. Góra G, Wojna A (2002a) Local attribute value grouping for lazy rule induction. In: Peters JF, Skowron A, Zhong N (eds) Rough sets and current trends in computing, volume 2475 of Lecture notes in artificial intelligence. Springer, Malvern, pp 405–412CrossRefGoogle Scholar
  22. Góra G, Wojna A (2002b) A new classification system combining rule induction and instance-based learning. Fundam Inform 54(4): 369–390Google Scholar
  23. Greco S, Matarazzo B, SŁowiński R, Stefanowski J (2000) An algorithm for induction of decision rules consistent with the dominance principle. In: Ziarko W, Yao Y (eds) Rough sets and current trends in computing, volume 2005 of Lecture notes in artificial intelligence. Springer, Banff, pp 304–313Google Scholar
  24. Greco S, Matarazzo B, SŁowiński R (2001) Rough sets theory for multicriteria decision analysis. Eur J Oper Res 129: 1–47MATHCrossRefGoogle Scholar
  25. Greco S, Pawlak Z, SŁowiński R (2004) Can Bayesian confirmation measures be useful for rough set decision rules. Eng Appl Artif Intell 17(4): 345–361CrossRefGoogle Scholar
  26. Grzymala-Busse JW (1992) LERS—a system for learning from examples based on rough sets. In: SŁowiński R (eds) Intelligent decision support, handbook of applications and advances of the rough sets theory. Kluwer, Dordrecht, pp 3–18Google Scholar
  27. Hastie T, Tibshirani R, Friedman JH (2003) Elements of statistical learning: data mining, inference, and prediction. Springer, New YorkGoogle Scholar
  28. Hilderman RJ, Hamilton HJ (2001) Knowledge discovery and measures of interest. Kluwer, BostonMATHGoogle Scholar
  29. Janssen F, Fürnkranz J (2008) An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In: Boulicaut J-F, Berthold MR, Horváth T (eds) Discovery science, volume 5255 of Lecture notes in artificial intelligence. Springer, Budapest, pp 40–51Google Scholar
  30. Jovanoski V, Lavrac N (2001) Classification rule learning with APRIORI-C. In: Brazdil P, Jorge A (eds) Progress in artificial intelligence, volume 2258 of Lecture notes in artificial intelligence. Springer, Berlin, pp 111–135Google Scholar
  31. Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, CambridgeGoogle Scholar
  32. Knobbe A, Crémilleux B, Fürnkranz J, Scholz M (2008) From local patterns to global models: the LeGo approach to data mining. In: Fürnkranz J, Knobbe A (eds) Proceedings of the ECML/PKDD 2008 workshop “From local patterns to global models”, Antwerp, BelgiumGoogle Scholar
  33. Koltchinskii V, Panchenko D (2006) Complexities of convex combinations and bounding the generalization error in classification. Ann Stat 33(4): 1455–1496CrossRefMathSciNetGoogle Scholar
  34. Marchand M, Shawe-Taylor J (2002) The set covering machine. J Mach Learn Res 3: 723–746CrossRefMathSciNetGoogle Scholar
  35. Mason L, Baxter J, Bartlett P, Frean M (1999) Functional gradient techniques for combining hypotheses. In: Bartlett P, Schölkopf B, Schuurmans D, Smola AJ (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 33–58Google Scholar
  36. Michalski RS (1983) A theory and methodology of inductive learning. In: Michalski RS, Carbonell JG, Mitchell TM (eds) Machine learning: an artificial intelligence approach. Tioga Publishing, Palo Alto, PP 83–129Google Scholar
  37. Pawlak Z (1991) Rough sets. Theoretical aspects of reasoning about data. Kluwer, DordrechtMATHGoogle Scholar
  38. Rückert U, Kramer S (2008) Margin-based first-order rule learning. Mach Learn 70(2–3): 189–206CrossRefGoogle Scholar
  39. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3): 297–336MATHCrossRefGoogle Scholar
  40. Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5): 1651–1686MATHCrossRefMathSciNetGoogle Scholar
  41. Skowron A (1995) Extracting laws from decision tables—a rough set approach. Comput Intell 11: 371–388CrossRefMathSciNetGoogle Scholar
  42. SŁowiński, R (eds) (1992) Intelligent decision support. Handbook of applications and advances of the rough set theory. Kluwer, DordrechtGoogle Scholar
  43. Stefanowski J (1998) On rough set based approach to induction of decision rules. In: Skowron A, Polkowski L (eds) Rough set in knowledge discovering. Physica Verlag, Heidelberg, pp 500–529Google Scholar
  44. Stefanowski J, Vanderpooten D (2001) Induction of decision rules in classification and discovery-oriented perspectives. Int J Intell Syst 16(1): 13–27MATHCrossRefGoogle Scholar
  45. Weiss SM, Indurkhya N (2000) Lightweight rule induction. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000). Morgan Kaufmann, Stanford, pp 1135–1142Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Krzysztof Dembczyński
    • 1
  • Wojciech Kotłowski
    • 1
  • Roman Słowiński
    • 1
    • 2
  1. 1.Poznań University of TechnologyPoznańPoland
  2. 2.Systems Research InstitutePolish Academy of SciencesWarsawPoland

Personalised recommendations