Advertisement

Rule Stacking: An Approach for Compressing an Ensemble of Rule Sets into a Single Classifier

  • Jan-Nikolas Sulzmann
  • Johannes Fürnkranz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6926)

Abstract

In this paper, we present an approach for compressing a rule-based pairwise classifier ensemble into a single rule set that can be directly used for classification. The key idea is to re-encode the training examples using information about which of the original rules of the ensemble cover the example, and to use them for training a rule-based meta-level classifier. We not only show that this approach is more accurate than using the same rule learner at the base level (which could have been expected for such a variant of stacking), but also demonstrate that the resulting meta-level rule set can be straight-forwardly translated back into a rule set at the base level. Our key result is that the rule sets obtained in this way are of comparable complexity to those of the original rule learner, but considerably more accurate.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andrews, R., Diederich, J., Tickle, A.B.: Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl.-Based Syst. 8(6), 373–389 (1995)CrossRefzbMATHGoogle Scholar
  2. 2.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
  3. 3.
    van den Bosch, A.: Using induced rules as complex features in memory-based language learning. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, pp. 73–78. Association for Computational Linguistics, Morristown (2000)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHGoogle Scholar
  5. 5.
    Cohen, W.W.: Fast effective rule induction. In: Prieditis, A., Russell, S. (eds.) Proceedings of the 12th International Conference on Machine Learning (ML 1995), pp. 115–123. Morgan Kaufmann, Lake Tahoe (1995)CrossRefGoogle Scholar
  6. 6.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Diederich, J.: Rule Extraction from Support Vector Machines. SCI, vol. 80. Springer, Heidelberg (2008)zbMATHGoogle Scholar
  8. 8.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research (JAIR) 2, 263–286 (1995)zbMATHGoogle Scholar
  9. 9.
    Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999), pp. 155–164. ACM, San Diego (1999)Google Scholar
  10. 10.
    Fürnkranz, J.: Integrative windowing. Journal of Artificial Intelligence Research 8, 129–164 (1998)zbMATHGoogle Scholar
  11. 11.
    Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13(1), 3–54 (1999)CrossRefzbMATHGoogle Scholar
  12. 12.
    Fürnkranz, J.: Round robin classification. Journal of Machine Learning Research 2, 721–747 (2002), http://www.ai.mit.edu/projects/jmlr/papers/volume2/fuernkranz02a/html/ MathSciNetzbMATHGoogle Scholar
  13. 13.
    Loza Mencía, E., Fürnkranz, J.: Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 50–65. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Loza Mencía, E., Fürnkranz, J.: Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 192–215. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Seewald, A.K.: How to make stacking better and faster while also taking care of an unknown weakness. In: Sammut, C., Hoffmann, A.G. (eds.) Proceedings of the 19th International Conference (ICML 2002), pp. 554–561. Morgan Kaufmann, Sydney (2002)Google Scholar
  16. 16.
    Ting, K.M., Witten, I.H.: Issues in stacked generalization. Journal of Artificial Intelligence Research 10, 271–289 (1999)zbMATHGoogle Scholar
  17. 17.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  18. 18.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5(2), 241–260 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jan-Nikolas Sulzmann
    • 1
  • Johannes Fürnkranz
    • 1
  1. 1.Knowledge EngineeringTU DarmstadtDarmstadtGermany

Personalised recommendations