Skip to main content

Extending Rule-Based Classifiers to Improve Recognition of Imbalanced Classes

  • Chapter
Advances in Data Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 223))

Abstract

Knowledge discovery in general, and data mining in particular, have received a growing interest both from research and industry in recent years. Its main aim is to look for previously unknown relationships or patterns representing knowledge hidden in real-life data sets [16]. The typical representations of knowledge discovered from data are: associations, trees or rules, relational logic clauses, functions, clusters or taxonomies, or characteristic descriptions of concepts [16, 29, 21]. In this paper we focus on the rule-based representation. More precisely, we are interested in decision or classification rules that are considered in classification problems. In data mining other types of rules are also considered, e.g., association rules or action rules [16, 29, 34], however, in the text hereafter we will use the general term “rules” to refer specifically to decision rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

    Google Scholar 

  2. Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)

    Article  Google Scholar 

  3. Chawla, N.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)

    Google Scholar 

  5. Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. In: Lin, T.Y., Wildberger, A. (eds.) Soft Computing: Rough Sets, Fuzzy Logic, Neural Networks, Uncertainty Management, Knowledge Discovery, pp. 294–297. Simulation Councils Inc. (1995)

    Google Scholar 

  6. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)

    Google Scholar 

  7. Cohen, W.: Fast effective rule induction. In: Proc. of the 12th International Conference on Machine Learning (ICML 1995), pp. 115–123 (1995)

    Google Scholar 

  8. Cohen, W., Singer, Y.: A simple, fast and effective rule learner. In: Proc. of the 16th National Conference on Artificial Intelligence (AAAI 1999), pp. 335–342. AAAI Press, Menlo Park (1999)

    Google Scholar 

  9. Furnkranz, J.: Pruning algorithms for rule learning. Machine Learning 27(2), 139–171 (1997)

    Article  Google Scholar 

  10. Furnkranz, J.: Separate and conquer rule learning. Artificial Intelligence Review 13(1), 3–54 (1999)

    Article  Google Scholar 

  11. Dzeroski, S., Cestnik, B., Petrovski, I.: Using the m-estimate in rule induction. Journal of Computing and Information Technology 1, 37–46 (1993)

    Google Scholar 

  12. Grzymala-Busse, J.W.: LERS - a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer, Dordrecht (1992)

    Google Scholar 

  13. Grzymala-Busse, J.W.: Managing uncertainty in machine learning from examples. In: Proc. of the 3rd International Symposium in Intelligent Systems, Wigry, Poland, pp. 70–84. IPI PAN Press (1994)

    Google Scholar 

  14. Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: An approach to imbalanced data sets based on changing rule strength. In: AAAI Workshop at the 17th Conference on AI, AAAI 2000, Learning from Imbalanced Data Sets, Austin, TX, July 30–31, pp. 69–74 (2000)

    Google Scholar 

  15. Grzymala-Busse, J.W., Stefanowski, J., Wilk, S.: A comparison of two approaches to data mining from imbalanced data. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS, vol. 3213, pp. 757–763. Springer, Heidelberg (2004)

    Google Scholar 

  16. Han, J., Kamber, M.: Data mining: Concepts and techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  17. Hilderman, R.J., Hamilton, H.J.: Knowledge Discovery and Measures of Interest. Kluwer Academic, Boston (2002)

    Google Scholar 

  18. Holsheimer, M., Kersten, M.L., Siebes, A.: Data Surveyor: searching the nuggets in parallel. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 447–467. AAAI/MIT Press, Cambridge (1996)

    Google Scholar 

  19. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5), 429–450 (2002)

    MATH  Google Scholar 

  20. Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop at the 17th Conference on AI, AAAI 2000, Learning from Imbalanced Data Sets, Austin, TX, July 30–31, pp. 10–17 (2000)

    Google Scholar 

  21. Klosgen, W., Żytkow, J.M.: Handbook of Data Mining and Knowledge Discovery. Oxford Press, Oxford (2002)

    Google Scholar 

  22. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-side selection. In: Proc. of the 14th International Conference on Machine Learning (ICML 1997), pp. 179–186 (1997)

    Google Scholar 

  23. Langley, P., Simon, H.A.: Fielded applications of machine learning. In: Michalski, R.S., Bratko, I., Kubat, M. (eds.) Machine learning and data mining, pp. 113–129. John Wiley & Sons, Chichester (1998)

    Google Scholar 

  24. Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. Technical Report A-2001-2, University of Tampere (2001)

    Google Scholar 

  25. Lewis, D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proc. of 11th International Conference on Machine Learning (ICML 1994), pp. 148–156 (1994)

    Google Scholar 

  26. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, KDD 1998 (1998)

    Google Scholar 

  27. Michalowski, W., Wilk, S., Farion, K., Pike, J., Rubin, S., Slowinski, R.: Development of a decision algorithm to support emergency triage of scrotal pain and its implementation in the MET system. INFOR 43(4), 287–301 (2005)

    Google Scholar 

  28. Michalski, R.S.: A theory and methodology of inductive learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. Morgan Kaufman, San Francisco (1983)

    Google Scholar 

  29. Michalski, R.S., Bratko, I., Kubat, M. (eds.): Machine learning and data mining. John Wiley & Sons, Chichester (1998)

    Google Scholar 

  30. Mienko, R., Stefanowski, J., Toumi, K., Vanderpooten, D.: Discovery-oriented induction of decision rules. Cahier du Lamsade no. 141, Paris, Université Paris Dauphine (September 1996)

    Google Scholar 

  31. Mitchell, T.: Machine learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  32. Pawlak, Z.: Rough sets. In: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    Google Scholar 

  33. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1992)

    Google Scholar 

  34. Ras, Z., Wieczorkowska, A.: Action rules: how to increase profit of a company. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 587–592. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  35. Riddle, P., Segal, R., Etzioni, O.: Representation design and brute-force induction in a Boening manufacturing fomain. Applied Artificial Intelligence Journal 8, 125–147 (1994)

    Article  Google Scholar 

  36. Skowron, A.: Boolean reasoning for decision rules generation. In: Komorowski, J., Raś, Z.W. (eds.) ISMIS 1993. LNCS (LNAI), vol. 689, pp. 295–305. Springer, Heidelberg (1993)

    Google Scholar 

  37. Stefanowski, J.: The rough set based rule induction technique for classification problems. In: Proc. of the 6th European Conference on Intelligent Techniques and Soft Computing EUFIT 1998, Aachen, pp. 109–113 (1998)

    Google Scholar 

  38. Stefanowski, J.: Handling continuous attributes in discovery of strong decision rules. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 394–401. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  39. Stefanowski, J.: Algorithims of rule induction for knowledge discovery. Habilitation Thesis published as Series Rozprawy no. 361. Poznan Univeristy of Technology Press, Poznan (2001) (in Polish)

    Google Scholar 

  40. Stefanowski, J.: On combined classifiers, rule induction and rough sets. In: Peters, J., et al. (eds.) Transactions on Rough Sets VI. LNCS, vol. 4374, pp. 329–350. Springer, Heidelberg (2007)

    Google Scholar 

  41. Stefanowski, J., Borkiewicz, R.: Interactive rule discovery of decision rules. In: Proc. of the VIIIth Intelligent Information Systems, June 1999, pp. 112–116. Wyd. Instytutu Podstaw Informatyki PAN, Warszawa (1999)

    Google Scholar 

  42. Stefanowski, J., Vanderpooten, D.: Induction of decision rules in classification and discovery-oriented perspectives. International Journal of Intelligent Systems 16(1), 13–28 (2001)

    Article  MATH  Google Scholar 

  43. Stefanowski, J., Wilk, S.: Evaluating business credit risk by means of approach integrating decision rules and case based learning. International Journal of Intelligent Systems in Accounting, Finance and Management 10, 97–114 (2001)

    Article  Google Scholar 

  44. Stefanowski, J., Wilk, S.: Rough sets for handling imbalanced data: combining filtering and rule-based classifiers. Fundamenta Informaticae 72, 379–391 (2006)

    MATH  Google Scholar 

  45. Stefanowski, J., Wilk S.: Improving rule based classifiers induced by MODLEM by selective pre-processing of imbalanced data. In: Proc. of the RSKD Workshop at ECML/PKDD, Warsaw, pp. 54–65 (2007)

    Google Scholar 

  46. Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 283–292. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  47. Van Hulse, J., Khoshgoftarr, T., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proc. of the 24th International Conference on Machine Learning (ICML 2007), pp. 935–942 (2007)

    Google Scholar 

  48. Wang, B., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) Foundations of Intelligent Systems. LNCS (LNAI), vol. 4994, pp. 38–47. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  49. Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)

    Article  Google Scholar 

  50. Weiss, S.M., Indurkhya, N.: Predicitive Data Mining. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  51. Wilk, S., Slowinski, R., Michalowski, W., Greco, S.: Supporting triage of children with abdominal pain in the emergency room. European Journal of Operational Research 160(3), 696–709 (2005)

    Article  MATH  Google Scholar 

  52. Zak, J., Stefanowski, J.: Determining maintenance activities of motor vehicles using rough sets approach. In: Proc. of Euromaintenance 1994 Conference, Amsterdam, pp. 39–42 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Stefanowski, J., Wilk, S. (2009). Extending Rule-Based Classifiers to Improve Recognition of Imbalanced Classes. In: Ras, Z.W., Dardzinska, A. (eds) Advances in Data Management. Studies in Computational Intelligence, vol 223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02190-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02190-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02189-3

  • Online ISBN: 978-3-642-02190-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics