Extending Rule-Based Classifiers to Improve Recognition of Imbalanced Classes

Stefanowski, Jerzy; Wilk, Szymon

doi:10.1007/978-3-642-02190-9_7

Jerzy Stefanowski⁴ &
Szymon Wilk⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 223))

637 Accesses
6 Citations

Abstract

Knowledge discovery in general, and data mining in particular, have received a growing interest both from research and industry in recent years. Its main aim is to look for previously unknown relationships or patterns representing knowledge hidden in real-life data sets [16]. The typical representations of knowledge discovered from data are: associations, trees or rules, relational logic clauses, functions, clusters or taxonomies, or characteristic descriptions of concepts [16, 29, 21]. In this paper we focus on the rule-based representation. More precisely, we are interested in decision or classification rules that are considered in classification problems. In data mining other types of rules are also considered, e.g., association rules or action rules [16, 29, 34], however, in the text hereafter we will use the general term “rules” to refer specifically to decision rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar
Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Article Google Scholar
Chawla, N.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)
Chapter Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)
Google Scholar
Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. In: Lin, T.Y., Wildberger, A. (eds.) Soft Computing: Rough Sets, Fuzzy Logic, Neural Networks, Uncertainty Management, Knowledge Discovery, pp. 294–297. Simulation Councils Inc. (1995)
Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Google Scholar
Cohen, W.: Fast effective rule induction. In: Proc. of the 12th International Conference on Machine Learning (ICML 1995), pp. 115–123 (1995)
Google Scholar
Cohen, W., Singer, Y.: A simple, fast and effective rule learner. In: Proc. of the 16th National Conference on Artificial Intelligence (AAAI 1999), pp. 335–342. AAAI Press, Menlo Park (1999)
Google Scholar
Furnkranz, J.: Pruning algorithms for rule learning. Machine Learning 27(2), 139–171 (1997)
Article Google Scholar
Furnkranz, J.: Separate and conquer rule learning. Artificial Intelligence Review 13(1), 3–54 (1999)
Article Google Scholar
Dzeroski, S., Cestnik, B., Petrovski, I.: Using the m-estimate in rule induction. Journal of Computing and Information Technology 1, 37–46 (1993)
Google Scholar
Grzymala-Busse, J.W.: LERS - a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer, Dordrecht (1992)
Google Scholar
Grzymala-Busse, J.W.: Managing uncertainty in machine learning from examples. In: Proc. of the 3rd International Symposium in Intelligent Systems, Wigry, Poland, pp. 70–84. IPI PAN Press (1994)
Google Scholar
Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: An approach to imbalanced data sets based on changing rule strength. In: AAAI Workshop at the 17th Conference on AI, AAAI 2000, Learning from Imbalanced Data Sets, Austin, TX, July 30–31, pp. 69–74 (2000)
Google Scholar
Grzymala-Busse, J.W., Stefanowski, J., Wilk, S.: A comparison of two approaches to data mining from imbalanced data. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS, vol. 3213, pp. 757–763. Springer, Heidelberg (2004)
Google Scholar
Han, J., Kamber, M.: Data mining: Concepts and techniques. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Hilderman, R.J., Hamilton, H.J.: Knowledge Discovery and Measures of Interest. Kluwer Academic, Boston (2002)
Google Scholar
Holsheimer, M., Kersten, M.L., Siebes, A.: Data Surveyor: searching the nuggets in parallel. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 447–467. AAAI/MIT Press, Cambridge (1996)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5), 429–450 (2002)
MATH Google Scholar
Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop at the 17th Conference on AI, AAAI 2000, Learning from Imbalanced Data Sets, Austin, TX, July 30–31, pp. 10–17 (2000)
Google Scholar
Klosgen, W., Żytkow, J.M.: Handbook of Data Mining and Knowledge Discovery. Oxford Press, Oxford (2002)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-side selection. In: Proc. of the 14th International Conference on Machine Learning (ICML 1997), pp. 179–186 (1997)
Google Scholar
Langley, P., Simon, H.A.: Fielded applications of machine learning. In: Michalski, R.S., Bratko, I., Kubat, M. (eds.) Machine learning and data mining, pp. 113–129. John Wiley & Sons, Chichester (1998)
Google Scholar
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. Technical Report A-2001-2, University of Tampere (2001)
Google Scholar
Lewis, D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proc. of 11th International Conference on Machine Learning (ICML 1994), pp. 148–156 (1994)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, KDD 1998 (1998)
Google Scholar
Michalowski, W., Wilk, S., Farion, K., Pike, J., Rubin, S., Slowinski, R.: Development of a decision algorithm to support emergency triage of scrotal pain and its implementation in the MET system. INFOR 43(4), 287–301 (2005)
Google Scholar
Michalski, R.S.: A theory and methodology of inductive learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. Morgan Kaufman, San Francisco (1983)
Google Scholar
Michalski, R.S., Bratko, I., Kubat, M. (eds.): Machine learning and data mining. John Wiley & Sons, Chichester (1998)
Google Scholar
Mienko, R., Stefanowski, J., Toumi, K., Vanderpooten, D.: Discovery-oriented induction of decision rules. Cahier du Lamsade no. 141, Paris, Université Paris Dauphine (September 1996)
Google Scholar
Mitchell, T.: Machine learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Pawlak, Z.: Rough sets. In: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1992)
Google Scholar
Ras, Z., Wieczorkowska, A.: Action rules: how to increase profit of a company. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 587–592. Springer, Heidelberg (2000)
Chapter Google Scholar
Riddle, P., Segal, R., Etzioni, O.: Representation design and brute-force induction in a Boening manufacturing fomain. Applied Artificial Intelligence Journal 8, 125–147 (1994)
Article Google Scholar
Skowron, A.: Boolean reasoning for decision rules generation. In: Komorowski, J., Raś, Z.W. (eds.) ISMIS 1993. LNCS (LNAI), vol. 689, pp. 295–305. Springer, Heidelberg (1993)
Google Scholar
Stefanowski, J.: The rough set based rule induction technique for classification problems. In: Proc. of the 6th European Conference on Intelligent Techniques and Soft Computing EUFIT 1998, Aachen, pp. 109–113 (1998)
Google Scholar
Stefanowski, J.: Handling continuous attributes in discovery of strong decision rules. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 394–401. Springer, Heidelberg (1998)
Chapter Google Scholar
Stefanowski, J.: Algorithims of rule induction for knowledge discovery. Habilitation Thesis published as Series Rozprawy no. 361. Poznan Univeristy of Technology Press, Poznan (2001) (in Polish)
Google Scholar
Stefanowski, J.: On combined classifiers, rule induction and rough sets. In: Peters, J., et al. (eds.) Transactions on Rough Sets VI. LNCS, vol. 4374, pp. 329–350. Springer, Heidelberg (2007)
Google Scholar
Stefanowski, J., Borkiewicz, R.: Interactive rule discovery of decision rules. In: Proc. of the VIIIth Intelligent Information Systems, June 1999, pp. 112–116. Wyd. Instytutu Podstaw Informatyki PAN, Warszawa (1999)
Google Scholar
Stefanowski, J., Vanderpooten, D.: Induction of decision rules in classification and discovery-oriented perspectives. International Journal of Intelligent Systems 16(1), 13–28 (2001)
Article MATH Google Scholar
Stefanowski, J., Wilk, S.: Evaluating business credit risk by means of approach integrating decision rules and case based learning. International Journal of Intelligent Systems in Accounting, Finance and Management 10, 97–114 (2001)
Article Google Scholar
Stefanowski, J., Wilk, S.: Rough sets for handling imbalanced data: combining filtering and rule-based classifiers. Fundamenta Informaticae 72, 379–391 (2006)
MATH Google Scholar
Stefanowski, J., Wilk S.: Improving rule based classifiers induced by MODLEM by selective pre-processing of imbalanced data. In: Proc. of the RSKD Workshop at ECML/PKDD, Warsaw, pp. 54–65 (2007)
Google Scholar
Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 283–292. Springer, Heidelberg (2008)
Chapter Google Scholar
Van Hulse, J., Khoshgoftarr, T., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proc. of the 24th International Conference on Machine Learning (ICML 2007), pp. 935–942 (2007)
Google Scholar
Wang, B., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) Foundations of Intelligent Systems. LNCS (LNAI), vol. 4994, pp. 38–47. Springer, Heidelberg (2008)
Chapter Google Scholar
Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)
Article Google Scholar
Weiss, S.M., Indurkhya, N.: Predicitive Data Mining. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Wilk, S., Slowinski, R., Michalowski, W., Greco, S.: Supporting triage of children with abdominal pain in the emergency room. European Journal of Operational Research 160(3), 696–709 (2005)
Article MATH Google Scholar
Zak, J., Stefanowski, J.: Determining maintenance activities of motor vehicles using rough sets approach. In: Proc. of Euromaintenance 1994 Conference, Amsterdam, pp. 39–42 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Science, Poznań University of Technology, ul. Piotrowo 2, 60–965, Poznań, Poland
Jerzy Stefanowski & Szymon Wilk

Authors

Jerzy Stefanowski
View author publications
You can also search for this author in PubMed Google Scholar
Szymon Wilk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing and Informatics, University of North Carolina at Charlotte, 28223, Charlotte, N.C., USA
Zbigniew W. Ras
Wydzial Informatyki, Politechnika Bialostocka, ul.Wiejska 45a, 15-351, Bialystok, Poland
Agnieszka Dardzinska

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stefanowski, J., Wilk, S. (2009). Extending Rule-Based Classifiers to Improve Recognition of Imbalanced Classes. In: Ras, Z.W., Dardzinska, A. (eds) Advances in Data Management. Studies in Computational Intelligence, vol 223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02190-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-02190-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02189-3
Online ISBN: 978-3-642-02190-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics