Skip to main content
Log in

A Comparison of Two Approaches to Data Mining from Imbalanced Data

  • Published:
Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Abstract

Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • R. Bairagi C.M. Suchindran (1989) ArticleTitleAn estimator of the cutoff point maximizing sum of sensitivity and specificity Sankhya, Series B. Indian Journal of Statistics. 51 263–269

    Google Scholar 

  • L.B. Booker D.E. Goldberg J.F. Holland (1990) Classifier systems and genetic algorithms J.G. Carbonell (Eds) Machine Learning Paradigms and Methods The MIT Press Cambridge, MA 235–282

    Google Scholar 

  • J.W. Grzymala-Busse (1992) LERS—A system for learning from examples based on rough sets R. Slowinski (Eds) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory Kluwer Academic Publishers Norwell, MA 3–18

    Google Scholar 

  • J.W. Grzymala-Busse (1997) ArticleTitleA new version of the rule induction system LERS Fundamenta Informaticae. 31 27–39

    Google Scholar 

  • Grzymala-Busse, J. W., Goodwin, L. K., Zhang, X. (1999) Increasing sensitivity of preterm birth by changing rule strengths. Proceedings of the 8th Workshop on Intelligent Information Systems (IIS’99), Ustron, Poland, 127–136

  • Grzymala-Busse, J. W., Goodwin, L. K., Grzymala-Busse, W. J., Zheng, X. (2000) An approach to imbalanced data sets based on changing rule strength. Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, 69–74

  • Grzymala-Busse, J. W., Stefanowski, J., Wilk, S. (2004) A comparison of two approaches to data mining from imbalanced data. Proceedings of the KES 2004, 8th International Conference on Knowledge-based Intelligent Information + Engineering Systems, Wellington, New Zealand, September 20–24, 2004. Part I, Lecture Notes in AI, Vol. 3213, Springer Verlag, Berlin Heidelberg, 2004, 757–763.

  • M. Hamburg (1983) Statistical Analysis for Decision Making EditionNumber3 Harcourt Brace Jovanovich Inc. New York, NY

    Google Scholar 

  • J.H. Holland K.J. Holyoak R.E. Nisbett (1986) Induction Processes of Inference, Learning, and Discovery The MIT Press Cambridge, MA

    Google Scholar 

  • Japkowicz, N. (2000) Learning from imbalanced data sets: a comparison of various strategies. Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, July 30–31, pp. 10–17

  • Z. Pawlak J.W. Grzymala-Busse R. Slowinski W. Ziarko (1995) ArticleTitleRough sets Communications of the ACM. 38 89–95 Occurrence Handle10.1145/219717.219791

    Article  Google Scholar 

  • Z. Pawlak (1982) ArticleTitleRough sets International Journal Computer and Information Sciences 11 341–356 Occurrence Handle10.1007/BF01001956

    Article  Google Scholar 

  • Z. Pawlak (1991) Rough Sets. Theoretical Aspects of Reasoning about Data Kluwer Academic Publishers Norwell, MA

    Google Scholar 

  • J. Stefanowski (1998) On rough set based approaches to induction of decision rules A. Skowron L. Polkowski (Eds) Rough Sets in Knowledge Discovery NumberInSeriesVol. 1 Physica Verlag Heidelberg 500–529

    Google Scholar 

  • J. Stefanowski D. Vanderpooten (2001) ArticleTitleInduction of decision rules in classification and discovery-oriented perspectives International Journal of Intelligent Systems. 16 IssueID1 13–28 Occurrence Handle10.1002/1098-111X(200101)16:1<13::AID-INT3>3.0.CO;2-M

    Article  Google Scholar 

  • J. Stefanowski S. Wilk (2001) ArticleTitleEvaluating business credit risk by means of approach integrating decision rules and case based learning. International Journal of Intelligent Systems in Accounting Finance and Management 10 97–114

    Google Scholar 

  • S. Wilk R. Slowinski W. Michalowski S. Greco (2004) ArticleTitleSupporting triage of children with abdominal pain in the emergency room European Journal of Operation Research 160 IssueID3 696–709 Occurrence Handle10.1016/j.ejor.2003.06.034

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy W. Grzymala-Busse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grzymala-Busse, J.W., Stefanowski, J. & Wilk, S. A Comparison of Two Approaches to Data Mining from Imbalanced Data. J Intell Manuf 16, 565–573 (2005). https://doi.org/10.1007/s10845-005-4362-2

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10845-005-4362-2

Keywords

Navigation