Skip to main content

Extension I: BMPM for Imbalanced Learning

  • Chapter
Machine Learning

Part of the book series: Advanced Topics in Science and Technology in China ((ATSTC))

  • 6154 Accesses

Abstract

In this chapter, we consider the imbalanced learning problem. This problem means the task of binary classification on imbalanced data, in which nearly all the instances are labeled as one class, while far fewer instances are labeled as the other class, usually the more important class. Traditional machine learning methods seeking accurate performance over a full range of instances are not suitable to deal with this problem, since they tend to classify all the data into the majority class, usually the less important class. Moreover, many current methods have tried to utilize some intermediate factors, e.g. the distribution of the training set, the decision thresholds or the cost matrix, to impose a bias towards the important class. However, it remains uncertain whether these roundabout methods can improve the performance in a systematic way. In this chapter, we apply Biased Minimax Probability Machine, one of the special cases of Minimum Error Minimax Probability Machine to deal with the imbalanced learning tasks. Different from previous methods, this model achieves in a worst-case scenario to derive the biased classifier by directly controlling the classification accuracy on each class. More precisely, BMPM builds up an explicit connection between the classification accuracy and the bias, which thus provides a rigorous treatment on imbalanced data. We examine different models and compare BMPM with three other competitive methods, i.e. the Naive Bayesian classifier, the k-Nearest Neighbor method, and the decision tree method C4.5. The experimental results demonstrate the superiority of this model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Machine Learning 6: 37–66

    Google Scholar 

  2. Bradley A (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithm. Pattern Recognition 30(7): 1145–1159

    Article  Google Scholar 

  3. Cardie C, Howe N (1997) Improving minority class prediction using case specific feature weights. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-1997). San Francisco, CA: Morgan Kaufmann 57–65

    Google Scholar 

  4. Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16: 321–357

    MATH  Google Scholar 

  5. Dorfman K, Berbaum D, Metz C (1992) Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology 27: 723–731

    Article  Google Scholar 

  6. Dori D, Liu W (1999) Sparse pixel vectorization: An algorithm and its performance evaluation. IEEE Trans. Pattern Analysis and Machine Intelligence 21: 202–215

    Article  Google Scholar 

  7. Firschein O, Strat T (1996) RADIUS: Image understanding for imagery intelligence. San Francisco, CA: Morgan Kaufmann

    Google Scholar 

  8. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Machine Learning 29: 131–161

    Article  MATH  Google Scholar 

  9. Grzymala-Busse JW, Goodwin LK, Zhang X (2003) Increasing sensitivity of preterm birth by changing rule strengths. Pattern Recognition Letters 24: 903–910

    Article  Google Scholar 

  10. Huang K, King I, Lyu MR (2003) Discriminative training of Bayesian chow-liu tree multinet classifiers. In Proceedings of International Joint Conference on Neural Network (IJCNN-2003), Oregon, Portland, U.S.A. 1: 484–488

    Article  Google Scholar 

  11. Huang K, King I, Lyu MR (2003) Finite mixture model of bound semi-naive Bayesian network classifier. In Proceedings of the International Conference on Artificial Neural Networks (ICANN-2003), Lecture Notes in Artificial Intelligence, Long paper. Heidelberg: Springer-Verlag 2714: 115–122

    Google Scholar 

  12. Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems (NIPS)

    Google Scholar 

  13. Kohavi R (1995) A study of cross validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-1995). San Francisco, CA: Morgan Kaufmann 338–345

    Google Scholar 

  14. Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2–3): 195–215

    Article  Google Scholar 

  15. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-1997). San Francisco, CA: Morgan Kaufmann 179–186

    Google Scholar 

  16. Lanckriet GRG, Ghaoui LE, Bhattacharyya C, Jordan MI (2001) Minimax probability machine. In Advances in Neural Information Processing Systems (NIPS)

    Google Scholar 

  17. Lanckriet GRG, Ghaoui LE, Bhattacharyya C, Jordan MI (2002) A robust minimax approach to classification. Journal of Machine Learning Research 3: 555–582

    Article  Google Scholar 

  18. Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In Proceedings of National Conference on Artificial Intelligence 223–228

    Google Scholar 

  19. Lerner B, Lawrence ND (2001) A comparison of state-of-the-art classification techniques with application to cytogenetics. Neural Computing and Applications 10(1): 39–47

    Article  Google Scholar 

  20. Lewis D, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machine Learning (ICML-1994). San Francisco, CA: Morgan Kaufmann 148–156

    Google Scholar 

  21. Lin C, Nevatia R (1998) Building detection and description from a single intensity image. Computer Vision and Image Understanding 72: 101–121

    Article  Google Scholar 

  22. Ling C, Li C (1998) Data mining for direct marketing:problems and solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-1998). Menlo Park, CA: AAAI Press 73–79

    Google Scholar 

  23. Liu W, Dori D (1997) A protocol for performance evaluation of line detection algorithms. Machine Vision and Application 9: 240–250

    Article  Google Scholar 

  24. Maloof MA (2002) On machine learning, ROC analysis, statistical tests of significance. In Proceedings of the Sixteenth International Conference on Pattern Recognition. Los Alamitos, CA: IEEE Press 204–207

    Google Scholar 

  25. Maloof MA (2003) Learning when data sets are imbanlanced and when costs are unequal and unknown. In Proceedings of International Conference on Machine Learning (ICML-2003)

    Google Scholar 

  26. Maloof MA, Langley P, Binford TO, Nevatia R, Sage S (2003) Improved rooftop detection in aerial images with machine learning. Machine Learning 53: 157–191

    Article  Google Scholar 

  27. Mcclish D (1989) Analyzing a portion of the ROC curve. Medical Decision Making 9(3): 190–195

    Article  Google Scholar 

  28. Nugroho AS, Kuroyanagi S, Iwata A (2002) A solution for imbalanced training sets problem by combnet and its application on fog forecasting. IEICE TRANS. INF. & SYST, E85-D(7)

    Google Scholar 

  29. Provost F (2000) Learning from imbanlanced data sets. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000)

    Google Scholar 

  30. Provost F, Fawcett T (1997) Analysis and visulization of classifier performance: comparison under imprecise class and cost distributions. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press 43–48

    Google Scholar 

  31. Quinlan JR (1993) C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers

    Google Scholar 

  32. Schmidt P, Witte A (1988) Predicting Recidivism Using Survival Models. New York, NY: Spring-Verlag

    Google Scholar 

  33. Swets J (1988) Measureing the accuracy of diagnostic systems. Science 240: 1285–1293

    Article  MathSciNet  Google Scholar 

  34. Swets J, Pickett R (1982) Evaluation of Diagnoistic Systems: Methods from Signal Detection Theory. New York, NY: Springer-Verlag

    Google Scholar 

  35. Vapnik VN (1999) The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag, 2nd edition

    Google Scholar 

  36. Woods K, Kegelmeyer Jr WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Tansactions on Pattern Analysis and Machine Intelligence 19(4): 405–410

    Article  Google Scholar 

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Zhejiang University Press, Hangzhou and Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

(2008). Extension I: BMPM for Imbalanced Learning. In: Machine Learning. Advanced Topics in Science and Technology in China. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79452-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-79452-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-79451-6

  • Online ISBN: 978-3-540-79452-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics