Extension I: BMPM for Imbalanced Learning

doi:10.1007/978-3-540-79452-3_5

Part of the book series: Advanced Topics in Science and Technology in China ((ATSTC))

6154 Accesses

Abstract

In this chapter, we consider the imbalanced learning problem. This problem means the task of binary classification on imbalanced data, in which nearly all the instances are labeled as one class, while far fewer instances are labeled as the other class, usually the more important class. Traditional machine learning methods seeking accurate performance over a full range of instances are not suitable to deal with this problem, since they tend to classify all the data into the majority class, usually the less important class. Moreover, many current methods have tried to utilize some intermediate factors, e.g. the distribution of the training set, the decision thresholds or the cost matrix, to impose a bias towards the important class. However, it remains uncertain whether these roundabout methods can improve the performance in a systematic way. In this chapter, we apply Biased Minimax Probability Machine, one of the special cases of Minimum Error Minimax Probability Machine to deal with the imbalanced learning tasks. Different from previous methods, this model achieves in a worst-case scenario to derive the biased classifier by directly controlling the classification accuracy on each class. More precisely, BMPM builds up an explicit connection between the classification accuracy and the bias, which thus provides a rigorous treatment on imbalanced data. We examine different models and compare BMPM with three other competitive methods, i.e. the Naive Bayesian classifier, the k-Nearest Neighbor method, and the decision tree method C4.5. The experimental results demonstrate the superiority of this model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Machine Learning 6: 37–66
Google Scholar
Bradley A (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithm. Pattern Recognition 30(7): 1145–1159
Article Google Scholar
Cardie C, Howe N (1997) Improving minority class prediction using case specific feature weights. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-1997). San Francisco, CA: Morgan Kaufmann 57–65
Google Scholar
Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16: 321–357
MATH Google Scholar
Dorfman K, Berbaum D, Metz C (1992) Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology 27: 723–731
Article Google Scholar
Dori D, Liu W (1999) Sparse pixel vectorization: An algorithm and its performance evaluation. IEEE Trans. Pattern Analysis and Machine Intelligence 21: 202–215
Article Google Scholar
Firschein O, Strat T (1996) RADIUS: Image understanding for imagery intelligence. San Francisco, CA: Morgan Kaufmann
Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Machine Learning 29: 131–161
Article MATH Google Scholar
Grzymala-Busse JW, Goodwin LK, Zhang X (2003) Increasing sensitivity of preterm birth by changing rule strengths. Pattern Recognition Letters 24: 903–910
Article Google Scholar
Huang K, King I, Lyu MR (2003) Discriminative training of Bayesian chow-liu tree multinet classifiers. In Proceedings of International Joint Conference on Neural Network (IJCNN-2003), Oregon, Portland, U.S.A. 1: 484–488
Article Google Scholar
Huang K, King I, Lyu MR (2003) Finite mixture model of bound semi-naive Bayesian network classifier. In Proceedings of the International Conference on Artificial Neural Networks (ICANN-2003), Lecture Notes in Artificial Intelligence, Long paper. Heidelberg: Springer-Verlag 2714: 115–122
Google Scholar
Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems (NIPS)
Google Scholar
Kohavi R (1995) A study of cross validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-1995). San Francisco, CA: Morgan Kaufmann 338–345
Google Scholar
Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2–3): 195–215
Article Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-1997). San Francisco, CA: Morgan Kaufmann 179–186
Google Scholar
Lanckriet GRG, Ghaoui LE, Bhattacharyya C, Jordan MI (2001) Minimax probability machine. In Advances in Neural Information Processing Systems (NIPS)
Google Scholar
Lanckriet GRG, Ghaoui LE, Bhattacharyya C, Jordan MI (2002) A robust minimax approach to classification. Journal of Machine Learning Research 3: 555–582
Article Google Scholar
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In Proceedings of National Conference on Artificial Intelligence 223–228
Google Scholar
Lerner B, Lawrence ND (2001) A comparison of state-of-the-art classification techniques with application to cytogenetics. Neural Computing and Applications 10(1): 39–47
Article Google Scholar
Lewis D, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machine Learning (ICML-1994). San Francisco, CA: Morgan Kaufmann 148–156
Google Scholar
Lin C, Nevatia R (1998) Building detection and description from a single intensity image. Computer Vision and Image Understanding 72: 101–121
Article Google Scholar
Ling C, Li C (1998) Data mining for direct marketing:problems and solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-1998). Menlo Park, CA: AAAI Press 73–79
Google Scholar
Liu W, Dori D (1997) A protocol for performance evaluation of line detection algorithms. Machine Vision and Application 9: 240–250
Article Google Scholar
Maloof MA (2002) On machine learning, ROC analysis, statistical tests of significance. In Proceedings of the Sixteenth International Conference on Pattern Recognition. Los Alamitos, CA: IEEE Press 204–207
Google Scholar
Maloof MA (2003) Learning when data sets are imbanlanced and when costs are unequal and unknown. In Proceedings of International Conference on Machine Learning (ICML-2003)
Google Scholar
Maloof MA, Langley P, Binford TO, Nevatia R, Sage S (2003) Improved rooftop detection in aerial images with machine learning. Machine Learning 53: 157–191
Article Google Scholar
Mcclish D (1989) Analyzing a portion of the ROC curve. Medical Decision Making 9(3): 190–195
Article Google Scholar
Nugroho AS, Kuroyanagi S, Iwata A (2002) A solution for imbalanced training sets problem by combnet and its application on fog forecasting. IEICE TRANS. INF. & SYST, E85-D(7)
Google Scholar
Provost F (2000) Learning from imbanlanced data sets. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000)
Google Scholar
Provost F, Fawcett T (1997) Analysis and visulization of classifier performance: comparison under imprecise class and cost distributions. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press 43–48
Google Scholar
Quinlan JR (1993) C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers
Google Scholar
Schmidt P, Witte A (1988) Predicting Recidivism Using Survival Models. New York, NY: Spring-Verlag
Google Scholar
Swets J (1988) Measureing the accuracy of diagnostic systems. Science 240: 1285–1293
Article MathSciNet Google Scholar
Swets J, Pickett R (1982) Evaluation of Diagnoistic Systems: Methods from Signal Detection Theory. New York, NY: Springer-Verlag
Google Scholar
Vapnik VN (1999) The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag, 2nd edition
Google Scholar
Woods K, Kegelmeyer Jr WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Tansactions on Pattern Analysis and Machine Intelligence 19(4): 405–410
Article Google Scholar

Download references

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

(2008). Extension I: BMPM for Imbalanced Learning. In: Machine Learning. Advanced Topics in Science and Technology in China. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79452-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-79452-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79451-6
Online ISBN: 978-3-540-79452-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics