Advertisement

Binormal Precision–Recall Curves for Optimal Classification of Imbalanced Data

  • Zhongkai Liu
  • Howard D. BondellEmail author
Article

Abstract

Binary classification on imbalanced data, i.e., a large skew in the class distribution, is a challenging problem. Evaluation of classifiers via the receiver operating characteristic (ROC) curve is common in binary classification. Techniques to develop classifiers that optimize the area under the ROC curve have been proposed. However, for imbalanced data, the ROC curve tends to give an overly optimistic view. Realizing its disadvantages of dealing with imbalanced data, we propose an approach based on the Precision–Recall (PR) curve under the binormal assumption. We propose to choose the classifier that maximizes the area under the binormal PR curve. The asymptotic distribution of the resulting estimator is shown. Simulations, as well as real data results, indicate that the binormal Precision–Recall method outperforms approaches based on the area under the ROC curve.

Keywords

Binary classification Binormal assumption Imbalanced data Precision–Recall curve ROC curve 

References

  1. 1.
    Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, IrvineGoogle Scholar
  2. 2.
    Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 12(4):387–415MathSciNetzbMATHGoogle Scholar
  3. 3.
    Box GE, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 26(2):211–252zbMATHGoogle Scholar
  4. 4.
    Boyd K, Eng KH, Page CD (2013) Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel H (ed) Machine learning and knowledge discovery in databases, vol 8190. Springer, New York, pp 451–466Google Scholar
  5. 5.
    Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010a) The balanced accuracy and its posterior distribution. In: Pattern Recognition (ICPR), 2010 20th International Conference on IEEE, pp 3121–3124Google Scholar
  6. 6.
    Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010b) The binormal assumption on precision-recall curves. In: Pattern Recognition (ICPR), 2010 20th International Conference on IEEE. pp 4263–4266Google Scholar
  7. 7.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357zbMATHGoogle Scholar
  8. 8.
    Clémençon S, Vayatis N (2009) Nonparametric estimation of the precision-recall curve. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp 185–192Google Scholar
  9. 9.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  10. 10.
    Craven JBM (2005) Markov networks for detecting overlapping elements in sequence data. Adv Neural Inf Process Syst 17:193Google Scholar
  11. 11.
    Davis J, Burnside ES, de Castro Dutra I, Page D, Ramakrishnan R, Costa VS, Shavlik JW (2005) View learning for statistical relational learning: with an application to mammography. In: Proceeding of the 19th international joint conference on artificial intelligence (IJCAI), pp 677–683Google Scholar
  12. 12.
    Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning, pp 233–240Google Scholar
  13. 13.
    Dorfman DD, Alf E (1968) Maximum likelihood estimation of parameters of signal detection theorya direct solution. Psychometrika 33(1):117–124Google Scholar
  14. 14.
    Fan Y, Kai Z, Qiang L (2014) A revisit to the class imbalance learning with linear support vector machine. In: Computer Science & Education (ICCSE), 2014 9th International Conference on, IEEE, pp 516–521Google Scholar
  15. 15.
    Friedman J, Popescu BE (2003) Gradient directed regularization for linear regression and classification. Technical report. Statistics Department, Stanford University, StanfordGoogle Scholar
  16. 16.
    Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing. Springer, pp 878–887Google Scholar
  17. 17.
    Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36Google Scholar
  18. 18.
    Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449zbMATHGoogle Scholar
  19. 19.
    Kok S, Domingos P (2005) Learning the structure of Markov logic networks. In: Proceedings of the 22nd international conference on Machine learning, pp 441–448Google Scholar
  20. 20.
    Krzanowski WJ, Hand DJ (2009) ROC curves for continuous data. CRC Press, Boca RatonzbMATHGoogle Scholar
  21. 21.
    LeDell E, Petersen M, van der Laan M (2015) Computationally efficient confidence intervals for cross-validated area under the roc curve estimates. Electron J Stat 9(1):1583MathSciNetzbMATHGoogle Scholar
  22. 22.
    Ma S, Huang J (2005) Regularized roc method for disease classification and biomarker selection with microarray data. Bioinformatics 21(24):4356–4362Google Scholar
  23. 23.
    Ma S, Song X, Huang J (2006) Regularized binormal roc method in disease classification using microarray data. BMC Bioinform 7(1):253Google Scholar
  24. 24.
    Metz CE, Kronman HB (1980) Statistical significance tests for binormal roc curves. J Math Psychol 22(3):218–243zbMATHGoogle Scholar
  25. 25.
    Metz CE, Pan X (1999) proper binormal roc curves: theory and maximum-likelihood estimation. J Math Psychol 43(1):1–33MathSciNetzbMATHGoogle Scholar
  26. 26.
    Nash WJ (1994) The Population Biology of Abalone (Haliotis Species) in Tasmania: Blacklip Abalone (H. Rubra) from the North Coast and the Islands of Bass Strait. Sea Fisheries Division, Marine Research Laboratories-Taroona, Department of Primary Industry and Fisheries, TasmaniaGoogle Scholar
  27. 27.
    Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, OxfordzbMATHGoogle Scholar
  28. 28.
    Pepe MS, Cai T, Longton G (2006) Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62(1):221–229MathSciNetzbMATHGoogle Scholar
  29. 29.
    Raghavan V, Bollmann P, Jung GS (1989) A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans Inf Syst 7(3):205–229Google Scholar
  30. 30.
    Siebert JP (1987) Vehicle recognition using rule based methods. Project report, Turing InstituteGoogle Scholar
  31. 31.
    Singla P, Domingos P (2005) Discriminative training of markov logic networks. AAAI 5:868–873Google Scholar
  32. 32.
    Zou KH, Hall W (2000) Two transformation models for estimating an roc curve derived from continuous data. J Appl Stat 27(5):621–631zbMATHGoogle Scholar

Copyright information

© International Chinese Statistical Association 2019

Authors and Affiliations

  1. 1.North Carolina State UniversityRaleighUSA
  2. 2.University of MelbourneMelbourneAustralia

Personalised recommendations