Skip to main content

Classifier Evaluation with Missing Negative Class Labels

  • Conference paper
Advances in Intelligent Data Analysis XII (IDA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8207))

Included in the following conference series:

Abstract

The concept of a negative class does not apply to many problems for which classification is increasingly utilized. In this study we investigate the reliability of evaluation metrics when the negative class contains an unknown proportion of mislabeled positive class instances. We examine how evaluation metrics can inform us about potential systematic biases in the data. We provide a motivating case study and a general framework for approaching evaluation when the negative class contains mislabeled positive class instances. We show that the behavior of evaluation metrics is unstable in the presence of uncertainty in class labels and that the stability of evaluation metrics depends on the kind of bias in the data. Finally, we show that the type and amount of bias present in data can have a significant effect on the ranking of evaluation metrics and the degree to which they over- or underestimate the true performance of classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pandey, G., Zhang, B., Chang, A.N., Myers, C.L., Zhu, J., Kumar, V., Schadt, E.E.: An integrative multi-network and multi-classifier approach to predict genetic interactions. PLoS Comput. Biol. 6(9), e1000928+ (2010)

    Google Scholar 

  2. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM (2008)

    Google Scholar 

  3. Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3), 490–500 (2006)

    Article  Google Scholar 

  4. Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bähler, J., Wood, V., Dolinski, K., Tyers, M.: The BioGRID Interaction Database: 2008 update. Nucleic Acids Research 36(suppl. 1), D637–D640 (2008)

    Google Scholar 

  5. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003)

    Article  Google Scholar 

  6. Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences of the United States of America 102(5), 1572–1577 (2005)

    Article  Google Scholar 

  7. Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., Friend, S.H.: Functional discovery via a compendium of expression profiles. Cell 102(1), 109–126 (2000)

    Article  Google Scholar 

  8. Christie, K.R., Hong, E.L., Cherry, J.M.: Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns. Trends in Microbiology 17(7), 286–294 (2009)

    Article  Google Scholar 

  9. Myers, C., Barrett, D., Hibbs, M., Huttenhower, C., Troyanskaya, O.: Finding function: evaluation methods for functional genomic data. BMC Genomics 7(1), 187+ (2006)

    Google Scholar 

  10. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17(5-6), 375–381 (2003)

    Article  Google Scholar 

  11. Allison, P.D.: Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology 55, 193–196 (2002)

    Article  Google Scholar 

  12. Forman, G.: An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research 3, 1289–1305 (2003)

    MATH  Google Scholar 

  13. Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240. ACM, New York (2006)

    Google Scholar 

  14. Drummond, C., Holte, R.C.: Explicitly representing expected cost: an alternative to ROC representation. In: Knowledge Discovery and Data Mining, pp. 198–207 (2000)

    Google Scholar 

  15. Landgrebe, T.C.W., Paclik, P., Duin, R.P.W., Bradley, A.P.: Precision-recall operating characteristic (P-ROC) curves in imprecise environments. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 4, pp. 123–127. IEEE (2006)

    Google Scholar 

  16. Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. In: Data Mining and Knowledge Discovery, pp. 1–23 (2012)

    Google Scholar 

  17. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. Special Interest Group on Knowledge Discovery and Data Mining Explorer Newsletter 11(1), 10–18 (2009)

    Google Scholar 

  18. Bache, K., Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rider, A.K., Johnson, R.A., Davis, D.A., Hoens, T.R., Chawla, N.V. (2013). Classifier Evaluation with Missing Negative Class Labels. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds) Advances in Intelligent Data Analysis XII. IDA 2013. Lecture Notes in Computer Science, vol 8207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41398-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41398-8_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41397-1

  • Online ISBN: 978-3-642-41398-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics