Classifier Evaluation with Missing Negative Class Labels

Rider, Andrew K.; Johnson, Reid A.; Davis, Darcy A.; Hoens, T. Ryan; Chawla, Nitesh V.

doi:10.1007/978-3-642-41398-8_33

Andrew K. Rider¹⁹,
Reid A. Johnson¹⁹,
Darcy A. Davis¹⁹,
T. Ryan Hoens¹⁹ &
…
Nitesh V. Chawla¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8207))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

2402 Accesses
2 Citations

Abstract

The concept of a negative class does not apply to many problems for which classification is increasingly utilized. In this study we investigate the reliability of evaluation metrics when the negative class contains an unknown proportion of mislabeled positive class instances. We examine how evaluation metrics can inform us about potential systematic biases in the data. We provide a motivating case study and a general framework for approaching evaluation when the negative class contains mislabeled positive class instances. We show that the behavior of evaluation metrics is unstable in the presence of uncertainty in class labels and that the stability of evaluation metrics depends on the kind of bias in the data. Finally, we show that the type and amount of bias present in data can have a significant effect on the ranking of evaluation metrics and the degree to which they over- or underestimate the true performance of classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pandey, G., Zhang, B., Chang, A.N., Myers, C.L., Zhu, J., Kumar, V., Schadt, E.E.: An integrative multi-network and multi-classifier approach to predict genetic interactions. PLoS Comput. Biol. 6(9), e1000928+ (2010)
Google Scholar
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM (2008)
Google Scholar
Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3), 490–500 (2006)
Article Google Scholar
Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bähler, J., Wood, V., Dolinski, K., Tyers, M.: The BioGRID Interaction Database: 2008 update. Nucleic Acids Research 36(suppl. 1), D637–D640 (2008)
Google Scholar
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003)
Article Google Scholar
Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences of the United States of America 102(5), 1572–1577 (2005)
Article Google Scholar
Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., Friend, S.H.: Functional discovery via a compendium of expression profiles. Cell 102(1), 109–126 (2000)
Article Google Scholar
Christie, K.R., Hong, E.L., Cherry, J.M.: Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns. Trends in Microbiology 17(7), 286–294 (2009)
Article Google Scholar
Myers, C., Barrett, D., Hibbs, M., Huttenhower, C., Troyanskaya, O.: Finding function: evaluation methods for functional genomic data. BMC Genomics 7(1), 187+ (2006)
Google Scholar
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17(5-6), 375–381 (2003)
Article Google Scholar
Allison, P.D.: Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology 55, 193–196 (2002)
Article Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240. ACM, New York (2006)
Google Scholar
Drummond, C., Holte, R.C.: Explicitly representing expected cost: an alternative to ROC representation. In: Knowledge Discovery and Data Mining, pp. 198–207 (2000)
Google Scholar
Landgrebe, T.C.W., Paclik, P., Duin, R.P.W., Bradley, A.P.: Precision-recall operating characteristic (P-ROC) curves in imprecise environments. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 4, pp. 123–127. IEEE (2006)
Google Scholar
Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. In: Data Mining and Knowledge Discovery, pp. 1–23 (2012)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. Special Interest Group on Knowledge Discovery and Data Mining Explorer Newsletter 11(1), 10–18 (2009)
Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
Andrew K. Rider, Reid A. Johnson, Darcy A. Davis, T. Ryan Hoens & Nitesh V. Chawla

Authors

Andrew K. Rider
View author publications
You can also search for this author in PubMed Google Scholar
Reid A. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Darcy A. Davis
View author publications
You can also search for this author in PubMed Google Scholar
T. Ryan Hoens
View author publications
You can also search for this author in PubMed Google Scholar
Nitesh V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Systems, Computing and Mathematics, Brunel University, UB8 3PH, Uxbridge, Middlesex, UK
Allan Tucker & Stephen Swift &
Faculty of Computer Science/IT, Ostfalia University of Applied Sciences, Am Exer 2, 38302, Wolfenbüttel, Germany
Frank Höppner
Faculty of Science, Department of Information and Computing Science, Buys Ballot Laboratory, Universiteit Utrecht, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rider, A.K., Johnson, R.A., Davis, D.A., Hoens, T.R., Chawla, N.V. (2013). Classifier Evaluation with Missing Negative Class Labels. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds) Advances in Intelligent Data Analysis XII. IDA 2013. Lecture Notes in Computer Science, vol 8207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41398-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-41398-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41397-1
Online ISBN: 978-3-642-41398-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics