Learning when negative examples abound

  • Miroslav Kubat
  • Robert Holte
  • Stan Matwin
Part II: Regular Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1224)


Existing concept learning systems can fail when the negative examples heavily outnumber the positive examples. The paper discusses one essential trouble brought about by imbalanced training sets and presents a learning algorithm addressing this issue. The experiments (with synthetic and real-world data) focus on 2-class problems with examples described with binary and continuous attributes.


  1. Ambrosino, R., Buchanan, R., Cooper, G.F., and Fine, M. (1995). The Use of Misclassification Costs to Learn Rule-Based Decision Support Models for Cost-Effective Hospital Admission Strategies. Proceedings of the 19th Annual Symposium on Computer Applications in Medical Care (SCAMC95) pp. 304–308Google Scholar
  2. Bloedorn, E., Mani, I., and MacMillan, T.R. (1996). Machine Learning of User Profiles: Representational Issues. Proceeding of the National Conference on Artificial Intelligence, AAAI'96 pp. 433–437Google Scholar
  3. Catlett, J. (1991). Megainduction: A Test Flight. Proceedings of the 8th International Workshop on ML (pp.596–599), San Mateo, CA: Morgan KaufmannGoogle Scholar
  4. DeRouin, E., Brown, J., Beck, H., Fausett, L., and Schneider, M. (1991). Neural Network Training on Unequally Represented Classes. In Dagli, C.H., Kumara, S.R.T. and Shin, Y.C. (eds.): Intelligent Engineering Systems Through Artificial Neural Networks, ASME Press, New York, 135–145Google Scholar
  5. Ezawa, K.J., Singh, M. and Norton, S.W. (1996). Learning Goal Oriented Bayesian Networks for Telecommunications Management. Proceedings of the International Conference on Machine Learning, ICML'96 (pp. 139–147), Bari, Italy, Morgan KaufmannGoogle Scholar
  6. Fawcett, T. and Provost, F. (1996). Combining Data Mining and Machine Learning for Effective User Profile. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (pp. 8–13), Portland OR, AAAI PressGoogle Scholar
  7. Freund, Y. and Schapire, R.E. (1995). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Proceedings of the 2nd Annual European Conference on Computational Learning Theory (pp.23–37)Google Scholar
  8. Kononenko, I. and Bratko, I. (1991). Information-Based Evaluation Criterion for Classifier's Performance. Machine Learning, 6, 67–80Google Scholar
  9. Kubat, M., Pfurtscheller, G., and Flotzinger D. (1994). AI-Based Approach to Automatic Sleep Classification. Biological Cybernetics, 79, 443–448Google Scholar
  10. Lang, K. (1995). Newsreader: Learning to Filter News. Proceedings of the 12th International Conference on Machine Learning, ICML'95 (pp. 331–339), Tahoe Lake, CA, Morgan KaufmannGoogle Scholar
  11. Lewis, D. and Catlett, J. (1994). Heterogeneous Uncertainty Sampling for Supervized Learning. Proceedings of the 11th International Conference on Machine Learning, ICML'94 (pp. 148–156), New Brunswick, New Jersey, Morgan KaufmannGoogle Scholar
  12. Lewis, D. and Gale, W. (1994). Training Text Classifiers by Uncertainty Sampling. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Google Scholar
  13. Murphy, P. and Aha, D. (1994). UCI Repository of Machine Learning Databases [machine-readable data repository]. Technical Report, University of California, IrvineGoogle Scholar
  14. Murthy, S., Kasif, S., & Salzberg, S. (1994). A System for Induction of Oblique Decision Trees. Journal of Artificial Intelligence Research, 2, 1–32Google Scholar
  15. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. (1994). Reducing Misclassification Costs. Proceedings of the 11th International Conference on ML, ICML'94 (pp. 217–225), New Brunswick, New Jersey, Morgan KaufmannGoogle Scholar
  16. Quinlan J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San MateoGoogle Scholar
  17. Swets, J.A. (1988). Measuring the Accuracy of Diagnostic Systems. Science, 240, 1285–1293PubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Miroslav Kubat
    • 1
  • Robert Holte
    • 1
  • Stan Matwin
    • 1
  1. 1.Department of Computer ScienceUniversity of OttawaOttawaCanada

Personalised recommendations