Machine Learning

, Volume 57, Issue 1–2, pp 177–195 | Cite as

A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000

  • Peter van der Putten
  • Maarten van Someren
Article

Abstract

The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a report describing the path to their solution. In this article we use the framework of bias-variance decomposition of error to analyze what caused the wide range of prediction performance. We characterize the challenge problem to make it comparable to other problems and evaluate why certain methods work or not. We also include an evaluation of the submitted explanations by a marketing expert. We find that variance is the key component of error for this problem. Participants use various strategies in data preparation and model development that reduce variance error, such as feature selection and the use of simple, robust and low variance learners like Naive Bayes. Adding constructed features, modeling with complex, weak bias learners and extensive fine tuning by the participants often increase the variance error.

bias-variance decomposition real world applications overfitting 

References

  1. Berka, P. (1999). Workshop notes on discovery challenge PKDD-99.Technical report, Laboratory of Intelligent Systems, University of Economics, Prague.Google Scholar
  2. Blake, C., & Merz, C. (1998). UCI Repository of machine learning databases.Google Scholar
  3. Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical Report, Statistics Department, University of California.Google Scholar
  4. Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., & Wirth, R. (1999). The CRISP-DM process model. Technical Report, Crisp Consortium. http://www.crisp-dm.org/.Google Scholar
  5. Domingos, P. (1997). The role of Occam's Razor in knowledge discovery. Data Mining and Knowledge Discovery, 3, 409–425.Google Scholar
  6. Domingos, P. (2000).Aunified bias-variance decomposition and its applications. In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 231–238). CA: Morgan Kaufmann.Google Scholar
  7. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.Google Scholar
  8. Elkan, C., (2001). Magical thinking in data mining: Lessons from CoIL challenge 2000. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (KDD'01) (pp. 426-431).Google Scholar
  9. Friedman, J. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.Google Scholar
  10. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–58.Google Scholar
  11. Guyon, I., & Elissee, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.Google Scholar
  12. Hall, M. A. (1998). Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato.Google Scholar
  13. Holte, R. (1993).Very simple classification rules perform well on most commonly used datasets.Machine Learning, 11, 63–91.Google Scholar
  14. James, G. M. (2003). Variance and bias for general loss functions. Machine Learning, 51, 115–135.Google Scholar
  15. Kohavi, R. & John, G., (1997). Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324.Google Scholar
  16. Kohavi, R., & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. In L. Saitta (Ed.), Machine learning: Proceedings of the thirteenth international conference (pp. 275–283). Morgan Kaufmann.Google Scholar
  17. Quinlan, J. R., & Cameron-Jones, R. M. (1995). Oversearching and layered search in empirical learning. In IJCAI (pp. 1019–1024).Google Scholar
  18. Tsamardinos, I., & Aliferis, C. (2003). Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics.Google Scholar
  19. van der Putten, P., & van Someren, M. (2000). CoiL challenge 2000: The insurance company case. Technical Report 2000-09, Leiden Institute of Advanced Computer Science, Universiteit van Leiden. http://www.liacs.nl/~putten/library/cc2000.Google Scholar
  20. Witten, I., & Frank, E. (2000). Data mining: Practical machine learning tools and techniques with java implementations. San Francisco: Morgan Kaufmann Publishers.Google Scholar
  21. Wolpert, D. H., & Macready, W.G. (1995).Nofree lunch theorems for search.Technical Report SFI-TR-95-02-010, The Santa Fe Institute.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Peter van der Putten
    • 1
  • Maarten van Someren
    • 2
  1. 1.Leiden Institute of Advanced Computer ScienceLeiden UniversityLeidenThe Netherlands
  2. 2.Department of Social Science InformaticsUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations