Machine Learning

, Volume 57, Issue 1, pp 177–195

A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000

  • Peter van der Putten
  • Maarten van Someren
Article

DOI: 10.1023/B:MACH.0000035476.95130.99

Cite this article as:
van der Putten, P. & van Someren, M. Machine Learning (2004) 57: 177. doi:10.1023/B:MACH.0000035476.95130.99
  • 806 Downloads

Abstract

The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a report describing the path to their solution. In this article we use the framework of bias-variance decomposition of error to analyze what caused the wide range of prediction performance. We characterize the challenge problem to make it comparable to other problems and evaluate why certain methods work or not. We also include an evaluation of the submitted explanations by a marketing expert. We find that variance is the key component of error for this problem. Participants use various strategies in data preparation and model development that reduce variance error, such as feature selection and the use of simple, robust and low variance learners like Naive Bayes. Adding constructed features, modeling with complex, weak bias learners and extensive fine tuning by the participants often increase the variance error.

bias-variance decomposition real world applications overfitting 
Download to read the full article text

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Peter van der Putten
    • 1
  • Maarten van Someren
    • 2
  1. 1.Leiden Institute of Advanced Computer ScienceLeiden UniversityLeidenThe Netherlands
  2. 2.Department of Social Science InformaticsUniversity of AmsterdamAmsterdamThe Netherlands