The Need for Low Bias Algorithms in Classification Learning from Large Data Sets

  • Damien Brain
  • Geoffrey I. Webb
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2431)


This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.


Classification Learn Variance Management Variance Decomposition Variance Profile Hypothesis Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Provost, F., Aronis, J.: Scaling Up Inductive Learning with Massive Parallelism. Machine Learning, Vol. 23. (1996) 33–46Google Scholar
  2. 2.
    Provost, F., Jensen, D., Oates, T.: Efficient Progressive Sampling. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. ACM Press, New York (1999) 22–32Google Scholar
  3. 3.
    Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the Twenty-Second VLDB Conference. Morgan Kaufmann, San Francisco (1996) 544–555Google Scholar
  4. 4.
    Catlett, J.: Peepholing: Choosing Attributes Efficiently for Megainduction. Proceedings of the Ninth International Conference on Machine Learning. Morgan Kaufmann, San Mateo (1992) 49–54Google Scholar
  5. 5.
    Cohen, W.: Fast Effective Rule Induction. Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1995) 115–123Google Scholar
  6. 6.
    Aronis, J., Provost, F.: Increasing the Efficiency of Data Mining Algorithms with Breadth-First Marker Propagation. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1997) 119–122Google Scholar
  7. 7.
    Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  8. 8.
    Breiman, L., Freidman, J. H., Olshen, R. A., Stone, C. J.: Classification and Regression Trees. Wadsworth International, Belmont (1984)zbMATHGoogle Scholar
  9. 9.
    Hecht-Nielsen, R.: Neurocomputing. Addison-Wesley, Menlo Park (1990)Google Scholar
  10. 10.
    Cover, T. M., Hart, P. E.: Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, Vol. 13. (1967) 21–27zbMATHCrossRefGoogle Scholar
  11. 11.
    Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest — A Framework for Fast Decision Tree Induction. Proceedings of the Twenty-fourth International Conference on Very Large Databases. Morgan Kaufmann, San Mateo (1998)Google Scholar
  12. 12.
    Moore, A., Lee, M. S.: Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets. Journal of Artificial Intelligence Research, Vol. 8. (1998) 67–91zbMATHMathSciNetGoogle Scholar
  13. 13.
    Chattratichat, J., Darlington, J., Ghanem, M., Guo, Y., Huning, H., Kohler, M., Sutiwaraphun, J., To, H. W., Yang, D.: Large Scale Data Mining: Challenges and Responses. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1997)Google Scholar
  14. 14.
    Freund, Y., Schapire, R. E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, Vol. 55. (1997) 95–121MathSciNetCrossRefGoogle Scholar
  15. 15.
    Breiman, L.: Arcing Classifiers. Technical Report 460. Department of Statistics, University of California, Berkeley (1996)Google Scholar
  16. 16.
    Breiman, L.: Bagging Predictors. Machine Learning, Vol. 24. (1996) 123–140.zbMATHMathSciNetGoogle Scholar
  17. 17.
    Webb, G. (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning, Vol. 40, (2000) 159–196MathSciNetCrossRefGoogle Scholar
  18. 18.
    Kong, E. B., Dietterich, T. G.: Error-Correcting Output Coding Corrects Bias and Variance. Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Mateo (1995)Google Scholar
  19. 19.
    Kohavi, R., Wolpert, D. H.: Bias Plus Variance Decomposition for Zero-One Loss Functions. Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1996)Google Scholar
  20. 20.
    James, G, Hastie, T.: Generalizations of the bias/variance decomposition for prediction error. Technical Report. Department of Statistics, Stanford University (1997)Google Scholar
  21. 21.
    Friedman, J. H.: On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery, Vol. 1. (1997) 55–77CrossRefGoogle Scholar
  22. 22.
    Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, Vol. 36. (1999) 105–142CrossRefGoogle Scholar
  23. 23.
    Blake, C. L., Merz, C. J. UCI Repository of Machine Learning Databases []. Department of Information and Computer Science, University of California, Irvine

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Damien Brain
    • 1
  • Geoffrey I. Webb
    • 1
  1. 1.School of Computing and MathematicsDeakin University GeelongVictoriaAustralia

Personalised recommendations