Making Early Predictions of the Accuracy of Machine Learning Classifiers

Smith, James Edward; Tahir, Muhammad Atif; Sannen, Davy; Van Brussel, Hendrik

doi:10.1007/978-1-4419-8020-5_6

James Edward Smith³,
Muhammad Atif Tahir⁴,
Davy Sannen⁵ &
…
Hendrik Van Brussel⁵

1045 Accesses
2 Citations

Abstract

The accuracy of machine learning systems is a widely studied research topic. Established techniques such as cross validation predict the accuracy on unseen data of the classifier produced by applying a given learning method to a given training data set. However, they do not predict whether incurring the cost of obtaining more data and undergoing further training will lead to higher accuracy. In this chapter, we investigate techniques for making such early predictions. We note that when a machine learning algorithm is presented with a training set the classifier produced, and hence its error, will depend on the characteristics of the algorithm, on training set’s size, and also on its specific composition. In particular we hypothesize that if a number of classifiers are produced, and their observed error is decomposed into bias and variance terms, then although these components may behave differently, their behavior may be predictable. Experimental results confirm this hypothesis, and show that our predictions are very highly correlated with the values observed after undertaking the extra training. This has particular relevance to learning in nonstationary environments, since we can use our characterization of bias and variance to detect whether perceived changes in the data stream arise from sampling variability or because the underlying data distributions have changed, which can be perceived as changes in bias.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.dynavis.org

References

P. L. Bartlett, S. Boucheron, and G. Lugosi. Model selection and error estimation. Machine Learning, 48(1), 85–113 (2002)
Article MATH Google Scholar
S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: A survey of some recent advances. ESAIM: P&S, 9, 323–375 (2005)
Article MathSciNet MATH Google Scholar
O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. Advanced Lectures on Machine Learning, pp. 169–207 (2004)
Google Scholar
L. Breiman. Bagging predictors. Machine Learning, 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, Berkeley, CA, 1996
Google Scholar
L. Breiman. Random forests. Machine Learning, 45(1), 5–32 (2001)
Article MATH Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, California. (1994)
Google Scholar
D. Brian and G.I. Webb. On the effect of data set size on bias and variance in classification learning. In Proceedings of the 4th Australian Knowledge Acquisition Workshop, pp. 117–128 (1999)
Google Scholar
G. Brown, J. Wyatt, R. Harris, and X. Yao. Diversity creation methods: A survey and categorisation. Journal of Information Fusion, 6(1), 5–20 (2005)
Article Google Scholar
C. Cortes, L.D. Jackel, S.A. Solla, V. Vapnik, and J.S. Denker. Learning curves: Asymptotic values and rate of convergence. In Advances in Neural Information Processing Systems: 6, pp. 327–334 (1994)
Google Scholar
T.M. Cover and P.E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27 (1967)
Article MATH Google Scholar
Pedro Domingos. A unified bias–variance decomposition and its applications. In Proceedings of the 17th International Conference on Machine Learning, pp. 231–238. Morgan Kaufmann, San Francisco (2000)
Google Scholar
R O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2nd edition, New York (2000)
Google Scholar
C. Eitzinger, W. Heidl, E. Lughofer, S. Raiser, J.E. Smith, M.A. Tahir, D. Sannen and H. Van Brussel. Assessment of the Influence of Adaptive Components in Trainable Surface Inspection Systems, Machine Vision and Applications, 21(5), 613–626 (2010)
Article Google Scholar
Yoav Freund and Robert E. Shapire. Experiments with a new boosting algorithm. In Proceedings of 13th International Conference on Machine Learning, pp. 148–156. (1996)
Google Scholar
J. H. Friedman. On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77 (2000)
Article Google Scholar
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, textbf4, 1–48 (1995)
Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag, New York, Heidelberg, London (2001)
MATH Google Scholar
G. James. Variance and bias for general loss functions. Machine Learning, 51(2), 115–135 (2003)
Article MATH Google Scholar
R. Kohavi. The power of decision tables. In Proceedings of the 8th European Conference on Machine Learning, pp. 174–189. Springer Verlag, London, UK (1995)
Google Scholar
R. Kohavi and D. H. Wolpert. Bias plus variance decomposition for zero-one loss functions. In Proceedings of the 13th International Conference on Machine Learning., pp. 275–283 (1996)
Google Scholar
B. E. Kong and T. G. Dietterich. Error-correcting output coding corrects bias and variance. In Proceedings of the 12th International Conference on Machine Learning, pp. 313–321, San Francisco, Morgan Kaufmann (1995)
Google Scholar
R. Leite and P. Brazdil. Improving Progressive Sampling via Meta-learning on Learning Curves. In Proceedings of the European Conference on machine Learning (ECML) pp. 250–261 (2004)
Google Scholar
E. Lughofer. Extensions of vector quantization for incremental clustering. Pattern Recognition, 41(3), 995–1011 (2008)
Article MATH Google Scholar
E. Lughofer, J.E. Smith, M.A. Tahir, P. Caleb-Solly, C. Eitzinger, D. Sannen and M. Nuttin. Human–Machine Interaction Issues in Quality Control Based on On-Line Image Classification, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 39(5), 960–971 (2009)
Article Google Scholar
S. Mukherjee, P. Tamayo, S. Rogers, R.M. Rifkin, A. Engle, C. Campbell, T.R. Golub, and J.P. Mesirov. Estimating dataset size requirements for classifying DNA microarray data. Journal of Computational Biology, 10(2), 119–142 (2003)
Article Google Scholar
Bartlett P.L. and S Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482 (2002)
Google Scholar
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods—Support Vector Learning. MIT Press, Mass (1998)
Google Scholar
F.J. Provost, D. Jensen and T. Oates. Efficient Progressive Sampling. In Proceedings of Knowledge Discovery in Databases (KDD), pp. 23–32 (1999)
Google Scholar
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
J. J. Rodriguez, C. J. Alonso, and O. J. Prieto. Bias and variance of rotation-based ensembles. In Computational Intelligence and Bioinspired Systems, number 3512 in Lecture Notes in Computer Science, pp. 779–786. Springer, Berlin, Heidelberg (2005)
Google Scholar
D. Sannen, H. Van Brussel, and M. Nuttin. Classifier fusion using discounted Dempster–Shafer combination. In Proceedings of the 5th International Conference on Machine Learning and Data Mining, Poster Proceedings, pp. 216–230 (2007)
Google Scholar
J. E. Smith and M. A. Tahir. Stop wasting time: On predicting the success or failure of learning for industrial applications. In Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’08), number 4881 in Lecture Notes in Computer Science, pp. 673–683. Springer Verlag, Berlin, Heidelberg (2007)
Google Scholar
M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36, 111–147 (1974)
MATH Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York (2000)
MATH Google Scholar
V. Vapnik and O. Chapelle. Bounds on error expectation for support vector machines. Neural Computation, 12(9), 2013–2036 (2000)
Article Google Scholar
G. I. Webb. Multiboosting: A technique for combining boosting and wagging. Machine Learning, 40(2), 159–196 (2000)
Article Google Scholar
Geoffrey I. Webb and Paul Conilione. Estimating bias and variance from data. Technical report, Monash University, http://www.csse.monash.edu.au/webb/Files/WebbConilione03.pdf (2004)
I.H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition (2005)
MATH Google Scholar

Download references

Acknowledgements

This work was supported by the European Commission (project Contract No. STRP016429, acronym DynaVis). This publication reflects only the authors’ views.

Author information

Authors and Affiliations

Department of Computer Science and Creative Technologies, University of the West of England, Bristol, BS16 1QY, UK
James Edward Smith
School of Computing, Engineering and Information Sciences, University of Northumbria, Newcastle, UK
Muhammad Atif Tahir
Department of Mechanical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
Davy Sannen & Hendrik Van Brussel

Authors

James Edward Smith
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Atif Tahir
View author publications
You can also search for this author in PubMed Google Scholar
Davy Sannen
View author publications
You can also search for this author in PubMed Google Scholar
Hendrik Van Brussel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Edward Smith .

Editor information

Editors and Affiliations

, Départment Informatique et Automatique, Ecole des Mines de Douai, 941, Rue Charles Bourseul, Douai cedex, 59508, France
Moamar Sayed-Mouchaweh
University of Linz, Weissdornweg 16, Linz, 4232, Austria
Edwin Lughofer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Smith, J.E., Tahir, M.A., Sannen, D., Van Brussel, H. (2012). Making Early Predictions of the Accuracy of Machine Learning Classifiers. In: Sayed-Mouchaweh, M., Lughofer, E. (eds) Learning in Non-Stationary Environments. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8020-5_6

Download citation

DOI: https://doi.org/10.1007/978-1-4419-8020-5_6
Published: 13 March 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-8019-9
Online ISBN: 978-1-4419-8020-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics