Summary
Boosting is a method to construct a committee of weak learners that lowers the error rate in classification and prediction error in regression. Boosting works by iteratively constructing weak learners whose training set is conditioned on the performance of the previous members of the ensemble. In classification, we train neural networks using stochastic gradient descent and in regression, we train neural networks using conjugate gradient descent. We compare ensembles of neural networks to ensembles of trees and show that neural networks are superior. We also compare ensembles constructed using boosting to those constructed using bagging and show that boosting is generally superior. Finally, the importance of using separate training, validation, and test sets in order to obtain good generalisation is stressed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford, 1995.
Leo Breiman. Stacked regression. Technical Report 367, Department of Statistics, University of California at Berkeley, 1992.
Leo Breiman. Bagging predictors. Machine Learning, 24 (2): 123–140, 1996.
Leo Breiman. The heuristics of instability in model selection. Annals of Statistics, 24: 2350–2383, 1996.
Leo Breiuran. Prediction games and arcing classifiers. Technical Report 504, Statistics Department, University of California at Berkeley, 1997.
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
Thomas G. Dietterich. Machine learning research: Four current directions. AI Magazine, 18 (4): 97–136, 1997.
Harris Drucker. Improving regressors using boosting techniques. In Proceeding International Conference on Machine Learning, pages 107–115. Morgan Kaufman, 1997.
Harris Drucker, Corinna Cortes, L.D. Jackel, Yann LeCun, and Vladimir Vapnik. Boosting and other ensemble methods. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 479–485. Mogan-Kaufmann, 1996.
Harris Drucker, Robert Schapire, and Patrice Simard. Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7 (4): 705–719, 1993.
Harris Drucker, Robert Schapire, and Patrice Simard. Improving performance in neural networks using a boosting algorithm. In Stephen José Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems 5, pages 42–49. Morgan Kaufman, 1993.
Bradley Efron and Robert J. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, 1993.
S. Fahlman and C.E. Lebiere. The cascade-correlation learning architecture. Technical report, Carnegie Mellon University, 1990. Technical Report CM-CS90–100.
Yoav Freund. Boosting a weak learning algorithm by majority. In Proceedings of the Third Workshop on Computational Learning Theory, pages 202–216. Morgan-Kaufmann, 1990.
Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory: Second European Conference, EuroCOLT ‘85, pages 23–37. Springer-Verlag, 1995.
Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 148–156, 1996.
Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325–332, 1996.
Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55 (1): 119–139, August 1997.
Jerome H. Friedman. Multivariate adaptive regression splines. In Annals of Statistics, volume 19, 1991.
Simon Haykin. Neural Networks. MacMillin, 1994.
Michael I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6: 181–214, 1994.
Y. LeCun, L. D. Jackel, L. Bottou, A. Brunot, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Säckinger, P. Simard, and V. N. Vapnik. Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman and P. Gallinari, editors, International Conference on Artificial Neural Networks, pages 53–60, Paris, 1995. EC2 & Cie.
Y. LeCun, L. D. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Säckinger, P. Simard, and V. N. Vapnik. Learning algorithms for classification: A comparison on handwri tten digit recognition. In J. H. Oh, C. Kwon, and S. Cho, editors, Neural Networks: The Statistical Mechanics Perspective, pages 261–276. World Scientific, 1995.
Yann LeCun, Bernard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, William Hubbard, and Larry D. Jackel. Handwritten digit recognition with a back-propagating network. In David Touretzky, editor, Advances in Neural Information Processing Systems 2. Margan Kaufmann, 1989.
David Luenberger. Introduction to Linear and Nonlinear Programming. Addison Wesley, 1973.
J. Mingers. An empirical comparison of pruning methods for decision trees. Machine Learning, 4: 277–243, 1989.
William H. Press, Brian P. Flannery, Sau A. Teukolsky, and William T. Vetterling. Numerical Recipes in C. Cambridge, 1990.
J. Ross Quinlin. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1988.
Robert E. Schapire. The strength of weak learnability. In 30th Annual Symposium on Foundations of Computer Science, pages 28–33, October 1989.
Robert E. Schapire. Using output codes to boost mulitclass learning problems. In Proceeding International Conference on Machine learning. Morgan-Kaufmann, 1997.
Robert E. Schapire and Yoram Singer. Improved boosing algorithms using confidence-rated predictions. In Proceeding of the Eleventh Annual Conference on Computation Learning Theory, 1998.
Holger Schwenk and Yoshua Bengio. Adaptive boosting of neural networks for character recognition. In Advances in Neural Information Processing Systems 10, 1997.
Vladimir Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, 1982.
David H. Wolpert. Stacked generalization. Neural Networks, 5: 241–259, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this chapter
Cite this chapter
Sharkey, A.J.C. (1999). Boosting Using Neural Networks. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0793-4_3
Publisher Name: Springer, London
Print ISBN: 978-1-85233-004-0
Online ISBN: 978-1-4471-0793-4
eBook Packages: Springer Book Archive