Machine Learning

, Volume 38, Issue 3, pp 243–255 | Cite as

Improved Generalization Through Explicit Optimization of Margins

  • Llew Mason
  • Peter L. Bartlett
  • Jonathan Baxter
Article

Abstract

Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large margins on the training data (i.e., correct examples are classified well away from the decision boundary). Neural network algorithms and AdaBoost have been shown to implicitly maximize margins, thus providing some theoretical justification for their remarkably good generalization performance. In this paper we are concerned with maximizing the margin explicitly. In particular, we prove a theorem bounding the generalization performance of convex combinations in terms of general cost functions of the margin, in contrast to previous results, which were stated in terms of the particular cost function sgn(θ − margin). We then present a new algorithm, DOOM, for directly optimizing a piecewise-linear family of cost functions satisfying the conditions of the theorem. Experiments on several of the datasets in the UC Irvine database are presented in which AdaBoost was used to generate a set of base classifiers and then DOOM was used to find the optimal convex combination of those classifiers. In all but one case the convex combination generated by DOOM had lower test error than AdaBoost's combination. In many cases DOOM achieves these lower test errors by sacrificing training error, in the interests of reducing the new cost function. In our experiments the margin plots suggest that the size of the minimum margin is not the critical factor in determining generalization performance.

voting methods ensembles margins analysis boosting 

References

  1. Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525–536.Google Scholar
  2. Blake, C., Keogh, E., & Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/ªmlearn/MLRepository.html.Google Scholar
  3. Breiman, L. (1997). Prediction games and arcing algorithms. Technical Report 504, Department of Statistics, University of California, Berkeley.Google Scholar
  4. Frean, M. & Downs, T. (1998). A simple cost function for boosting. Technical Report, Department of Computer Science and Electrical Engineering, University of Queensland.Google Scholar
  5. Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.Google Scholar
  6. Grove, A. & Schuurmans, D. (1998). Boosting in the limit: maximizing the margin of learned ensembles. Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 692–699).Google Scholar
  7. Mason, L., Baxter, J., Bartlett, P. L., & Frean, M. (to appear). Functional gradient techniques for combining hypotheses. In: A. J. Smola, P. Bartlett, B. Schölkopf, & C. Schuurmans (Eds.), Advances in large margin classifiers. Cambridge, MA: MIT Press.Google Scholar
  8. Schapire, R. E., Freund, Y., Bartlett, P. L., & Lee, W. S. (1998). Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Llew Mason
    • 1
  • Peter L. Bartlett
    • 1
  • Jonathan Baxter
    • 1
  1. 1.Research School of Information Sciences and EngineeringAustralian National UniversityCanberraAustralia

Personalised recommendations