Machine Learning

, Volume 61, Issue 1–3, pp 71–103 | Cite as

The Synergy Between PAV and AdaBoost

Article

Abstract

Schapire and Singer's improved version of AdaBoost for handling weak hypotheses with confidence rated predictions represents an important advance in the theory and practice of boosting. Its success results from a more efficient use of information in weak hypotheses during updating. Instead of simple binary voting a weak hypothesis is allowed to vote for or against a classification with a variable strength or confidence. The Pool Adjacent Violators (PAV) algorithm is a method for converting a score into a probability. We show how PAV may be applied to a weak hypothesis to yield a new weak hypothesis which is in a sense an ideal confidence rated prediction and that this leads to an optimal updating for AdaBoost. The result is a new algorithm which we term PAV-AdaBoost. We give several examples illustrating problems for which this new algorithm provides advantages in performance.

Keywords

boosting isotonic regression convergence document classification k nearest neighbors 

References

  1. Apte, C., Damerau, F., & Weiss, S. (1998). Text mining with decision rules and decision trees. Conference Proceedings The Conference on Automated Learning and Discovery, CMU.Google Scholar
  2. Aslam, J. (2000). Improving algorithms for boosting. Conference Proceedings 13th COLT. Palo Alto, California.Google Scholar
  3. Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T., & Silverman, E. (1954). An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 26, 641–647.MathSciNetGoogle Scholar
  4. Bennett, K. P., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation algorithm for boosting. Conference Proceedings 17th ICML.Google Scholar
  5. Buja, A., Hastie, T., & Tibshirani, R. (1989). Linear smoothers and additive models. The Annals of Statistics, 17:2 453–555.MathSciNetGoogle Scholar
  6. Burges, C. J. C. (1999). A tutorial on support vector machines for pattern recognition (Available electronically from the author): Bell Laboratories, Lucent Technologies.Google Scholar
  7. Carreras, X., & Marquez, L. (2001). September 5–7, 2001. Boosting trees for anti-spam email filtering. Conference Proceedings RANLP2001, Tzigov Chark, Bulgaria.Google Scholar
  8. Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48:1, 253–285.CrossRefGoogle Scholar
  9. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification (2 edn.). New York: John Wiley & Sons, Inc.Google Scholar
  10. Duffy, N., & Helmbold, D. (1999). Potential boosters? Conference Proceedings Advances in Neural Information Processing Systems 11.Google Scholar
  11. Duffy, N., & Helmbold, D. (2000). Leveraging for regression. Conference Proceedings 13th Annual Conference on Computational Learning Theory. San Francisco.Google Scholar
  12. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal or Computer and System Sciences, 55:1, 119–139.MathSciNetGoogle Scholar
  13. Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 38:2, 337–374.MathSciNetGoogle Scholar
  14. Hardle, W. (1991). Smoothing Techniques: With Implementation in S. New York: Springer-Verlag.Google Scholar
  15. Johnson, M., Geman, S., Canon, S., Chi, Z., & Riezler, S. (1999). Estimators for stochastic “unification-based” grammars. Conference Proceedings Proceedings ACL'99. Univ. Maryland.Google Scholar
  16. Kim, W., Aronson, A. R., & Wilbur, W. J. (2001). Automatic MeSH term assignment and quality assessment. Conference Proceedings Proc. AMIA Symp. Washington, D.C.Google Scholar
  17. Kim, W. G., & Wilbur, W. J. (2001). Corpus-based statistical screening for content-bearing terms. Journal of the American Society for Information Science, 52:3, 247–259.Google Scholar
  18. Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. Conference Proceedings Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA.Google Scholar
  19. Maclin, R. (1998). Boosting classifiers locally. Conference Proceedings Proceedings of AAAI.Google Scholar
  20. Mason, L., Bartlett, P. L., & Baxter, J. (2000). Improved generalizations through explicit optimizations of margins. Machine Learning, 38, 243–255.CrossRefGoogle Scholar
  21. McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. Conference Proceedings AAAI-98 Workshop on Learning for Text Categorization.Google Scholar
  22. Meir, R., El-Yaniv, R., & Ben-David, S. 2000. Localized boosting. Conference Proceedings 13th COLT. Palo Alto, California.Google Scholar
  23. Mitchell, T. M. (1997). Machine learning. Boston: WCB/McGraw-Hill.Google Scholar
  24. Moerland, P., & Mayoraz, E. (1999). DynamBoost: combining boosted hypotheses in a dynamic way (Technical Report RR 99-09): IDIAP Switzerland.Google Scholar
  25. Nock, R., & Sebban, M. (2001). A Bayesian boosting theorem. Pattern Recognition Letters, 22, 413–419.Google Scholar
  26. Pardalos, P. M., & Xue, G. (1999). Algorithms for a class of isotonic regression problems. Algorithmica, 23, 211–222.MathSciNetGoogle Scholar
  27. Ratsch, G., Mika, S., & Warmuth, M. K. (2001).On the Convergence of Leveraging (NeuroCOLT2 Technical Report 98). London: Royal Holloway College.Google Scholar
  28. Ratsch, G., Onoda, T., & Muller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42, 287–320.Google Scholar
  29. Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. Conference Proceedings 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Google Scholar
  30. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 297–336.CrossRefGoogle Scholar
  31. Vapnik, V. (1998). Statistical Learning Theory. New York: John Wiley & Sons, Inc.Google Scholar
  32. Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing Gigabytes (2 edn.). San Francisco: Morgan-Kaufmann Publishers, Inc.Google Scholar
  33. Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4:1, 5–31.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaU.S.A.

Personalised recommendations