Article

Machine Learning

, Volume 61, Issue 1, pp 71-103

The Synergy Between PAV and AdaBoost

  • W. John WilburAffiliated withNational Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Email author 
  • , Lana YeganovaAffiliated withNational Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
  • , Won KimAffiliated withNational Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

Abstract

Schapire and Singer's improved version of AdaBoost for handling weak hypotheses with confidence rated predictions represents an important advance in the theory and practice of boosting. Its success results from a more efficient use of information in weak hypotheses during updating. Instead of simple binary voting a weak hypothesis is allowed to vote for or against a classification with a variable strength or confidence. The Pool Adjacent Violators (PAV) algorithm is a method for converting a score into a probability. We show how PAV may be applied to a weak hypothesis to yield a new weak hypothesis which is in a sense an ideal confidence rated prediction and that this leads to an optimal updating for AdaBoost. The result is a new algorithm which we term PAV-AdaBoost. We give several examples illustrating problems for which this new algorithm provides advantages in performance.

Keywords

boosting isotonic regression convergence document classification k nearest neighbors