Increasing the Robustness of Boosting Algorithms within the Linear-programming Framework

Article

Abstract

AdaBoost has been successfully used in many signal classification systems. However, it has been observed that on highly noisy data AdaBoost easily leads to overfitting, which seriously constrains its applicability. In this paper, we address this problem by proposing a new regularized boosting algorithm LPnorm2-AdaBoost (LPNA). This algorithm arises from a close connection between AdaBoost and linear programming. In the algorithm, skewness of the data distribution is controlled during the training to prevent outliers from spoiling decision boundaries. To this end, a smooth convex penalty function (l2 norm) is introduced in the objective function of a minimax problem. A stabilized column generation technique is used to transform the optimization problem into a simple linear programming problem. The effectiveness of the proposed algorithm is demonstrated through experiments on many diverse datasets.

Keywords

pattern classification large margin classifier AdaBoost linear programming minimax problem soft margin regularization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Y. Freund and R. E. Schapire, “A Decision-theoretic Generalization of On-line Learning and an Application to Boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, 1997, pp. 119–139.MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    R. Meir and G. Rätsch, “An Introduction to Boosting and Leveraging,” in Advanced Lectures on Machine Learning, S. Mendelson and A. Smola (Eds.), Springer, 2003, pp. 119–184.Google Scholar
  3. 3.
    H. Schwenk, “Using Boosting to Improve a Hybrid HMM/Neural Network Speech Recognizer,” in Proc. Intl. Conf. Acoustics, Speech, Signal Processing, Phoenix, AZ, USA, 1999, pp. 1009–1012.Google Scholar
  4. 4.
    R. Zhang and A. I. Rudnicky, “Improving the Performance of an LVCSR System Through Ensembles of Acoustic Models,” in Proc. Intl. Conf. Acoustics, Speech, Signal Processing, vol. 1, Hong Kong, 2003, pp. 876–879.Google Scholar
  5. 5.
    G. Tur, R. E. Schapire, and D. Hakkani-Tur, “Active Learning for Spoken Language Understanding,” in Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Proc., Hong Kong, China, 2003.Google Scholar
  6. 6.
    J. Miteran, J. Matas, E. Bourennane, M. Paindavoine, and J. Dubois, “Automatic Hardware Implementation Tool for a Discrete AdaBoost-based Decision Algorithm,” EURASIP J. Appl. Signal Process., vol. 2005, no. 7, 2005, pp. 1035–1046.MATHCrossRefGoogle Scholar
  7. 7.
    R. Nishii and S. Eguchi, “Robust Supervised Image Classifiers by Spatial AdaBoost Based on Robust Loss Functions, ” in Proc. SPIE, Image and Signal Processing for Remote Sensing XI, vol. 5982, no. 1., 2005.Google Scholar
  8. 8.
    J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kegl, “Meta-features and AdaBoost for Music Classification,” Machine Learning, 2006 (in press).Google Scholar
  9. 9.
    T. G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Mach. Learn., vol. 40, no. 2, 2000, pp. 139–157.CrossRefGoogle Scholar
  10. 10.
    G. Rätsch, T. Onoda, and K.-R. Müller, “Soft Margins for AdaBoost,” Mach. Learn., vol. 42, no. 3, 2001, pp. 287–320.MATHCrossRefGoogle Scholar
  11. 11.
    J. R. Quinlan,“C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  12. 12.
    A. J. Grove and D. Schuurmans,“Boosting in the Limit: Maximizing the Margin of Learned Ensembles,” in Proc. 15th Nat’l Conf. on Artificial Intelligence, Madison, WI, USA, 1998, pp. 692–699.Google Scholar
  13. 13.
    G. Rätsch, “Robust Boosting via Convex Optimization: Theory and Application,” Ph.D. dissertation, University of Potsdam, Germany, 2001.Google Scholar
  14. 14.
    L. Breiman, “Prediction Games and Arcing algorithms,” Neural Comput., vol. 11, no. 7, 1999, pp. 1493–1517, October.CrossRefGoogle Scholar
  15. 15.
    Y. Freund and R. E. Schapire, “Game Theory, On-line Prediction and Boostin,” in Proc. 9th Annual Conf. Computational Learning Theory, Desenzano del Garda, Italy, 1996, pp. 325–332.Google Scholar
  16. 16.
    C. Cortes and V. Vapnik, “Support Vector Networks,” Mach. Learn., vol. 20, 1995, pp. 273–297.MATHGoogle Scholar
  17. 17.
    A. Demiriz, K. P. Bennett, and J. Shawe-Taylor, “Linear Programming Boosting via Column Generation,” Mach. Learn., vol. 46, 2002, pp. 225–254.MATHCrossRefGoogle Scholar
  18. 18.
    L. Breiman, “Bagging Predictors,” Mach. Learn., vol. 24, 1996, pp. 123–140.MATHMathSciNetGoogle Scholar
  19. 19.
    L. Breiman, “Arcing Classifiers,” Ann. Stat., vol. 26, no. 3, 1998, pp. 801–849.MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    L. Mason, J. Bartlett, P. Baxter, and M. Frean, “Functional Gradient Techniques for Combining Hypotheses,” in Advances in Large Margin Classifiers, B. Scholkopf, A. Smola, P. Bartlett, and D. Schuurmans (Eds.), MIT, Cambridge, MA, USA, 2000, pp. 221–247.Google Scholar
  21. 21.
    W. Jiang, “Some Theoretical Aspects of Boosting in the Presence of Noisy Data,” in Proc. 18th Intl. Conf. on Machine Learning, Williams College, MA, 2001, pp. 234–241.Google Scholar
  22. 22.
    W. Jiang, “Is Regularization Unnecessary for Boosting,” in Proc. Eighth Intl. Workshop on Artificial Intelligence and Statistics, Key West, FL, 2001.Google Scholar
  23. 23.
    R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” in Proc. 14th Intl. Conf. on Machine Learning, Nashville, TN, USA, 1997, pp. 322–330.Google Scholar
  24. 24.
    C. Rudin, I. Daubechies, and R. E. Schapire, “The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins,” J. Mach. Learn. Res., vol. 5, 2004, pp. 1557–1595, Dec.MathSciNetGoogle Scholar
  25. 25.
    J. von Neumann, “Zur Theorie Der Gesellschaftsspiele,” Math. Ann., vol. 100, 1928, pp. 295–320.CrossRefMATHMathSciNetGoogle Scholar
  26. 26.
    V. Vapnik, “Statistical Learning Theory,” New York, Wiley, 1998.MATHGoogle Scholar
  27. 27.
    I. Ekeland and R. Temam, “Convex Analysis and Variational Problems,” Amsterdam, Holland, North-Holland, 1976.MATHGoogle Scholar
  28. 28.
    E. K. P. Chong and S. H. Zak, “An Introduction to Optimization,” New York, Wiley, 2001.MATHGoogle Scholar
  29. 29.
    R. E. Marsten, W. W. Hogan, and J. W. Blankenship, “The BOXSTEP Method for Large-scale Optimization,” Oper. Res., vol. 23, 1975, pp. 389–405.MATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    J. Moody and C. Darken, “Fast Learning in Networks of Locally-tuned Processing Units,” Neural Comput., vol. 1, no. 2, 1989, pp. 281–294.Google Scholar
  31. 31.
    C. Bishop, “Neural Networks for Pattern Recognition,” Claredon, Oxford, 1995.Google Scholar
  32. 32.
    G. Rätsch, “IDA Benchmark Repository,” 2001. [Online]. Available at http://ida.first.fhg.de/projects/bench/benchmarks.htm.
  33. 33.
    R. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-rated Predictions,” Mach. Learn., vol. 37, no. 3, 1999, pp. 297–336.MATHCrossRefGoogle Scholar
  34. 34.
    R. E. Schapire, “Using Output Codes to Boost Multiclass Learning Problems,” in Proc. 14th Intl. Conf. Machine Learning, Nashville, TN, USA, 1997, pp. 313–321.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Interdisciplinary Center for Biotechnology ResearchUniversity of FloridaGainesvilleUSA
  2. 2.Beckman Institute for Advanced Science and TechnologyUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.Department of Electrical and Computer EngineeringUniversity of FloridaGainesvilleUSA

Personalised recommendations