Increasing the Robustness of Boosting Algorithms within the Linear-programming Framework
- 74 Downloads
AdaBoost has been successfully used in many signal classification systems. However, it has been observed that on highly noisy data AdaBoost easily leads to overfitting, which seriously constrains its applicability. In this paper, we address this problem by proposing a new regularized boosting algorithm LPnorm2-AdaBoost (LPNA). This algorithm arises from a close connection between AdaBoost and linear programming. In the algorithm, skewness of the data distribution is controlled during the training to prevent outliers from spoiling decision boundaries. To this end, a smooth convex penalty function (l2 norm) is introduced in the objective function of a minimax problem. A stabilized column generation technique is used to transform the optimization problem into a simple linear programming problem. The effectiveness of the proposed algorithm is demonstrated through experiments on many diverse datasets.
Keywordspattern classification large margin classifier AdaBoost linear programming minimax problem soft margin regularization
Unable to display preview. Download preview PDF.
- 2.R. Meir and G. Rätsch, “An Introduction to Boosting and Leveraging,” in Advanced Lectures on Machine Learning, S. Mendelson and A. Smola (Eds.), Springer, 2003, pp. 119–184.Google Scholar
- 3.H. Schwenk, “Using Boosting to Improve a Hybrid HMM/Neural Network Speech Recognizer,” in Proc. Intl. Conf. Acoustics, Speech, Signal Processing, Phoenix, AZ, USA, 1999, pp. 1009–1012.Google Scholar
- 4.R. Zhang and A. I. Rudnicky, “Improving the Performance of an LVCSR System Through Ensembles of Acoustic Models,” in Proc. Intl. Conf. Acoustics, Speech, Signal Processing, vol. 1, Hong Kong, 2003, pp. 876–879.Google Scholar
- 5.G. Tur, R. E. Schapire, and D. Hakkani-Tur, “Active Learning for Spoken Language Understanding,” in Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Proc., Hong Kong, China, 2003.Google Scholar
- 7.R. Nishii and S. Eguchi, “Robust Supervised Image Classifiers by Spatial AdaBoost Based on Robust Loss Functions, ” in Proc. SPIE, Image and Signal Processing for Remote Sensing XI, vol. 5982, no. 1., 2005.Google Scholar
- 8.J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kegl, “Meta-features and AdaBoost for Music Classification,” Machine Learning, 2006 (in press).Google Scholar
- 11.J. R. Quinlan,“C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
- 12.A. J. Grove and D. Schuurmans,“Boosting in the Limit: Maximizing the Margin of Learned Ensembles,” in Proc. 15th Nat’l Conf. on Artificial Intelligence, Madison, WI, USA, 1998, pp. 692–699.Google Scholar
- 13.G. Rätsch, “Robust Boosting via Convex Optimization: Theory and Application,” Ph.D. dissertation, University of Potsdam, Germany, 2001.Google Scholar
- 15.Y. Freund and R. E. Schapire, “Game Theory, On-line Prediction and Boostin,” in Proc. 9th Annual Conf. Computational Learning Theory, Desenzano del Garda, Italy, 1996, pp. 325–332.Google Scholar
- 20.L. Mason, J. Bartlett, P. Baxter, and M. Frean, “Functional Gradient Techniques for Combining Hypotheses,” in Advances in Large Margin Classifiers, B. Scholkopf, A. Smola, P. Bartlett, and D. Schuurmans (Eds.), MIT, Cambridge, MA, USA, 2000, pp. 221–247.Google Scholar
- 21.W. Jiang, “Some Theoretical Aspects of Boosting in the Presence of Noisy Data,” in Proc. 18th Intl. Conf. on Machine Learning, Williams College, MA, 2001, pp. 234–241.Google Scholar
- 22.W. Jiang, “Is Regularization Unnecessary for Boosting,” in Proc. Eighth Intl. Workshop on Artificial Intelligence and Statistics, Key West, FL, 2001.Google Scholar
- 23.R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” in Proc. 14th Intl. Conf. on Machine Learning, Nashville, TN, USA, 1997, pp. 322–330.Google Scholar
- 30.J. Moody and C. Darken, “Fast Learning in Networks of Locally-tuned Processing Units,” Neural Comput., vol. 1, no. 2, 1989, pp. 281–294.Google Scholar
- 31.C. Bishop, “Neural Networks for Pattern Recognition,” Claredon, Oxford, 1995.Google Scholar
- 32.G. Rätsch, “IDA Benchmark Repository,” 2001. [Online]. Available at http://ida.first.fhg.de/projects/bench/benchmarks.htm.
- 34.R. E. Schapire, “Using Output Codes to Boost Multiclass Learning Problems,” in Proc. 14th Intl. Conf. Machine Learning, Nashville, TN, USA, 1997, pp. 313–321.Google Scholar