Fast Training of Effective Multi-class Boosting Using Coordinate Descent Optimization
We present a novel column generation based boosting method for multi-class classification. Our multi-class boosting is formulated in a single optimization problem as in . Different from most existing multi-class boosting methods, which use the same set of weak learners for all the classes, we train class specified weak learners (i.e., each class has a different set of weak learners). We show that using separate weak learner sets for each class leads to fast convergence, without introducing additional computational overhead in the training procedure. To further make the training more efficient and scalable, we also propose a fast coordinate descent method for solving the optimization problem at each boosting iteration. The proposed coordinate descent method is conceptually simple and easy to implement in that it is a closed-form solution for each coordinate update. Experimental results on a variety of datasets show that, compared to a range of existing multi-class boosting methods, the proposed method has much faster convergence rate and better generalization performance in most cases. We also empirically show that the proposed fast coordinate descent algorithm needs less training time than the MultiBoost algorithm in .
KeywordsTraining Time Column Generation Master Problem Weak Learner Coordinate Descent
Unable to display preview. Download preview PDF.
- 1.Shen, C., Hao, Z.: A direct formulation for totally-corrective multi-class boosting. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (2011)Google Scholar
- 5.Guruswami, V., Sahai, A.: Multiclass learning, boosting, and error-correcting codes. In: Proc. Annual Conf. Computational Learning Theory, pp. 145–155. ACM, New York (1999)Google Scholar
- 6.Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. In: Machine Learn., pp. 80–91 (1999)Google Scholar
- 9.Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization. ACM Trans. Math. Software (1994)Google Scholar
- 10.Yuan, G.X., Chang, K.W., Hsieh, C.J., Lin, C.J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Mach. Learn. Res., 3183–3234 (2010)Google Scholar
- 12.Guillaumin, M., Verbeek, J., Schmid, C.: Multimodal semi-supervised learning for image classification. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (2010)Google Scholar
- 13.Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., vol. 2, pp. 2169–2178 (2006)Google Scholar
- 14.Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: Large-scale scene recognition from abbey to zoo. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3485–3492 (2010)Google Scholar