Machine Learning

, Volume 97, Issue 3, pp 295–326 | Cite as

An improved multiclass LogitBoost using adaptive-one-vs-one

Article

Abstract

LogitBoost is a popular Boosting variant that can be applied to either binary or multi-class classification. From a statistical viewpoint LogitBoost can be seen as additive tree regression by minimizing the Logistic loss. Following this setting, it is still non-trivial to devise a sound multi-class LogitBoost compared with to devise its binary counterpart. The difficulties are due to two important factors arising in multiclass Logistic loss. The first is the invariant property implied by the Logistic loss, causing the optimal classifier output being not unique, i.e. adding a constant to each component of the output vector won’t change the loss value. The second is the density of the Hessian matrices that arise when computing tree node split gain and node value fittings. Oversimplification of this learning problem can lead to degraded performance. For example, the original LogitBoost algorithm is outperformed by ABC-LogitBoost thanks to the latter’s more careful treatment of the above two factors. In this paper we propose new techniques to address the two main difficulties in multiclass LogitBoost setting: (1) we adopt a vector tree model (i.e. each node value is vector) where the unique classifier output is guaranteed by adding a sum-to-zero constraint, and (2) we use an adaptive block coordinate descent that exploits the dense Hessian when computing tree split gain and node values. Higher classification accuracy and faster convergence rates are observed for a range of public data sets when compared to both the original and the ABC-LogitBoost implementations. We also discuss another possibility to cope with LogitBoost’s dense Hessian matrix. We derive a loss similar to the multi-class Logistic loss but which guarantees a diagonal Hessian matrix. While this makes the optimization (by Newton descent) easier we unfortunately observe degraded performance for this modification. We argue that working with the dense Hessian is likely unavoidable, therefore making techniques like those proposed in this paper necessary for efficient implementations.

Keywords

LogitBoost Boosting Ensemble Supervised learning Convex optimization 

References

  1. Bertsekas, D. P. (1982). Constrained optimization and Lagrange multiplier methods. Boston: Academic Press.MATHGoogle Scholar
  2. Bottou, L., & Lin, C. J. (2007). Support vector machine solvers. In L. Bottou, O. Chapelle, D. DeCoste, & J. Weston (Eds.), Large scale Kernel machines (pp. 301–320). Cambridge: MIT Press. http://leon.bottou.org/papers/bottou-lin-2006
  3. Freund, Y., & Schapire, R. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory (pp. 23–37). New York: Springer.Google Scholar
  4. Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.MATHMathSciNetCrossRefGoogle Scholar
  5. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2), 337–407.MathSciNetCrossRefGoogle Scholar
  6. Jaynes, E. (1957). Information theory and statistical mechanics. The Physical Review, 106(4), 620–630.MATHMathSciNetCrossRefGoogle Scholar
  7. Kégl, B., & Busa-Fekete, R. (2009). Boosting products of base classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 497–504). New York: ACM.Google Scholar
  8. Kivinen. J., & Warmuth, M. K. (1999). Boosting as entropy projection. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 134–144). New York: ACM.Google Scholar
  9. Lafferty, J. (1999). Additive models, boosting, and inference for generalized divergences. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 125–133).Google Scholar
  10. Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine learning (pp. 473–480). New York: ACM.Google Scholar
  11. Li, P. (2008). Adaptive base class boost for multi-class classification. Arxiv preprint arXiv:08111250.Google Scholar
  12. Li, P. (2009a). Abc-boost: Adaptive base class boost for multi-class classification. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 625–632). New York: ACM.Google Scholar
  13. Li, P. (2009b). Abc-logitboost for multi-class classification. Arxiv preprint arXiv:09084144.Google Scholar
  14. Li, P. (2010a). An empirical evaluation of four algorithms for multi-class classification: Mart, abc-mart, robust logitboost, and abc-logitboost. Arxiv preprint arXiv:10011020.Google Scholar
  15. Li, P. (2010b). Robust logitboost and adaptive base class (abc) logitboost. In Conference on Uncertainty in Artificial Intelligence.Google Scholar
  16. Magnus, J. R., & Neudecker, H. (2007). Matrix differential calculus with applications in statistics and econometrics (3rd ed.). New York: Wiley.Google Scholar
  17. Masnadi-Shirazi, H., & Vasconcelos, N. (2010). Risk minimization, probability elicitation, and cost-sensitive svms. In Proceedings of the International Conference on Machine Learning (pp. 204–213).Google Scholar
  18. Reid, M. D., & Williamson, R. C. (2010). Composite binary losses. The Journal of Machine Learning Research, 11, 2387–2422.MATHMathSciNetGoogle Scholar
  19. Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine learning, 37(3), 297–336.MATHCrossRefGoogle Scholar
  20. Shen, C., & Hao, Z. (2011). A direct formulation for totally corrective multi-class boosting. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2585–2592).Google Scholar
  21. Shen, C., & Li, H. (2010). On the dual formulation of boosting algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2216–2231.MathSciNetCrossRefGoogle Scholar
  22. Zou, H., Zhu, J., & Hastie, T. (2008). New multicategory boosting algorithms based on multicategory fisher-consistent losses. The Annals of Applied Statistics, 2(4), 1290–1306.MATHMathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Tsinghua National Laboratory for Information Science and Technology(TNList), Department of AutomationTsinghua UniversityBeijingChina
  2. 2.Research School of Computer ScienceThe Australian National University and NICTACanberraAustralia

Personalised recommendations