Advertisement

Enhance AdaBoost Algorithm by Integrating LDA Topic Model

  • Fangyu GaiEmail author
  • Zhiqiang Li
  • Xinwen Jiang
  • Hongchen Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9714)

Abstract

AdaBoost is an ensemble method, which is considered to be one of the most influential algorithms for multi-label classification. It has been successfully applied to diverse domains for its tremendous simplicity and accurate prediction. To choose the weak hypotheses, AdaBoost has to examine the whole features individually, which will dramatically increase the computational time of classification, especially for large scale datasets. In order to tackle this problem, we a introduce Latent Dirichlet Allocation (LDA) model to improve the efficiency and effectiveness of AdaBoost by mapping word-matrix into topic-matrix. In this paper, we propose a framework integrating LDA and AdaBoost, and test it with two Chinese Language corpora. Experiments show that our method outperforms the traditional AdaBoost using BOW model.

Keywords

AdaBoost Ensemble method Text categorization 

Notes

Acknowledgments

This work is supported by the National Science Foundation of China under Grants 61272010.

References

  1. 1.
    Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 163–222. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Esuli, A., Fagni, T., Sebastiani, F.: MP-Boost: a multiple-pivot boosting algorithm and its application to text categorization. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Ferreira, A.J., Figueiredo, M.A.: Boosting algorithms: a review of methods, theory, and applications. In: Zhang, C., Yunqian, M. (eds.) Ensemble Machine Learning, pp. 35–85. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J. Jap. Soc. Artif. Intell. 14(771–780), 771–780 (1999)Google Scholar
  6. 6.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. Pattern Anal. Mach. Intell., IEEE Trans. 6, 721–741 (1984)CrossRefzbMATHGoogle Scholar
  8. 8.
    Iwakura, T., Saitou, T., Okamoto, S.: An AdaBoost for efficient use of confidences of weak hypotheses on text categorization. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 782–794. Springer, Heidelberg (2014)Google Scholar
  9. 9.
    Lee, C., Lee, G.G.: Information gain and divergence-based feature selection for machine learning-based text categorization. Inf. Process. Manage. 42(1), 155–165 (2006)CrossRefGoogle Scholar
  10. 10.
    Morchid, M., Dufour, R., Linares, G.: A lda-based topic classification approach from highly imperfect automatic transcriptions. In: LREC 2014 (2014)Google Scholar
  11. 11.
    Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)CrossRefzbMATHGoogle Scholar
  12. 12.
    Tan, S., Cheng, X., Ghanem, M.M., Wang, B., Xu, H.: A novel refinement approach for text categorization. In: Proceedings of the 14th ACM International Conference on Information and knowledge Management, pp. 469–476. ACM (2005)Google Scholar
  13. 13.
    Uğuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl. Based Syst. 24(7), 1024–1032 (2011)CrossRefGoogle Scholar
  14. 14.
    Wang, Y., Guo, Q.: Multi-lda hybrid topic model with boosting strategy and its application in text classification. In: 2014 33rd Chinese Control Conference (CCC), pp. 4802–4806. IEEE (2014)Google Scholar
  15. 15.
    Xiong, W., Wan, Z., Bai, X., Xing, H., Zuo, H., Zhu, K., Yang, S.: Adaboost-based multi-attribute classification technology and its application. In: 76th EAGE Conference and Exhibition 2014 (2014)Google Scholar
  16. 16.
    Zhu, J., Zou, H., Rosset, S., Hastie, T.: Multi-class AdaBoost. Stat. Interface 2(3), 349–360 (2009)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Fangyu Gai
    • 1
    Email author
  • Zhiqiang Li
    • 2
  • Xinwen Jiang
    • 1
  • Hongchen Guo
    • 2
  1. 1.School of ComputerNational University of Defense TechnologyChangshaChina
  2. 2.Network Service Center, Beijing Institude of TechnologyBeijingChina

Personalised recommendations