Enhance AdaBoost Algorithm by Integrating LDA Topic Model
AdaBoost is an ensemble method, which is considered to be one of the most influential algorithms for multi-label classification. It has been successfully applied to diverse domains for its tremendous simplicity and accurate prediction. To choose the weak hypotheses, AdaBoost has to examine the whole features individually, which will dramatically increase the computational time of classification, especially for large scale datasets. In order to tackle this problem, we a introduce Latent Dirichlet Allocation (LDA) model to improve the efficiency and effectiveness of AdaBoost by mapping word-matrix into topic-matrix. In this paper, we propose a framework integrating LDA and AdaBoost, and test it with two Chinese Language corpora. Experiments show that our method outperforms the traditional AdaBoost using BOW model.
KeywordsAdaBoost Ensemble method Text categorization
This work is supported by the National Science Foundation of China under Grants 61272010.
- 5.Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J. Jap. Soc. Artif. Intell. 14(771–780), 771–780 (1999)Google Scholar
- 8.Iwakura, T., Saitou, T., Okamoto, S.: An AdaBoost for efficient use of confidences of weak hypotheses on text categorization. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 782–794. Springer, Heidelberg (2014)Google Scholar
- 10.Morchid, M., Dufour, R., Linares, G.: A lda-based topic classification approach from highly imperfect automatic transcriptions. In: LREC 2014 (2014)Google Scholar
- 12.Tan, S., Cheng, X., Ghanem, M.M., Wang, B., Xu, H.: A novel refinement approach for text categorization. In: Proceedings of the 14th ACM International Conference on Information and knowledge Management, pp. 469–476. ACM (2005)Google Scholar
- 14.Wang, Y., Guo, Q.: Multi-lda hybrid topic model with boosting strategy and its application in text classification. In: 2014 33rd Chinese Control Conference (CCC), pp. 4802–4806. IEEE (2014)Google Scholar
- 15.Xiong, W., Wan, Z., Bai, X., Xing, H., Zuo, H., Zhu, K., Yang, S.: Adaboost-based multi-attribute classification technology and its application. In: 76th EAGE Conference and Exhibition 2014 (2014)Google Scholar