Hidden Topic Models for Multi-label Review Classification: An Experimental Study

  • Thi-Ngan Pham
  • Thi-Thom Phan
  • Phuoc-Thao Nguyen
  • Quang-Thuy Ha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8083)

Abstract

In recent years, Multi-Label Classification (MLC) has become an important task in the field of Supervised Learning. The MLC tasks are omnipresent in real-world problems in which an instance could belong to different classes simultaneously. In this paper, we present a method for MLC using the hidden topic method to enrich data features and using mutual information for feature selection. Our experiments on classifying user reviews about one thousand Vietnamese hotels showed the efficiency of the proposed approach.

Keywords

feature selection hidden topic model/LDA multi-label classification (MLC) mutual information opinion mining/sentiment analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blei, D.M.: Probabilistic topic models. Commun. ACM (CACM) 55(4), 77–84 (2012)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research (JMLR) 3, 993–1022 (2003)MATHGoogle Scholar
  3. 3.
    Dash, M., Liu, H.: Feature Selection for Classification. Intell. Data Anal (IDA) 1(1-4), 131–156 (1997)CrossRefGoogle Scholar
  4. 4.
    Doquire, G., Verleysen, M.: Feature Selection for Multi-label Classification Problems. In: Cabestany, J., Rojas, I., Joya, G. (eds.) IWANN 2011, Part I. LNCS, vol. 6691, pp. 9–16. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Doquire, G., Verleysen, M.: A Comparison of Multivariate Mutual Information Estimators for Feature Selection. In: ICPRAM 2012, pp. 176–185 (2012)Google Scholar
  6. 6.
    Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: NIPS 2001, pp. 681–687 (2001)Google Scholar
  7. 7.
    Novovicová, J., Malík, A., Pudil, P.: Feature Selection Using Improved Mutual Information for Text Classification. In: SSPR/SPR 2004, pp. 1010–1017 (2004)Google Scholar
  8. 8.
    Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP 2009, pp. 248–256 (2009)Google Scholar
  9. 9.
    Rousu, J., Saunders, C., Szedmák, S., Shawe-Taylor, J.: Kernel-Based Learning of Hierarchical Multilabel Classification Models. Journal of Machine Learning Research 7, 1601–1626 (2006)MATHGoogle Scholar
  10. 10.
    Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. (DATAMINE) 22(1-2), 31–72 (2011)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Tsoumakas, G., Katakis, I.: Multi-Label Classification: An Overview. IJDWM (JDWM) 3(3), 1–13 (2007)Google Scholar
  12. 12.
    Tsoumakas, G., Katakis, I., Vlahavas, I.P.: Mining Multi-label Data. In: Data Mining and Knowledge Discovery Handbook 2010, pp. 667–685 (2010)Google Scholar
  13. 13.
    Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.P.: Multi-Label Classification of Music into Emotions. In: ISMIR 2008, pp. 325–330 (2008)Google Scholar
  14. 14.
    Tsoumakas, G., Zhang, M.-L., Zhou, Z.-H.: Introduction to the special issue on learning from multi-label data. Machine Learning 88(1-2), 1–4 (2012)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Gómez-Verdejo, V., Verleysen, M., Fleury, J.: Information-Theoretic Feature Selection for the Classification of Hysteresis Curves. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 522–529. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Phan, X.-H., Nguyen, C.-T., Le, D.-T., Nguyen, L.-M., Horiguchi, S., Ha, Q.-T.: Classification and Contextual Match on the Web with Hidden Topics from Large Data Collections. IEEE Transactions on Knowledge and Data Engineering 23(7), 961–976 (2011)CrossRefGoogle Scholar
  17. 17.
    Zhang, M.-L., Pena, J.M., Robles, V.: Feature selection for multi-label naive Bayes classification. Information Sciences 179, 3218–3229 (2009)CrossRefMATHGoogle Scholar
  18. 18.
    Zhang, M.-L., Zhou, Z.-H.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition (PR) 40(7), 2038–2048 (2007)CrossRefMATHGoogle Scholar
  19. 19.
    Zhang, Y., Zhou, Z.-H.: Multilabel dimensionality reduction via dependence maximization. TKDD 4(3) (2010)Google Scholar
  20. 20.
    Zhou, Z.-H., Zhang, M.-L., Huang, S.-J., Li, Y.-F.: Multi-instance multi-label learning. Artif. Intell. 176(1), 2291–2320 (2012)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Menendez, H., Bello-Orgaz, G., Camacho, D.: Features selection from high-dimensional web data using clustering analysis. In: WIMS 2012, 20 (9 pages) (2012)Google Scholar
  22. 22.
    Phan, X.-H., Nguyen, C.-T.: GibbsLDA++ (2007), http://gibbslda.sourceforge.net/

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Thi-Ngan Pham
    • 1
    • 2
  • Thi-Thom Phan
    • 1
  • Phuoc-Thao Nguyen
    • 1
  • Quang-Thuy Ha
    • 1
  1. 1.College of Technology (UET)Vietnam National University, Hanoi (VNU)HanoiVietnam
  2. 2.The Vietnamese People’s Police AcademyHanoiVietnam

Personalised recommendations