Chapter

Advances in Knowledge Discovery and Data Mining

Volume 6119 of the series Lecture Notes in Computer Science pp 261-271

Feature Subsumption for Sentiment Classification in Multiple Languages

  • Zhongwu ZhaiAffiliated withState Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University
  • , Hua XuAffiliated withState Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University
  • , Jun LiAffiliated withState Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University
  • , Peifa JiaAffiliated withState Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University

* Final gross prices may vary according to local VAT.

Get Access

Abstract

An open problem in machine learning-based sentiment classification is how to extract complex features that outperform simple features; figuring out which types of features are most valuable is another. Most of the studies focus primarily on character or word Ngrams features, but substring-group features have never been considered in sentiment classification area before. In this study, the substring-group features are extracted and selected for sentiment classification by means of transductive learning-based algorithm. To demonstrate generality, experiments have been conducted on three open datasets in three different languages: Chinese, English and Spanish. The experimental results show that the proposed algorithm’s performance is usually superior to the best performance in related work, and the proposed feature subsumption algorithm for sentiment classification is multilingual. Compared to the inductive learning-based algorithm, the experimental results also illustrate that the transductive learning-based algorithm can significantly improve the performance of sentiment classification. As for term weighting, the experiments show that the “tfidf-c” outperforms all other term weighting approaches in the proposed algorithm.

Keywords

Sentiment Transductive Substring-group Multilingual