Feature Subsumption for Sentiment Classification in Multiple Languages

* Final gross prices may vary according to local VAT.

Get Access

Abstract

An open problem in machine learning-based sentiment classification is how to extract complex features that outperform simple features; figuring out which types of features are most valuable is another. Most of the studies focus primarily on character or word Ngrams features, but substring-group features have never been considered in sentiment classification area before. In this study, the substring-group features are extracted and selected for sentiment classification by means of transductive learning-based algorithm. To demonstrate generality, experiments have been conducted on three open datasets in three different languages: Chinese, English and Spanish. The experimental results show that the proposed algorithm’s performance is usually superior to the best performance in related work, and the proposed feature subsumption algorithm for sentiment classification is multilingual. Compared to the inductive learning-based algorithm, the experimental results also illustrate that the transductive learning-based algorithm can significantly improve the performance of sentiment classification. As for term weighting, the experiments show that the “tfidf-c” outperforms all other term weighting approaches in the proposed algorithm.