Sentiment Classification from Multi-class Imbalanced Twitter Data Using Binarization
Twitter became one of the most dynamically developing areas of social media. Due to concise nature of messages, rapid publication and high outreach, people share more and more of their opinions, thoughts and commentaries using this medium. Sentiment analysis is a specific subsection of natural language processing that concentrates on automatically categorizing opinions and attitudes expressed in a given portion of textual information. This requires dedicated machine learning solutions that are able to handle various difficulties embedded in the nature of data. In this paper, we present an efficient framework for automatic sentiment analysis from high-dimensional and sparse datasets that suffer from multi-class imbalance. We propose to approach it by applying a one-vs-one binary decomposition and reducing the dimensionality of each pairwise class set using Multiple Correspondence Analysis. Then we apply preprocessing to alleviate the skewed distributions in reduced number of dimensions. After that, on each pair of classes we train a binary classifier and combined them using a weighted multi-class reconstruction that promotes minority classes. The proposal is evaluated on a large Twitter dataset and obtained results are in favor of the proposed solution.
KeywordsMachine learning Text mining Sentiment analysis Imbalanced learning Multi-class imbalance
- 3.Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)Google Scholar
- 6.Hoens, T.R., Qian, Q., Chawla, N.V., Zhou, Z.-H.: Building decision trees for the multi-class imbalance problem. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012. LNCS (LNAI), vol. 7301, pp. 122–134. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30217-6_11 CrossRefGoogle Scholar
- 7.Nakov, P., Ritter, A., Rosenthal, S., Stoyanov, V., Sebastiani, F.: SemEval-2016 task 4: sentiment analysis in Twitter. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2016. Association for Computational Linguistics, San Diego, California, June 2016Google Scholar
- 8.Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retrieval 2(1–2), 1–135 (2008)Google Scholar