RIFT: A Rule Induction Framework for Twitter Sentiment Analysis
The rapid evolution of microblogging and the emergence of sites such as Twitter have propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information globally. The exponential growth of product-based user reviews has become an ever-increasing resource playing a key role in emerging Twitter-based sentiment analysis (SA) techniques and applications to collect and analyse customer trends and reviews. Existing studies on supervised black-box sentiment analysis systems do not provide adequate information, regarding rules as to why a certain review was classified to a class or classification. The accuracy in some ways is less than our personal judgement. To address these shortcomings, alternative approaches, such as supervised white-box classification algorithms, need to be developed to improve the classification of Twitter-based microblogs. The purpose of this study was to develop a supervised white-box microblogging SA system to analyse user reviews on certain products using rough set theory (RST)-based rule induction algorithms. RST classifies microblogging reviews of products into positive, negative, or neutral class using different rules extracted from training decision tables using RST-centric rule induction algorithms. The primary focus of this study is also to perform sentiment classification of microblogs (i.e. also known as tweets) of product reviews using conventional, and RST-based rule induction algorithms. The proposed RST-centric rule induction algorithm, namely Learning from Examples Module version: 2, and LEM2 \(+\) Corpus-based rules (LEM2 \(+\) CBR),which is an extension of the traditional LEM2 algorithm, are used. Corpus-based rules are generated from tweets, which are unclassified using other conventional LEM2 algorithm rules. Experimental results show the proposed method, when compared with baseline methods, is excellent, with regard to accuracy, coverage and the number of rules employed. The approach using this method achieves an average accuracy of 92.57% and an average coverage of 100%, with an average number of rules of 19.14.
KeywordsTwitter Sentiment analysis Rule induction Slang Emoticons Rough set theory LEM2
Unable to display preview. Download preview PDF.
We are grateful to Dr. Shakeel Ahmad, Institute of Computing, Gomal University, for facilitating us by providing a licensed software and manuals during the execution of this project.
Compliance with Ethical Standards
Conflict of interest
Muhammad Zubair Asghar, Aurangzeb Khan, Furqan Khan and Fazal Masud Kundi, declare that they have no conflict of interest.
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This study did not involve any experimental research on humans or animals; hence, an approval from an ethics committee was not applicable in this regard. The data collected from the online forums are publicly available data, and no personally identifiable information of the forum users was collected or used for this study.
- 2.Chan, C.-C.; Liszka, K.J.: Application of rough set theory to sentiment analysis of microblog data. In: Skowron, A., Suraj, Z. (eds.) Rough Sets and Intelligent Systems-Professor Zdzisław Pawlak in Memoriam. Intelligent systems reference library, vol 43. Springer, Berlin (2013)Google Scholar
- 3.Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski J. Rough set algorithms in classification problem. In: Polkowski, L., Tsumoto, S., Lin T.Y. (eds.) Rough set methods and applications. Studies in Fuzziness and soft computing, vol 56. Physica, Heidelberg (2000)Google Scholar
- 8.Choi, Y.; et al.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 355–362. Association for Computational Linguistics (2005)Google Scholar
- 10.Go, A.; Bhayani, R.; Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford vol. 1, p. 12 (2009)Google Scholar
- 11.Rui, H.; Liu, Y.; Whinston, A.: Whose and what chatter matters? The effect of tweets on movie sales. DSSs 55, 863–870 (2013)Google Scholar
- 12.Asghar, M.Z.; Ahmad, S.; Qasim, M.; Zahra, S.R.; Kundi, F.M.: SentiHealth: Creating Health-Related Sentiment Lexicon Using Hybrid Approach. Springer, Berlin (2016)Google Scholar
- 13.Khan, F.H.; Bashir, S.; Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. DSSs 57, 245–257 (2014)Google Scholar
- 14.Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
- 15.Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
- 17.Liang, W.-Y.: Apply rough set theory into the web services composition. In: 22nd International Conference on Advanced Information Networking and Applications, 2008. AINA 2008, pp. 888–895. IEEE (2008)Google Scholar
- 21.Barbosa, L.; Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics (2010)Google Scholar
- 22.Chikersal, P.S.; Cambria, E.: SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval 2015), pp. 647–651 (2015)Google Scholar
- 24.Prusa, J.D.; Khoshgoftaar, T.M.; Dittman, D.J.: Impact of feature selection techniques for tweet sentiment classification. In: The Twenty-Eighth International Flairs Conference (2015)Google Scholar
- 25.Gunther, T.: Sentiment analysis of microblogs. Master thesis, University of Gothenburg, pp. 66–67 (2013)Google Scholar
- 26.Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903
- 27.Nagy, A.; Valley C.M.S.; Stamberger, J.: Crowd sentiment detection during disasters and crises. In: Proceedings of the 9th International ISCRAM Conference, pp. 1–9 (2012)Google Scholar
- 28.Kundi, F.M.; et al.: Detection and scoring of internet slangs for sentiment analysis using SentiWordNet. Life Sci. J. 11(9), 66–72 (2014)Google Scholar
- 29.Esuli, A.; Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6 (2006)Google Scholar
- 30.Miller, G.; et al.: Introduction to WordNet: an on-line lexical database*. IJL 3(4), 235–244 (1990)Google Scholar
- 32.Ma, S.; Huifen, L.; Yuan, Y.: Intrusion detection based on rough-set attribute reduction. In: Proceedings of the International Conference on Information Engineering and Applications (IEA) 2012. Springer, London (2013)Google Scholar
- 33.Wakabi-Waiswa, P.P.; Baryamureeba, V.: Extraction of interesting association rules using genetic algorithms. IJCIR 2(1), 26–33 (2008)Google Scholar
- 35.Skowron, A.; et al.: RSES 2.2 user’s guide. Institute of Mathematics, Warsaw University, Warsaw, RBGN, vol. 17, no. 57, p. 1228 (2015)Google Scholar