Abstract
The rapid evolution of microblogging and the emergence of sites such as Twitter have propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information globally. The exponential growth of product-based user reviews has become an ever-increasing resource playing a key role in emerging Twitter-based sentiment analysis (SA) techniques and applications to collect and analyse customer trends and reviews. Existing studies on supervised black-box sentiment analysis systems do not provide adequate information, regarding rules as to why a certain review was classified to a class or classification. The accuracy in some ways is less than our personal judgement. To address these shortcomings, alternative approaches, such as supervised white-box classification algorithms, need to be developed to improve the classification of Twitter-based microblogs. The purpose of this study was to develop a supervised white-box microblogging SA system to analyse user reviews on certain products using rough set theory (RST)-based rule induction algorithms. RST classifies microblogging reviews of products into positive, negative, or neutral class using different rules extracted from training decision tables using RST-centric rule induction algorithms. The primary focus of this study is also to perform sentiment classification of microblogs (i.e. also known as tweets) of product reviews using conventional, and RST-based rule induction algorithms. The proposed RST-centric rule induction algorithm, namely Learning from Examples Module version: 2, and LEM2 \(+\) Corpus-based rules (LEM2 \(+\) CBR),which is an extension of the traditional LEM2 algorithm, are used. Corpus-based rules are generated from tweets, which are unclassified using other conventional LEM2 algorithm rules. Experimental results show the proposed method, when compared with baseline methods, is excellent, with regard to accuracy, coverage and the number of rules employed. The approach using this method achieves an average accuracy of 92.57% and an average coverage of 100%, with an average number of rules of 19.14.
Similar content being viewed by others
References
Chung, W.; Tseng, T.-L.B.: Discovering business intelligence from online product reviews: a rule-induction framework. Expert. Syst. Appl. 39(15), 11870–11879 (2012)
Chan, C.-C.; Liszka, K.J.: Application of rough set theory to sentiment analysis of microblog data. In: Skowron, A., Suraj, Z. (eds.) Rough Sets and Intelligent Systems-Professor Zdzisław Pawlak in Memoriam. Intelligent systems reference library, vol 43. Springer, Berlin (2013)
Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski J. Rough set algorithms in classification problem. In: Polkowski, L., Tsumoto, S., Lin T.Y. (eds.) Rough set methods and applications. Studies in Fuzziness and soft computing, vol 56. Physica, Heidelberg (2000)
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundam. Inform. 31(1), 27–39 (1997)
Pawlak, Z.: Rough sets. IJCIS 11(5), 341–356 (1982)
Wang, X.; Gotoh, O.: Accurate molecular classification of cancer using simple rules. BMC Med. Genom. 2(1), 1 (2009)
Califf, M.E.; Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. JMLR 4, 177–210 (2003)
Choi, Y.; et al.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 355–362. Association for Computational Linguistics (2005)
Feldman, R.; Rosenfeld, B.; Fresko, M.: TEG—a hybrid approach to information extraction. Knowl. Inf. Syst. 9(1), 1–18 (2006)
Go, A.; Bhayani, R.; Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford vol. 1, p. 12 (2009)
Rui, H.; Liu, Y.; Whinston, A.: Whose and what chatter matters? The effect of tweets on movie sales. DSSs 55, 863–870 (2013)
Asghar, M.Z.; Ahmad, S.; Qasim, M.; Zahra, S.R.; Kundi, F.M.: SentiHealth: Creating Health-Related Sentiment Lexicon Using Hybrid Approach. Springer, Berlin (2016)
Khan, F.H.; Bashir, S.; Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. DSSs 57, 245–257 (2014)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Jiang, F.; Sui, Y.; Cao, C.: Some issues about outlier detection in rough set theory. Expert. Syst. Appl. 36(3), 4680–4687 (2009)
Liang, W.-Y.: Apply rough set theory into the web services composition. In: 22nd International Conference on Advanced Information Networking and Applications, 2008. AINA 2008, pp. 888–895. IEEE (2008)
Tay, F.E.H.; Shen, L.: Economic and financial prediction using rough sets model. EJOR 141(3), 641–659 (2002)
Goh, C.; Law, R.: Incorporating the rough sets theory into travel demand analysis. Tour. Manag. 24(5), 511–517 (2003)
Asghar, M.Z.; Khan, A.; Ahmad, S.; Qasim, M.; Khan, : Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2), e0171649 (2017). doi:10.1371/journal.pone.0171649
Barbosa, L.; Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics (2010)
Chikersal, P.S.; Cambria, E.: SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval 2015), pp. 647–651 (2015)
Asghar, M.Z.; Khan, A.; Ahmad, S.; Khan, I.A.; Kundi, F.M.: A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS ONE 10(10), e0140204 (2015). doi:10.1371/journal.pone.0140204
Prusa, J.D.; Khoshgoftaar, T.M.; Dittman, D.J.: Impact of feature selection techniques for tweet sentiment classification. In: The Twenty-Eighth International Flairs Conference (2015)
Gunther, T.: Sentiment analysis of microblogs. Master thesis, University of Gothenburg, pp. 66–67 (2013)
Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903
Nagy, A.; Valley C.M.S.; Stamberger, J.: Crowd sentiment detection during disasters and crises. In: Proceedings of the 9th International ISCRAM Conference, pp. 1–9 (2012)
Kundi, F.M.; et al.: Detection and scoring of internet slangs for sentiment analysis using SentiWordNet. Life Sci. J. 11(9), 66–72 (2014)
Esuli, A.; Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6 (2006)
Miller, G.; et al.: Introduction to WordNet: an on-line lexical database*. IJL 3(4), 235–244 (1990)
Li, C.; et al.: Phylogenetic analysis of DNA sequences based on k-word and rough set theory. Physica A 398, 162–171 (2014)
Ma, S.; Huifen, L.; Yuan, Y.: Intrusion detection based on rough-set attribute reduction. In: Proceedings of the International Conference on Information Engineering and Applications (IEA) 2012. Springer, London (2013)
Wakabi-Waiswa, P.P.; Baryamureeba, V.: Extraction of interesting association rules using genetic algorithms. IJCIR 2(1), 26–33 (2008)
Błaszczyński, J.; Słowiński, R.; Szela̧g, M.: Sequential covering rule induction algorithm for variable consistency rough set approaches. Inf. Sci. 181(5), 987–1002 (2011)
Skowron, A.; et al.: RSES 2.2 user’s guide. Institute of Mathematics, Warsaw University, Warsaw, RBGN, vol. 17, no. 57, p. 1228 (2015)
Stefanowski, J.: On rough set based approaches to induction of decision rules. Rough Sets Knowl. Discov. 1(1), 500–529 (1998)
Acknowledgements
We are grateful to Dr. Shakeel Ahmad, Institute of Computing, Gomal University, for facilitating us by providing a licensed software and manuals during the execution of this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Muhammad Zubair Asghar, Aurangzeb Khan, Furqan Khan and Fazal Masud Kundi, declare that they have no conflict of interest.
Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This study did not involve any experimental research on humans or animals; hence, an approval from an ethics committee was not applicable in this regard. The data collected from the online forums are publicly available data, and no personally identifiable information of the forum users was collected or used for this study.
Rights and permissions
About this article
Cite this article
Asghar, M.Z., Khan, A., Khan, F. et al. RIFT: A Rule Induction Framework for Twitter Sentiment Analysis. Arab J Sci Eng 43, 857–877 (2018). https://doi.org/10.1007/s13369-017-2770-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-017-2770-1