RIFT: A Rule Induction Framework for Twitter Sentiment Analysis

  • Muhammad Zubair Asghar
  • Aurangzeb Khan
  • Furqan Khan
  • Fazal Masud Kundi
Research Article - Computer Engineering and Computer Science

Abstract

The rapid evolution of microblogging and the emergence of sites such as Twitter have propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information globally. The exponential growth of product-based user reviews has become an ever-increasing resource playing a key role in emerging Twitter-based sentiment analysis (SA) techniques and applications to collect and analyse customer trends and reviews. Existing studies on supervised black-box sentiment analysis systems do not provide adequate information, regarding rules as to why a certain review was classified to a class or classification. The accuracy in some ways is less than our personal judgement. To address these shortcomings, alternative approaches, such as supervised white-box classification algorithms, need to be developed to improve the classification of Twitter-based microblogs. The purpose of this study was to develop a supervised white-box microblogging SA system to analyse user reviews on certain products using rough set theory (RST)-based rule induction algorithms. RST classifies microblogging reviews of products into positive, negative, or neutral class using different rules extracted from training decision tables using RST-centric rule induction algorithms. The primary focus of this study is also to perform sentiment classification of microblogs (i.e. also known as tweets) of product reviews using conventional, and RST-based rule induction algorithms. The proposed RST-centric rule induction algorithm, namely Learning from Examples Module version: 2, and LEM2 \(+\) Corpus-based rules (LEM2 \(+\) CBR),which is an extension of the traditional LEM2 algorithm, are used. Corpus-based rules are generated from tweets, which are unclassified using other conventional LEM2 algorithm rules. Experimental results show the proposed method, when compared with baseline methods, is excellent, with regard to accuracy, coverage and the number of rules employed. The approach using this method achieves an average accuracy of 92.57% and an average coverage of 100%, with an average number of rules of 19.14.

Keywords

Twitter Sentiment analysis Rule induction Slang Emoticons Rough set theory LEM2 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

We are grateful to Dr. Shakeel Ahmad, Institute of Computing, Gomal University, for facilitating us by providing a licensed software and manuals during the execution of this project.

Compliance with Ethical Standards

Conflict of interest

Muhammad Zubair Asghar, Aurangzeb Khan, Furqan Khan and Fazal Masud Kundi, declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This study did not involve any experimental research on humans or animals; hence, an approval from an ethics committee was not applicable in this regard. The data collected from the online forums are publicly available data, and no personally identifiable information of the forum users was collected or used for this study.

References

  1. 1.
    Chung, W.; Tseng, T.-L.B.: Discovering business intelligence from online product reviews: a rule-induction framework. Expert. Syst. Appl. 39(15), 11870–11879 (2012)CrossRefGoogle Scholar
  2. 2.
    Chan, C.-C.; Liszka, K.J.: Application of rough set theory to sentiment analysis of microblog data. In: Skowron, A., Suraj, Z. (eds.) Rough Sets and Intelligent Systems-Professor Zdzisław Pawlak in Memoriam. Intelligent systems reference library, vol 43. Springer, Berlin (2013)Google Scholar
  3. 3.
    Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski J. Rough set algorithms in classification problem. In: Polkowski, L., Tsumoto, S., Lin T.Y. (eds.) Rough set methods and applications. Studies in Fuzziness and soft computing, vol 56. Physica, Heidelberg (2000)Google Scholar
  4. 4.
    Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundam. Inform. 31(1), 27–39 (1997)MathSciNetMATHGoogle Scholar
  5. 5.
    Pawlak, Z.: Rough sets. IJCIS 11(5), 341–356 (1982)MATHGoogle Scholar
  6. 6.
    Wang, X.; Gotoh, O.: Accurate molecular classification of cancer using simple rules. BMC Med. Genom. 2(1), 1 (2009)CrossRefGoogle Scholar
  7. 7.
    Califf, M.E.; Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. JMLR 4, 177–210 (2003)MathSciNetMATHGoogle Scholar
  8. 8.
    Choi, Y.; et al.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 355–362. Association for Computational Linguistics (2005)Google Scholar
  9. 9.
    Feldman, R.; Rosenfeld, B.; Fresko, M.: TEG—a hybrid approach to information extraction. Knowl. Inf. Syst. 9(1), 1–18 (2006)CrossRefGoogle Scholar
  10. 10.
    Go, A.; Bhayani, R.; Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford vol. 1, p. 12 (2009)Google Scholar
  11. 11.
    Rui, H.; Liu, Y.; Whinston, A.: Whose and what chatter matters? The effect of tweets on movie sales. DSSs 55, 863–870 (2013)Google Scholar
  12. 12.
    Asghar, M.Z.; Ahmad, S.; Qasim, M.; Zahra, S.R.; Kundi, F.M.: SentiHealth: Creating Health-Related Sentiment Lexicon Using Hybrid Approach. Springer, Berlin (2016)Google Scholar
  13. 13.
    Khan, F.H.; Bashir, S.; Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. DSSs 57, 245–257 (2014)Google Scholar
  14. 14.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
  15. 15.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  16. 16.
    Jiang, F.; Sui, Y.; Cao, C.: Some issues about outlier detection in rough set theory. Expert. Syst. Appl. 36(3), 4680–4687 (2009)CrossRefGoogle Scholar
  17. 17.
    Liang, W.-Y.: Apply rough set theory into the web services composition. In: 22nd International Conference on Advanced Information Networking and Applications, 2008. AINA 2008, pp. 888–895. IEEE (2008)Google Scholar
  18. 18.
    Tay, F.E.H.; Shen, L.: Economic and financial prediction using rough sets model. EJOR 141(3), 641–659 (2002)CrossRefMATHGoogle Scholar
  19. 19.
    Goh, C.; Law, R.: Incorporating the rough sets theory into travel demand analysis. Tour. Manag. 24(5), 511–517 (2003)CrossRefGoogle Scholar
  20. 20.
    Asghar, M.Z.; Khan, A.; Ahmad, S.; Qasim, M.; Khan, : Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2), e0171649 (2017). doi:10.1371/journal.pone.0171649 CrossRefGoogle Scholar
  21. 21.
    Barbosa, L.; Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics (2010)Google Scholar
  22. 22.
    Chikersal, P.S.; Cambria, E.: SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval 2015), pp. 647–651 (2015)Google Scholar
  23. 23.
    Asghar, M.Z.; Khan, A.; Ahmad, S.; Khan, I.A.; Kundi, F.M.: A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS ONE 10(10), e0140204 (2015). doi:10.1371/journal.pone.0140204 CrossRefGoogle Scholar
  24. 24.
    Prusa, J.D.; Khoshgoftaar, T.M.; Dittman, D.J.: Impact of feature selection techniques for tweet sentiment classification. In: The Twenty-Eighth International Flairs Conference (2015)Google Scholar
  25. 25.
    Gunther, T.: Sentiment analysis of microblogs. Master thesis, University of Gothenburg, pp. 66–67 (2013)Google Scholar
  26. 26.
    Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903
  27. 27.
    Nagy, A.; Valley C.M.S.; Stamberger, J.: Crowd sentiment detection during disasters and crises. In: Proceedings of the 9th International ISCRAM Conference, pp. 1–9 (2012)Google Scholar
  28. 28.
    Kundi, F.M.; et al.: Detection and scoring of internet slangs for sentiment analysis using SentiWordNet. Life Sci. J. 11(9), 66–72 (2014)Google Scholar
  29. 29.
    Esuli, A.; Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6 (2006)Google Scholar
  30. 30.
    Miller, G.; et al.: Introduction to WordNet: an on-line lexical database*. IJL 3(4), 235–244 (1990)Google Scholar
  31. 31.
    Li, C.; et al.: Phylogenetic analysis of DNA sequences based on k-word and rough set theory. Physica A 398, 162–171 (2014)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Ma, S.; Huifen, L.; Yuan, Y.: Intrusion detection based on rough-set attribute reduction. In: Proceedings of the International Conference on Information Engineering and Applications (IEA) 2012. Springer, London (2013)Google Scholar
  33. 33.
    Wakabi-Waiswa, P.P.; Baryamureeba, V.: Extraction of interesting association rules using genetic algorithms. IJCIR 2(1), 26–33 (2008)Google Scholar
  34. 34.
    Błaszczyński, J.; Słowiński, R.; Szela̧g, M.: Sequential covering rule induction algorithm for variable consistency rough set approaches. Inf. Sci. 181(5), 987–1002 (2011)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Skowron, A.; et al.: RSES 2.2 user’s guide. Institute of Mathematics, Warsaw University, Warsaw, RBGN, vol. 17, no. 57, p. 1228 (2015)Google Scholar
  36. 36.
    Stefanowski, J.: On rough set based approaches to induction of decision rules. Rough Sets Knowl. Discov. 1(1), 500–529 (1998)MATHGoogle Scholar

Copyright information

© King Fahd University of Petroleum & Minerals 2017

Authors and Affiliations

  1. 1.Institute of Computing and Information TechnologyGomal UniversityDera Ismail KhanPakistan
  2. 2.Department of Computer ScienceUniversity of Science and TechnologyBannuPakistan

Personalised recommendations