Skip to main content
Log in

RIFT: A Rule Induction Framework for Twitter Sentiment Analysis

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

The rapid evolution of microblogging and the emergence of sites such as Twitter have propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information globally. The exponential growth of product-based user reviews has become an ever-increasing resource playing a key role in emerging Twitter-based sentiment analysis (SA) techniques and applications to collect and analyse customer trends and reviews. Existing studies on supervised black-box sentiment analysis systems do not provide adequate information, regarding rules as to why a certain review was classified to a class or classification. The accuracy in some ways is less than our personal judgement. To address these shortcomings, alternative approaches, such as supervised white-box classification algorithms, need to be developed to improve the classification of Twitter-based microblogs. The purpose of this study was to develop a supervised white-box microblogging SA system to analyse user reviews on certain products using rough set theory (RST)-based rule induction algorithms. RST classifies microblogging reviews of products into positive, negative, or neutral class using different rules extracted from training decision tables using RST-centric rule induction algorithms. The primary focus of this study is also to perform sentiment classification of microblogs (i.e. also known as tweets) of product reviews using conventional, and RST-based rule induction algorithms. The proposed RST-centric rule induction algorithm, namely Learning from Examples Module version: 2, and LEM2 \(+\) Corpus-based rules (LEM2 \(+\) CBR),which is an extension of the traditional LEM2 algorithm, are used. Corpus-based rules are generated from tweets, which are unclassified using other conventional LEM2 algorithm rules. Experimental results show the proposed method, when compared with baseline methods, is excellent, with regard to accuracy, coverage and the number of rules employed. The approach using this method achieves an average accuracy of 92.57% and an average coverage of 100%, with an average number of rules of 19.14.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chung, W.; Tseng, T.-L.B.: Discovering business intelligence from online product reviews: a rule-induction framework. Expert. Syst. Appl. 39(15), 11870–11879 (2012)

    Article  Google Scholar 

  2. Chan, C.-C.; Liszka, K.J.: Application of rough set theory to sentiment analysis of microblog data. In: Skowron, A., Suraj, Z. (eds.) Rough Sets and Intelligent Systems-Professor Zdzisław Pawlak in Memoriam. Intelligent systems reference library, vol 43. Springer, Berlin (2013)

  3. Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski J. Rough set algorithms in classification problem. In: Polkowski, L., Tsumoto, S., Lin T.Y. (eds.) Rough set methods and applications. Studies in Fuzziness and soft computing, vol 56. Physica, Heidelberg (2000)

  4. Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundam. Inform. 31(1), 27–39 (1997)

    MathSciNet  MATH  Google Scholar 

  5. Pawlak, Z.: Rough sets. IJCIS 11(5), 341–356 (1982)

    MATH  Google Scholar 

  6. Wang, X.; Gotoh, O.: Accurate molecular classification of cancer using simple rules. BMC Med. Genom. 2(1), 1 (2009)

    Article  Google Scholar 

  7. Califf, M.E.; Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. JMLR 4, 177–210 (2003)

    MathSciNet  MATH  Google Scholar 

  8. Choi, Y.; et al.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 355–362. Association for Computational Linguistics (2005)

  9. Feldman, R.; Rosenfeld, B.; Fresko, M.: TEG—a hybrid approach to information extraction. Knowl. Inf. Syst. 9(1), 1–18 (2006)

    Article  Google Scholar 

  10. Go, A.; Bhayani, R.; Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford vol. 1, p. 12 (2009)

  11. Rui, H.; Liu, Y.; Whinston, A.: Whose and what chatter matters? The effect of tweets on movie sales. DSSs 55, 863–870 (2013)

    Google Scholar 

  12. Asghar, M.Z.; Ahmad, S.; Qasim, M.; Zahra, S.R.; Kundi, F.M.: SentiHealth: Creating Health-Related Sentiment Lexicon Using Hybrid Approach. Springer, Berlin (2016)

    Google Scholar 

  13. Khan, F.H.; Bashir, S.; Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. DSSs 57, 245–257 (2014)

    Google Scholar 

  14. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)

  15. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  16. Jiang, F.; Sui, Y.; Cao, C.: Some issues about outlier detection in rough set theory. Expert. Syst. Appl. 36(3), 4680–4687 (2009)

    Article  Google Scholar 

  17. Liang, W.-Y.: Apply rough set theory into the web services composition. In: 22nd International Conference on Advanced Information Networking and Applications, 2008. AINA 2008, pp. 888–895. IEEE (2008)

  18. Tay, F.E.H.; Shen, L.: Economic and financial prediction using rough sets model. EJOR 141(3), 641–659 (2002)

    Article  MATH  Google Scholar 

  19. Goh, C.; Law, R.: Incorporating the rough sets theory into travel demand analysis. Tour. Manag. 24(5), 511–517 (2003)

    Article  Google Scholar 

  20. Asghar, M.Z.; Khan, A.; Ahmad, S.; Qasim, M.; Khan, : Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2), e0171649 (2017). doi:10.1371/journal.pone.0171649

    Article  Google Scholar 

  21. Barbosa, L.; Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics (2010)

  22. Chikersal, P.S.; Cambria, E.: SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval 2015), pp. 647–651 (2015)

  23. Asghar, M.Z.; Khan, A.; Ahmad, S.; Khan, I.A.; Kundi, F.M.: A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS ONE 10(10), e0140204 (2015). doi:10.1371/journal.pone.0140204

    Article  Google Scholar 

  24. Prusa, J.D.; Khoshgoftaar, T.M.; Dittman, D.J.: Impact of feature selection techniques for tweet sentiment classification. In: The Twenty-Eighth International Flairs Conference (2015)

  25. Gunther, T.: Sentiment analysis of microblogs. Master thesis, University of Gothenburg, pp. 66–67 (2013)

  26. Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903

  27. Nagy, A.; Valley C.M.S.; Stamberger, J.: Crowd sentiment detection during disasters and crises. In: Proceedings of the 9th International ISCRAM Conference, pp. 1–9 (2012)

  28. Kundi, F.M.; et al.: Detection and scoring of internet slangs for sentiment analysis using SentiWordNet. Life Sci. J. 11(9), 66–72 (2014)

    Google Scholar 

  29. Esuli, A.; Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6 (2006)

  30. Miller, G.; et al.: Introduction to WordNet: an on-line lexical database*. IJL 3(4), 235–244 (1990)

    Google Scholar 

  31. Li, C.; et al.: Phylogenetic analysis of DNA sequences based on k-word and rough set theory. Physica A 398, 162–171 (2014)

    Article  MathSciNet  Google Scholar 

  32. Ma, S.; Huifen, L.; Yuan, Y.: Intrusion detection based on rough-set attribute reduction. In: Proceedings of the International Conference on Information Engineering and Applications (IEA) 2012. Springer, London (2013)

  33. Wakabi-Waiswa, P.P.; Baryamureeba, V.: Extraction of interesting association rules using genetic algorithms. IJCIR 2(1), 26–33 (2008)

    Google Scholar 

  34. Błaszczyński, J.; Słowiński, R.; Szela̧g, M.: Sequential covering rule induction algorithm for variable consistency rough set approaches. Inf. Sci. 181(5), 987–1002 (2011)

    Article  MathSciNet  Google Scholar 

  35. Skowron, A.; et al.: RSES 2.2 user’s guide. Institute of Mathematics, Warsaw University, Warsaw, RBGN, vol. 17, no. 57, p. 1228 (2015)

  36. Stefanowski, J.: On rough set based approaches to induction of decision rules. Rough Sets Knowl. Discov. 1(1), 500–529 (1998)

    MATH  Google Scholar 

Download references

Acknowledgements

We are grateful to Dr. Shakeel Ahmad, Institute of Computing, Gomal University, for facilitating us by providing a licensed software and manuals during the execution of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Zubair Asghar.

Ethics declarations

Conflict of interest

Muhammad Zubair Asghar, Aurangzeb Khan, Furqan Khan and Fazal Masud Kundi, declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This study did not involve any experimental research on humans or animals; hence, an approval from an ethics committee was not applicable in this regard. The data collected from the online forums are publicly available data, and no personally identifiable information of the forum users was collected or used for this study.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asghar, M.Z., Khan, A., Khan, F. et al. RIFT: A Rule Induction Framework for Twitter Sentiment Analysis. Arab J Sci Eng 43, 857–877 (2018). https://doi.org/10.1007/s13369-017-2770-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-017-2770-1

Keywords

Navigation