KL-NF technique for sentiment classification

Abstract

This work proposes sentiment analysis for low-resource languages like Hindi using Neuro-Fuzzy Technique. Low-resource languages suffer from the scarcity of resources; consequently, we propose a method that can be implemented for any language. We use information theory for establishing a relation between terms that exists in a sentence. This work proposes a novel approach for calculating feature values using Kullback-Leibler (KL) divergence method. The feature values are employed to calculate the membership values associated with the Fuzzy logic in Neuro-Fuzzy Technique. The novelty of this method lies in its predictive nature that can mitigate the impact generated from un-labeled, unknown data or multi-domain data. We have seen the results for multi-domain data in our experiments. We evaluate our results using Accuracy, Precision, Recall and F1-Score. Our experiments show the efficacy of the proposed approach. It achieved 93.01% accuracy for English dataset and 91.18% accuracy for Hindi dataset which is more than the other state-of-art techniques like Naïve Bayes and SVM. Additionally, we found that our approach provides satisfactory results with multi-domain data as both the datasets were of different domains.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    https://www.domo.com/learn/data-never-sleeps-7

  2. 2.

    https://www.omnicoreagency.com/facebook-statistics/

  3. 3.

    https://www.statista.com/statistics/255146/number-of-internet-users-in-india/

  4. 4.

    https://en.wikipedia.org/wiki/Information_bottleneck_method#Learning_theory

References

  1. 1.

    Abdel-aleem A, El-sharief MA, Hassan MA, El-sebaie MG (2017) Implementation of Fuzzy and adaptive neuro-fuzzy inference systems in optimization of production inventory problem. Appl Math Inf Sci 11(1):289–298

    Article  Google Scholar 

  2. 2.

    Akhtar S, Ekbal A, Bhattacharyya P (2014) Aspect based sentiment Analysis in Hindi : resource creation and evaluation. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pp 2703–2709

  3. 3.

    Akhtar S, Kumar A, Ekbal A, Bhattacharyya P (2016) A hybrid deep learning architecture for sentiment analysis. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 482–493

  4. 4.

    Ali F, Kwak K, Kim Y (2016) Opinion mining based on fuzzy domain ontology and support vector machine : A proposal to automate online review classification. Appl Soft Comput 47:235–250

    Article  Google Scholar 

  5. 5.

    Alm CO, Roth D, Sproat R (2005) Emotions from text: Machine learning for text-based emotion prediction. Proceedings of human language technology conference and conference on empirical methods in natural language processing, pp 579–586

    Google Scholar 

  6. 6.

    Arora P, Bakliwal A, Varma V (2012) Hindi subjective lexicon generation using wordnet graph traversal. International Journal of Computational Linguistics and Applications 3(1):25–39

    Article  Google Scholar 

  7. 7.

    Bakliwal A, Arora P, Varma V (2012) Hindi subjective lexicon: A lexical resource for hindi polarity classification. Eighth Int Conf Lang Resour Eval:1189–1196

  8. 8.

    Balamurali AR, Joshi A, Bhattacharyya P (2011) Robust sense-based sentiment classification. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), pp 132–138

    Google Scholar 

  9. 9.

    Baziotis C, Pelekis N, Doulkeridis C (2017) DataStories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 747–754

    Google Scholar 

  10. 10.

    Benedetto F, Tedeschi A (2016) Big data sentiment analysis for brand monitoring in social media streams by cloud computing. In: Sentiment Analysis and Ontology Engineering (Springer), pp. 341–377

    Google Scholar 

  11. 11.

    Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: Domain Adaption for sentiment classification. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 440–447

    Google Scholar 

  12. 12.

    Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of hindi-english code-mixed social media text for hate speech detection. Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pp 36–41

  13. 13.

    Carvalho J, Plastino A (2020) On the evaluation and combination of state-of-the-art features in Twitter sentiment analysis. Artif Intell Rev 2020:1–50

    Google Scholar 

  14. 14.

    Ceron A, Curini L, Maria S (2016) iSA : A fast, scalable and accurate algorithm for sentiment analysis of social media content. Inf Sci 367–368:105–124. https://doi.org/10.1016/j.ins.2016.05.052

    Article  Google Scholar 

  15. 15.

    Cerra D, Datcu M (2011) Algorithmic relative complexity. Entropy 13(4):902–914. https://doi.org/10.3390/e13040902

  16. 16.

    Cheng X, Chen Y, Cheng B, Li S, Zhou G (2017) An Emotion Cause Corpus for Chinese Microblogs with Multiple-User Structures. ACM Trans Asian Low-Resource Lang Inf Process 17(1):1–19

    Article  Google Scholar 

  17. 17.

    Cliche M (2017) BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs, CoRR abs/1704.0 (2017). arXiv preprint arXiv:1704.06125

  18. 18.

    Dahiya A, Battan N, Shrivastava M, Sharma DM (2019) Curriculum learning strategies for hindi-english codemixed sentiment analysis in arXiv preprint:1906.07382

  19. 19.

    Das A, Bandyopadhyay S (2009) Subjectivity detection in English and Bengali: A CRF-based approach. Proceeding ICON 2009

  20. 20.

    Das A, Bandyopadhyay S (2010) SentiWordNet for Indian Languages. In: Proceedings of the eighth workshop on Asian language resouces, pp 56–63

  21. 21.

    Das S, Das A (2016) Fusion with sentiment scores for market research. In: 19th International Conference on Information Fusion (FUSION), pp 1003–1010

    Google Scholar 

  22. 22.

    Dave K, Lawrence S, Pennock D (2003) Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceeding of 12th Intl. Conference on the WWW, pp 519–528

    Google Scholar 

  23. 23.

    Zhang FHC, Zuo W, Peng T (2008) Sentiment classification for Chinese reviews using machine learning methods based on string kernel. Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology (IEEE) 02:909–914

    Article  Google Scholar 

  24. 24.

    Garain A, Mahata SK, Das D (2020) JUNLP at SemEval-2020 task 9: Sentiment analysis of Hindi-English code mixed data using Grid Search Cross Validation. arXiv Pre-Print: 2007.12561

  25. 25.

    Garg K (2020) Sentiment analysis of Indian PM’s ‘Mann Ki Baat. Int J Inf Technol, Springer 12(1):37–48

  26. 26.

    Garg K, Lobiyal DK (2018) Multi-class classification of sentiments in Hindi sentences based on intensities. In: Chakraverty S, Goel A, Misra S (eds) Towards extensible and adaptable methods in computing. Springer, Singapore, pp 251–266

  27. 27.

    Garg K, Lobiyal DK (2020) Hindi EmotionNet : A scalable emotion lexicon for sentiment classification of Hindi text. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19(4):1–35

    Article  Google Scholar 

  28. 28.

    Gaski JF, Etzel MJ (1986) The index of consumer sentiment toward marketing. J Mark 50(3):71–81

    Article  Google Scholar 

  29. 29.

    Giatsoglou M, Vozalis MG, Diamantaras K, Vakali A, Sarigiannidis G, Ch K (2017) Sentiment analysis leveraging emotions and word embeddings. Expert Syst Appl 69:214–224

    Article  Google Scholar 

  30. 30.

    Hong S, Nadler D (2012) Which candidates do the public discuss online in an election campaign? The use of social media by 2012 presidential candidates and its impact. Gov Inf Q 29(4):455–461

    Article  Google Scholar 

  31. 31.

    Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 168–177

  32. 32.

    Jain A, Jain S, Shukla P, Bandiya H (2012) Towards automatic detection of sentiments in customer reviews. International Journal of Information Sciences and Techniques (IJIST) 2(4):103–111

    Article  Google Scholar 

  33. 33.

    Jang JSR (1993) ANFIS: Adaptive network based fuzzy inference system. IEEE Trans Syst Man Cybern 23(3):665–685

    Article  Google Scholar 

  34. 34.

    Jang JSR, Sun CT, Mizutani E (1997) Neuro-Fuzzy And Soft computing: A computational approach to learning and machine intelligence. IEEE Trans Autom Control 42(10):1482–1484

    Article  Google Scholar 

  35. 35.

    Jha V, Savitha R, Shenoy PD, Venugopal KR (2018) A novel sentiment aware dictionary for multi-domain sentiment classification. Comput Electr Eng 69:585–597

    Article  Google Scholar 

  36. 36.

    Joshi A, Balamurali AR, Bhattacharyya P (2010) A fall-back strategy for sentiment analysis in Hindi : a case study. Proceedings of 8th International Conference on Natural Language Processing (ICON-2010)

  37. 37.

    Joshi A, Balamurali AR, Bhattacharyya P, Mohanty R (2011) C-Feel-It: a sentiment analyzer for micro-blogs. Proceedings of the ACL-HLT 2011 System Demonstrations, pp 127–132

  38. 38.

    Kullback S (1987) Letter to the Editor: The Kullback-Leibler distance. Am Stat 41(4):340–341

    Google Scholar 

  39. 39.

    Solomon K, Leibler Richard A (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    MathSciNet  Article  Google Scholar 

  40. 40.

    Kumar JV, Kumar S, Fernandes SL (2017) Extraction of emotions from multilingual text using intelligent text processing and computational linguistics. J Comput Sci 21:316–326

    Article  Google Scholar 

  41. 41.

    Kummer Ol, Savoy J (2012) Feature weighting strategies in sentiment analysis. SDAD 2012: The First International Workshop on Sentiment Discovery from Affective Data, pp 48–55

  42. 42.

    Lei J, Rao Y, Li Q, Quan X, Wenyin L (2014) Towards building a social emotion detection system for online news. Fut Gen Comput Syst 37:438–448

    Article  Google Scholar 

  43. 43.

    Leitch D, Sherif M (2017) Twitter mood , CEO succession announcements and stock returns. J Comput Sci 21:1–10

    Article  Google Scholar 

  44. 44.

    Liu B (2010) Sentiment analysis and subjectivity. Handbook of Natural Language Processing 2:627–666

  45. 45.

    Luyckx K, Vaassen F, Peersman C, Daelemans W (2012) Fine-grained emotion detection in suicide notes: a thresholding approach to multi-label classification. Biomedical Informatics Insights 5. https://doi.org/10.4137/BII.S8966

  46. 46.

    Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia conference on knowledge discovery and data mining (Springer): Advances in Knowledge Discovery and Data Mining, pp 301–311

  47. 47.

    McCart JA, Finch DK, Jarman J, Hickling E, Lind JD, Richardson MR, Berndt DJ, Luther SL (2012) Using ensemble models to classify the sentiment expressed in suicide notes. Biomed Inform Insights 5. https://doi.org/10.4137/BII.S8931

  48. 48.

    Mohammad SM, Bravo-Marquez F (2017) WASSA-2017 shared task on emotion intensity. In: Proceedings of the EMNLP 2017 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA)

  49. 49.

    Mukherjee S, Bhattacharyya P (2012) Feature specific sentiment analysis for product reviews. International Conference on Intelligent Text Processing and Computational Linguistics, pp 475–487

  50. 50.

    Rekha V, Raksha R, Patil P, Swaras N, Rajat GL (2019) Sentiment analysis on Indian Government Schemes using Twitter data. In: 2019 International Conference on Data Science and Communication (IconDSC). IEEE, pp 1–5

  51. 51.

    Narr S, Hulfenhaus M, Albayrak S (2012) Language-independent twitter sentiment analysis. Knowledge discovery and Machine Learning (KDML), pp 12–14

  52. 52.

    Nauck D, Kruse R (1993) A fuzzy neural network learning fuzzy control rules and membership functions by fuzzy error backpropogation. IEEE International Conference on Neural Networks:1022–1027

  53. 53.

    Pak A, Paroubek P (2010). Twitter as a corpus for sentiment analysis and opinion mining. Language Resources and Evaluation(LREC) 10:1320–1326

  54. 54.

    Pang B, L. Lee L, Vaithyanathan S, Pang SVB, Lee L (2002) Thumbs up? Sentiment classification using machine learning techniques. To appear in EMNLP -2002. arXiv:cs/0205070v1 [cs.CL]

  55. 55.

    Patra BG, Das D, Das A (2018) Sentiment analysis of code-mixed Indian languages: an overview of SAIL_Code-Mixed Shared Task@ ICON-2017. arXiv preprint arXiv 1803:06745

    Google Scholar 

  56. 56.

    Paul SK, Azeem A, Ghosh AK (2015) Application of adaptive neuro-fuzzy inference system and artificial neural network in inventory level forecasting. International Journal of Business Information Systems (IJBIS) 18(3):268–284

    Article  Google Scholar 

  57. 57.

    Pundlik S, Dasare P, Kasbekar P, Gawade A, Gaikwad G, Pundlik P (2016) Multiclass classification and class based sentiment analysis for Hindi language. 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, pp 512–518

  58. 58.

    Raichelgauz I,Odinaev K, Zeevi YY (2015) System and method for brand monitoring and trend analysis based on deep-content-classification. U.S. Patent 9,218,606, issued December 22, 2015

  59. 59.

    Raj S, Tanveer K (2015) Sentiment analysis of Swachh Bharat Abhiyan. International Journal of Business Analyics and Intelligence (IJBAI) 3(1):00–38

  60. 60.

    Ramrakhiyani N, Pawar S, Palshikar G (2015) Word2vec or JoBimText? A comparison for lexical expansion of Hindi words. Proceedings of the 7th Forum for Information Retrieval Evaluation, pp 39–42

  61. 61.

    Raychev V, Nakov P (2019) Language-independent sentiment analysis using subjectivity and positional information. arXiv preprint arXiv 1911:12544

    Google Scholar 

  62. 62.

    Rodrigues RG, das Dores RM, Camilo-Junior CG, Rosa TC (2016) SentiHealth-Cancer: a sentiment analysis tool to help detecting mood of patients in online social networks. Int J Med Inform 85(1):80–95

  63. 63.

    Rosenthal S, Farra N, Nakov P (2017) SemEval-2017 Task 4: sentiment analysis in Twitter. Proceedings of 11th International Workshop on Semantic Evaluation

  64. 64.

    Samir R, Mustafayev E, Clements MA (2013) Sentence-level subjectivity detection using neuro-fuzzy models. Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 108–114

  65. 65.

    Rustamov S, Mustafayev E, Clements MA (2013) Sentiment analysis using Neuro-Fuzzy and Hidden Markov models of text. In: 2013 Proceedings of IEEE Southeastcon. IEEE, pp 1–6

  66. 66.

    Moudy, Christopher, Todd Paterson, and Kevin Berns. Relativistic sentiment analyzer. U.S. Patent 9,336,268, issued May 10, 2016

  67. 67.

    Schneider K-M (2004) A new feature selection score for multinomial naive Bayes text classification based on KL-divergence. In: Proceedings of the ACL interactive poster and demonstration sessions, pp 186–189

    Google Scholar 

  68. 68.

    Shi W, Wang H, He S (2015) EOSentiMiner: an opinion-aware system based on emotion ontology for sentiment analysis of Chinese online reviews. J Exp Theor Artif Intell 27(4):423–448

    Article  Google Scholar 

  69. 69.

    Singh HR, Biswas SK (2018) Transparent Neuro-fuzzy model for Linguistic variables selection and rule-based classification. Int J Pure Appl Math 118(19):85–100

    Google Scholar 

  70. 70.

    Singh VK, Piryani R, Uddin A, Waila P (2013) Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification. In: 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s). IEEE, pp 712–717

  71. 71.

    Song Y, Kaiwen G, Li H, Sun G (2017) A lexical updating algorithm for sentiment analysis on Chinese movie reviews. In: 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD). IEEE, pp 188–193

  72. 72.

    Srivastava, Aditya, and V. Harsha Vardhan (2020) HCMS at SemEval-2020 Task 9: A neural approach to sentiment analysis for code-mixed texts. arXiv preprint arXiv:2007.12076 (2020)

  73. 73.

    Tetlock PC (2007) Giving content to investor sentiment: The role of media in the stock market. J Finance 62(3):1139–1168

    Article  Google Scholar 

  74. 74.

    Tian Y, Galery T, Dulcinati G, Molimpakis E, sentiment CSF (2017) Reactions and emojis. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp 11–16

  75. 75.

    Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW). IEEE, pp 1–5

  76. 76.

    Tishby N, Pereira FC, Bialek W (2000) The information bottleneck method. In: arXiv preprint physics/0004057

    Google Scholar 

  77. 77.

    Tripathy, A, Agrawal A, Kumar Rath S (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126

  78. 78.

    Uhl MW (2011) Explaining US consumer behavior with news sentiment. ACM Transactions on Management Information Systems (TMIS) 2(2):1–18

    Article  Google Scholar 

  79. 79.

    Varma V (2012) Language independent sentence-level subjectivity analysis with feature selection. 26th Pacific Asia Conf Lang Comput:171–180

  80. 80.

    Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM international conference on Information and knowledge management, pp 625–631

  81. 81.

    Wilson T (2005) Recognizing contextual polarity in phrase-level sentiment analysis in HLT-EMNLP, pp 347–354

  82. 82.

    Yadav M, Bhojane V (2019) Semi-supervised mix-Hindi sentiment analysis using neural network. 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, pp 309–314

  83. 83.

    Zhang Y, Hu X, Li P, Li L, Wu X (2015) Cross-domain sentiment classification-feature divergence, polarity divergence or both? Pattern Recognit Lett 65:44–50

    Article  Google Scholar 

  84. 84.

    Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst Appl 42(4):1857–1863

    Article  Google Scholar 

  85. 85.

    Zubiaga A, Vicente IS, Gamallo P, Pichel JR, Alegria I, Aranberri N, Ezeiza A, Fresno V (2016) Tweetlid: a benchmark for tweet language identification. Lang Resour Eval 50(4):729–766

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kanika Garg.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Garg, K., Lobiyal, D.K. KL-NF technique for sentiment classification. Multimed Tools Appl (2021). https://doi.org/10.1007/s11042-021-10559-y

Download citation

Keywords

  • Sentiment analysis
  • Neuro-fuzzy
  • KL divergence
  • Hindi
  • Low-resource language