Skip to main content
Log in

A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

An immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc., are so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon-based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as information gain and cosine similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state-of-the-art techniques proves that the proposed approach is superior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.noslang.com/dictionary (Last Accessed: April 6, 2016).

  2. http://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html (Last Accessed: April 6, 2016).

  3. http://www.interopia.com/education/all-question-words-in-english/ (Last Accessed: April 6, 2016).

  4. http://nlp.stanford.edu/software/tagger.shtml (Last Accessed: April 6, 2016).

  5. http://download.joachims.org/svm_light/current/svm_light_windows64.zip (Last Accessed: April 7, 2016).

  6. http://sentiwordnet.isti.cnr.it/code/SentiWordNetDemoCode.java (Last Accessed: April 8, 2016).

References

  1. Khan FH, Qamar U, Bashir S (2016) SentiMI: introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Appl Soft Comput 39:140–153

    Article  Google Scholar 

  2. Balahur A (2013) Sentiment analysis in social media texts. In: 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 120–128

  3. Molina-González MD, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA (2015) A Spanish semantic orientation approach to domain adaptation for polarity classification. Inf Process Manag 51:520–531

    Article  Google Scholar 

  4. Khan FH, Bashir S, Qamar U (2014) TOM: twitter opinion mining framework using hybrid classification scheme. Decis Support Syst 57:245–257

    Article  Google Scholar 

  5. Khan FH, Qamar U, Bashir S (2015) Building normalized SentiMI to enhance semi-supervised sentiment analysis. J Intell Fuzzy Syst 29:1805–1816

    Article  Google Scholar 

  6. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113

    Article  Google Scholar 

  7. Triguero Isaac, García Salvador, Herrera Francisco (2013) Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl Inf Syst 42(2):245–284

    Article  Google Scholar 

  8. Fazakis N, Karlos S, Kotsiantis S, Sgarbas K (2016) Self-trained LMT for semisupervised learning. Comput Intell Neurosci 2016:1–13

    Article  Google Scholar 

  9. Didaci L, Fumera G, Roli F, Gimel’farb, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) (2012) Analysis of co-training algorithm with very small training sets. LNCS. Springer, Berlin Heidelberg. pp 719–726. ISBN 9783642341656

  10. Habernal I, Ptáček T, Steinberger J (2015) Reprint of ”Supervised sentiment analysis in Czech social media”. Inf Process Manag 51:532–546

    Article  Google Scholar 

  11. Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd joint WICOW/AIRWeb workshop on web quality, pp 35–40

  12. Singh PK, Husain MS (2014) Methodological study of opinion mining and sentiment analysis techniques. Int J Soft Comput 5(1):11

    Article  Google Scholar 

  13. Ortega R, Fonseca A, Montoyo A (2013) SSA-UO: unsupervised Twitter sentiment analysis. In: Second joint conference on lexical and computational semantics (*SEM), Vol. 2, pp 501–507

  14. Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: 9th. IT & T conference p 13

  15. Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Comput Sci 46:635–643

    Article  Google Scholar 

  16. Chikersal P, Poria S, Cambria E, Gelbukh A, Siong CE (2015) Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Computational linguistics and intelligent text processing, Springer International Publishing, pp 49–65

  17. Pandarachalil R, Sendhilkumar S, Mahalakshmi GS (2015) Twitter sentiment analysis for large-scale data: an unsupervised approach. In: Cognitive computation pp 1–9

  18. Ghosh M, Kar A (2013) Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int J Eng Res Technol 2(9) ESRSA Publications

  19. Fellbaum C (1998) WordNet: an electronic database. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  20. Strapparava C, Valitutti A (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), pp 1083–1086

  21. Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini C (2007) Micro-WNOp: a gold standard for the evaluation of automatically compiled lexical resources for opinion mining. In: Sanso A (ed) Language resources, linguistic theory. Franco Angeli, Milan, pp 200–210

    Google Scholar 

  22. Stone PJ, Hunt EB (1963) A computer approach to content analysis: studies using the general inquirer system. In: Proceedings of the spring joint computer conference (AFIPS 1963), pp 241–256

  23. de Albornoz JC, Plaza L, Gervas P (2012) Sentisense: an easily scalable concept based affective lexicon for sentiment analysis. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), pp 3562–3567

  24. Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs, CoRR abs/1103.2903

  25. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

    Article  Google Scholar 

  26. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 168–177

  27. Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP 2003), pp 105–112

  28. Cambria E, Havasi C, Hussain A (2012) Senticnet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: Proceedings of the 25th Florida artificial intelligence research society conference (FLAIRS 2012), pp 202–207

  29. Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150

  30. Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 115–124

  31. Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL vol 7, pp 440–447

  32. Khan FH, Qamar U, Bashir S (2016) Multi-objective model selection (MOMS)-based semi-supervised framework for sentiment analysis. Cognit Comput. doi:10.1007/s12559-016-9386-8

  33. Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: International conference on language resources and evaluation (LREC), vol 10, pp 2200–2204

  34. Mitchell T (1996) Machine learning. McCraw Hill, New YorK

    MATH  Google Scholar 

  35. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420

  36. Lewis DD, Ringuette M (1994) Comparison of two learning algorithms for text categorization. In: Proceedings of third annual symposium on document analysis and information retrieval

  37. Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3):491–504

    Article  Google Scholar 

  38. Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. In: Data mining workshops (ICDMW), 2012 IEEE 12th international conference on IEEE, pp 918–925

  39. Kim K, Chung BS, Choi Y, Lee S, Jung JY, Park J (2014) Language independent semantic kernels for short-text classification. Expert Syst Appl 41(2):735–743

    Article  Google Scholar 

  40. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  41. Verma S, Bhattacharyya P (2009) Incorporating semantic knowledge for sentiment analysis. In: 6th international conference on natural language processing India

  42. Kalaivani P, Shunmuganathan KL (2015) Feature reduction based on genetic algorithm and hybrid model for opinion mining. Sci Program. doi:10.1155/2015/961454

  43. Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152

    Article  Google Scholar 

  44. Varela PL, Martins AF, Aguiar PM, Figueiredo MA (2013) An empirical study of feature selection for sentiment analysis. In: 9th conference on telecommunications, Conftele, Castelo Branco

  45. Hung C, Lin HK (2013) Using objective words in SentiWordNet to improve word-of-mouth sentiment classification. IEEE Intell Syst 2:47–54

    Article  Google Scholar 

  46. Rice DR, Zorn C (2013) Corpus-based dictionaries for sentiment analysis of specialized vocabularies. In: Proceedings of NDATAD

  47. Demiroz G, Yanikoglu B, Tapucu D, Saygin Y (2012) Learning domain-specific polarity lexicons. In: Data mining workshops (ICDMW). In: 2012 IEEE 12th international conference on IEEE, pp 674–679

  48. Sharma A, Dey S (2012) Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. In: Special issue of international journal of computer applications (0975 – 8887) on advanced computing and communication technologies for HPC Applications – ACCTHPCA

  49. Mudinas A, Zhang D, Levene M (2012) Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the first international workshop on issues of sentiment discovery and opinion mining. ACM, p 5

  50. Hamouda A, Marei M, Rohaim M (2011) Building machine learning based senti-word lexicon for sentiment analysis. J Adv Inf Technol 2(4):199–203

    Google Scholar 

  51. Su F, Markert K (2008) From words to senses: a case study of subjectivity recognition. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 825–832

  52. Agarwal B, Mittal N, Bansal P, Garg S (2015) Sentiment analysis using common-sense and context information. Comput Intell Neurosci 9:715–730. doi:10.1155/2015/715730

    Google Scholar 

  53. Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decision Support Syst 57:77–93

    Article  Google Scholar 

  54. Dhande LL, Patnaik GK (2014) Analyzing sentiment of movie review data using naive bayes neural classifier. Int J Emerg Trends Technol Comput Sci (IJETTCS)

  55. Zhou S, Chen Q, Wang X, Li X (2014) Hybrid deep belief networks for semi-supervised sentiment classification. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics. Technical Papers, pp 1341–1349

  56. Liu B, Blasch E, Chen Y, Shen D, Chen G (2013) Scalable sentiment classification for big data analysis using naive bayes classifier. In: Big data, 2013 IEEE international conference on IEEE, pp 99–104

  57. Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing. pp 151–161

  58. He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process Manag 47(4):606–616

    Article  MathSciNet  Google Scholar 

  59. Lin C, He Y, Everson Y (2010) A comparative study of Bayesian models for unsupervised sentiment. In: Proceedings of the fourteenth conference on computational natural language learning. Uppsala, Sweden, pp 144–152

  60. Park S, Lee W, Moon IC (2015) Efficient extraction of domain specific sentiment lexicon with active learning. Pattern Recognit Lett 56:38–44

    Article  Google Scholar 

  61. Agarwal B, Mittal N (2013) Sentiment classification using rough set based hybrid feature selection. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (WASSA), 2013, June, pp 115–119

  62. Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell Syst 25(4):46–53

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farhan Hassan Khan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, F.H., Qamar, U. & Bashir, S. A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet. Knowl Inf Syst 51, 851–872 (2017). https://doi.org/10.1007/s10115-016-0993-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0993-1

Keywords

Navigation