Knowledge and Information Systems

, Volume 51, Issue 3, pp 851–872

A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

Regular Paper
  • 304 Downloads

Abstract

An immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc., are so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon-based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as information gain and cosine similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state-of-the-art techniques proves that the proposed approach is superior.

Keywords

Sentiment analysis Polarity classification Support vector machine Cosine similarity Information gain 

References

  1. 1.
    Khan FH, Qamar U, Bashir S (2016) SentiMI: introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Appl Soft Comput 39:140–153CrossRefGoogle Scholar
  2. 2.
    Balahur A (2013) Sentiment analysis in social media texts. In: 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 120–128Google Scholar
  3. 3.
    Molina-González MD, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA (2015) A Spanish semantic orientation approach to domain adaptation for polarity classification. Inf Process Manag 51:520–531CrossRefGoogle Scholar
  4. 4.
    Khan FH, Bashir S, Qamar U (2014) TOM: twitter opinion mining framework using hybrid classification scheme. Decis Support Syst 57:245–257CrossRefGoogle Scholar
  5. 5.
    Khan FH, Qamar U, Bashir S (2015) Building normalized SentiMI to enhance semi-supervised sentiment analysis. J Intell Fuzzy Syst 29:1805–1816CrossRefGoogle Scholar
  6. 6.
    Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113CrossRefGoogle Scholar
  7. 7.
    Triguero Isaac, García Salvador, Herrera Francisco (2013) Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl Inf Syst 42(2):245–284CrossRefGoogle Scholar
  8. 8.
    Fazakis N, Karlos S, Kotsiantis S, Sgarbas K (2016) Self-trained LMT for semisupervised learning. Comput Intell Neurosci 2016:1–13CrossRefGoogle Scholar
  9. 9.
    Didaci L, Fumera G, Roli F, Gimel’farb, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) (2012) Analysis of co-training algorithm with very small training sets. LNCS. Springer, Berlin Heidelberg. pp 719–726. ISBN 9783642341656Google Scholar
  10. 10.
    Habernal I, Ptáček T, Steinberger J (2015) Reprint of ”Supervised sentiment analysis in Czech social media”. Inf Process Manag 51:532–546CrossRefGoogle Scholar
  11. 11.
    Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd joint WICOW/AIRWeb workshop on web quality, pp 35–40Google Scholar
  12. 12.
    Singh PK, Husain MS (2014) Methodological study of opinion mining and sentiment analysis techniques. Int J Soft Comput 5(1):11CrossRefGoogle Scholar
  13. 13.
    Ortega R, Fonseca A, Montoyo A (2013) SSA-UO: unsupervised Twitter sentiment analysis. In: Second joint conference on lexical and computational semantics (*SEM), Vol. 2, pp 501–507Google Scholar
  14. 14.
    Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: 9th. IT & T conference p 13Google Scholar
  15. 15.
    Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Comput Sci 46:635–643CrossRefGoogle Scholar
  16. 16.
    Chikersal P, Poria S, Cambria E, Gelbukh A, Siong CE (2015) Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Computational linguistics and intelligent text processing, Springer International Publishing, pp 49–65Google Scholar
  17. 17.
    Pandarachalil R, Sendhilkumar S, Mahalakshmi GS (2015) Twitter sentiment analysis for large-scale data: an unsupervised approach. In: Cognitive computation pp 1–9Google Scholar
  18. 18.
    Ghosh M, Kar A (2013) Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int J Eng Res Technol 2(9) ESRSA PublicationsGoogle Scholar
  19. 19.
    Fellbaum C (1998) WordNet: an electronic database. MIT Press, Cambridge, MAMATHGoogle Scholar
  20. 20.
    Strapparava C, Valitutti A (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), pp 1083–1086Google Scholar
  21. 21.
    Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini C (2007) Micro-WNOp: a gold standard for the evaluation of automatically compiled lexical resources for opinion mining. In: Sanso A (ed) Language resources, linguistic theory. Franco Angeli, Milan, pp 200–210Google Scholar
  22. 22.
    Stone PJ, Hunt EB (1963) A computer approach to content analysis: studies using the general inquirer system. In: Proceedings of the spring joint computer conference (AFIPS 1963), pp 241–256Google Scholar
  23. 23.
    de Albornoz JC, Plaza L, Gervas P (2012) Sentisense: an easily scalable concept based affective lexicon for sentiment analysis. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), pp 3562–3567Google Scholar
  24. 24.
    Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs, CoRR abs/1103.2903Google Scholar
  25. 25.
    Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307CrossRefGoogle Scholar
  26. 26.
    Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 168–177Google Scholar
  27. 27.
    Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP 2003), pp 105–112Google Scholar
  28. 28.
    Cambria E, Havasi C, Hussain A (2012) Senticnet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: Proceedings of the 25th Florida artificial intelligence research society conference (FLAIRS 2012), pp 202–207Google Scholar
  29. 29.
    Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150Google Scholar
  30. 30.
    Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 115–124Google Scholar
  31. 31.
    Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL vol 7, pp 440–447Google Scholar
  32. 32.
    Khan FH, Qamar U, Bashir S (2016) Multi-objective model selection (MOMS)-based semi-supervised framework for sentiment analysis. Cognit Comput. doi:10.1007/s12559-016-9386-8
  33. 33.
    Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: International conference on language resources and evaluation (LREC), vol 10, pp 2200–2204Google Scholar
  34. 34.
    Mitchell T (1996) Machine learning. McCraw Hill, New YorKMATHGoogle Scholar
  35. 35.
    Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420Google Scholar
  36. 36.
    Lewis DD, Ringuette M (1994) Comparison of two learning algorithms for text categorization. In: Proceedings of third annual symposium on document analysis and information retrievalGoogle Scholar
  37. 37.
    Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3):491–504CrossRefGoogle Scholar
  38. 38.
    Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. In: Data mining workshops (ICDMW), 2012 IEEE 12th international conference on IEEE, pp 918–925Google Scholar
  39. 39.
    Kim K, Chung BS, Choi Y, Lee S, Jung JY, Park J (2014) Language independent semantic kernels for short-text classification. Expert Syst Appl 41(2):735–743CrossRefGoogle Scholar
  40. 40.
    Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATHGoogle Scholar
  41. 41.
    Verma S, Bhattacharyya P (2009) Incorporating semantic knowledge for sentiment analysis. In: 6th international conference on natural language processing IndiaGoogle Scholar
  42. 42.
    Kalaivani P, Shunmuganathan KL (2015) Feature reduction based on genetic algorithm and hybrid model for opinion mining. Sci Program. doi:10.1155/2015/961454
  43. 43.
    Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152CrossRefGoogle Scholar
  44. 44.
    Varela PL, Martins AF, Aguiar PM, Figueiredo MA (2013) An empirical study of feature selection for sentiment analysis. In: 9th conference on telecommunications, Conftele, Castelo BrancoGoogle Scholar
  45. 45.
    Hung C, Lin HK (2013) Using objective words in SentiWordNet to improve word-of-mouth sentiment classification. IEEE Intell Syst 2:47–54CrossRefGoogle Scholar
  46. 46.
    Rice DR, Zorn C (2013) Corpus-based dictionaries for sentiment analysis of specialized vocabularies. In: Proceedings of NDATADGoogle Scholar
  47. 47.
    Demiroz G, Yanikoglu B, Tapucu D, Saygin Y (2012) Learning domain-specific polarity lexicons. In: Data mining workshops (ICDMW). In: 2012 IEEE 12th international conference on IEEE, pp 674–679Google Scholar
  48. 48.
    Sharma A, Dey S (2012) Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. In: Special issue of international journal of computer applications (0975 – 8887) on advanced computing and communication technologies for HPC Applications – ACCTHPCAGoogle Scholar
  49. 49.
    Mudinas A, Zhang D, Levene M (2012) Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the first international workshop on issues of sentiment discovery and opinion mining. ACM, p 5Google Scholar
  50. 50.
    Hamouda A, Marei M, Rohaim M (2011) Building machine learning based senti-word lexicon for sentiment analysis. J Adv Inf Technol 2(4):199–203Google Scholar
  51. 51.
    Su F, Markert K (2008) From words to senses: a case study of subjectivity recognition. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 825–832Google Scholar
  52. 52.
    Agarwal B, Mittal N, Bansal P, Garg S (2015) Sentiment analysis using common-sense and context information. Comput Intell Neurosci 9:715–730. doi:10.1155/2015/715730 Google Scholar
  53. 53.
    Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decision Support Syst 57:77–93CrossRefGoogle Scholar
  54. 54.
    Dhande LL, Patnaik GK (2014) Analyzing sentiment of movie review data using naive bayes neural classifier. Int J Emerg Trends Technol Comput Sci (IJETTCS)Google Scholar
  55. 55.
    Zhou S, Chen Q, Wang X, Li X (2014) Hybrid deep belief networks for semi-supervised sentiment classification. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics. Technical Papers, pp 1341–1349Google Scholar
  56. 56.
    Liu B, Blasch E, Chen Y, Shen D, Chen G (2013) Scalable sentiment classification for big data analysis using naive bayes classifier. In: Big data, 2013 IEEE international conference on IEEE, pp 99–104Google Scholar
  57. 57.
    Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing. pp 151–161Google Scholar
  58. 58.
    He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process Manag 47(4):606–616MathSciNetCrossRefGoogle Scholar
  59. 59.
    Lin C, He Y, Everson Y (2010) A comparative study of Bayesian models for unsupervised sentiment. In: Proceedings of the fourteenth conference on computational natural language learning. Uppsala, Sweden, pp 144–152Google Scholar
  60. 60.
    Park S, Lee W, Moon IC (2015) Efficient extraction of domain specific sentiment lexicon with active learning. Pattern Recognit Lett 56:38–44CrossRefGoogle Scholar
  61. 61.
    Agarwal B, Mittal N (2013) Sentiment classification using rough set based hybrid feature selection. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (WASSA), 2013, June, pp 115–119Google Scholar
  62. 62.
    Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell Syst 25(4):46–53CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.Department of Computer Engineering, College of Electrical and Mechanical EngineeringNational University of Sciences and Technology (NUST)IslamabadPakistan

Personalised recommendations