Knowledge and Information Systems

, Volume 53, Issue 3, pp 805–831 | Cite as

Document-level sentiment classification using hybrid machine learning approach

  • Abinash TripathyEmail author
  • Abhishek Anand
  • Santanu Kumar Rath
Regular Paper


It is a practice that users or customers intend to share their comments or reviews about any product in different social networking sites. An analyst usually processes to reviews properly to obtain any meaningful information from it. Classification of sentiments associated with reviews is one of these processing steps. The reviews framed are often made in text format. While processing the text reviews, each word of the review is considered as a feature. Thus, selection of right kind of features needs to be carried out to select the best feature from the set of all features. In this paper, the machine learning algorithm, i.e., support vector machine, is used to select the best features from the training data. These features are then given input to artificial neural network method, to process further. Different performance evaluation parameters such as precision, recall, f-measure, accuracy have been considered to evaluate the performance of the proposed approach on two different datasets, i.e., IMDb dataset and polarity dataset.


Document-level sentiment analysis Machine learning algorithm Support vector machine (SVM) Artificial neural network (ANN) Performance evaluation parameter 


  1. 1.
    Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol 10, Association for Computational Linguistics, 2002, pp 79–86Google Scholar
  2. 2.
    Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, 2004, p 271Google Scholar
  3. 3.
    Turney PD (2002) Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, 2002, pp 417–424Google Scholar
  4. 4.
    Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167MathSciNetCrossRefGoogle Scholar
  5. 5.
    Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89CrossRefGoogle Scholar
  6. 6.
    Gautam G, Yadav D (2014) Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In: 2014 seventh international conference on contemporary computing (IC3), IEEE, 2014, pp 437–442Google Scholar
  7. 7.
    Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. Springer, BerlinCrossRefzbMATHGoogle Scholar
  8. 8.
    Hady MFA, Schwenker F (2013) Semi-supervised learning. In: Bianchini M, Maggini M, Jain LC (eds) Handbook on neural information processing. Springer, Berlin, pp 215–239Google Scholar
  9. 9.
    IMDb, Internet movie database (IMDb) (2011).
  10. 10.
    Garreta R, Moncecchi G (2013) Learning scikit-learn: machine Learning in Python. Packt Publishing Ltd, BirminghamGoogle Scholar
  11. 11.
    Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho TB, Chung D, Liu H (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 301–311CrossRefGoogle Scholar
  12. 12.
    Moraes R, Valiati JF, Neto WPG (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633CrossRefGoogle Scholar
  13. 13.
    Tang D (2015) Sentiment-specific representation learning for document-level sentiment analysis. In: Proceedings of the eighth ACM international conference on web search and data mining, ACM, 2015, pp 447–452Google Scholar
  14. 14.
    Tu Z, He Y, Foster J, van Genabith J, Liu Q, Lin S (2012) Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics: short papers, vol 2, Association for Computational Linguistics, 2012, pp 338–343Google Scholar
  15. 15.
    Liu SM, Chen J-H (2015) A multi-label classification based approach for sentiment classification. Expert Syst Appl 42(3):1083–1093CrossRefGoogle Scholar
  16. 16.
    Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and SVM perf. Expert Syst Appl 42(4):1857–1863CrossRefGoogle Scholar
  17. 17.
    Luo B, Zeng J, Duan J (2016) Emotion space model for classifying opinions in stock message board. Expert Syst Appl 44:138–146CrossRefGoogle Scholar
  18. 18.
    Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: Tian Q, Sebe N, Qi G, Huet B, Hong R, Liu X (eds) Multimedia modeling. Springer, Berlin, pp 15–27CrossRefGoogle Scholar
  19. 19.
    Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126CrossRefGoogle Scholar
  20. 20.
    Govindarajan M (2013) Sentiment analysis of movie reviews using hybrid method of naive bayes and genetic algorithm. Int J Adv Comput Res 3(4):139Google Scholar
  21. 21.
    Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12CrossRefGoogle Scholar
  22. 22.
    Balage Filho PP, Avanço L, Pardo TA, Nunes MG (2014) NILC USP: an improved hybrid system for sentiment analysis in Twitter messages. SemEval 2014:428Google Scholar
  23. 23.
    Jagtap B, Dhotre V (2014) SVM and HMM based hybrid approach of sentiment analysis for teacher feedback assessment. Int J Emerg Trends Technol Comput Sci (IJETCS) 3(3):229–232Google Scholar
  24. 24.
    Wang S, Wei Y, Li D, Zhang W, Li W (2007) A hybrid method of feature selection for Chinese text sentiment classification, In: Fourth international conference on fuzzy systems and knowledge discovery, 2007 (FSKD 2007), vol 3, IEEE, 2007, pp 435–439Google Scholar
  25. 25.
    Babatunde O, Armstrong L, Leng J, Diepeveen D (2014) A genetic algorithm-based feature selection. Br J Math Comput Sci 4(21):889–905Google Scholar
  26. 26.
    Neumann J, Schnörr C, Steidl G (2005) Combined SVM-based feature selection and classification. Mach Learn 61(1–3):129–150CrossRefzbMATHGoogle Scholar
  27. 27.
    Fernandez-Lozano C, Seoane JA, Gestal M, Gaunt TR, Dorado J, Campbell C (2015) Texture classification using feature selection and kernel-based techniques. Soft Comput 19(9):2469–2480CrossRefGoogle Scholar
  28. 28.
    Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128CrossRefGoogle Scholar
  29. 29.
    Zheng L, Wang H, Gao S (2015) Sentimental feature selection for sentiment analysis of Chinese online reviews. Int J Mach Learn Cybern 6:1–10Google Scholar
  30. 30.
    Sharma A, Dey S (2012) A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, ACM, 2012, pp 1–7Google Scholar
  31. 31.
    Hardin D, Tsamardinos I, Aliferis CF (2004) A theoretical characterization of linear svm-based feature selection. In: Proceedings of the twenty-first international conference on machine learning, ACM, 2004, p 48Google Scholar
  32. 32.
    Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Expert Syst Appl 36(7):10760–10773CrossRefGoogle Scholar
  33. 33.
    Refaeilzadeh P, Tang L, Liu H Cross-validation.
  34. 34.
    Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical Report, Department of Computer Science, National Taiwan UniversityGoogle Scholar
  35. 35.
    Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybern C Appl Rev 30(4):451–462CrossRefGoogle Scholar
  36. 36.
    Reby D, Lek S, Dimopoulos I, Joachim J, Lauga J, Aulagnier S (1997) Artificial neural networks as a classification method in the behavioural sciences. Behav Process 40(1):35–43CrossRefGoogle Scholar
  37. 37.
    Mouthami K, Devi KN, Bhaskaran VM (2013) Sentiment analysis and classification based on textual reviews. In: 2013 international conference on information communication and embedded systems (ICICES), IEEE, 2013, pp 271–276Google Scholar
  38. 38.
    Salvetti F, Lewis S, Reichenbach C (2004) Automatic opinion polarity classification of movie. Colo Res Linguist 17:2Google Scholar
  39. 39.
    Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: Lin D, Wu D (eds) EMNLP, vol 4, pp 412–418Google Scholar
  40. 40.
    Beineke P, Hastie T, Vaithyanathan S (2004) The sentimental factor: improving review classification via human-provided information. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, 2004, p 263Google Scholar
  41. 41.
    Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM international conference on information and knowledge management, ACM, 2005, pp 625–631Google Scholar
  42. 42.
    Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: Proceedings of recent advances in natural language processing (RANLP), vol. 1, 2005, pp 1–7Google Scholar
  43. 43.
    Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL student research workshop, Association for Computational Linguistics, 2005, pp 43–48Google Scholar
  44. 44.
    Kennedy A, Inkpen D (2006) Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22(2):110–125MathSciNetCrossRefGoogle Scholar
  45. 45.
    Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning, pp 137–142Google Scholar
  46. 46.
    Socher R, Perelygin A, Wu JY, Chuang J, Manning C, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1642–1654Google Scholar
  47. 47.
    Cao Y, Xu R, Chen T (2015) Combining convolutional neural network and support vector machine for sentiment classification. In: Chinese national conference on social media processing, pp 144–155Google Scholar
  48. 48.
    Liu B (2015) Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  49. 49.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
  50. 50.
    van Rijsbergen CJ, Robertson SE, Porter MF, Martin F (1980) New models in probabilistic information retrieval. British Library Research and Development Department, LondonGoogle Scholar
  51. 51.
    Goldberg Y, Levy O (2014) word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722
  52. 52.
    Blake C, Merz CJ (1998) \(\{\text{UCI}\}\) Repository of machine learning databases. University of California, Dept. of Inform. Computer science, Irvine, CA, Available:
  53. 53.
    Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology RourkelaRourkelaIndia

Personalised recommendations