Sentiment analysis via semi-supervised learning: a model based on dynamic threshold and multi-classifiers

  • Yue Han
  • Yuhong Liu
  • Zhigang Jin
Original Article


Sentiment analysis has become a very popular research topic, especially for retrieving valuable information from various online environments. Most existing sentiment studies are based on supervised learning, which requires sufficient amount of labeled data. However, sentiment analysis often faces insufficient labeled data in practice, as it is very expensive and time-consuming to label large amount of data. To handle the scenario of insufficient initial labeled data, we propose a novel semi-supervised model based on dynamic threshold and multi-classifiers. In particular, the training data are auto-labeled in an iterative way based on the proposed dynamic threshold algorithm, where a dynamic threshold function is proposed to set thresholds for selecting the auto-labeled data. It considers both the quality and quantity of the auto-labeled data. In addition, the proposed weighted voting strategy combines multiple support vector machine classifiers by considering performance gap among different classifiers. The performance of the proposed model is validated through experiments on real datasets. Compared with two other existing models, the proposed model achieves the highest sentiment analysis accuracy across datasets with different sizes of initial labeled training data.


Dynamic threshold Multiple classifiers Semi-supervised learning Sentiment analysis Social media Weighted voting 



This work was supported by National Nature Science Foundation of China (NSFC) under Project 71502125.


  1. 1.
    Nagarajan SM, Gandhi UD (2018) Classifying streaming of twitter data based on sentiment analysis using hybridization. Neural Comput Appl 4:1–9Google Scholar
  2. 2.
    Valdivia A, Luzn MV, Herrera F (2017) Sentiment analysis in TripAdvisor. IEEE Intell Syst 32(4):72–77CrossRefGoogle Scholar
  3. 3.
    Lei X, Qian X, Zhao G (2016) Rating prediction based on social sentiment from textual reviews. IEEE Trans Multimed 18(9):1910–1921CrossRefGoogle Scholar
  4. 4.
    Cao J, Zeng K, Wang H (2014) Web-based traffic sentiment analysis: methods and applications. IEEE Trans Intell Transp Syst 15(2):844–853CrossRefGoogle Scholar
  5. 5.
    Lu Y, Rao Y, Yang J, Yin J (2018) Incorporating Lexicons into LSTM for sentiment classification. In: 2018 International joint conference on neural networks (IJCNN), pp 1–7Google Scholar
  6. 6.
    Chen Y, Zhang Z (2018) Research on text sentiment analysis based on CNNs and SVM. In: 13th IEEE conference on industrial electronics and applications (ICIEA), pp 2731–2734Google Scholar
  7. 7.
    Yenter A, Verma A (2017) Deep CNN-LSTM with combined kernels from multiple branches for IMDB review sentiment analysis. In: IEEE 8th annual ubiquitous computing, electronics and mobile communication conference (UEMCON), pp 540–546Google Scholar
  8. 8.
    Zhou S, Chen Q, Wang X (2013) Active deep learning method for semisupervised sentiment classification. Neurocomputing 120(10):536–546CrossRefGoogle Scholar
  9. 9.
    Hussain A, Cambria E (2018) Semi-supervised learning for big social data analysis. Neurocomputing 275:1662–1673CrossRefGoogle Scholar
  10. 10.
    Rout J, Dalmia A, Choo KKR, Bakshi S, Jena S (2017) Revisiting semisupervised learning for online deceptive review detection. IEEE Access 99:1–1Google Scholar
  11. 11.
    Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15(1):29–44CrossRefGoogle Scholar
  12. 12.
    Zhang H, Liu G, Chow TWS (2011) Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw 22(10):1532–1546CrossRefGoogle Scholar
  13. 13.
    Hong S, Lee J, Lee JH (2014) Competitive self-training technique for sentiment analysis in mass social media. In: International symposium on soft computing and intelligent systems, pp 9–12Google Scholar
  14. 14.
    Huang W, Fan L (2016) Semi-supervised sentiment classification based on ensemble learning with voting. J Chin Inf Process 2:41–49Google Scholar
  15. 15.
    Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: International conference on knowledge capture, pp 70–77Google Scholar
  16. 16.
    Atarashi K, Oyama S, Kurihara M (2018) Semi-supervised learning from crowds using deep generative models. In: Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI-18). AAAIGoogle Scholar
  17. 17.
    Blum A (1998) Combining labeled and unlabeled data with co-training. In: Conference on computational learning theory, pp 92–100Google Scholar
  18. 18.
    Maeireizo B, Litman D, Hwa R (2004) Co-training for predicting emotions with spoken dialogue data. In: ACL 2004 on interactive poster and demonstration sessions, p 28Google Scholar
  19. 19.
    Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Joint conference of the meeting of the ACL and the international joint conference on natural language processing of the AFNLP, vol, pp 244–252Google Scholar
  20. 20.
    Sindhwani V, Melville P (2008) Document-word co-regularization for semisupervised sentiment analysis, pp 1025–1030Google Scholar
  21. 21.
    He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process Manag 47(4):606–616CrossRefGoogle Scholar
  22. 22.
    Nora BM, Lemnaru C, Potolea R (2010) Semi-supervised learning with lexical knowledge for opinion mining. IEEE Computer SocietyGoogle Scholar
  23. 23.
    Lu TJ (2015) Semi-supervised microblog sentiment analysis using social relation and text similarity. In: International conference on big data and smart computing, pp 194–201Google Scholar
  24. 24.
    Sadhana SA, Sairamesh L, Sabena S, Ganapathy S, Kannan A (2017) Mining target opinions from online reviews using semi-supervised word alignment model. In: Second international conference on recent trends and challenges in computational models, pp 196–200Google Scholar
  25. 25.
    Hajmohammadi MS, Ibrahim R, Selamat A (2015) Graph-based semisupervised learning for cross-lingual sentiment classification. Springer, BerlinGoogle Scholar
  26. 26.
    Zhu S, Xu B, Zheng D, Zhao T (2013) Chinese microblog sentiment analysis based on semi-supervised learning. Springer, New YorkCrossRefGoogle Scholar
  27. 27.
    Aghababaei S, Makrehchi M (2017) Interpolative self-training approach for sentiment analysis. In: International conference on behavioral, economic and socio-cultural computing, pp 1–6Google Scholar
  28. 28.
    Shi H, Li X, Liu H, Zhu L (2016) Research on the attribute classification of sentiment target based on the stratified sampling. In: International conference on natural computation, fuzzy systems and knowledge discovery, pp 1180–1187Google Scholar
  29. 29.
    Dai L, Chen H, Li X (2011) Improving sentiment classification using feature highlighting and feature bagging. In: IEEE international conference on data mining workshops, pp 61–66Google Scholar
  30. 30.
    Rong W, Nie Y, Ouyang Y, Peng B, Xiong Z (2014) Auto-encoder based bagging architecture for sentiment analysis. J Vis Lang Comput 25(6):840–849CrossRefGoogle Scholar
  31. 31.
    Prusa J, Khoshgoftaar TM, Dittman DJ (2015) Using ensemble learners to improve classifier performance on tweet sentiment data. In: IEEE international conference on information reuse and integration, pp 252–257Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electrical and Information EngineeringTianjin UniversityTianjinPeople’s Republic of China
  2. 2.Department of Computer EngineeringSanta Clara UniversitySanta ClaraUSA

Personalised recommendations