Skip to main content
Log in

A two-stage unsupervised sentiment analysis method

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, the SASC (Sentiment Analysis based on Sentiment Clustering) method is proposed to solve the problems of low accuracy and poor stability in the review sentiment clustering methods. Through two-stage sentiment clustering, the hidden sentiment information among the review texts is obtained to improve the accuracy and stability of the results. Specifically, in the first stage, the review representation vector construction method is put forward through the topic model LDA. Then the second stage uses K-means algorithm to achieve further optimization of the sentiment clustering results. In the experiment part, the evaluation methods of sentiment clustering are firstly introduced, and then a series of experiments are carried out on two widely used datasets Large Movie Review Dataset v1.0 and Multi-Domain Sentiment Dataset. Experiment results indicate that compared with other methods, the SASC method proposed in this paper has better clustering accuracy and stability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available in the Multi-Domain Sentiment Dataset (version 2.0) [https://www.cs.jhu.edu/~mdredze/datasets/sentiment/] and the Large Movie Review Dataset v1.0 [http://ai.stanford.edu/~amaas/data/sentiment/].

References

  1. Andrzejewski D, Zhu X (2017) Latent Dirichlet allocation with topic-in-set knowledge. Association for Computational Linguistics

  2. Ansah J, Liu L, Kang W, Liu J, Li J (2020) Leveraging burst in twitter network communities for event detection. World Wide Web 23:2851–2876

    Article  Google Scholar 

  3. Billal B, Fonseca A, Sadat F, Lounis H (2018) Semi-supervised learning and social media text analysis towards multi-labeling categorization. In Proceedings of the 2017 IEEE International Conference on Big Data, USA

  4. Blitzer J (2007) Biographies, Bollywood, Boomboxes and blenders: domain adaptation for sentiment classification. In Proceedings of the Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), 2007

  5. Claypo N, Jaiyen S (2015) Opinion mining for thai restaurant reviews using K-Means clustering and MRF feature selection. IEEE

  6. Fan G-F, Zhang LZ, Yu M, Hong W-C, Dong S-Q (2022) Applications of random forest in multivariable response surface for short-term load forecasting. Int J Electr Power Energy Syst 139:108073. https://doi.org/10.1016/j.ijepes.2022.108073

    Article  Google Scholar 

  7. Ippolito M, Ferguson J, Jenson F (2021) Improving facies prediction by combining supervised and unsupervised learning methods. J Pet Sci Eng 200:108300. https://doi.org/10.1016/j.petrol.2020.108300

    Article  Google Scholar 

  8. Ji D, Wang R, Ren Y (2016) A topic-enhanced word embedding for Twitter sentiment classification. Inform Sci Int J

  9. Kang H, Jung EH (2021) The smart wearables-privacy paradox: a cluster analysis of smartwatch users. Behav Inform Technol 40:1755–1768. https://doi.org/10.1080/0144929X.2020.1778787

    Article  Google Scholar 

  10. Kaveri VV, Maheswari V (2017) A framework for recommending health-related topics based on topic modeling in conversational data (twitter). Clust Comput

  11. Li G, Liu F (2012) Application of a clustering method on sentiment analysis. J Inf Sci 38:127–139

    Article  Google Scholar 

  12. Liu Q, Liang B, Xu J, Zhou Q (2018) A deep hierarchical neural network model for aspect-based sentiment analysis

    Google Scholar 

  13. Ma B, Hua Y, Ye W (2017) Exploring performance of clustering methods on document sentiment analysis. J Inf Sci 43:54–74

    Article  Google Scholar 

  14. Maas AL, Daly RE, Pham PT, Dan H, Potts C (2011) Learning word vectors for sentiment analysis. In Proceedings of the Meeting of the Association for Computational Linguistics, Human Language Technologies

    Google Scholar 

  15. Moeinpour L, Nasiri M, Pineh AJ, Davarpanah N (2019) Dynamic assessment of IELTS writing task one through Mobile learning in the context of Iranian EFL learners. Int J English Lang Educ 7:1

    Article  Google Scholar 

  16. Nazareth AFDV, Lana MS (2021) A methodology for the definition of geotechnical mine sectors based on multivariate cluster analysis. Geotech Geol Eng 39:4405–4426. https://doi.org/10.1007/s10706-021-01771-6

    Article  Google Scholar 

  17. Noekhah S, Salim NB, Zakaria NH (2018) A comprehensive study on opinion mining features and their applications. In Proceedings of the International Conference of Reliable Information and Communication Technology

  18. Pandey AC, Rajpoot DS, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manag 53:764–779

    Article  Google Scholar 

  19. Su F, Markert K (2008) From words to senses: a case study of subjectivity recognition. In Proceedings of the the 22nd International Conference

  20. Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Information Fusion 36:10–25. https://doi.org/10.1016/j.inffus.2016.10.004

    Article  Google Scholar 

  21. Tang J, Shu X, Qi GJ, Li Z, Wang M, Yan S, Jain R (2016, 1662-1674) Tri-clustered tensor completion for social-aware image tag refinement. IEEE Trans Pattern Anal Machine Intell

  22. Xu Z, Liu Y, Xuan J, Chen H, Mei L (2017) Crowdsourcing based social media data analysis of urban emergency events. Multimed Tools Appl 76:11567–11584. https://doi.org/10.1007/s11042-015-2731-1

    Article  Google Scholar 

  23. Xu K, Wang F, Wang H, Wang Y, Zhang Y (2020) Mitigating the impact of data sampling on social media analysis and mining. IEEE Trans Computat Social Syst 7:546–555. https://doi.org/10.1109/TCSS.2020.2970602

    Article  Google Scholar 

  24. Yu YW, Jia ZF, Cao L, Zhao JD, Liu JL (2018) Fast density-based clustering algorithm for location big data

    Google Scholar 

  25. Zhai Z, Liu B, Xu H, Jia P (2011) Clustering product features for opinion mining. In Proceedings of the ACM, p. 347

  26. Zhang M, Zhu G (2021) A User Group Classification Model Based on Sentiment Analysis Under Microblog Hot Topic. In: A user group classification model based on sentiment analysis under microblog hot topic; Big Data Analytics for Cyber-Physical System in Smart City

    Chapter  Google Scholar 

  27. Zhang JM, Harman M, Ma L, Liu Y (2022) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng 48:1–36. https://doi.org/10.1109/tse.2019.2962027

    Article  Google Scholar 

Download references

Funding

The research leading to these results received funding from the Key Research and Promotion Projects of Henan Province under Grant Agreement No (222102210034, 222102210178, and 212102210099), and the Key Research Projects of Henan Higher Education Institutions under Grant Agreement No 22A520020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin He.

Ethics declarations

Conflicts sof interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Han, H., He, X. et al. A two-stage unsupervised sentiment analysis method. Multimed Tools Appl 82, 26527–26544 (2023). https://doi.org/10.1007/s11042-023-14864-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14864-6

Keywords

Navigation