Abstract
In this paper, the SASC (Sentiment Analysis based on Sentiment Clustering) method is proposed to solve the problems of low accuracy and poor stability in the review sentiment clustering methods. Through two-stage sentiment clustering, the hidden sentiment information among the review texts is obtained to improve the accuracy and stability of the results. Specifically, in the first stage, the review representation vector construction method is put forward through the topic model LDA. Then the second stage uses K-means algorithm to achieve further optimization of the sentiment clustering results. In the experiment part, the evaluation methods of sentiment clustering are firstly introduced, and then a series of experiments are carried out on two widely used datasets Large Movie Review Dataset v1.0 and Multi-Domain Sentiment Dataset. Experiment results indicate that compared with other methods, the SASC method proposed in this paper has better clustering accuracy and stability.
Similar content being viewed by others
Data availability
The datasets analysed during the current study are available in the Multi-Domain Sentiment Dataset (version 2.0) [https://www.cs.jhu.edu/~mdredze/datasets/sentiment/] and the Large Movie Review Dataset v1.0 [http://ai.stanford.edu/~amaas/data/sentiment/].
References
Andrzejewski D, Zhu X (2017) Latent Dirichlet allocation with topic-in-set knowledge. Association for Computational Linguistics
Ansah J, Liu L, Kang W, Liu J, Li J (2020) Leveraging burst in twitter network communities for event detection. World Wide Web 23:2851–2876
Billal B, Fonseca A, Sadat F, Lounis H (2018) Semi-supervised learning and social media text analysis towards multi-labeling categorization. In Proceedings of the 2017 IEEE International Conference on Big Data, USA
Blitzer J (2007) Biographies, Bollywood, Boomboxes and blenders: domain adaptation for sentiment classification. In Proceedings of the Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), 2007
Claypo N, Jaiyen S (2015) Opinion mining for thai restaurant reviews using K-Means clustering and MRF feature selection. IEEE
Fan G-F, Zhang LZ, Yu M, Hong W-C, Dong S-Q (2022) Applications of random forest in multivariable response surface for short-term load forecasting. Int J Electr Power Energy Syst 139:108073. https://doi.org/10.1016/j.ijepes.2022.108073
Ippolito M, Ferguson J, Jenson F (2021) Improving facies prediction by combining supervised and unsupervised learning methods. J Pet Sci Eng 200:108300. https://doi.org/10.1016/j.petrol.2020.108300
Ji D, Wang R, Ren Y (2016) A topic-enhanced word embedding for Twitter sentiment classification. Inform Sci Int J
Kang H, Jung EH (2021) The smart wearables-privacy paradox: a cluster analysis of smartwatch users. Behav Inform Technol 40:1755–1768. https://doi.org/10.1080/0144929X.2020.1778787
Kaveri VV, Maheswari V (2017) A framework for recommending health-related topics based on topic modeling in conversational data (twitter). Clust Comput
Li G, Liu F (2012) Application of a clustering method on sentiment analysis. J Inf Sci 38:127–139
Liu Q, Liang B, Xu J, Zhou Q (2018) A deep hierarchical neural network model for aspect-based sentiment analysis
Ma B, Hua Y, Ye W (2017) Exploring performance of clustering methods on document sentiment analysis. J Inf Sci 43:54–74
Maas AL, Daly RE, Pham PT, Dan H, Potts C (2011) Learning word vectors for sentiment analysis. In Proceedings of the Meeting of the Association for Computational Linguistics, Human Language Technologies
Moeinpour L, Nasiri M, Pineh AJ, Davarpanah N (2019) Dynamic assessment of IELTS writing task one through Mobile learning in the context of Iranian EFL learners. Int J English Lang Educ 7:1
Nazareth AFDV, Lana MS (2021) A methodology for the definition of geotechnical mine sectors based on multivariate cluster analysis. Geotech Geol Eng 39:4405–4426. https://doi.org/10.1007/s10706-021-01771-6
Noekhah S, Salim NB, Zakaria NH (2018) A comprehensive study on opinion mining features and their applications. In Proceedings of the International Conference of Reliable Information and Communication Technology
Pandey AC, Rajpoot DS, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manag 53:764–779
Su F, Markert K (2008) From words to senses: a case study of subjectivity recognition. In Proceedings of the the 22nd International Conference
Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Information Fusion 36:10–25. https://doi.org/10.1016/j.inffus.2016.10.004
Tang J, Shu X, Qi GJ, Li Z, Wang M, Yan S, Jain R (2016, 1662-1674) Tri-clustered tensor completion for social-aware image tag refinement. IEEE Trans Pattern Anal Machine Intell
Xu Z, Liu Y, Xuan J, Chen H, Mei L (2017) Crowdsourcing based social media data analysis of urban emergency events. Multimed Tools Appl 76:11567–11584. https://doi.org/10.1007/s11042-015-2731-1
Xu K, Wang F, Wang H, Wang Y, Zhang Y (2020) Mitigating the impact of data sampling on social media analysis and mining. IEEE Trans Computat Social Syst 7:546–555. https://doi.org/10.1109/TCSS.2020.2970602
Yu YW, Jia ZF, Cao L, Zhao JD, Liu JL (2018) Fast density-based clustering algorithm for location big data
Zhai Z, Liu B, Xu H, Jia P (2011) Clustering product features for opinion mining. In Proceedings of the ACM, p. 347
Zhang M, Zhu G (2021) A User Group Classification Model Based on Sentiment Analysis Under Microblog Hot Topic. In: A user group classification model based on sentiment analysis under microblog hot topic; Big Data Analytics for Cyber-Physical System in Smart City
Zhang JM, Harman M, Ma L, Liu Y (2022) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng 48:1–36. https://doi.org/10.1109/tse.2019.2962027
Funding
The research leading to these results received funding from the Key Research and Promotion Projects of Henan Province under Grant Agreement No (222102210034, 222102210178, and 212102210099), and the Key Research Projects of Henan Higher Education Institutions under Grant Agreement No 22A520020.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts sof interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Han, H., He, X. et al. A two-stage unsupervised sentiment analysis method. Multimed Tools Appl 82, 26527–26544 (2023). https://doi.org/10.1007/s11042-023-14864-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14864-6