Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

Wu, Di; Yang, Ruixin; Shen, Chao

doi:10.1007/s10844-020-00597-7

Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

Published: 25 May 2020

Volume 56, pages 1–23, (2021)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Di Wu^1,2,
Ruixin Yang¹ &
Chao Shen¹

1948 Accesses
37 Citations
Explore all metrics

Abstract

The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is proposed. A definition of a word bag based on sentiment word co-occurrence is proposed. The co-occurrence of emotional words takes full account of different short texts. Then, the short texts of a microblog are endowed with emotional polarity. Furthermore, the knowledge pairs of topic special words and topic relation words are extracted and inserted into the LDA model for clustering. Thus, semantic information can be found more accurately. Then, the hidden n topics and Top30 special words set of each topic are extracted from the knowledge pair set. Finally, via LDA topic model primary clustering, a Top30 topic special words set is obtained that is clustered by K-means secondary clustering. The clustering center is optimized iteratively. Comparing with JST, LSM, LTM and ELDA, SKP-LDA performs better in terms of Accuracy, Precision, Recall and F-measure. The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect. It can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Exploring aspect-based sentiment quadruple extraction with implicit aspects, opinions, and ChatGPT: a comprehensive survey

Article Open access 20 January 2024

Sentiment analysis using product review data

Article Open access 16 June 2015

References

Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet allocation[J]. Journal of Machine Learning Research Archive, 3, 993–1022.
MATH Google Scholar
Chang, P., & Ma, H. (2011). Efficient short texts keyword extraction method analysis[j]. Computer Engineering & Applications, 47(20), 126–128,154.
Google Scholar
Chen, Z., & Liu, B. Topic modeling using topics from many domains, lifelong learning and big data[C].
Hao, J., Xie, J., Su, J.Q., & et al. (2016). An unsupervised approach for sentiment classification based on weighted latent Dirichlet allocation [J]. CAAI Transactions on Intelligent Systems, 11(4), 539–545.
Google Scholar
He, Y. (2011). Latent sentiment model for weakly-supervised crosslingual sentiment classification[J]. Advances in Information Retrieval, 6611, 214–225.
Article Google Scholar
Huang, F.L., Yu, G., Zhang, J.L., & et al. (2017). Mining topic sentiment in micro-blogging based on micro-blogger social relation [J]. Journal of Software, 28(3), 694–707.
Google Scholar
Kozlowski, M., & Rybinski, H. (2019). Clustering of semantically enriched short texts[J]. Journal of Intelligent Information Systems, 53(1), 69–92.
Article Google Scholar
Lin, C., & He, Y. (2009). Joint sentiment topic model for sentiment analysis[C]. In Proceedings of the 18th ACM conference on information and knowledge management (pp. 375–384). New York: ACM Press.
Liu, B.Y., Wang, C.R., Wang, C., & et al. (2017). Micro-blog community discovery algorithm based on dynamic topic model with multidimensional data fusion[J]. Journal of Software, 28(2), 246–261.
Google Scholar
Liu, Z., Liu, C.Y., Xia, B., & Li, T. (2018). Multiple relational topic modeling for noisy short texts[J]. International Journal of Software Engineering and Knowledge Engineering, 28(11), 1559–1574.
Article Google Scholar
Lu, L., Fuxi, Z., Rong, G., & et al. (2018). Point of interest joint recommendation method based on user-content topic model[J]. Computer Engineering & Applications, 4, 154–159.
Google Scholar
Peng, M., Huang, J.J., Zhu, J.H., & et al. (2015). Mass of short texts clustering and topic extraction based on frequent itemsets[J]. Journal of Computer Research & Development, 52(9), 1941–1953.
Google Scholar
Qi, J., Xun, L., Zhou, X., & et al. (2018). Micro-blog user community discovery using generalized simrank edge weighting method[J]. PLoS ONE, 13(5).
Qu, J., Chen, Z., & Zheng, Y. (2018). Research on the text clustering method of science and technology reports based on the topic model[J]. Library & Information Service.
Shams, M., & Baraani-Dastjerdi, A. (2017). Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction[J]. Expert Systems with Applications, 80, 136–146.
Article Google Scholar
Sun, Y., & Zhou, X.G. (2013). Unsupervised topic and sentiment unification model for sentiment analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 49(1), 102–108.
Google Scholar
Tago, K., & Jin, Q. (2018). Influence analysis of emotional behaviors and user relationships based on twitter data[J]. Tsinghua Science & Technology, 23(1), 104–113.
Article Google Scholar
Wan, H.X., & Peng, Y. (2018). Topic words extraction of social media based on semantic constrained and time associated LDA[J]. Journal of Chinese Computer Systems, 39(4), 742–747.
Google Scholar
Wang, X.W., & Zhang, K. (2012). Improved expansion algorithm based on co-occurrence relationship between short text feature[J]. Journal of Henan University of Urban Construction, 21(4), 48–50.
Google Scholar
Xiong, S., Wang, K., Ji, D., & et al. (2018). A Short text sentiment-topic model for product reviews[J]. Neurocomputing, 297, 94–102.
Article Google Scholar
Yong, M.C., Qing, C., School, B, & et al. (2018). Chinese short text topic analysis by latent Dirichlet allocation model with co-word network analysis[J]. Journal of the China Society for Scientific and Technical Information, 37(3), 305–317.
Google Scholar

Download references

Acknowledgments

This work is supported by Research Projects of Science and Technology in Hebei Higher Education Institutions (No.ZD2018087,ZD2016017,QN2018109,YQ2014014), the Nature Science Foundation of Hebei Province (No.F2019402-428), National Key R&D Program of China (No.2018YFF0301004), National Natural Science Foundation of China (No.61802107).

Author information

Authors and Affiliations

Department of Information and Electronic Engineering, Hebei University of Engineering, Handan, Hebei, China
Di Wu, Ruixin Yang & Chao Shen
Hebei Key Laboratory of Security Protection Information Sensing and Processing, Hebei University of Engineering, Handan, Hebei, China
Di Wu

Authors

Di Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ruixin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Shen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, D., Yang, R. & Shen, C. Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm. J Intell Inf Syst 56, 1–23 (2021). https://doi.org/10.1007/s10844-020-00597-7

Download citation

Received: 02 November 2019
Revised: 21 February 2020
Accepted: 25 February 2020
Published: 25 May 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10844-020-00597-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Exploring aspect-based sentiment quadruple extraction with implicit aspects, opinions, and ChatGPT: a comprehensive survey

Sentiment analysis using product review data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Exploring aspect-based sentiment quadruple extraction with implicit aspects, opinions, and ChatGPT: a comprehensive survey

Sentiment analysis using product review data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation