Short Text Processing for Analyzing User Portraits: A Dynamic Combination

Ding, Zhengping; Yan, Chen; Liu, Chunli; Ji, Jianrui; Liu, Yezheng

doi:10.1007/978-3-030-61616-8_59

Zhengping Ding^11,12,
Chen Yan^11,12,
Chunli Liu^11,12,
Jianrui Ji^11,12 &
…
Yezheng Liu^11,12

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

International Conference on Artificial Neural Networks

2259 Accesses
2 Citations

Abstract

The rich digital footprint left by users on the Internet has led to extensive researches on all aspects of Internet users. Among them, topic modeling is used to analyze text information posted by users on websites to generate user portraits. For dealing with the serious sparsity problems when extracting topics from short texts by traditional text modeling methods such as Latent Dirichlet Allocation (LDA), researchers usually aggregate all the texts published by each user into a pseudo-document. However, such pseudo-documents contain a lot of irrelevant topics, which is not consistent with the documents published by people in reality. To that end, this paper introduces the LDA-RCC model for dynamic text modeling based on the actual text, which is used to analyze the interests of forum users and build user portraits. Specifically, this combined model can effectively process short texts through the iterative combination of text modeling method LDA and robust continuous clustering method (RCC). Meanwhile, this model can automatically extract the number of topics based on the user’s data. In this way, by processing the clustering results, we can obtain the preferences of each user for deep user analysis. A large number of experimental results show that the LDA-RCC model can obtain good results and is superior to both traditional text modeling methods and short text clustering benchmark methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A LDA-Based Algorithm for Length-Aware Text Clustering

Improving Document Clustering for Short Texts by Long Documents via a Dirichlet Multinomial Allocation Model

Deep Structured Clustering of Short Text

References

Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Google Scholar
Lim, K.W., Chen, C., Buntine, W.: Twitter-network topic model: a full Bayesian treatment for social network and text modeling. arXiv preprint arXiv:1609.06791 (2016)
Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114 (2016)
Google Scholar
Liu, J., Toubia, O.: A semantic approach for estimating consumer content preferences from online search queries. Market. Sci. 37, 930–952 (2018)
Article Google Scholar
Thomaz, F., Salge, C., Karahanna, E., Hulland, J.: Learning from the Dark Web: leveraging conversational agents in the era of hyper-privacy to enhance marketing. J. Acad. Mark. Sci. 48(1), 43–63 (2019). https://doi.org/10.1007/s11747-019-00704-3
Article Google Scholar
Amato, G., Straccia, U.: User profile modeling and applications to digital libraries. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 184–197. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48155-9_13
Chapter Google Scholar
Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20, 202–215 (2007)
Article Google Scholar
Zhou, M.X., Wang, F., Zimmerman, T., Yang, H., Haber, E., Gou, L.: Computational discovery of personal traits from social multimedia. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6. IEEE (2013)
Google Scholar
Sumner, C., Byers, A., Shearing, M.: Determining personality traits and privacy concerns from facebook activity. Black Hat Brief. 11, 197–221 (2011)
Google Scholar
Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS One 8, e73791 (2013)
Article Google Scholar
Puranam, D., Narayan, V., Kadiyali, V.: The effect of calorie posting regulation on consumer opinion: a flexible latent Dirichlet allocation model with informative priors. Market. Sci. 36, 726–746 (2017)
Article Google Scholar
Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46, 236–247 (2017)
Article Google Scholar
Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on Twitter: a first look. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 73–80 (2010)
Google Scholar
Abel, F., Gao, Q., Houben, G.-J., Tao, K.: Analyzing user modeling on Twitter for personalized news recommendations. In: Konstan, Joseph A., Conejo, R., Marzo, José L., Oliver, N. (eds.) UMAP 2011. LNCS, vol. 6787, pp. 1–12. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22362-4_1
Chapter Google Scholar
Liu, Q., Niu, K., He, Z., He, X.: Microblog user interest modeling based on feature propagation. In: 2013 Sixth International Symposium on Computational Intelligence and Design, pp. 383–386. IEEE (2013)
Google Scholar
Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In: Advances in Neural Information Processing Systems, pp. 1982–1989. (2009)
Google Scholar
Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential Twitterers. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 261–270 (2010)
Google Scholar
Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88 (2010)
Google Scholar
Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 2928–2941 (2014)
Article Google Scholar
Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100 (2008)
Google Scholar
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014)
Google Scholar
Xu, C., Zhang, H., Lu, B., Wu, S.: Local community detection using social relations and topic features in social networks. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds.) CCL/NLP-NABD -2017. LNCS (LNAI), vol. 10565, pp. 371–383. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69005-6_31
Chapter Google Scholar
Shah, S.A., Koltun, V.: Robust continuous clustering. Proc. Natl. Acad. Sci. 114, 9814–9819 (2017)
Article Google Scholar
Mimno, D., Wallach, H., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 262–272 (2011)
Google Scholar
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Management, Hefei University of Technology, Hefei, 230009, Anhui, People’s Republic of China
Zhengping Ding, Chen Yan, Chunli Liu, Jianrui Ji & Yezheng Liu
Key Laboratory of Process Optimization and Intelligent Decision Making, Ministry of Education, Hefei, 230009, Anhui, China
Zhengping Ding, Chen Yan, Chunli Liu, Jianrui Ji & Yezheng Liu

Authors

Zhengping Ding
View author publications
You can also search for this author in PubMed Google Scholar
Chen Yan
View author publications
You can also search for this author in PubMed Google Scholar
Chunli Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianrui Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yezheng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yezheng Liu .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, Z., Yan, C., Liu, C., Ji, J., Liu, Y. (2020). Short Text Processing for Analyzing User Portraits: A Dynamic Combination. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_59

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_59
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Short Text Processing for Analyzing User Portraits: A Dynamic Combination

Abstract

Access this chapter

Similar content being viewed by others

A LDA-Based Algorithm for Length-Aware Text Clustering

Improving Document Clustering for Short Texts by Long Documents via a Dirichlet Multinomial Allocation Model

Deep Structured Clustering of Short Text

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Short Text Processing for Analyzing User Portraits: A Dynamic Combination

Abstract

Access this chapter

Similar content being viewed by others

A LDA-Based Algorithm for Length-Aware Text Clustering

Improving Document Clustering for Short Texts by Long Documents via a Dirichlet Multinomial Allocation Model

Deep Structured Clustering of Short Text

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation