Abstract
The rich digital footprint left by users on the Internet has led to extensive researches on all aspects of Internet users. Among them, topic modeling is used to analyze text information posted by users on websites to generate user portraits. For dealing with the serious sparsity problems when extracting topics from short texts by traditional text modeling methods such as Latent Dirichlet Allocation (LDA), researchers usually aggregate all the texts published by each user into a pseudo-document. However, such pseudo-documents contain a lot of irrelevant topics, which is not consistent with the documents published by people in reality. To that end, this paper introduces the LDA-RCC model for dynamic text modeling based on the actual text, which is used to analyze the interests of forum users and build user portraits. Specifically, this combined model can effectively process short texts through the iterative combination of text modeling method LDA and robust continuous clustering method (RCC). Meanwhile, this model can automatically extract the number of topics based on the user’s data. In this way, by processing the clustering results, we can obtain the preferences of each user for deep user analysis. A large number of experimental results show that the LDA-RCC model can obtain good results and is superior to both traditional text modeling methods and short text clustering benchmark methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Lim, K.W., Chen, C., Buntine, W.: Twitter-network topic model: a full Bayesian treatment for social network and text modeling. arXiv preprint arXiv:1609.06791 (2016)
Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114 (2016)
Liu, J., Toubia, O.: A semantic approach for estimating consumer content preferences from online search queries. Market. Sci. 37, 930–952 (2018)
Thomaz, F., Salge, C., Karahanna, E., Hulland, J.: Learning from the Dark Web: leveraging conversational agents in the era of hyper-privacy to enhance marketing. J. Acad. Mark. Sci. 48(1), 43–63 (2019). https://doi.org/10.1007/s11747-019-00704-3
Amato, G., Straccia, U.: User profile modeling and applications to digital libraries. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 184–197. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48155-9_13
Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20, 202–215 (2007)
Zhou, M.X., Wang, F., Zimmerman, T., Yang, H., Haber, E., Gou, L.: Computational discovery of personal traits from social multimedia. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6. IEEE (2013)
Sumner, C., Byers, A., Shearing, M.: Determining personality traits and privacy concerns from facebook activity. Black Hat Brief. 11, 197–221 (2011)
Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS One 8, e73791 (2013)
Puranam, D., Narayan, V., Kadiyali, V.: The effect of calorie posting regulation on consumer opinion: a flexible latent Dirichlet allocation model with informative priors. Market. Sci. 36, 726–746 (2017)
Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46, 236–247 (2017)
Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on Twitter: a first look. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 73–80 (2010)
Abel, F., Gao, Q., Houben, G.-J., Tao, K.: Analyzing user modeling on Twitter for personalized news recommendations. In: Konstan, Joseph A., Conejo, R., Marzo, José L., Oliver, N. (eds.) UMAP 2011. LNCS, vol. 6787, pp. 1–12. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22362-4_1
Liu, Q., Niu, K., He, Z., He, X.: Microblog user interest modeling based on feature propagation. In: 2013 Sixth International Symposium on Computational Intelligence and Design, pp. 383–386. IEEE (2013)
Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In: Advances in Neural Information Processing Systems, pp. 1982–1989. (2009)
Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential Twitterers. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 261–270 (2010)
Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88 (2010)
Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 2928–2941 (2014)
Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100 (2008)
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014)
Xu, C., Zhang, H., Lu, B., Wu, S.: Local community detection using social relations and topic features in social networks. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds.) CCL/NLP-NABD -2017. LNCS (LNAI), vol. 10565, pp. 371–383. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69005-6_31
Shah, S.A., Koltun, V.: Robust continuous clustering. Proc. Natl. Acad. Sci. 114, 9814–9819 (2017)
Mimno, D., Wallach, H., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 262–272 (2011)
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ding, Z., Yan, C., Liu, C., Ji, J., Liu, Y. (2020). Short Text Processing for Analyzing User Portraits: A Dynamic Combination. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)