Skip to main content

Short Text Processing for Analyzing User Portraits: A Dynamic Combination

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2020 (ICANN 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

Abstract

The rich digital footprint left by users on the Internet has led to extensive researches on all aspects of Internet users. Among them, topic modeling is used to analyze text information posted by users on websites to generate user portraits. For dealing with the serious sparsity problems when extracting topics from short texts by traditional text modeling methods such as Latent Dirichlet Allocation (LDA), researchers usually aggregate all the texts published by each user into a pseudo-document. However, such pseudo-documents contain a lot of irrelevant topics, which is not consistent with the documents published by people in reality. To that end, this paper introduces the LDA-RCC model for dynamic text modeling based on the actual text, which is used to analyze the interests of forum users and build user portraits. Specifically, this combined model can effectively process short texts through the iterative combination of text modeling method LDA and robust continuous clustering method (RCC). Meanwhile, this model can automatically extract the number of topics based on the user’s data. In this way, by processing the clustering results, we can obtain the preferences of each user for deep user analysis. A large number of experimental results show that the LDA-RCC model can obtain good results and is superior to both traditional text modeling methods and short text clustering benchmark methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)

    Google Scholar 

  3. Lim, K.W., Chen, C., Buntine, W.: Twitter-network topic model: a full Bayesian treatment for social network and text modeling. arXiv preprint arXiv:1609.06791 (2016)

  4. Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114 (2016)

    Google Scholar 

  5. Liu, J., Toubia, O.: A semantic approach for estimating consumer content preferences from online search queries. Market. Sci. 37, 930–952 (2018)

    Article  Google Scholar 

  6. Thomaz, F., Salge, C., Karahanna, E., Hulland, J.: Learning from the Dark Web: leveraging conversational agents in the era of hyper-privacy to enhance marketing. J. Acad. Mark. Sci. 48(1), 43–63 (2019). https://doi.org/10.1007/s11747-019-00704-3

    Article  Google Scholar 

  7. Amato, G., Straccia, U.: User profile modeling and applications to digital libraries. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 184–197. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48155-9_13

    Chapter  Google Scholar 

  8. Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. Knowl. Data Eng. 20, 202–215 (2007)

    Article  Google Scholar 

  9. Zhou, M.X., Wang, F., Zimmerman, T., Yang, H., Haber, E., Gou, L.: Computational discovery of personal traits from social multimedia. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6. IEEE (2013)

    Google Scholar 

  10. Sumner, C., Byers, A., Shearing, M.: Determining personality traits and privacy concerns from facebook activity. Black Hat Brief. 11, 197–221 (2011)

    Google Scholar 

  11. Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS One 8, e73791 (2013)

    Article  Google Scholar 

  12. Puranam, D., Narayan, V., Kadiyali, V.: The effect of calorie posting regulation on consumer opinion: a flexible latent Dirichlet allocation model with informative priors. Market. Sci. 36, 726–746 (2017)

    Article  Google Scholar 

  13. Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46, 236–247 (2017)

    Article  Google Scholar 

  14. Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on Twitter: a first look. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 73–80 (2010)

    Google Scholar 

  15. Abel, F., Gao, Q., Houben, G.-J., Tao, K.: Analyzing user modeling on Twitter for personalized news recommendations. In: Konstan, Joseph A., Conejo, R., Marzo, José L., Oliver, N. (eds.) UMAP 2011. LNCS, vol. 6787, pp. 1–12. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22362-4_1

    Chapter  Google Scholar 

  16. Liu, Q., Niu, K., He, Z., He, X.: Microblog user interest modeling based on feature propagation. In: 2013 Sixth International Symposium on Computational Intelligence and Design, pp. 383–386. IEEE (2013)

    Google Scholar 

  17. Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In: Advances in Neural Information Processing Systems, pp. 1982–1989. (2009)

    Google Scholar 

  18. Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential Twitterers. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 261–270 (2010)

    Google Scholar 

  19. Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88 (2010)

    Google Scholar 

  20. Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 2928–2941 (2014)

    Article  Google Scholar 

  21. Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100 (2008)

    Google Scholar 

  22. Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014)

    Google Scholar 

  23. Xu, C., Zhang, H., Lu, B., Wu, S.: Local community detection using social relations and topic features in social networks. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds.) CCL/NLP-NABD -2017. LNCS (LNAI), vol. 10565, pp. 371–383. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69005-6_31

    Chapter  Google Scholar 

  24. Shah, S.A., Koltun, V.: Robust continuous clustering. Proc. Natl. Acad. Sci. 114, 9814–9819 (2017)

    Article  Google Scholar 

  25. Mimno, D., Wallach, H., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 262–272 (2011)

    Google Scholar 

  26. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yezheng Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ding, Z., Yan, C., Liu, C., Ji, J., Liu, Y. (2020). Short Text Processing for Analyzing User Portraits: A Dynamic Combination. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61616-8_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61615-1

  • Online ISBN: 978-3-030-61616-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics