Advertisement

Topic dynamics in Weibo: a comprehensive study

  • Rui Fan
  • Jichang Zhao
  • Ke Xu
Original Article

Abstract

The tremendous development of online social media has changed people’s life fundamentally in recent years. Weibo, a Twitter-like service in China, has attracted more than 500 million users in less than 5 years and produces more than 100 million Chinese tweets everyday. In these massive tweets, different user interests and daily trends are reflected by different topics. To our best knowledge, a systematic investigation of topic dynamics in Weibo is still missing. Aiming at filling this vital gap, we try to comprehensively disclose the topic dynamics from the perspective of time, geography, demographics, emotion, retweeting and correlation. An incremental learning framework is first established to probe more than 200 million streaming tweets and an interaction network constituted by around 90,000 active users. Many interesting patterns are then revealed, which could provide insights for topic-related applications in online social media, such as user profiling, event detection, trend tracking or content recommendation.

Keywords

Topic classification Topic dynamics Topic patterns Topic correlation Weibo 

Notes

Acknowledgments

This work was supported by NSFC (Grant No. 61421003) and the fund of the State Key Lab of Software Development Environment (Grant No. SKLSDE-2015ZX-05). Jichang Zhao was partially supported by the fund of the State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2015ZX-28) and the Fundamental Research Funds for the Central Universities (Grant No. YWF-15-JGXY-011).

References

  1. Ardon S, Bagchi A, Mahanti A, Ruhela A, Seth A, Tripathy RM, Triukose S (2013) Spatio-temporal and events based analysis of topic popularity in Twitter. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management (CIKM), San Francisco, CA, ACM, pp 219–228Google Scholar
  2. Banerjee S, Ramanathan K, Gupta A (2007) Clustering short texts using Wikipedia.In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 787–788Google Scholar
  3. Barabasi AL (2005) The origin of bursts and heavy tails in human dynamics. Nature 435(7039):207–211CrossRefGoogle Scholar
  4. Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: ICWSMGoogle Scholar
  5. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  6. Bogdanov P, Busch M, Moehlis J, Singh AK, Szymanski BK (2013) The social media genome: modeling individual topic-specific behavior in social media. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 236–242Google Scholar
  7. Boyd D, Golder S, Lotan G (2010) Tweet, tweet, retweet: conversational aspects of retweeting on Twitter. In: System sciences (HICSS), 2010 43rd Hawaii international conference. IEEE, pp 1–10Google Scholar
  8. Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining. ACM, p 4Google Scholar
  9. Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the seventh international conference on Information and knowledge management. ACM, pp 148–155Google Scholar
  10. Fan R, Zhao J, Chen Y, Xu K (2014) Anger is more influential than joy: sentiment correlation in Weibo. PLoS One 9:e110, 184Google Scholar
  11. Genc Y, Sakamoto Y, Nickerson JV (2011) Discovering context: classifying tweets through a semantic transform based on Wikipedia. In: Foundations of augmented cognition. Directing the future of adaptive systems. Springer, pp 484–492Google Scholar
  12. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 50–57Google Scholar
  13. Joachims T (1999) Transductive inference for text classification using support vector machines. In: ICML, vol 99, pp 200–209Google Scholar
  14. Kinsella S, Passant A, Breslin JG (2011) Topic classification in social media using metadata from hyperlinked objects. In: Advances in information retrieval. Springer, pp 201–206Google Scholar
  15. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World Wide Web, WWW ’10. ACM, pp 591–600Google Scholar
  16. Michelson M, Macskassy SA (2010) Discovering users’ topics of interest on Twitter: a first look. In: Proceedings of the fourth workshop on analytics for noisy unstructured text data. ACM, pp 73–80Google Scholar
  17. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, pp 29–42Google Scholar
  18. Novakovic J (2010) The impact of feature selection on the accuracy of naïve Bayes classifier. In: 18th telecommunications forum TELFOR, pp 1113–1116Google Scholar
  19. Quercia D, Capra L, Crowcroft J (2012) The social world of Twitter: topics, geography, and emotions. In: ICWSMGoogle Scholar
  20. Ritter A, Etzioni O, Clark S et al (2012) Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1104–1112Google Scholar
  21. Romero DM, Meeder B, Kleinberg J (2011) Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on Twitter. In: Proceedings of the 20th international conference on World Wide Web. ACM, pp 695–704Google Scholar
  22. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123CrossRefGoogle Scholar
  23. Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) Twitterstand: news in tweets. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, pp 42–51Google Scholar
  24. Schönhofen P (2009) Identifying document topics using the Wikipedia category network. Web Intell Agent Syst 7(2):195–207Google Scholar
  25. Song S, Li Q, Bao H (2012) Detecting dynamic association among Twitter topics. In: Proceedings of the 21st international conference companion on World Wide Web. ACM, pp 605–606Google Scholar
  26. Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in Twitter to improve information filtering. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 841–842Google Scholar
  27. Suh B, Hong L, Pirolli P, Chi EH (2010) Want to be retweeted? large scale analytics on factors impacting retweet in Twitter network. In: Social computing (socialcom), 2010 IEEE second international conference. IEEE, pp. 177–184Google Scholar
  28. Yamaguchi Y, Amagasa T, Kitagawa H (2011) Tag-based user topic discovery using Twitter lists. In: Advances in social networks analysis and mining (ASONAM), 2011 international conference. IEEE, pp 13–20Google Scholar
  29. Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining. ACM, pp 177–186Google Scholar
  30. Yang T, Lee D, Yan S (2013) Steeler nation, 12th man, and boo birds: classifying Twitter user interests using time series. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 684–691Google Scholar
  31. Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 42–49Google Scholar
  32. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420Google Scholar
  33. Yang Z, Guo J, Cai K, Tang J, Li J, Zhang L, Su Z (2010) Understanding retweeting behaviors in social networks. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, pp 1633–1636Google Scholar
  34. Yu L, Asur S, Huberman BA (2011) What trends in Chinese social media. In: The 5th SNA-KDD workshop’11 (SNA-KDD’11), 21 August 2011, San Diego, CAGoogle Scholar
  35. Yu L, Asur S, Huberman BA (2015) Trend dynamics and attention in Chinese social media. Am Behav Sci. doi: 10.1177/0002764215580619
  36. Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31zbMATHCrossRefGoogle Scholar
  37. Zhao J, Dong L, Wu J, Xu K (2012) Moodlens: an emoticon-based sentiment analysis system for Chinese tweets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1528–1531Google Scholar
  38. Zhou T, Han XP, Wang BH (2008) Towards the understanding of human dynamics. In: Science matters: humanities as complex systems, pp 207–233Google Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  1. 1.State Key Laboratory of Software Development EnvironmentBeihang UniversityBeijingPeople’s Republic of China
  2. 2.School of Economics and ManagementBeihang UniversityBeijingPeople’s Republic of China

Personalised recommendations