Skip to main content

A probabilistic method for emerging topic tracking in Microblog stream

Abstract

Microblog is a popular and open platform for discovering and sharing the latest news about social issues and daily life. The quickly-updated microblog streams make it urgent to develop an effective tool to monitor such streams. Emerging topic tracking is one of such tools to reveal what new events are attracting the most online attention at present. However, due to the fast changing, high noise and short length of the microblog feeds, two challenges should be addressed in emerging topic tracking. One is the problem of detecting emerging topics early, long before they become hot, and the other is how to effectively monitor evolving topics over time. In this study, we propose a novel emerging topics tracking method, which aligns emerging word detection from temporal perspective with coherent topic mining from spatial perspective. Specifically, we first design a metric to estimate word novelty and fading based on local weighted linear regression (LWLR), which can highlight the word novelty of expressing an emerging topic and suppress the word novelty of expressing an existing topic. We then track emerging topics by leveraging topic novelty and fading probabilities, which are learnt by designing and solving an optimization problem. We evaluate our method on a microblog stream containing over one million feeds. Experimental results show the promising performance of the proposed method in detecting emerging topic and tracking topic evolution over time on both effectiveness and efficiency.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Notes

  1. http://weibo.com

  2. We use Jieba for Chinese word segmentation, which can be downloaded from https://github.com/fxsjy/jieba

References

  1. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: WSDM, pp. 183–194 (2008)

  2. AlSumait, L., Barbar, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM, pp 3–12 (2008)

  3. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113–120 (2006)

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. FTML3(1), 1–122 (2011)

  6. Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. TKDE 27(11), 3001–3015 (2015)

    Google Scholar 

  7. Chen, Y., Amiri, H., Li, Z., Chua, T.S.: Emerging topic detection for organizations from microblogs. In: SIGIR, pp. 43–52 (2013)

  8. Chen, Z., Liu, B.: Mining topics in documents: Standing on the shoulders of big data. In: SIGKDD, pp. 1116–1125 (2014)

  9. Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: Topic model over short texts. TKDE 26(12), 2928–2941 (2014)

    Google Scholar 

  10. Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: ACL, pp. 536–544 (2012)

  11. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci 101, 5228–5235 (2004)

    Article  Google Scholar 

  12. Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)

  13. Huang, J., Peng, M., Wang, H.: Topic detection from large scale of microblog stream with high utility pattern clustering. In: Proceedings of the 8th Workshop on Ph. D. Workshop in CIKM, pp. 3–10 (2015)

  14. Iwata, T., Watanabe, S., Yamada, T., Ueda, N.: Topic tracking model for analyzing consumer purchase behavior. In: IJCAI, pp. 1427–1432 (2009)

  15. Jeffery, S.R., Garofalakis, M., Franklin, M.J.: Adaptive cleaning for RFID data streams. In: VLDB, pp. 163–174 (2006)

  16. Kasiviswanathan, S.P., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM, pp. 745–754 (2011)

  17. Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: Twitter trends detection topic model online. In: COLING, pp. 1519–1534 (2012)

  18. Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM, pp. 155–164 (2012)

  19. Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550 (2014)

  20. Ma, J., Sun, L., Wang, H., Zhang, Y., Aickelin, U.: Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans. Internet Technol. (TOIT) 16(1), 4 (2016)

    Article  Google Scholar 

  21. McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Recommender Systems, pp 165–172 (2013)

  22. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)

  23. Nallapati, R.M., Ditmore, S., Lafferty, J.D., Ung, K.: Multiscale topic tomography. In: SIGKDD, pp. 520–529 (2007)

  24. Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: WISE, pp. 188–201 (2013)

  25. Petrovi, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: NAACL, pp. 181–189 (2010)

  26. Pu, X., Jin, R., Wu, G., Han, D., Xue, G.: Topic modeling in semantic space with keywords. In: CIKM, pp. 1141–1150 (2015)

  27. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: Real-time event detection by social sensors. In: WWW, pp 851–860 (2010)

  28. Schubert, E., Weiler, M., Kriegel, H.P.: Signitrend: Scalable detection of emerging topics in textual streams by hashed significance thresholds. In: SIGKDD, pp. 871–880 (2014)

  29. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on text mining, pp. 525–526 (2000)

  30. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: What 140 characters reveal about political sentiment. In: ICWSM, pp. 178–185 (2010)

  31. Unankard, S., Li, X., Sharaf, M.A.: Emerging event detection in social networks with location sensitivity. JWWW 18(5), 1–25 (2014)

    Google Scholar 

  32. Wang, X., McCallum, A.: Topics over time: A non-Markov continuous-time model of topical trends. In: SIGKDD, pp. 424–433 (2006)

  33. Weng, J., Lee, B.S.: Event detection in Twitter. In: ICWSM, pp 401–408 (2011)

  34. Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. In: ICDM, pp. 837–846 (2013)

  35. Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI Conference on artificial intelligence, pp. 353–359 (2015)

  36. Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing Twitter feeds. In: SIGKDD, pp. 370–378 (2012)

  37. Yao, W., He, J., Wang, H., Zhang, Y., Cao, J.: Collaborative topic ranking: Leveraging item Meta-Data for sparsity reduction. In: AAAI, pp. 374–380 (2015)

  38. Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242 (2014)

  39. Yin, H., Cui, B., Lu, H., Huang, Y., Yao, J.: A unified model for stable and temporal topic detection from social media data. In: ICDE, pp. 661–672 (2013)

  40. Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: SIGKDD, pp. 1425–1434 (2015)

  41. Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI, pp. 831–838 (2011)

  42. Zhu, J., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R., Liu, P.: Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. In: WAIM, pp. 70–82 (2015)

Download references

Acknowledgments

The work was supported by the National Science Foundation of China (NSFC, No. 61472291 and No. 61303115), and Interdisciplinary Foundation of Independent Scientific Research (No. 2042016kf0182).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jiajia Huang or Min Peng.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Peng, M., Wang, H. et al. A probabilistic method for emerging topic tracking in Microblog stream. World Wide Web 20, 325–350 (2017). https://doi.org/10.1007/s11280-016-0390-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0390-4

Keywords

  • Microblog stream
  • Emerging topic
  • LWLR
  • Topic evolution
  • Optimization problem