Advertisement

Cease with Bass: A Framework for Real-Time Topic Detection and Popularity Prediction Based on Long-Text Contents

  • Quanquan Chu
  • Zhenhao Cao
  • Xiaofeng GaoEmail author
  • Peng He
  • Qianni Deng
  • Guihai Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11280)

Abstract

Nowadays, social network has become a powerful information source. At the advent of new services like WeChat Official Account, long-text contents have been embedded into social network. Compared with tweet-style contents, long-text contents are better-organized and less prone to noise. However, existing methods for real-time topic detection leveraging long-textual data do not produce satisfactory performance on sensitivity and scalability, and long-text based trend prediction methods are also facing absence of stronger rationales. In this paper, we propose a framework specifically adapted for long-text based topic analysis, covering both topic detection and popularity prediction. For topic detection, we design a novel real-time topic model dubbed as a Cost-Effective And Scalable Embedding model (CEASE) based on improved GloVe Models and keyword frequency clustering algorithm. We then propose strategies for topic tracking and renewal by taking topic abortion, mergence and neologies into account. For popularity prediction, we propose Feature-Combined Bass model with Association Analysis (FCA-Bass) with a strong rationale transplanted from economic fields. Our methods are validated by experiments on real-world dataset from WeChat and are proved to outperform several currently existing mainstream methods.

Keywords

Topic detection Popularity prediction Social network 

Notes

Acknowledgements

This work is supported by the National Key R&D Program of China (2018YFB1004703), the National Natural Science Foundation of China (61872238, 61672348, 61672353), the Shanghai Science and Technology Fund (17510740200), the CCF-Tencent Open Research Fund (RAGR20170114), and Huawei Innovation Research Program (HO2018085286), and the National Key Research of China (2018YFB1003800). Quanquan Chu finished the experiments in this paper when he was an intern at Tencent Shenzhen. The authors also would like to thank Chunxia Jia, Yiming Zhang, Chao Wang, and Tianxiang Gao for their contributions on this paper.

References

  1. 1.
    Bass, F.M.: A new product growth for model consumer durables. MS 15(5), 215–227 (1969)CrossRefGoogle Scholar
  2. 2.
    Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. ICWS 11, 438–441 (2011)Google Scholar
  3. 3.
    Brants, T., Chen, F.: A system for new event detection. In: SIGIR, pp. 330–337 (2003)Google Scholar
  4. 4.
    Elshamy, W.: Continuous-time infinite dynamic topic models. arXiv:1302.7088 (2013)
  5. 5.
    Figueiredo, F., Almeida, J.M., Gonçalves, M.A., Benevenuto, F.: TrendLearner: early prediction of popularity trends of user generated content. IS 349, 172–187 (2016)CrossRefGoogle Scholar
  6. 6.
    Gao, S., Ma, J., Chen, Z.: Effective and effortless features for popularity prediction in microblogging network. In: WWW, pp. 269–270 (2014)Google Scholar
  7. 7.
    Kasiviswanathan, S., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM, pp. 745–754 (2011)Google Scholar
  8. 8.
    Kong, S., Mei, Q., Feng, L., Ye, F., Zhao, Z.: Predicting bursts and popularity of hashtags in real-time. In: SIGIR, pp. 927–930 (2014)Google Scholar
  9. 9.
    Kong, S., Ye, F., Feng, L., Zhao, Z.: Towards the prediction problems of bursting hashtags on twitter. JASIST 66(12), 2566–2579 (2015)Google Scholar
  10. 10.
    Kupavskii, A., et al.: Prediction of retweet cascade size over time. In: CIKM, pp. 2335–2338 (2012)Google Scholar
  11. 11.
    Ma, X., Gao, X., Chen, G.: Beep: a Bayesian perspective early stage event prediction model for online social networks. In: ICDM, pp. 973–978 (2017)Google Scholar
  12. 12.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP, pp. 1–8 (2004)Google Scholar
  13. 13.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
  14. 14.
    Naaman, M., Becker, H., Gravano, L.: Hip and Trendy: characterizing emerging trends on twitter. JASIST 62(5), 902–918 (2011)CrossRefGoogle Scholar
  15. 15.
    Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)Google Scholar
  16. 16.
    Proskurnia, J., Mavlyutov, R., Castillo, C., Aberer, K., Mauroux, P.: Efficient document filtering using vector space topic expansion and pattern-mining: the case of event detection in microposts. In: CIKM, pp. 457–466 (2017)Google Scholar
  17. 17.
    Rosenfeld, N., Nitzan, M., Globerson, A.: Discriminative learning of infection models. In: WSDM, pp. 563–572 (2016)Google Scholar
  18. 18.
    Tang, X., Yang, C.: Tut: a statistical model for detecting trends, topics and user interests in social media. In: CIKM, pp. 972–981 (2012)Google Scholar
  19. 19.
    Wang, C., Paisley, J., Blei, D.: Online variational inference for the hierarchical Dirichlet process. In: AISTATS, pp. 752–760 (2011)Google Scholar
  20. 20.
    Yan, Y., Tan, Z., Gao, X., Tang, S., Chen, G.: STH-Bass: a spatial-temporal heterogeneous bass model to predict single-tweet popularity. In: DASFAA, pp. 18–32 (2016)Google Scholar
  21. 21.
    Zhao, Q., Erdogdu, M.A., He, H.Y., Rajaraman, A., Leskovec, J.: SEISMIC: a self-exciting point process model for predicting tweet popularity. In: KDD, pp. 1513–1522 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Quanquan Chu
    • 1
  • Zhenhao Cao
    • 1
  • Xiaofeng Gao
    • 1
    Email author
  • Peng He
    • 2
  • Qianni Deng
    • 1
  • Guihai Chen
    • 1
  1. 1.Shanghai Key laboratory of Scalable Computing and Systems, Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.TencentShenzhenChina

Personalised recommendations