Skip to main content
Log in

CPB: a classification-based approach for burst time prediction in cascades

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Studying the bursty nature of cascades in social media is practically important in many real applications such as product sales prediction, disaster relief, and stock market prediction. Although both the cascade size prediction and the burst patterns of the cascades have been extensively studied, how to predict when a burst will come remains an open problem. It is challenging for traditional time-series-based models such as regression models to address this task directly. Firstly, times-series-based prediction models focus on predicting the future values based on previously observed ones. It is hard to apply them to predict the time of a bursts with the “quick rise-and-fall” pattern. Secondly, besides the cascade popularity, a lot of other side information like user profile and social relation are available in social media. Although the potential utility of such information can be high, it is also hard for time-series-based models to capture and integrate these rich information with diverse formats seamlessly. This paper proposes a classification-based approach for burst time prediction by exploiting rich knowledge in information diffusion. Particularly, we first propose a time-window-based transformation to predict in which time window the burst will appear. By dividing the time spans of all the cascades into the same number of time windows K, the cascades with diverse time spans can thus be handled uniformly. To exploit the rich and heterogenous information in social media, we next propose a scale-independent feature extraction framework to model the heterogenous knowledge in a scale-independent manner. Systematical evaluations are conducted on the Sina Weibo reposting dataset and MemeTracker dataset. Besides the superior performance of the proposed approach, we also observe that: (1) surprisingly, social/structure knowledge is more indicative of the bursts than the cascade popularity information, especially for the bursts occurring in a farther future. (2) Larger cascades are harder to predict as the spreading process of the cascades with higher popularity is usually more diverse and fluctuant. (3) The proposed approach is robust in the sense that the result is not much sensitive to the popularity of the training cascades.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://arnetminer.org/Influencelocality#b2354.

  2. http://www.memetracker.org/data.html.

References

  1. Hu X, Tang L, Tang JL, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the sixth ACM international conference on web search and data mining, pp 537–546

  2. Oh J, Susarla A, Tan Y (2008) Examining the diffusion of user-generated content in online social networks. Soc Sci Res Netw. doi:10.2139/ssrn.1182631. http://ssrn.com/abstract=1182631

  3. Wang SZ, Hu X, Yu PS, Li ZJ (2014) MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining, pp 1246–1255

  4. Wang SZ, Zhang HH, Zhang JW, Zhang XM, Yu PS, Li ZJ (2015) Inferring diffusion networks with sparse cascades by structure transfer. In: Proceedings of the 20th international conference on database systems for advanced applications, pp 405–421

  5. Parikh N, Sundaresan N (2008) Scalable and near real-time burst detection from e-commerce queries. In: Proceedings of the 14th ACM SIGKDD conference on knowledge discovery and data mining, pp 972–980

  6. Cui P, Jin SF, Yu LY, Wang F, Zhu WW, Yang SQ (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM conference on knowledge discovery and data mining, pp 901–909

  7. Mill TC (1990) Time series techniques for economists. Cambridge University Press, Cambridge

    Google Scholar 

  8. Goel S, Anderson A, Hofman J, Watts D (2013) The structure virality of online diffusion (preprint)

  9. Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: Proceedings of the 11th ACM SIGKDD conference on knowledge discovery and data mining, pp 78–87

  10. Kong SB, Mei QZ, Feng L, Zhao Z, Ye F (2014) On the Real-time prediction problem of bursting hashtags in twitter. CoRR abs/1401.2018

  11. Papadimitriou P, Dasdan A, Garcia-Molina H (2008) Web Graph Similarity for Anomaly Detection. iN: Proceedings of the 17th International World Wide Web Conference, pp 1167–1168

  12. Ma ZY, Sun AX, Cong G (2013) On predicting the popularity of newly emerging hashtags in twitter. J Am Soc Inf Sci Technol 7(64):1399–1410

    Article  Google Scholar 

  13. Lin JH (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 1(37):145–151

    Article  MathSciNet  MATH  Google Scholar 

  14. Zhang J, Liu B, Tang J, Chen T, Li JZ (2013) Social influence locality for modeling retweeting behaviors. In: Proceedings of the 23rd international joint conference on artificial intelligence, pp 2761–2767

  15. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international World Wide Web conference, pp 851–860

  16. Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD conference on knowledge discovery and data mining, pp 91–101

  17. Li L, Liang CJM, Liu J, Nath S, Terzis A, Faloutsos C (2011) Thermocast: a cyber-physical forecasting model for data centers. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining, pp 1370–1378

  18. Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci USA 41(105):15649–15663

    Article  Google Scholar 

  19. Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 177–186

  20. Kleinberg J (2005) Temporal dynamics of on-line information streams. In: Data Stream Managemnt: Processing High-speed Data. Springer

  21. Zhu YY, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the 9th ACM SIGKDD conference on knowledge discovery and data mining, pp 336–345

  22. Pinsen D (2012) Predicting the bursting of a market bubble. http://finance.yahoo.com/news/predicting-bursting-market-bubble-171432469.html

  23. Barabási A (2011) BURSTS: the hidden pattern behind everything we Do, from Your E-mail to Bloody Crusades. Penguin, New York

    Google Scholar 

  24. Barabási A (2005) The origin of bursts and heavy tails in human dynamics. Nature 435:207–211

    Article  Google Scholar 

  25. Vazquez A, Oliveira JG, Dezso Z, Goh K, Kondor I, Barabási A (2006) Modeling bursts and heavy tails in human dynamics. Phys Rev E 73, 036126:1-19

  26. Matsubara Y, Sakurai Y, Prakash BA, Li L, Faloutsos C (2012) Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining, pp 6–14

  27. Hong LJ, Dan O, Davison BD (2011) Predicting popular messages in twitter. In: Proceedings of the 20th international World Wide Web conference, pp 57–58

  28. Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):81–88

  29. Kupavskii A, Umnov A, Gusev G, Serdyukov P (2013) Predicting the audience size of a Tweet. In: Proceedings of the seventh international AAAI conference on weblogs and social media, pp 693–696

  30. Petrovic S, Osborne M, Lavrenko V (2011) RT to Win! Predicting message propagation in twitter. In: Proceedings of the fifth international AAAI conference on weblogs and social media

  31. Myers S, Leskovec J (2014) The bursty dynamics of the twitter information network. In: Proceedings of the 23th international World Wide Web conference, pp 913–924

  32. Goel S, Watts DJ, Goldstein DG (2012) The structure of online diffusion networks. In: Proceedings of conceptual modeling—31st international conference, pp 623–638

  33. Cheng J, Adamic LA, Dow PA, Kleinberg J, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international World Wide Web conference, pp 925–936

  34. Kupavskii A, Ostroumova L, Umnov A, Usachev S, Serdyukov P, Gusev G, Kustarev A (2012) Prediction of retweet cascade size over time. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 2335–2338

  35. Gershenfeld N (1999) The nature of mathematical modeling. Cambridge University Press, Cambridge, pp 205–208

  36. Said SE, Dickey DA (1984) Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71(3):599–607

    Article  MathSciNet  MATH  Google Scholar 

  37. Motulsky H, Christopoulos A (2004) Fitting models to biological data using linear and nonlinear regression: a practical guide to curve fitting. England Oxford University Press, Oxford

    MATH  Google Scholar 

  38. Chakrabarti D, Faloutsos C (2002) Large-scale automated forecasting using fractals. In: Proceedings of the eleventh international conference on information and knowledge management

  39. Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the Web. Technical Report Stanford InfoLab

  40. Kleinberg JM (1999) Hubs, authorities, and communities. ACM Comput Surv 31(4):5

    Article  Google Scholar 

  41. Gomez-Rodriguez M, Leskovec J, Scholkopf B (2013) Modeling information propagation with survival theory. The 30th international conference on machine learning

  42. Wang SZ, Xie SH, Zhang XM, Li ZJ, Yu PS, and Shu XY (2014) Future influence ranking of scientific literature. In: 2014 SIAM international conference on data mining

  43. Cui P, Wang F, Liu SW, Ou MD, Yang SQ (2011) Who should share what? Item-level social influence prediction for users and posts ranking. In: The 34th international ACM SIGIR conference on research and development in information retrieval

  44. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923 (1998)

    Article  Google Scholar 

  45. Wang SZ, Yan Z, Hu X, Yu PS, Li ZJ (2015) Burst time prediction in cascades. In: The twenty-ninth AAAI conference on artificial intelligence

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China (Grant Nos. 61170189, 61370126, 61202239), National High Technology Research and Development Program of China under Grant (No. 2015AA016004), Major Projects of the National Social Science Fund of China under Grant (No. 14&ZH0036), Science and Technology Innovation Ability Promotion Project of Beijing (PXM2015-014203-000059), the Fund of the State Key Laboratory of Software Development Environment (No. SKLSDE-2015ZX-16), Microsoft Research Asia Fund (No. FY14-RES-OPP-105), the Innovation Foundation of BUAA for PhD Graduates (No. YWF-14-YJSY-021), US NSF through Grants III-1526499, CNS-1115234, and OISE-1129076.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhoujun Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Yan, Z., Hu, X. et al. CPB: a classification-based approach for burst time prediction in cascades. Knowl Inf Syst 49, 243–271 (2016). https://doi.org/10.1007/s10115-015-0899-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0899-3

Keywords

Navigation