Abstract
Today’s social networks continuously generate massive streams of data, which provide a valuable starting point for the detection of rumours as soon as they start to propagate. However, rumour detection faces tight latency bounds, which cannot be met by contemporary algorithms, given the sheer volume of high-velocity streaming data emitted by social networks. Hence, in this paper, we argue for best-effort rumour detection that detects most rumours quickly rather than all rumours with a high delay. To this end, we combine techniques for efficient, graph-based matching of rumour patterns with effective load shedding that discards some of the input data while minimising the loss in accuracy. Experiments with large-scale real-world datasets illustrate the robustness of our approach in terms of runtime performance and detection accuracy under diverse streaming conditions.
Similar content being viewed by others
References
Bian, T., Xiao, X., Xu, T., Zhao, P., Huang, W., Rong, Y., Huang, J.: Rumor detection on social media with bi-directional graph convolutional networks. In: AAAI, vol. 34, pp. 549–556 (2020)
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on twitter. In: WWW, pp. 675–684 (2011)
Chen, F., Neill, D.B.: Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In: KDD, pp. 1166–1175 (2014)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)
Das, A., Svendsen, M., Tirthapura, S.: Incremental maintenance of maximal cliques in a dynamic graph. The VLDB Journal 28(3), 351–375 (2019)
Ding, K., Li, J., Dhar, S., Devan, S., Liu, H.: Interspot: interactive spammer detection in social media. In: IJCAI, pp. 6509–6511 (2019)
Fang, Y., Huang, X., Qin, L., Zhang, Y., Zhang, W., Cheng, R., Lin, X.: A survey of community search over big graphs. The VLDB Journal 29(1), 353–392 (2020)
Farajtabar, M., Yang, J., Ye, X., Xu, H., Trivedi, R., Khalil, E., Li, S., Song, L., Zha, H.: Fake news mitigation via point process based intervention. In: ICML, pp. 1097–1106 (2017)
Friggeri, A., Adamic, L.A., Eckles, D., Cheng, J.: Rumor cascades. In: ICWSM (2014)
Hao, T., Huang, L.: A social interaction activity based time-varying user vectorization method for online social networks. In: IJCAI, pp. 3790–3796 (2018)
He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: ICDT, pp. 213–224 (2014)
Hu, S., Sturtevant, N.R.: Direction-optimizing breadth-first search with external memory storage. In: IJCAI, pp. 1258–1264 (2019)
Huang, H., Zhang, Q., Huang, X., Huang, H., Zhang, Q., Huang, X.: Mention recommendation for twitter with end-to-end memory network. In: IJCAI, pp. 1872–1878 (2017)
Ihler, A., Hutchins, J., Smyth, P.: Adaptive event detection with time-varying poisson processes. In: KDD, pp. 207–216 (2006)
Knoblauch, J., Jewson, J.E., Damoulas, T.: Doubly robust bayesian inference for non-stationary streaming data with beta-divergences. In: NIPS, pp. 64–75 (2018)
Kulldorff, M.: A spatial scan statistic. Communications in Statistics-Theory and methods 26(6), 1481–1496 (1997)
Kwon, S., Cha, M., Jung, K.: Rumor detection over varying time windows. PloS one 12(1) (2017)
Lee, J., Han, W.S., Na, H.J., Park, C.G., Kim, K.H., Kim, D.H., Lee, J.Y., Cha, S.K., Moon, S.: Parallel replication across formats for scaling out mixed oltp/olap workloads in main-memory databases. The VLDB Journal 27(3), 421–444 (2018)
Li, R.H., Qin, L., Yu, J.X., Mao, R.: Finding influential communities in massive networks. The VLDB Journal 26(6), 751–776 (2017)
Liu, G., Zheng, K., Wang, Y., Orgun, M.A., Liu, A., Zhao, L., Zhou, X.: Multi-constrained graph pattern matching in large-scale contextual social graphs. In: ICDE, pp. 351–362 (2015)
Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: IJCAI, pp. 3818–3824 (2016)
Ma, J., Gao, W., Wong, K.F.: Detect rumors in microblog posts using propagation structure via kernel learning. In: ACL, vol. 1, pp. 708–717 (2017)
Muandet, K., Schölkopf, B.: One-class support measure machines for group anomaly detection. arXiv preprint arXiv:1303.0309 (2013)
Nguyen, T.T., Nguyen, T.T., Nguyen, T.T., Vo, B., Jo, J., Nguyen, Q.V.H.: Judo: Just-in-time rumour detection in streaming social platforms. Information Sciences 570, 70–93 (2021)
Oluwasuji, O.I., Malik, O., Zhang, J., Ramchurn, S.D., et al.: Algorithms for fair load shedding in developing countries. In: IJCAI, pp. 1590–1596 (2018)
Peierls, R.: Statistical error in counting experiments. Royal Society 149(868), 467–486 (1935)
Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing: extended survey. The VLDB Journal pp. 1–24 (2019)
Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. TKDE 29(1), 17–37 (2017)
Shu, K., Liu, H.: Detecting fake news on social media. Synthesis Lectures on Data Mining and Knowledge Discovery 11(3), 1–129 (2019)
Shu, K., Mahudeswaran, D., Liu, H.: Fakenewstracker: a tool for fake news collection, detection, and visualization. Computational and Mathematical Organization Theory 25(1), 60–71 (2019)
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter 19(1), 22–36 (2017)
Slo, A., Bhowmik, S., Rothermel, K.: espice: Probabilistic load shedding from input event streams in complex event processing. In: Middleware, pp. 215–227 (2019)
Srijith, P., Hepple, M., Bontcheva, K., Preotiuc-Pietro, D.: Sub-story detection in twitter with hierarchical dirichlet processes. IPM 53(4), 989–1003 (2017)
Tam, N.T., Weidlich, M., Zheng, B., Yin, H., Hung, N.Q.V., Stantic, B.: From anomaly detection to rumour detection using data streams of social platforms. PVLDB 12(9), 1016–1029 (2019)
To, Q.C., Soto, J., Markl, V.: A survey of state management in big data processing systems. The VLDB Journal 27(6), 847–872 (2018)
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)
Wang, B., Chen, G., Fu, L., Song, L., Wang, X., Liu, X.: Drimux: Dynamic rumor influence minimization with user experience in social networks. In: AAAI, pp. 791–797 (2016)
Wang, S., Moise, I., Helbing, D., Terano, T.: Early signals of trending rumor event in streaming social media. In: COMPSAC, vol. 2, pp. 654–659 (2017)
Wang, S., Terano, T.: Detecting rumor patterns in streaming social media. In: Big Data, pp. 2709–2715 (2015)
Xing, C., Wang, Y., Liu, J., Huang, Y., Ma, W.Y.: Hashtag-based sub-event discovery using mutually generative lda in twitter. In: AAAI, pp. 2666–2672 (2016)
Yang, F., Liu, Y., Yu, X., Yang, M.: Automatic detection of rumor on sina weibo. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, p. 13 (2012)
Ying, R., Wang, A., You, J., Leskovec, J.: Frequent subgraph mining by walking in order embedding space. In: ICML (2020)
Yu, R., He, X., Liu, Y.: Glad: group anomaly detection in social media analysis. TKDD 10(2), 18 (2015)
Yu, R., Qiu, H., Wen, Z., Lin, C., Liu, Y.: A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter 18(1), 1–14 (2016)
Yu, S., Wang, X., Príncipe, J.C.: Request-and-reverify: Hierarchical hypothesis testing for concept drift detection with expensive labels. In: IJCAI, p. 3033–3039 (2018)
Zellag, K., Kemme, B.: Consistency anomalies in multi-tier architectures: automatic detection and prevention. The VLDB Journal 23(1), 147–172 (2014)
Zhao, B., Hung, N.Q.V., Weidlich, M.: Load shedding for complex event processing: Input-based and state-based techniques. In: ICDE, pp. 1093–1104 (2020)
Zhao, Z., Resnick, P., Mei, Q.: Enquiring minds: Early detection of rumors in social media from enquiry posts. In: WWW, pp. 1395–1405 (2015)
Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., Procter, R.: Detection and resolution of rumours in social media: A survey. CSUR 51(2), 32 (2018)
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2019.323.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nguyen, T.T., Huynh, T.T., Yin, H. et al. Detecting rumours with latency guarantees using massive streaming data. The VLDB Journal 32, 369–387 (2023). https://doi.org/10.1007/s00778-022-00750-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-022-00750-4