Detecting rumours with latency guarantees using massive streaming data

Nguyen, Thanh Tam; Huynh, Thanh Trung; Yin, Hongzhi; Weidlich, Matthias; Nguyen, Thanh Thi; Mai, Thai Son; Nguyen, Quoc Viet Hung

doi:10.1007/s00778-022-00750-4

Detecting rumours with latency guarantees using massive streaming data

Regular Paper
Published: 08 June 2022

Volume 32, pages 369–387, (2023)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Thanh Tam Nguyen ORCID: orcid.org/0000-0002-2586-7757¹,
Thanh Trung Huynh²,
Hongzhi Yin³,
Matthias Weidlich⁴,
Thanh Thi Nguyen⁵,
Thai Son Mai⁶ &
…
Quoc Viet Hung Nguyen²

592 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Today’s social networks continuously generate massive streams of data, which provide a valuable starting point for the detection of rumours as soon as they start to propagate. However, rumour detection faces tight latency bounds, which cannot be met by contemporary algorithms, given the sheer volume of high-velocity streaming data emitted by social networks. Hence, in this paper, we argue for best-effort rumour detection that detects most rumours quickly rather than all rumours with a high delay. To this end, we combine techniques for efficient, graph-based matching of rumour patterns with effective load shedding that discards some of the input data while minimising the loss in accuracy. Experiments with large-scale real-world datasets illustrate the robustness of our approach in terms of runtime performance and detection accuracy under diverse streaming conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online social networks security and privacy: comprehensive review and analysis

Article Open access 01 June 2021

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Article Open access 13 April 2024

Notes

https://cse.hkust.edu.hk/graphgen/.

References

Bian, T., Xiao, X., Xu, T., Zhao, P., Huang, W., Rong, Y., Huang, J.: Rumor detection on social media with bi-directional graph convolutional networks. In: AAAI, vol. 34, pp. 549–556 (2020)
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on twitter. In: WWW, pp. 675–684 (2011)
Chen, F., Neill, D.B.: Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In: KDD, pp. 1166–1175 (2014)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)
Article MathSciNet MATH Google Scholar
Das, A., Svendsen, M., Tirthapura, S.: Incremental maintenance of maximal cliques in a dynamic graph. The VLDB Journal 28(3), 351–375 (2019)
Article Google Scholar
Ding, K., Li, J., Dhar, S., Devan, S., Liu, H.: Interspot: interactive spammer detection in social media. In: IJCAI, pp. 6509–6511 (2019)
Fang, Y., Huang, X., Qin, L., Zhang, Y., Zhang, W., Cheng, R., Lin, X.: A survey of community search over big graphs. The VLDB Journal 29(1), 353–392 (2020)
Article Google Scholar
Farajtabar, M., Yang, J., Ye, X., Xu, H., Trivedi, R., Khalil, E., Li, S., Song, L., Zha, H.: Fake news mitigation via point process based intervention. In: ICML, pp. 1097–1106 (2017)
Friggeri, A., Adamic, L.A., Eckles, D., Cheng, J.: Rumor cascades. In: ICWSM (2014)
Hao, T., Huang, L.: A social interaction activity based time-varying user vectorization method for online social networks. In: IJCAI, pp. 3790–3796 (2018)
He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: ICDT, pp. 213–224 (2014)
Hu, S., Sturtevant, N.R.: Direction-optimizing breadth-first search with external memory storage. In: IJCAI, pp. 1258–1264 (2019)
Huang, H., Zhang, Q., Huang, X., Huang, H., Zhang, Q., Huang, X.: Mention recommendation for twitter with end-to-end memory network. In: IJCAI, pp. 1872–1878 (2017)
Ihler, A., Hutchins, J., Smyth, P.: Adaptive event detection with time-varying poisson processes. In: KDD, pp. 207–216 (2006)
Knoblauch, J., Jewson, J.E., Damoulas, T.: Doubly robust bayesian inference for non-stationary streaming data with beta-divergences. In: NIPS, pp. 64–75 (2018)
Kulldorff, M.: A spatial scan statistic. Communications in Statistics-Theory and methods 26(6), 1481–1496 (1997)
Article MathSciNet MATH Google Scholar
Kwon, S., Cha, M., Jung, K.: Rumor detection over varying time windows. PloS one 12(1) (2017)
Lee, J., Han, W.S., Na, H.J., Park, C.G., Kim, K.H., Kim, D.H., Lee, J.Y., Cha, S.K., Moon, S.: Parallel replication across formats for scaling out mixed oltp/olap workloads in main-memory databases. The VLDB Journal 27(3), 421–444 (2018)
Article Google Scholar
Li, R.H., Qin, L., Yu, J.X., Mao, R.: Finding influential communities in massive networks. The VLDB Journal 26(6), 751–776 (2017)
Article Google Scholar
Liu, G., Zheng, K., Wang, Y., Orgun, M.A., Liu, A., Zhao, L., Zhou, X.: Multi-constrained graph pattern matching in large-scale contextual social graphs. In: ICDE, pp. 351–362 (2015)
Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: IJCAI, pp. 3818–3824 (2016)
Ma, J., Gao, W., Wong, K.F.: Detect rumors in microblog posts using propagation structure via kernel learning. In: ACL, vol. 1, pp. 708–717 (2017)
Muandet, K., Schölkopf, B.: One-class support measure machines for group anomaly detection. arXiv preprint arXiv:1303.0309 (2013)
Nguyen, T.T., Nguyen, T.T., Nguyen, T.T., Vo, B., Jo, J., Nguyen, Q.V.H.: Judo: Just-in-time rumour detection in streaming social platforms. Information Sciences 570, 70–93 (2021)
Article MathSciNet Google Scholar
Oluwasuji, O.I., Malik, O., Zhang, J., Ramchurn, S.D., et al.: Algorithms for fair load shedding in developing countries. In: IJCAI, pp. 1590–1596 (2018)
Peierls, R.: Statistical error in counting experiments. Royal Society 149(868), 467–486 (1935)
MATH Google Scholar
Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing: extended survey. The VLDB Journal pp. 1–24 (2019)
Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. TKDE 29(1), 17–37 (2017)
Google Scholar
Shu, K., Liu, H.: Detecting fake news on social media. Synthesis Lectures on Data Mining and Knowledge Discovery 11(3), 1–129 (2019)
Article Google Scholar
Shu, K., Mahudeswaran, D., Liu, H.: Fakenewstracker: a tool for fake news collection, detection, and visualization. Computational and Mathematical Organization Theory 25(1), 60–71 (2019)
Article Google Scholar
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter 19(1), 22–36 (2017)
Slo, A., Bhowmik, S., Rothermel, K.: espice: Probabilistic load shedding from input event streams in complex event processing. In: Middleware, pp. 215–227 (2019)
Srijith, P., Hepple, M., Bontcheva, K., Preotiuc-Pietro, D.: Sub-story detection in twitter with hierarchical dirichlet processes. IPM 53(4), 989–1003 (2017)
Google Scholar
Tam, N.T., Weidlich, M., Zheng, B., Yin, H., Hung, N.Q.V., Stantic, B.: From anomaly detection to rumour detection using data streams of social platforms. PVLDB 12(9), 1016–1029 (2019)
Google Scholar
To, Q.C., Soto, J., Markl, V.: A survey of state management in big data processing systems. The VLDB Journal 27(6), 847–872 (2018)
Article Google Scholar
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)
Article Google Scholar
Wang, B., Chen, G., Fu, L., Song, L., Wang, X., Liu, X.: Drimux: Dynamic rumor influence minimization with user experience in social networks. In: AAAI, pp. 791–797 (2016)
Wang, S., Moise, I., Helbing, D., Terano, T.: Early signals of trending rumor event in streaming social media. In: COMPSAC, vol. 2, pp. 654–659 (2017)
Wang, S., Terano, T.: Detecting rumor patterns in streaming social media. In: Big Data, pp. 2709–2715 (2015)
Xing, C., Wang, Y., Liu, J., Huang, Y., Ma, W.Y.: Hashtag-based sub-event discovery using mutually generative lda in twitter. In: AAAI, pp. 2666–2672 (2016)
Yang, F., Liu, Y., Yu, X., Yang, M.: Automatic detection of rumor on sina weibo. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, p. 13 (2012)
Ying, R., Wang, A., You, J., Leskovec, J.: Frequent subgraph mining by walking in order embedding space. In: ICML (2020)
Yu, R., He, X., Liu, Y.: Glad: group anomaly detection in social media analysis. TKDD 10(2), 18 (2015)
Article Google Scholar
Yu, R., Qiu, H., Wen, Z., Lin, C., Liu, Y.: A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter 18(1), 1–14 (2016)
Article Google Scholar
Yu, S., Wang, X., Príncipe, J.C.: Request-and-reverify: Hierarchical hypothesis testing for concept drift detection with expensive labels. In: IJCAI, p. 3033–3039 (2018)
Zellag, K., Kemme, B.: Consistency anomalies in multi-tier architectures: automatic detection and prevention. The VLDB Journal 23(1), 147–172 (2014)
Article Google Scholar
Zhao, B., Hung, N.Q.V., Weidlich, M.: Load shedding for complex event processing: Input-based and state-based techniques. In: ICDE, pp. 1093–1104 (2020)
Zhao, Z., Resnick, P., Mei, Q.: Enquiring minds: Early detection of rumors in social media from enquiry posts. In: WWW, pp. 1395–1405 (2015)
Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., Procter, R.: Detection and resolution of rumours in social media: A survey. CSUR 51(2), 32 (2018)
Google Scholar

Download references

Acknowledgements

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2019.323.

Author information

Authors and Affiliations

Faculty of Information Technology, HUTECH University, Ho Chi Minh City, Vietnam
Thanh Tam Nguyen
Griffith University, Brisbane, Australia
Thanh Trung Huynh & Quoc Viet Hung Nguyen
The University of Queensland, St Lucia, Australia
Hongzhi Yin
Humboldt-Universität zu Berlin, Berlin, Germany
Matthias Weidlich
Deakin University, Melbourne, Australia
Thanh Thi Nguyen
Queen’s University Belfast, Belfast, UK
Thai Son Mai

Authors

Thanh Tam Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Trung Huynh
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Yin
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Weidlich
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Thi Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thai Son Mai
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Viet Hung Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh Tam Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, T.T., Huynh, T.T., Yin, H. et al. Detecting rumours with latency guarantees using massive streaming data. The VLDB Journal 32, 369–387 (2023). https://doi.org/10.1007/s00778-022-00750-4

Download citation

Received: 15 April 2021
Revised: 01 May 2022
Accepted: 07 May 2022
Published: 08 June 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00778-022-00750-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting rumours with latency guarantees using massive streaming data

Abstract

Access this article

Similar content being viewed by others

Online social networks security and privacy: comprehensive review and analysis

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting rumours with latency guarantees using massive streaming data

Abstract

Access this article

Similar content being viewed by others

Online social networks security and privacy: comprehensive review and analysis

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation