Abstract
This paper proposes a data-driven method that forecasts groups of topic-related, overlapping, online conversation trees. Our method is generative: given a group of original posts, it generates the resulting conversation threads with timing and authorship information. We demonstrate using two large datasets from Reddit that the microscopic properties of such groups of conversations can be accurately predicted when starting from the original posts, without knowledge of the intermediate reactions to such posts. We show that our solution significantly outperforms competitive baselines in terms of predicting the conversation structure and user engagement over time. Potential benefits of this solution include the evaluation of intervention strategies to limit disinformation.
Similar content being viewed by others
References
Abdelzaher T, Han J, Hao Y, Jing A, Liu D, Liu S, Nguyen HH, Nicol DM, Shao H, Wang T et al (2020) Multiscale online media simulation with socialcube. Comput Math Organ Theory 26:145–174 (2020). https://doi.org/10.1007/s10588-019-09303-7
Aliapoulios M, Papasavva A, Ballard C, De Cristofaro E, Stringhini G, Zannettou S, Blackburn J (2021) The gospel according to q: understanding the qanon conspiracy from the perspective of canonical information. https://arXiv.org/210108750
Aragón P, Gómez V, García D, Kaltenbrunner A (2017a) Generative models of online discussion threads: state of the art and research challenges. J Internet Serv Appl 8(1):15
Aragón P, Gómez V, Kaltenbrunner A (2017b) To thread or not to thread: the impact of conversation threading on online discussion. In: Proceedings of the International AAAI Conference on Web and Social Media, vol 11, no 1
Bollenbacher J, Pacheco D, Hui PM, Ahn YY, Flammini A, Menczer F (2021) On the challenges of predicting microscopic dynamics of online conversations. Appl Netw Sci 6(1):1–21
Bourigault S, Lamprier S, Gallinari P (2016) Representation learning for information diffusion through social networks: an embedded cascade model. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining, ACM, pp 573–582
Chen L, Deng H (2020) Predicting user retweeting behavior in social networks with a novel ensemble learning approach. IEEE Access 8:148250–148263
Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on World wide web, ACM, pp 925–936
Cheng J, Adamic LA, Kleinberg JM, Leskovec J (2016) Do cascades recur? In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 671–681
Cheng J, Kleinberg J, Leskovec J, Liben-Nowell D, State B, Subbian K, Adamic L (2018) Do diffusion protocols govern cascade growth? In: Proceedings of the International AAAI Conference on Web and Social Media, vol 12, no 1
Chollet F et al (2015) Keras. https://keras.io
DARPA DARPA (2021) Computational simulation of online social behavior (socialsim). https://www.darpa.mil/program/computational-simulation-of-online-social-behavior
De Jong K (1990) Genetic-algorithm-based learning. In: Machine learning, pp 611–638. Morgan Kaufmann
DiResta R, Shaffer K, Ruppel B, Sullivan D, Matney R, Fox R, Albright J, Johnson B (2018) The tactics & tropes of the internet research agency. New Knowledge
Dutta S, Masud S, Chakrabarti S, Chakraborty T (2020) Deep exogenous and endogenous influence combination for social chatter intensity prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1999–2008
Fang H, Cheng H, Ostendorf M (2016) Learning latent local conversation modes for predicting comment endorsement in online discussions. In: Proceedings of The 4th International Workshop on Natural Language Processing for Social Media, Association for Computational Linguistics, Austin, TX, USA, pp 55–64. https://doi.org/10.18653/v1/W16-6209
Gao X, Cao Z, Li S, Yao B, Chen G, Tang S (2019) Taxonomy and evaluation for microblog popularity prediction. ACM Trans Knowl Discov Data (TKDD) 13(2):1–40
Garibay I, Oghaz TA, Yousefi N, Mutlu EC, Schiappa M, Scheinert S, Anagnostopoulos GC, Bouwens C, Fiore SM, Mantzaris A et al (2020) Deep agent: studying the dynamics of information spread and evolution in social networks. https://arXiv.org/200311611
Glenski M, Saldanha E, Volkova S (2019) Characterizing speed and scale of cryptocurrency discussion spread on reddit. In: The World Wide Web Conference, pp 560–570
Goel S, Anderson A, Hofman J, Watts DJ (2015) The structural virality of online diffusion. Manag Sci 62(1):180–196
Gomez-Rodriguez M, Song L, Daneshmand H, Schölkopf B (2016) Estimating diffusion networks: recovery conditions, sample complexity & soft-thresholding algorithm. J Mach Learn Res 17(1):3092–3120
Gómez V, Kappen HJ, Litvak N, Kaltenbrunner A (2013) A likelihood-based framework for the analysis of discussion threads. World Wide Web 16(5–6):645–675
He X, Song G, Chen W, Jiang Q (2012) Influence blocking maximization in social networks under the competitive linear threshold model. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SIAM, pp 463–474
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780
Horawalavithana S (2021) Mcas. https://github.com/SamTube405/MCAS
Hutto CJ, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: 8th international AAAI conference on weblogs and social media
Islam MR, Muthiah S, Adhikari B, Prakash BA, Ramakrishnan N (2018) Deepdiffuse: predicting the’who’and’when’in cascades. In: 2018 IEEE International Conference on Data Mining (ICDM), IEEE, pp 1055–1060
Jahanbakhsh F, Zhang AX, Berinsky AJ, Pennycook G, Rand DG, Karger DR (2021) Exploring lightweight interventions at posting time to reduce the sharing of misinformation on social media. In: Proceedings of the ACM on Human-Computer Interaction 5, no. CSCW: 1–42
Krishnan S, Butler P, Tandon R, Leskovec J, Ramakrishnan N (2016) Seeing the forest for the trees: new approaches to forecasting cascades. In: Proceedings of the 8th ACM conference on web science, pp 249–258
Krohn R, Weninger T (2019) Modelling online comment threads from their start. In: IEEE international conference on big data (Big Data), pp 820–829
Kumar R, Mahdian M, McGlohon M (2010) Dynamics of conversations. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 553–562
Li C, Ma J, Guo X, Mei Q (2017) Deepcas: an end-to-end predictor of information cascades. In: Proceedings of the 26th international conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 577–586
Liben-Nowell D, Kleinberg J (2008) Tracing information flow on a global scale using internet chain-letter data. Proc Natl Acad Sci 105(12):4633–4638
Ling C, Tong G, Chen M (2020) Nestpp: modeling thread dynamics in online discussion forums. In: Proceedings of the 31st ACM conference on hypertext and social media, pp 251–260
Lu W, Chen W, Lakshmanan LV (2015) From competition to complementarity: comparative influence diffusion and maximization. Proc VLDB Endowment 9(2):60–71
Lu Y, Yu L, Zhang T, Zang C, Cui P, Song C, Zhu W (2018) Collective human behavior in cascading system: discovery, modeling and applications. In: IEEE international conference on data mining (ICDM), IEEE, pp 297–306
Lumbreras A (2016) Automatic role detection in online forums. PhD thesis Université de Lyon
Manco G, Pirrò G, Ritacco E (2018) Predicting temporal activation patterns via recurrent neural networks. In: International symposium on methodologies for intelligent systems, Springer, pp 347–356
Medvedev AN, Delvenne JC, Lambiotte R (2018) Modelling structure and predicting dynamics of discussion threads in online boards. J Complex Netw 7(1):67–82
Medvedev AN, Lambiotte R, Delvenne JC (2019) The anatomy of reddit: an overview of academic research. In: Ghanbarnejad F, Saha Roy R, Karimi F, Delvenne JC, Mitra B (eds) Dynamics on and of complex networks III. Springer International Publishing, Cham, pp 183–204
Myers SA, Leskovec J (2012) Clash of the contagions: cooperation and competition in information diffusion. In: Data mining (ICDM), IEEE 12th International Conference on, IEEE, pp 539–548
Myers SA, Zhu C, Leskovec J (2012) Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 33–41
Qiu J, Tang J, Ma H, Dong Y, Wang K, Tang J (2018) Deepinf: social influence prediction with deep learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, ACM, pp 2110–2119
Singer P, Flöck F, Meinhart C, Zeitfogel E, Strohmaier M (2014) Evolution of reddit: from the front page of the internet to a self-referential community? In: Proceedings of the 23rd international conference on world wide web, ACM, pp 517–522
Starbird K, Arif A, Wilson T (2019) Disinformation as collaborative work: surfacing the participatory nature of strategic information operations. In: Proceedings of the ACM on Human-Computer Interaction, vol 3 (CSCW), pp 1–26. https://doi.org/10.1145/3359229
Tan C (2018) Tracing community genealogy: how new communities emerge from the old. In: 12th international AAAI conference on web and social media
Valera I, Gomez-Rodriguez M (2015) Modeling adoption and usage of competing products. In: Proceedings of the IEEE international conference on data mining (ICDM), IEEE Computer Society, Washington, DC, USA, ICDM ’15, pp 409–418. https://doi.org/10.1109/ICDM.2015.40
Wang C, Ye M, Huberman BA (2012) From user comments to on-line conversations. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 244–252
Wang J, Zheng VW, Liu Z, Chang KCC (2017, November) Topological recurrent neural network for diffusion prediction. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE, pp 475–484
Weng L, Flammini A, Vespignani A, Menczer F (2012) Competition among memes in a world with limited attention. Sci Rep 2:335
Xiao Y, Zhang L, Li Q, Liu L (2019) Mm-sis: model for multiple information spreading in multiplex network. Phys A: Statist Mech Appl 513:135–146
Yu L, Cui P, Wang F, Song C, Yang S (2015, November) From micro to macro: uncovering and predicting information cascading process with behavioral dynamics. In: 2015 IEEE International Conference on Data Mining. IEEE, pp 559–568
Zarezade A, Khodadadi A, Farajtabar M, Rabiee HR, Zha H (2017) Correlated cascades: compete or cooperate. In: Proceedings of the 31st AAAI conference on artificial intelligence, San Francisco, California, USA, pp 238–244. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14360
Zarocostas J (2020) How to fight an infodemic. Lancet 395(10225):676
Zayats V, Ostendorf M (2018) Conversation modeling on reddit using a graph-structured lstm. Trans Assoc Comput Linguist 6:121–132
Zhao Q, Erdogdu MA, He HY, Rajaraman A, Leskovec J (2015) Seismic: a self-exciting point process model for predicting tweet popularity. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1513–1522
Acknowledgements
This work is supported by the DARPA SocialSim Program and the Air Force Research Laboratory under contract FA8650-18-C-7825. The authors would like to thank Leidos for providing data.
Funding
This work is supported by the DARPA SocialSim Program and the Air Force Research Laboratory under contract FA8650-18-C-7825.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Horawalavithana, S., Choudhury, N., Skvoretz, J. et al. Online discussion threads as conversation pools: predicting the growth of discussion threads on reddit. Comput Math Organ Theory 28, 112–140 (2022). https://doi.org/10.1007/s10588-021-09340-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10588-021-09340-1