The Anatomy of Reddit: An Overview of Academic Research

Conference paper
Part of the Springer Proceedings in Complexity book series (SPCOM)


Online forums provide rich environments where users may post questions and comments about different topics. Understanding how people behave in online forums may shed light on the fundamental mechanisms by which collective thinking emerges in a group of individuals, but it has also important practical applications, for instance, to improve user experience, increase engagement or automatically identify bullying. Importantly, the datasets generated by the activity of the users are often openly available for researchers, in contrast to other sources of data in computational social science. In this survey, we map the main research directions that arose in recent years and focus primarily on the most popular platform, Reddit. We distinguish and categorize research depending on their focus on the posts or on the users and point to different types of methodologies to extract information from the structure and dynamics of the system. We emphasize the diversity and richness of the research in terms of questions and methods and suggest future avenues of research.


Online communities Stochastic models Discussion trees 



This work was supported by Concerted Research Action (ARC) supported by the Federation Wallonia-Brussels Contract ARC 14/19-060; Flagship European Research Area Network (FLAG-ERA) Joint Transnational Call “FuturICT 2.0”; and by grant 16-01-00499 of the Russian Foundation for Basic Research.


  1. 1.
    Aragón, P., Gómez, V., Kaltenbrunner, A.: Visualization tool for collective awareness in a platform of citizen proposals. In: Proceedings of the International AAAI Conference on Weblogs and Social Media, pp. 756–757 (2016)Google Scholar
  2. 2.
    Aragón, P., Gómez, V., García, D., Kaltenbrunner, A.: Generative models of online discussion threads: state of the art and research challenges. J. Internet Serv. Appl. 8(1), 15 (2017)CrossRefGoogle Scholar
  3. 3.
    Aragón, P., Gómez, V., Kaltenbrunner, A.: To thread or not to thread: the impact of conversation threading on online discussion. In: International AAAI Conference on Web and Social Media (2017)Google Scholar
  4. 4.
    Backstrom, L., Boldi, P., Rosa, M., Ugander, J., Vigna, S.: Four degrees of separation. In: Proceedings of the 4th Annual ACM Web Science Conference, pp. 33–42. ACM, New York (2012)Google Scholar
  5. 5.
    Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: forecasting popularity. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (ICWSM), vol. 12, pp. 26–33 (2012)Google Scholar
  6. 6.
    Bishop, J.: The effect of de-individuation of the internet troller on criminal procedure implementation: an interview with a hater. Int. J. Cyber Criminol. 7(1), 28–48 (2013)Google Scholar
  7. 7.
    Chandrasekharan, E., Pavalanathan, U., Srinivasan, A., Glynn, A., Eisenstein, J., Gilbert, E.: You can’t stay here: the efficacy of Reddit’s 2015 ban examined through hate speech. Proc. ACM Hum.-Comput. Interact. 1, 31 (2017)Google Scholar
  8. 8.
    Chandrasekharan, E., Samory, M., Jhaver, S., Charvat, H., Bruckman, A., Lampe, C., Eisenstein, J., Gilbert, E.: The internet’s hidden rules: an empirical study of Reddit norm violations at micro, meso, and macro scales. Proc. ACM Hum.-Comput. Interact. 2, 32:1–32:25 (2018).
  9. 9.
    Cohen, R., Havlin, S.: Scale-free networks are ultrasmall. Phys. Rev. Lett. 90(5), 058701 (2003)ADSCrossRefGoogle Scholar
  10. 10.
    Das, S., Lavoie, A.: The effects of feedback on human behavior in social media: an inverse reinforcement learning model. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, pp. 653–660. International Foundation for Autonomous Agents and Multiagent Systems (2014)Google Scholar
  11. 11.
    Derczynski, L., Rowe, M.: Tracking the diffusion of named entities. (2017, preprint). arXiv:1712.08349Google Scholar
  12. 12.
    Dommers, S., Van Der Hofstad, R., Hooghiemstra, G.: Diameters in preferential attachment models. J. Stat. Phys. 139(1), 72–107 (2010)ADSMathSciNetCrossRefGoogle Scholar
  13. 13.
    Fang, H., Cheng, H., Ostendorf, M.: Learning latent local conversation modes for predicting comment endorsement in online discussions. In: Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, pp. 55–64 (2016)Google Scholar
  14. 14.
    Gaffney, D., Matias, J.N.: Caveat emptor, computational social science: large-scale missing data in a widely-published Reddit corpus. (2018, preprint). arXiv:1803.05046Google Scholar
  15. 15.
    Gilbert, E.: Widespread underprovision on Reddit. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 803–808. ACM, New York (2013)Google Scholar
  16. 16.
    Glenski, M., Weninger, T.: Predicting user-interactions on Reddit. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 609–612. ACM, New York (2017)Google Scholar
  17. 17.
    Glenski, M., Pennycuff, C., Weninger, T.: Consumers and curators: browsing and voting patterns on Reddit. IEEE Trans. Comput. Soc. Syst. 4(4), 196–206 (2017)CrossRefGoogle Scholar
  18. 18.
    Gómez, V., Kaltenbrunner, A., López, V.: Statistical analysis of the social network and discussion threads in Slashdot. In: Proceedings of the 17th International Conference on World Wide Web, pp. 645–654. ACM, New York (2008)Google Scholar
  19. 19.
    Gómez, V., Kappen, H.J., Kaltenbrunner, A.: Modeling the structure and evolution of discussion cascades. In: Proceedings of the 22Nd ACM Conference on Hypertext and Hypermedia, pp. 181–190 (2011)Google Scholar
  20. 20.
    Gómez, V., Kappen, H.J., Litvak, N., Kaltenbrunner, A.: A likelihood-based framework for the analysis of discussion threads. World Wide Web 16(5–6), 645–675 (2013)CrossRefGoogle Scholar
  21. 21.
    Gonzalez-Bailon, S., Kaltenbrunner, A., Banchs, R.E.: The structure of political discussion networks: a model for the analysis of online deliberation. J. Inf. Technol. 25(2), 230–243 (2010). CrossRefGoogle Scholar
  22. 22.
    Halfaker, A., Keyes, O., Kluver, D., Thebault-Spieker, J., Nguyen, T., Shores, K., Uduwage, A., Warncke-Wang, M.: User session identification based on strong regularities in inter-activity time. In: Proceedings of the 24th International Conference on World Wide Web, pp. 410–418. International World Wide Web Conferences Steering Committee, Geneva (2015)Google Scholar
  23. 23.
    Hamilton, W.L., Zhang, J., Danescu-Niculescu-Mizil, C., Jurafsky, D., Leskovec, J.: Loyalty in online communities. In: Proceedings of the International AAAI Conference on Weblogs and Social Media, vol. 2017, p. 540. NIH Public Access (2017)Google Scholar
  24. 24.
    Hanson, W.A., Putler, D.S.: Hits and misses: herd behavior and online product popularity. Mark. Lett. 7(4), 297–305 (1996)CrossRefGoogle Scholar
  25. 25.
    Hessel, J., Tan, C., Lee, L.: Science, askscience, and badscience: on the coexistence of highly related communities. In: The Tenth International Conference on Web and Social Media (ICWSM), pp. 171–180 (2016)Google Scholar
  26. 26.
    Hessel, J., Lee, L., Mimno, D.: Cats and captions vs. creators and the clock: comparing multimodal content to context in predicting relative popularity. In: Proceedings of the 26th International Conference on World Wide Web, pp. 927–936. International World Wide Web Conferences Steering Committee, Geneva (2017)Google Scholar
  27. 27.
    Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences 102(46), 16569–16572 (2005)ADSCrossRefGoogle Scholar
  28. 28.
    Horne, B.D., Adali, S.: The impact of crowds on news engagement: a Reddit case study. (2017, preprint). arXiv:1703.10570Google Scholar
  29. 29.
    Horne, B.D., Adali, S., Sikdar, S.: Identifying the social signals that drive online discussions: a case study of Reddit communities. In: 26th International Conference on Computer Communication and Networks (ICCCN), pp. 1–9 (2017).
  30. 30.
    Jaech, A., Zayats, V., Fang, H., Ostendorf, M., Hajishirzi, H.: Talking to the crowd: what do people react to in online discussions? In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2026–2031 (2015)Google Scholar
  31. 31.
    Kaltenbrunner, A., Gomez, V., Lopez, V.: Description and prediction of Slashdot activity. In: Latin American Web Conference 2007 (LA-WEB 2007), pp. 57–66. IEEE, Piscataway (2007)Google Scholar
  32. 32.
    Karsai, M., Kivelä, M., Pan, R.K., Kaski, K., Kertész, J., Barabási, A.L., Saramäki, J.: Small but slow world: how network topology and burstiness slow down spreading. Phys. Rev. E 83(2), 025102 (2011)ADSCrossRefGoogle Scholar
  33. 33.
    Kumar, S., Hamilton, W.L., Leskovec, J., Jurafsky, D.: Community interaction and conflict on the web. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 933–943. International World Wide Web Conferences Steering Committee, Geneva (2018)Google Scholar
  34. 34.
    Lakkaraju, H., McAuley, J.J., Leskovec, J.: What’s in a name? Understanding the interplay between titles, content, and communities in social media. In: International AAAI Conference on Web and Social Media (ICWSM), vol. 1, no. 2, 3 (2013)Google Scholar
  35. 35.
    Lambiotte, R., Kosinski, M.: Tracking the digital footprints of personality. Proc. IEEE 102(12), 1934–1939 (2014)CrossRefGoogle Scholar
  36. 36.
    Lee, J.G., Moon, S., Salamatian, K.: Modeling and predicting the popularity of online contents with cox proportional hazard regression model. Neurocomputing 76(1), 134–145 (2012)CrossRefGoogle Scholar
  37. 37.
    Lumbreras, A., Jouve, B., Velcin, J., Guégan, M.: Role detection in online forums based on growth models for trees. Soc. Netw. Anal. Min. 7(1), 49 (2017)CrossRefGoogle Scholar
  38. 38.
    Marckert, J.F., Mokkadem, A., et al.: The depth first processes of Galton–Watson trees converge to the same Brownian excursion. Ann. Probab. 31(3), 1655–1678 (2003)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Medvedev, A.N., Delvenne, J.C., Lambiotte, R.: Modelling structure and predicting dynamics of discussion threads in online boards. J. Complex Netw. 7, 67–82 (2018). MathSciNetCrossRefGoogle Scholar
  40. 40.
    Mishne, G., Glance, N.: Leave a reply: an analysis of weblog comments. In: Proceedings of 3rd Annual Workshop on the Weblogging Ecosystem at the 15th International World Wide Web Conference (2006)Google Scholar
  41. 41.
    Mojica, L.G.: Modeling trolling in social media conversations. (2016, preprint). arXiv:1612.05310Google Scholar
  42. 42.
    Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the sample good enough? comparing data from Twitter’s streaming API with Twitter’s Firehose. In: International AAAI Conference on Web and Social Media (ICWSM) (2013)Google Scholar
  43. 43.
    Moyer, D., Carson, S.L., Dye, T.K., Carson, R.T., Goldbaum, D.: Determining the influence of Reddit posts on Wikipedia pageviews. In: Proceedings of the Ninth International AAAI Conference on Web and Social Media (2015)Google Scholar
  44. 44.
    Muchnik, L., Aral, S., Taylor, S.J.: Social influence bias: a randomized experiment. Science 341(6146), 647–651 (2013)ADSCrossRefGoogle Scholar
  45. 45.
    Newell, E., Jurgens, D., Saleem, H.M., Vala, H., Sassine, J., Armstrong, C., Ruths, D.: User migration in online social networks: a case study on Reddit during a period of community unrest. In: International AAAI Conference on Web and Social Media (ICWSM), pp. 279–288 (2016)Google Scholar
  46. 46.
    Nishi, R., Takaguchi, T., Oka, K., Maehara, T., Toyoda, M., Kawarabayashi, K.I., Masuda, N.: Reply trees in twitter: data analysis and branching process models. Soc. Netw. Anal. Min. 6(1), 1–13 (2016)CrossRefGoogle Scholar
  47. 47.
    Saleem, H.M., Ruths, D.: The aftermath of disbanding an online hateful community (2018). Preprint. arXiv:1804.07354Google Scholar
  48. 48.
    Salganik, M.J., Watts, D.J.: Leading the herd astray: an experimental study of self-fulfilling prophecies in an artificial cultural market. Soc. Psychol. Quart. 71(4), 338–355 (2008)CrossRefGoogle Scholar
  49. 49.
    Sinatra, R., Lambiotte, R.: Topical issue-quantifying success. Adv. Complex Syst. 21, 3–4 (2018)CrossRefGoogle Scholar
  50. 50.
    Singer, P., Flöck, F., Meinhart, C., Zeitfogel, E., Strohmaier, M.: Evolution of Reddit: from the front page of the internet to a self-referential community? In: Proceedings of the 23rd International Conference on World Wide Web, pp. 517–522. ACM, New York (2014)Google Scholar
  51. 51.
    Singer, P., Ferrara, E., Kooti, F., Strohmaier, M., Lerman, K.: Evidence of online performance deterioration in user sessions on Reddit. PloS One 11(8), e0161636 (2016)CrossRefGoogle Scholar
  52. 52.
    Stoddard, G.: Popularity dynamics and intrinsic quality in Reddit and hacker news. In: International AAAI Conference on Web and Social Media (ICWSM), pp. 416–425 (2015)Google Scholar
  53. 53.
    Stuck_In_the_Matrix: Dataset is available on the following webpage. (Query: 2017-06-01)
  54. 54.
    Stuck_In_the_Matrix: I have every publicly available Reddit comment for research. approx. 1.7 billion comments @ 250 gb compressed. any interest in this? (Query: 2017-07-14)
  55. 55.
    Stuck_In_the_Matrix: Update for the Reddit corpus. (Query: 2018-09-27)
  56. 56.
    Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)CrossRefGoogle Scholar
  57. 57.
    Tan, C.: Tracing community genealogy: how new communities emerge from the old. (2018, preprint). arXiv:1804.01990Google Scholar
  58. 58.
    Tan, C., Lee, L.: All who wander: on the prevalence and characteristics of multi-community engagement. In: Proceedings of the 24th International Conference on World Wide Web, WWW ’15, pp. 1056–1066. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2015)Google Scholar
  59. 59.
    Tsagkias, M., Weerkamp, W., De Rijke, M.: Predicting the volume of comments on online news stories. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1765–1768. ACM, New York (2009)Google Scholar
  60. 60.
    Wakefield, J.: Are you scared yet? Meet Norman, the psychopathic AI. BBC News
  61. 61.
    Wang, C., Ye, M., Huberman, B.A.: From user comments to on-line conversations. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12, pp. 244–252 (2012)Google Scholar
  62. 62.
    Zannettou, S., Caulfield, T., Blackburn, J., De Cristofaro, E., Sirivianos, M., Stringhini, G., Suarez-Tangil, G.: On the origins of memes by means of fringe web communities (2018). Preprint. arXiv:1805.12512Google Scholar
  63. 63.
    Zayats, V., Ostendorf, M.: Conversation modeling on Reddit using a graph-structured LSTM. Trans. Assoc. Comput. Linguist. 6, 121–132 (2018)CrossRefGoogle Scholar
  64. 64.
    Zhang, J., Hamilton, W.L., Danescu-Niculescu-Mizil, C., Jurafsky, D., Leskovec, J.: Community identity and user engagement in a multi-community landscape. In: Proceedings of the International AAAI Conference on Weblogs and Social Media, vol. 2017, p. 377. NIH Public Access (2017)Google Scholar
  65. 65.
    Zhao, Q., Erdogdu, M.A., He, H.Y., Rajaraman, A., Leskovec, J.: Seismic: A self-exciting point process model for predicting tweet popularity. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1513–1522 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.naXys, Université de Namur, ICTEAM, Université catholique de LouvainLouvain-la-NeuveBelgium
  2. 2.Mathematical InstituteUniversity of OxfordOxfordUK
  3. 3.naXys, Université de NamurNamurBelgium
  4. 4.ICTEAM and COREUniversité catholique de LouvainLouvain-la-NeuveBelgium

Personalised recommendations