Skip to main content

How the R community creates and curates knowledge: an extended study of stack overflow and mailing lists


One of the effects of social media’s prevalence in software development is the many flourishing communities of practice where users share a common interest. These large communities use many different communication channels, but little is known about how they create, share, and curate knowledge using such channels. In this paper, we report a mixed methods study of how one community of practice, the R software development community, creates and curates knowledge associated with questions and answers (Q&A) in two of its main communication channels: the R tag in Stack Overflow and the R-Help mailing list. The results reveal that knowledge is created and curated in two main forms: participatory, where multiple users explicitly collaborate to build knowledge, and crowdsourced, where individuals primarily work independently of each other. Moreover, we take a unique approach at slicing the data based on question score and participation activities over time. Our study reveals participation patterns, showing the existence of prolific contributors: users who are active across both channels and are responsible for a large proportion of the answers, serving as a bridge of knowledge. The key contributions of this paper are: a characterization of knowledge artifacts that are exchanged by this community of practice; the reasons why users choose one channel over the other; and insights on the community participation patterns, which indicate an evolution of the community and a shift from knowledge creation to knowledge curation.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

    In the third phase of our study, we extended the mined datasets up to September 2016

  7. 7.

  8. 8.

  9. 9.

    Our scripts, sample data, and coded data are openly available at

  10. 10.

  11. 11.

    A copy of the survey is available at

  12. 12.

  13. 13.

  14. 14.

  15. 15.

  16. 16.

    There is a threat to validity for this result in the R-help data: Stack Overflow separates responses into comments and answers, however, R-help does not have this distinction. For R-help, we consider that any direct reply to an email is an answer; and we consider a reply to an answer to be a comment.

  17. 17.

  18. 18.

  19. 19.

  20. 20.

  21. 21.

  22. 22.

  23. 23.

  24. 24.


  1. Bettenburg N, Shihab E, Hassan A (2009) An empirical study on the risks of using off-the-shelf techniques for processing mailing list data. In: ICSM’09 Proceedings of the 25th International Conference on Software Maintenance, pp 539–542

  2. Bosu A, Corley CS, Heaton D, Chatterji D, Carver JC, Kraft NA (2013) Building reputation in stackoverflow An empirical investigation. In: Proceedings of the 10th International Conference on Mining Software Repositories, MSR ’13, pp 89–92

  3. Bowen GA (2008) Naturalistic inquiry and the saturation concept: a research note. Qual Res 8(1):137–152

    Article  Google Scholar 

  4. Correa D, Sureka A (2014) Chaff from the wheat Characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pp 631–642

  5. Creswell J (2009) Research design: Qualitative, Quantitative, and Mixed Methods Approaches. SAGE Publications

  6. German D, Adams B, Hassan A (2013) The evolution of the r software ecosystem. In: 2013 17th European Conference on Software Maintenance and Reengineering (CSMR), pp 243–252

  7. Gomez C, Cleary B, Singer L (2013) A study of innovation diffusion through link sharing on stack overflow. In: Proceedings of the 10th International Conference on Mining Software Repositories, pp 81– 84

  8. Ihaka R, Gentleman R (1996) A language for data analysis and graphics. J Comput Graph Stat 5(3):299– 314

    Google Scholar 

  9. Jenkins H (2009) Confronting the Challenges of Participatory Culture: Media Education for the 21st Century. The John D. and Catherine T. MacArthur Foundation Reports on Digital Media and Learning MIT Press

  10. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    Article  MATH  Google Scholar 

  11. Lave J, Wenger E (2002) Legitimate peripheral participation in communities of practice. Supporting Lifelong Learn 1:111–126

    Google Scholar 

  12. Li H, Xing Z, Peng X, Zhao W (2013) What help do developers seek, when and how?. In: 2013 20th Working Conference on Reverse Engineering Reverse Engineering (WCRE). IEEE, pp 142–151

  13. Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest Q&A site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, pp 2857–2866

  14. Naur P (1985) Programming as theory building. Microprocessing Microprogramming 15(5):253–261

    Article  Google Scholar 

  15. Runeson P, Host M, Rainer A, Regnell B (2012) Case Study Research in Software Engineering: Guidelines and Examples. Wiley

  16. Singer L, Figueira Filho F, Cleary B, Treude C, Storey M-A, Schneider K (2013) Mutual assessment in the social programmer ecosystem: an empirical investigation of developer profile aggregators. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW ’13, pp 103– 116

  17. Squire M (2015) Should we move to Stack Overflow?: measuring the utility of social media for developer support. In: 37th International Conference on Software Engineering, pp 219–228

  18. Srba I, Bielikova M (2016) Why is stack overflow failing? preserving sustainability in community question answering. IEEE Softw 33(4):80–89

    Article  Google Scholar 

  19. Stemler SE (2004) A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Pract Assess Res Eval 9:4

    Google Scholar 

  20. Storey M-A, Singer L, Cleary B, Figueira Filho F, Zagalsky A (2014) The (r) evolution of social media in software engineering. In: Proceedings of the on Future of Software Engineering, FOSE 2014, pp 100–116

  21. Tausczik YR, Kittur A, Kraut RE (2014) Collaborative problem solving: A study of mathoverflow. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW’14, pp 355–367

  22. Vasilescu B (2014) Social aspects of collaboration in online software communities. PhD thesis, Eindhoven University of Technology

  23. Vasilescu B, Serebrenik A, Devanbu PT, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM Conf. on Computer Supported Cooperative Work and Social Computing, pp 342–354

  24. Wenger E, White N, Smith JD (2009) Digital habitats: Stewarding technology for communities. CPsquare

  25. Zagalsky A, Teshima CG, German DM, Storey M-A, Poo-Caamaño G (2016) How the r community creates and curates knowledge: a comparative study of stack overflow and mailing lists. In: Proceedings of the 13th International Conference on Mining Software Repositories. ACM, pp 441–451

  26. Zhang AX, Ackerman MS, Karger DR (2015) Mailing lists: Why are they still here, what is wrong with them, and how can we fix them?. In: Proceedings of the 33rd SIGCHI Conference on Human Factors in Computing Systems

Download references


The authors would like to thank Cassandra Petrachenko for the editing support and valuable comments that contributed to this work. We also thank Lorena Castañeda for her assistance with the data collection and analysis processes. Finally, we thank the R community users that responded to our survey. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information



Corresponding author

Correspondence to Alexey Zagalsky.

Additional information

Communicated by: Romain Robbes, Christian Bird, and Emily Hill

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zagalsky, A., German, D.M., Storey, MA. et al. How the R community creates and curates knowledge: an extended study of stack overflow and mailing lists. Empir Software Eng 23, 953–986 (2018).

Download citation


  • Mining software repositories
  • Empirical study
  • Qualitative study
  • Survey
  • Stack overflow
  • R
  • Mailing list