Empirical Software Engineering

, Volume 23, Issue 2, pp 953–986 | Cite as

How the R community creates and curates knowledge: an extended study of stack overflow and mailing lists

  • Alexey ZagalskyEmail author
  • Daniel M. German
  • Margaret-Anne Storey
  • Carlos Gómez Teshima
  • Germán Poo-Caamaño


One of the effects of social media’s prevalence in software development is the many flourishing communities of practice where users share a common interest. These large communities use many different communication channels, but little is known about how they create, share, and curate knowledge using such channels. In this paper, we report a mixed methods study of how one community of practice, the R software development community, creates and curates knowledge associated with questions and answers (Q&A) in two of its main communication channels: the R tag in Stack Overflow and the R-Help mailing list. The results reveal that knowledge is created and curated in two main forms: participatory, where multiple users explicitly collaborate to build knowledge, and crowdsourced, where individuals primarily work independently of each other. Moreover, we take a unique approach at slicing the data based on question score and participation activities over time. Our study reveals participation patterns, showing the existence of prolific contributors: users who are active across both channels and are responsible for a large proportion of the answers, serving as a bridge of knowledge. The key contributions of this paper are: a characterization of knowledge artifacts that are exchanged by this community of practice; the reasons why users choose one channel over the other; and insights on the community participation patterns, which indicate an evolution of the community and a shift from knowledge creation to knowledge curation.


Mining software repositories Empirical study Qualitative study Survey Stack overflow Mailing list 



The authors would like to thank Cassandra Petrachenko for the editing support and valuable comments that contributed to this work. We also thank Lorena Castañeda for her assistance with the data collection and analysis processes. Finally, we thank the R community users that responded to our survey. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).


  1. Bettenburg N, Shihab E, Hassan A (2009) An empirical study on the risks of using off-the-shelf techniques for processing mailing list data. In: ICSM’09 Proceedings of the 25th International Conference on Software Maintenance, pp 539–542Google Scholar
  2. Bosu A, Corley CS, Heaton D, Chatterji D, Carver JC, Kraft NA (2013) Building reputation in stackoverflow An empirical investigation. In: Proceedings of the 10th International Conference on Mining Software Repositories, MSR ’13, pp 89–92Google Scholar
  3. Bowen GA (2008) Naturalistic inquiry and the saturation concept: a research note. Qual Res 8(1):137–152CrossRefGoogle Scholar
  4. Correa D, Sureka A (2014) Chaff from the wheat Characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pp 631–642Google Scholar
  5. Creswell J (2009) Research design: Qualitative, Quantitative, and Mixed Methods Approaches. SAGE PublicationsGoogle Scholar
  6. German D, Adams B, Hassan A (2013) The evolution of the r software ecosystem. In: 2013 17th European Conference on Software Maintenance and Reengineering (CSMR), pp 243–252Google Scholar
  7. Gomez C, Cleary B, Singer L (2013) A study of innovation diffusion through link sharing on stack overflow. In: Proceedings of the 10th International Conference on Mining Software Repositories, pp 81– 84Google Scholar
  8. Ihaka R, Gentleman R (1996) A language for data analysis and graphics. J Comput Graph Stat 5(3):299– 314Google Scholar
  9. Jenkins H (2009) Confronting the Challenges of Participatory Culture: Media Education for the 21st Century. The John D. and Catherine T. MacArthur Foundation Reports on Digital Media and Learning MIT PressGoogle Scholar
  10. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174CrossRefzbMATHGoogle Scholar
  11. Lave J, Wenger E (2002) Legitimate peripheral participation in communities of practice. Supporting Lifelong Learn 1:111–126Google Scholar
  12. Li H, Xing Z, Peng X, Zhao W (2013) What help do developers seek, when and how?. In: 2013 20th Working Conference on Reverse Engineering Reverse Engineering (WCRE). IEEE, pp 142–151Google Scholar
  13. Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest Q&A site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, pp 2857–2866Google Scholar
  14. Naur P (1985) Programming as theory building. Microprocessing Microprogramming 15(5):253–261CrossRefGoogle Scholar
  15. Runeson P, Host M, Rainer A, Regnell B (2012) Case Study Research in Software Engineering: Guidelines and Examples. WileyGoogle Scholar
  16. Singer L, Figueira Filho F, Cleary B, Treude C, Storey M-A, Schneider K (2013) Mutual assessment in the social programmer ecosystem: an empirical investigation of developer profile aggregators. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW ’13, pp 103– 116Google Scholar
  17. Squire M (2015) Should we move to Stack Overflow?: measuring the utility of social media for developer support. In: 37th International Conference on Software Engineering, pp 219–228Google Scholar
  18. Srba I, Bielikova M (2016) Why is stack overflow failing? preserving sustainability in community question answering. IEEE Softw 33(4):80–89CrossRefGoogle Scholar
  19. Stemler SE (2004) A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Pract Assess Res Eval 9:4Google Scholar
  20. Storey M-A, Singer L, Cleary B, Figueira Filho F, Zagalsky A (2014) The (r) evolution of social media in software engineering. In: Proceedings of the on Future of Software Engineering, FOSE 2014, pp 100–116Google Scholar
  21. Tausczik YR, Kittur A, Kraut RE (2014) Collaborative problem solving: A study of mathoverflow. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW’14, pp 355–367Google Scholar
  22. Vasilescu B (2014) Social aspects of collaboration in online software communities. PhD thesis, Eindhoven University of TechnologyGoogle Scholar
  23. Vasilescu B, Serebrenik A, Devanbu PT, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM Conf. on Computer Supported Cooperative Work and Social Computing, pp 342–354Google Scholar
  24. Wenger E, White N, Smith JD (2009) Digital habitats: Stewarding technology for communities. CPsquareGoogle Scholar
  25. Zagalsky A, Teshima CG, German DM, Storey M-A, Poo-Caamaño G (2016) How the r community creates and curates knowledge: a comparative study of stack overflow and mailing lists. In: Proceedings of the 13th International Conference on Mining Software Repositories. ACM, pp 441–451Google Scholar
  26. Zhang AX, Ackerman MS, Karger DR (2015) Mailing lists: Why are they still here, what is wrong with them, and how can we fix them?. In: Proceedings of the 33rd SIGCHI Conference on Human Factors in Computing SystemsGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.University of VictoriaVictoriaCanada

Personalised recommendations