One of the effects of social media’s prevalence in software development is the many flourishing communities of practice where users share a common interest. These large communities use many different communication channels, but little is known about how they create, share, and curate knowledge using such channels. In this paper, we report a mixed methods study of how one community of practice, the R software development community, creates and curates knowledge associated with questions and answers (Q&A) in two of its main communication channels: the R tag in Stack Overflow and the R-Help mailing list. The results reveal that knowledge is created and curated in two main forms: participatory, where multiple users explicitly collaborate to build knowledge, and crowdsourced, where individuals primarily work independently of each other. Moreover, we take a unique approach at slicing the data based on question score and participation activities over time. Our study reveals participation patterns, showing the existence of prolific contributors: users who are active across both channels and are responsible for a large proportion of the answers, serving as a bridge of knowledge. The key contributions of this paper are: a characterization of knowledge artifacts that are exchanged by this community of practice; the reasons why users choose one channel over the other; and insights on the community participation patterns, which indicate an evolution of the community and a shift from knowledge creation to knowledge curation.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
In the third phase of our study, we extended the mined datasets up to September 2016
Our scripts, sample data, and coded data are openly available at https://zenodo.org/record/831805
A copy of the survey is available at http://cagomezt.com/lime/index.php/857211?lang=en
There is a threat to validity for this result in the R-help data: Stack Overflow separates responses into comments and answers, however, R-help does not have this distinction. For R-help, we consider that any direct reply to an email is an answer; and we consider a reply to an answer to be a comment.
Bettenburg N, Shihab E, Hassan A (2009) An empirical study on the risks of using off-the-shelf techniques for processing mailing list data. In: ICSM’09 Proceedings of the 25th International Conference on Software Maintenance, pp 539–542
Bosu A, Corley CS, Heaton D, Chatterji D, Carver JC, Kraft NA (2013) Building reputation in stackoverflow An empirical investigation. In: Proceedings of the 10th International Conference on Mining Software Repositories, MSR ’13, pp 89–92
Bowen GA (2008) Naturalistic inquiry and the saturation concept: a research note. Qual Res 8(1):137–152
Correa D, Sureka A (2014) Chaff from the wheat Characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pp 631–642
Creswell J (2009) Research design: Qualitative, Quantitative, and Mixed Methods Approaches. SAGE Publications
German D, Adams B, Hassan A (2013) The evolution of the r software ecosystem. In: 2013 17th European Conference on Software Maintenance and Reengineering (CSMR), pp 243–252
Gomez C, Cleary B, Singer L (2013) A study of innovation diffusion through link sharing on stack overflow. In: Proceedings of the 10th International Conference on Mining Software Repositories, pp 81– 84
Ihaka R, Gentleman R (1996) A language for data analysis and graphics. J Comput Graph Stat 5(3):299– 314
Jenkins H (2009) Confronting the Challenges of Participatory Culture: Media Education for the 21st Century. The John D. and Catherine T. MacArthur Foundation Reports on Digital Media and Learning MIT Press
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Lave J, Wenger E (2002) Legitimate peripheral participation in communities of practice. Supporting Lifelong Learn 1:111–126
Li H, Xing Z, Peng X, Zhao W (2013) What help do developers seek, when and how?. In: 2013 20th Working Conference on Reverse Engineering Reverse Engineering (WCRE). IEEE, pp 142–151
Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest Q&A site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, pp 2857–2866
Naur P (1985) Programming as theory building. Microprocessing Microprogramming 15(5):253–261
Runeson P, Host M, Rainer A, Regnell B (2012) Case Study Research in Software Engineering: Guidelines and Examples. Wiley
Singer L, Figueira Filho F, Cleary B, Treude C, Storey M-A, Schneider K (2013) Mutual assessment in the social programmer ecosystem: an empirical investigation of developer profile aggregators. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW ’13, pp 103– 116
Squire M (2015) Should we move to Stack Overflow?: measuring the utility of social media for developer support. In: 37th International Conference on Software Engineering, pp 219–228
Srba I, Bielikova M (2016) Why is stack overflow failing? preserving sustainability in community question answering. IEEE Softw 33(4):80–89
Stemler SE (2004) A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Pract Assess Res Eval 9:4
Storey M-A, Singer L, Cleary B, Figueira Filho F, Zagalsky A (2014) The (r) evolution of social media in software engineering. In: Proceedings of the on Future of Software Engineering, FOSE 2014, pp 100–116
Tausczik YR, Kittur A, Kraut RE (2014) Collaborative problem solving: A study of mathoverflow. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW’14, pp 355–367
Vasilescu B (2014) Social aspects of collaboration in online software communities. PhD thesis, Eindhoven University of Technology
Vasilescu B, Serebrenik A, Devanbu PT, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM Conf. on Computer Supported Cooperative Work and Social Computing, pp 342–354
Wenger E, White N, Smith JD (2009) Digital habitats: Stewarding technology for communities. CPsquare
Zagalsky A, Teshima CG, German DM, Storey M-A, Poo-Caamaño G (2016) How the r community creates and curates knowledge: a comparative study of stack overflow and mailing lists. In: Proceedings of the 13th International Conference on Mining Software Repositories. ACM, pp 441–451
Zhang AX, Ackerman MS, Karger DR (2015) Mailing lists: Why are they still here, what is wrong with them, and how can we fix them?. In: Proceedings of the 33rd SIGCHI Conference on Human Factors in Computing Systems
The authors would like to thank Cassandra Petrachenko for the editing support and valuable comments that contributed to this work. We also thank Lorena Castañeda for her assistance with the data collection and analysis processes. Finally, we thank the R community users that responded to our survey. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).
Communicated by: Romain Robbes, Christian Bird, and Emily Hill
About this article
Cite this article
Zagalsky, A., German, D.M., Storey, M. et al. How the R community creates and curates knowledge: an extended study of stack overflow and mailing lists. Empir Software Eng 23, 953–986 (2018). https://doi.org/10.1007/s10664-017-9536-y
- Mining software repositories
- Empirical study
- Qualitative study
- Stack overflow
- Mailing list