Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

An empirical study on the teams structures in social coding using GitHub projects

  • 277 Accesses

Abstract

Social coding enables collaborative software development in virtual and distributed communities. Social coding platforms (e.g., GitHub) provide the pull request feature that allows developers to clone a project, make code changes, and request the project owners to review and integrate the code changes to the main stream of a project. The pull request feature has been widely adopted by a large number of GitHub projects, as it minimizes the risk of exposing the projects to the open communities. The efficiency of the pull requests review process depends both on technical (e.g., the code quality) and social (e.g., the connection of a contributor to the project maintainer) factors. However, it is still unclear which social factors have the most impact on the efficiency of the review process. To identify the social factors, we study the team structures formed by the developers within the projects that adopt the pull-based development model. We build the pull-based networks, where two developers are linked if one has integrated a pull request submitted by the other. We investigate the 7,850 most popular projects on GitHub that are developed in ten programming languages. We identify the network metrics that have a significant association with the speed of processing the pull requests. Specifically, maintaining a strong core of contributors and denser interactions among the developers is associated with faster response and processing of the pull requests. We further find that more than 90% of the studied projects follow 8 dominant team structures out of 18 possible team structures. In the larger projects, only a set of developers is granted review and integration privileges of the pull requests, reflecting a strict decision making process. The small to medium projects are characterized by a small number of core contributors who maintain repeated interactions, and are able to process the incoming pull requests more efficiently. The evolution of the team structures of projects over time reveals that only a low percentage of the projects witnesses a change towards team structures associated to faster pull requests processing (e.g., stronger centralization).

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    https://github.com/

  2. 2.

    https://github.com/apache/ignite

References

  1. Anderson BS, Butts C, Carley K (1999) The interaction of size and density with graph-level indices. Soc Networks 21(3):239–267

  2. Barr ET, Bird C, Rigby PC, Hindle A, German DM, Devanbu P (2012) Cohesive and isolated development with branches. In: Fundamental Approaches to Software Engineering, Springer, pp 316–331

  3. Bersani FS, Lindqvist D, Mellon SH, Epel ES, Yehuda R, Flory J, Henn-Hasse C, Bierer LM, Makotkine I, Abu-Amara D, Coy M, Reus VI, Lin J, Blackburn EH, Marmar C, Wolkowitz OM (2016) Association of dimensional psychological health measures with telomere length in male war veterans. J Affect Disord 190:537–542

  4. Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: 2010 IEEE 18th International Conference on Program Comprehension (ICPC), pp 124–133

  5. Bird C, Gourley A, Devanbu P, Swaminathan A, Hsu G (2007) Open borders? Immigration in open source projects. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, IEEE Computer Society, MSR ’07, pp 6

  6. Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, ACM, pp 24–35

  7. Butts CT, et al. (2008) Social network analysis with sna. J Stat Softw 24 (6):1–51

  8. Capra E, Francalanci C, Merlo F (2008) An empirical study on the relationship between software design quality, development effort and governance in open source projects. IEEE Trans Softw Eng 34(6):765–782

  9. Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88:2–9

  10. Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10(2). https://doi.org/10.5210/fm.v0i0.1478

  11. Crowston K, Howison J (2006) Hierarchy and centralization in free and open source software team communications. Knowl Technol Policy 18(4):65–85

  12. Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in github: Transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’12, pp 1277–1286

  13. de Reus MA, van den Heuvel MP (2013) The parcellation-based connectome: Limitations and extensions. NeuroImage 80:397–404. mapping the Connectome

  14. Dinh-Trong TT, Bieman JM (2005) The freebsd project: a replication case study of open source development. IEEE Trans Softw Eng 31(6):481–494

  15. Ducheneaut N (2005) Socialization in an open source software community: a socio-technical analysis. Comput Supported Coop Work (CSCW) 14(4):323–368

  16. Ehrlich K, Cataldo M (2012) All-for-one and one-for-all?: A multi-level analysis of communication patterns and individual performance in geographically distributed software development. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’12, pp 945–954

  17. Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry pp 35–41

  18. Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Networks 1(3):215–239

  19. Gacek C, Arief B (2004) The many meanings of open source. IEEE Softw 21 (1):34–40

  20. Garlaschelli D, Loffredo MI (2004) Patterns of link reciprocity in directed networks. Phys Rev Lett 93(26):268,701

  21. Gharehyazie M, Posnett D, Vasilescu B, Filkov V (2015) Developer initiation and social interactions in oss: a case study of the apache software foundation. Empir Softw Eng 20(5):1318–1353

  22. Gousios G (2013) The ghtorent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, pp 233–236

  23. Gousios G, Pinzger M, Av Deursen (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, ICSE 2014, pp 345–355

  24. Gousios G, Zaidman A, Storey MA, van Deursen A (2015) Work practices and challenges in pull-based development: The integrator’s perspective. In: Proceedings of the 37th International Conference on Software Engineering, vol 1. IEEE Press, Piscataway, ICSE ’15, pp 358–368

  25. Gousios G, Storey MA, Bacchelli A (2016) Work practices and challenges in pull-based development: The contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering, ACM, New York, ICSE ’16, pp 285–296

  26. Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2008) Statnet: Software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw 24(1):1548

  27. Howison J, Inoue K, Crowston K (2006) Social dynamics of free and open source team communications. In: IFIP International Conference on Open Source Systems, Springer, pp 319–330

  28. Jiang Y, Adams B, German DM (2013) Will my patch make it? and how fast?: Case study on the linux kernel. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, MSR ’13, pp 101–110

  29. Joblin M, Apel S, Hunsen C, Mauerer W (2017) Classifying developers into core and peripheral: An empirical study on count and network metrics. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), IEEE, pp 164–174

  30. Krackhardt D (1994) Graph theoretical dimensions of informal organizations. Computational Organization Theory 89(112):123–140

  31. Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621

  32. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics pp 50–60

  33. Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: Activity traces and personal profiles in github. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’13, pp 117–128

  34. Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11 (3):309–346

  35. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, ACM, New York, ICSE ’08, pp 181–190

  36. Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, Washington, ESEM ’07, pp 364–373

  37. O’Mahony S, Ferraro F (2007) The emergence of governance in an open source community. Acad Manag J 50(5):1079–1106

  38. Rick (2013) View long-running pull requests

  39. Rigby PC, Storey MA (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd International Conference on Software Engineering, ACM, New York, ICSE ’11, pp 541–550

  40. Rigby PC, Barr ET, Bird C, Devanbu P, German DM (2013) What effect does distributed version control have on oss project organization? In: 2013 1st International Workshop on Release Engineering (RELENG), IEEE, pp 29–32

  41. Robertsa J, Hann IH, Slaughter S (2006) Communication networks in an open source software project. In: IFIP International Conference on Open Source Systems, Springer, pp 297–306

  42. Schall D (2014) Who to follow recommendation in large-scale online development communities. Inf Softw Technol 56(12):1543–1555. special issue: Human Factors in Software Development

  43. Sheskin DJ (2007) Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, Boca Raton

  44. Siegel S (1956) Nonparametric Statistics for the Behavioral Sciences. McGraw-hill, New York

  45. Steel RGD, Torrie JH (1960) Principles and Procedures of Statistics: with Special Reference to the Biological Sciences. McGraw-Hill, New York

  46. Tsay J, Dabbish L, Herbsleb J (2014a) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, ICSE 2014, pp 356–366

  47. Tsay J, Dabbish L, Herbsleb J (2014b) Let’s talk about it: Evaluating contributions through discussion in github. In: Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, New York, FSE 2014, pp 144–154

  48. Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ACM, pp 805–816

  49. von Krogh G, Spaeth S, Lakhani KR (2003) Community, joining, and specialization in open source software innovation: a case study. Res Policy 32 (7):1217–1241. open Source Software Development

  50. Wasserman S, Faust K (1994) Social Network Analysis: Methods and Applications, vol 8. Cambridge University Press, Cambridge

  51. Wolf T, Schroter A, Damian D, Nguyen T (2009) Predicting build failures using social network analysis on developer communication. In: Proceedings of the 31st International Conference on Software Engineering, IEEE Computer Society, Washington, ICSE ’09, pp 1–11

  52. Yu Y, Wang H, Yin G, Ling CX (2014a) Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 21St Asia-pacific Software Engineering Conference, vol 1. pp 335–342

  53. Yu Y, Yin G, Wang H, Wang T (2014b) Exploring the patterns of social behavior in github. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies, ACM, New York, CrowdSoft 2014, pp 31–36

  54. Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 367–371

  55. Zanetti MS, Scholtes I, Tessone CJ, Schweitzer F (2013) Categorizing bugs with social networks: A case study on four open source software communities. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, ICSE ’13, pp 1032–1041

  56. Zar JH (2005) Spearman Rank Correlation. John Wiley & Sons, Ltd

Download references

Author information

Correspondence to Mariam El Mezouar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by: Jeffrey C. Carver

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

El Mezouar, M., Zhang, F. & Zou, Y. An empirical study on the teams structures in social coding using GitHub projects. Empir Software Eng 24, 3790–3823 (2019). https://doi.org/10.1007/s10664-019-09700-1

Download citation

Keywords

  • Pull request
  • Social coding
  • Team structure
  • Github