Skip to main content
Log in

An empirical study on the teams structures in social coding using GitHub projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Social coding enables collaborative software development in virtual and distributed communities. Social coding platforms (e.g., GitHub) provide the pull request feature that allows developers to clone a project, make code changes, and request the project owners to review and integrate the code changes to the main stream of a project. The pull request feature has been widely adopted by a large number of GitHub projects, as it minimizes the risk of exposing the projects to the open communities. The efficiency of the pull requests review process depends both on technical (e.g., the code quality) and social (e.g., the connection of a contributor to the project maintainer) factors. However, it is still unclear which social factors have the most impact on the efficiency of the review process. To identify the social factors, we study the team structures formed by the developers within the projects that adopt the pull-based development model. We build the pull-based networks, where two developers are linked if one has integrated a pull request submitted by the other. We investigate the 7,850 most popular projects on GitHub that are developed in ten programming languages. We identify the network metrics that have a significant association with the speed of processing the pull requests. Specifically, maintaining a strong core of contributors and denser interactions among the developers is associated with faster response and processing of the pull requests. We further find that more than 90% of the studied projects follow 8 dominant team structures out of 18 possible team structures. In the larger projects, only a set of developers is granted review and integration privileges of the pull requests, reflecting a strict decision making process. The small to medium projects are characterized by a small number of core contributors who maintain repeated interactions, and are able to process the incoming pull requests more efficiently. The evolution of the team structures of projects over time reveals that only a low percentage of the projects witnesses a change towards team structures associated to faster pull requests processing (e.g., stronger centralization).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://github.com/

  2. https://github.com/apache/ignite

References

  • Anderson BS, Butts C, Carley K (1999) The interaction of size and density with graph-level indices. Soc Networks 21(3):239–267

    Article  Google Scholar 

  • Barr ET, Bird C, Rigby PC, Hindle A, German DM, Devanbu P (2012) Cohesive and isolated development with branches. In: Fundamental Approaches to Software Engineering, Springer, pp 316–331

  • Bersani FS, Lindqvist D, Mellon SH, Epel ES, Yehuda R, Flory J, Henn-Hasse C, Bierer LM, Makotkine I, Abu-Amara D, Coy M, Reus VI, Lin J, Blackburn EH, Marmar C, Wolkowitz OM (2016) Association of dimensional psychological health measures with telomere length in male war veterans. J Affect Disord 190:537–542

    Article  Google Scholar 

  • Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: 2010 IEEE 18th International Conference on Program Comprehension (ICPC), pp 124–133

  • Bird C, Gourley A, Devanbu P, Swaminathan A, Hsu G (2007) Open borders? Immigration in open source projects. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, IEEE Computer Society, MSR ’07, pp 6

  • Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, ACM, pp 24–35

  • Butts CT, et al. (2008) Social network analysis with sna. J Stat Softw 24 (6):1–51

    Article  Google Scholar 

  • Capra E, Francalanci C, Merlo F (2008) An empirical study on the relationship between software design quality, development effort and governance in open source projects. IEEE Trans Softw Eng 34(6):765–782

    Article  Google Scholar 

  • Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88:2–9

    Article  Google Scholar 

  • Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10(2). https://doi.org/10.5210/fm.v0i0.1478

  • Crowston K, Howison J (2006) Hierarchy and centralization in free and open source software team communications. Knowl Technol Policy 18(4):65–85

    Article  Google Scholar 

  • Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in github: Transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’12, pp 1277–1286

  • de Reus MA, van den Heuvel MP (2013) The parcellation-based connectome: Limitations and extensions. NeuroImage 80:397–404. mapping the Connectome

    Article  Google Scholar 

  • Dinh-Trong TT, Bieman JM (2005) The freebsd project: a replication case study of open source development. IEEE Trans Softw Eng 31(6):481–494

    Article  Google Scholar 

  • Ducheneaut N (2005) Socialization in an open source software community: a socio-technical analysis. Comput Supported Coop Work (CSCW) 14(4):323–368

    Article  Google Scholar 

  • Ehrlich K, Cataldo M (2012) All-for-one and one-for-all?: A multi-level analysis of communication patterns and individual performance in geographically distributed software development. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’12, pp 945–954

  • Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry pp 35–41

    Article  Google Scholar 

  • Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Networks 1(3):215–239

    Article  Google Scholar 

  • Gacek C, Arief B (2004) The many meanings of open source. IEEE Softw 21 (1):34–40

    Article  Google Scholar 

  • Garlaschelli D, Loffredo MI (2004) Patterns of link reciprocity in directed networks. Phys Rev Lett 93(26):268,701

    Article  Google Scholar 

  • Gharehyazie M, Posnett D, Vasilescu B, Filkov V (2015) Developer initiation and social interactions in oss: a case study of the apache software foundation. Empir Softw Eng 20(5):1318–1353

    Article  Google Scholar 

  • Gousios G (2013) The ghtorent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, pp 233–236

  • Gousios G, Pinzger M, Av Deursen (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, ICSE 2014, pp 345–355

  • Gousios G, Zaidman A, Storey MA, van Deursen A (2015) Work practices and challenges in pull-based development: The integrator’s perspective. In: Proceedings of the 37th International Conference on Software Engineering, vol 1. IEEE Press, Piscataway, ICSE ’15, pp 358–368

  • Gousios G, Storey MA, Bacchelli A (2016) Work practices and challenges in pull-based development: The contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering, ACM, New York, ICSE ’16, pp 285–296

  • Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2008) Statnet: Software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw 24(1):1548

    Article  Google Scholar 

  • Howison J, Inoue K, Crowston K (2006) Social dynamics of free and open source team communications. In: IFIP International Conference on Open Source Systems, Springer, pp 319–330

    Google Scholar 

  • Jiang Y, Adams B, German DM (2013) Will my patch make it? and how fast?: Case study on the linux kernel. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, MSR ’13, pp 101–110

  • Joblin M, Apel S, Hunsen C, Mauerer W (2017) Classifying developers into core and peripheral: An empirical study on count and network metrics. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), IEEE, pp 164–174

  • Krackhardt D (1994) Graph theoretical dimensions of informal organizations. Computational Organization Theory 89(112):123–140

    Google Scholar 

  • Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621

    Article  Google Scholar 

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics pp 50–60

  • Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: Activity traces and personal profiles in github. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’13, pp 117–128

  • Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11 (3):309–346

    Article  Google Scholar 

  • Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, ACM, New York, ICSE ’08, pp 181–190

  • Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, Washington, ESEM ’07, pp 364–373

  • O’Mahony S, Ferraro F (2007) The emergence of governance in an open source community. Acad Manag J 50(5):1079–1106

    Article  Google Scholar 

  • Rick (2013) View long-running pull requests

  • Rigby PC, Storey MA (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd International Conference on Software Engineering, ACM, New York, ICSE ’11, pp 541–550

  • Rigby PC, Barr ET, Bird C, Devanbu P, German DM (2013) What effect does distributed version control have on oss project organization? In: 2013 1st International Workshop on Release Engineering (RELENG), IEEE, pp 29–32

  • Robertsa J, Hann IH, Slaughter S (2006) Communication networks in an open source software project. In: IFIP International Conference on Open Source Systems, Springer, pp 297–306

    Google Scholar 

  • Schall D (2014) Who to follow recommendation in large-scale online development communities. Inf Softw Technol 56(12):1543–1555. special issue: Human Factors in Software Development

    Article  Google Scholar 

  • Sheskin DJ (2007) Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Siegel S (1956) Nonparametric Statistics for the Behavioral Sciences. McGraw-hill, New York

    MATH  Google Scholar 

  • Steel RGD, Torrie JH (1960) Principles and Procedures of Statistics: with Special Reference to the Biological Sciences. McGraw-Hill, New York

    MATH  Google Scholar 

  • Tsay J, Dabbish L, Herbsleb J (2014a) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, ICSE 2014, pp 356–366

  • Tsay J, Dabbish L, Herbsleb J (2014b) Let’s talk about it: Evaluating contributions through discussion in github. In: Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, New York, FSE 2014, pp 144–154

  • Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ACM, pp 805–816

  • von Krogh G, Spaeth S, Lakhani KR (2003) Community, joining, and specialization in open source software innovation: a case study. Res Policy 32 (7):1217–1241. open Source Software Development

    Article  Google Scholar 

  • Wasserman S, Faust K (1994) Social Network Analysis: Methods and Applications, vol 8. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Wolf T, Schroter A, Damian D, Nguyen T (2009) Predicting build failures using social network analysis on developer communication. In: Proceedings of the 31st International Conference on Software Engineering, IEEE Computer Society, Washington, ICSE ’09, pp 1–11

  • Yu Y, Wang H, Yin G, Ling CX (2014a) Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 21St Asia-pacific Software Engineering Conference, vol 1. pp 335–342

  • Yu Y, Yin G, Wang H, Wang T (2014b) Exploring the patterns of social behavior in github. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies, ACM, New York, CrowdSoft 2014, pp 31–36

  • Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 367–371

  • Zanetti MS, Scholtes I, Tessone CJ, Schweitzer F (2013) Categorizing bugs with social networks: A case study on four open source software communities. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, ICSE ’13, pp 1032–1041

  • Zar JH (2005) Spearman Rank Correlation. John Wiley & Sons, Ltd

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariam El Mezouar.

Additional information

Communicated by: Jeffrey C. Carver

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

El Mezouar, M., Zhang, F. & Zou, Y. An empirical study on the teams structures in social coding using GitHub projects. Empir Software Eng 24, 3790–3823 (2019). https://doi.org/10.1007/s10664-019-09700-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09700-1

Keywords

Navigation