Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation


Maintaining a productive and collaborative team of developers is essential to Open Source Software (OSS) success, and hinges upon the trust inherent among the team. Whether a project participant is initiated as a committer is a function of both his technical contributions and also his social interactions with other project participants. One’s online social footprint is arguably easier to ascertain and gather than one’s technical contributions e.g., gathering patch submission information requires mining multiple sources with different formats, and then merging the aliases from these sources. In contrast to prior work, where patch submission was found to be an essential ingredient to achieving committer status, here we investigate the extent to which the likelihood of achieving that status can be modeled solely as a social network phenomenon. For 6 different Apache Software Foundation OSS projects we compile and integrate a set of social measures of the communications network among OSS project participants and a set of technical measures, i.e., OSS developers’ patch submission activities. We use these sets to predict whether a project participant will become a committer, and to characterize their socialization patterns around the time of becoming committer. We find that the social network metrics, in particular the amount of two-way communication a person participates in, are more significant predictors of one’s likelihood to becoming a committer. Further, we find that this is true to the extent that other predictors, e.g., patch submission info, need not be included in the models. In addition, we show that future committers are easy to identify with great fidelity when using the first three months of data of their social activities. Moreover, only the first month of their social links are a very useful predictor, coming within 10 % of the three month data’s predictions. Interestingly, we find that on average, for each project, one’s level of socialization ramps up before the time of becoming a committer. After obtaining committer status, their social behavior is more individualized, falling into few distinct modes of behavior. In a significant number of projects, immediately after the initiation there is a notable social cooling-off period. Finally, we find that it is easier to become a committer earlier in the projects life cycle than it is later as the project matures. These results should provide insight on the social nature of gaining trust and advancing in status in distributed projects.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

    Issue trackers also capture communication between committers and developers. We did not use those because the mailing lists contained a large enough communication sample which was not obviously biased in any way

  7. 7.


  1. Ashton MC, Lee K, Paunonen SV (2002) What is the central feature of extraversion? Social attention versus reward sensitivity. J Pers Soc Psychol 83(1):245

    Article  Google Scholar 

  2. Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: WWW. ACM, pp 95–106

  3. Butler BS (2001) Membership size, communication activity, and sustainability: a resource-based model of online social structures. Inf Syst Res 12(4):346–362

    Article  Google Scholar 

  4. Bird C, Gourley A, Devanbu P, Swaminathan A, Hsu G (2007) Open borders? pImmigration in open source projects. In: MSR. IEEE, p 6

  5. Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: ICPC. IEEE, pp 124–133

  6. Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality? An empirical case study of Windows Vista. Commun ACM 52(8):85–93

    Article  Google Scholar 

  7. Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: MSR. ACM, pp. 137–143

  8. Begel A, Simon B (2008) Novice software developers, all over again. In: Proceedings of the 4th international workshop on computing education research. ACM, pp 3–14

  9. Bauer TN, Erdogan B (2011) Organizational socialization: the effective onboarding of new employees

  10. Bettenburg N, Shihab E, Hassan AE (2009) An empirical study on the risks of using off-the-shelf techniques for processing mailing list data. In: ICSM. IEEE, pp 539–542

  11. Bauer TN, Bodner T, Erdogan B, Truxillo DM, Tucker JS (2007) Newcomer adjustment during organizational socialization: a meta-analytic review of antecedents, outcomes, and methods. J Appl Psychol 92(3):707

    Article  Google Scholar 

  12. Crowston K, Wei K, Howison J, Wiggins A (2012) Free/libre open-source software development: What we know and what we do not know. ACM Comput Surv (CSUR) 44(2):7

    Article  Google Scholar 

  13. Cataldo M, Herbsleb JD, K M Carley (2008) Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In: ESEM. ACM, pp 2–11

  14. Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10(2)

  15. Cheng R, Vassileva J (2006) Design and evaluation of an adaptive incentive mechanism for sustained educational online communities. User Model. User-Adap Inter 16(3–4):321–348

    Article  Google Scholar 

  16. Cohen J (2003) Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum

  17. Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836

    MathSciNet  Article  MATH  Google Scholar 

  18. Ducheneaut N (2005) Socialization in an open source software community: a socio-technical analysis. CSCW 14(4):323–368

    Google Scholar 

  19. De Souza C, Froehlich J, Dourish P (2005) Seeking the source: software source code as a social and technical artifact. In: SIGGROUP. ACM, pp 197–206

  20. Depue RA, Collins PF (1999) Neurobiology of the structure of personality: Dopamine, facilitation of incentive motivation, and extraversion. Behav Brain Sci 22(03):491–517

    Google Scholar 

  21. Deterding S, Sicart M, Nacke L, O’Hara K, Dixon D (2011) Gamification. using game-design elements in non-gaming contexts. In: CHI. ACM, pp 2425–2428

  22. Dai G, De Meuse KP (2007) A review of onboarding literature, Lominger Limited Inc., a subsidiary of Korn/Ferry International

  23. Fielding R (1999) Shared leadership in the Apache project. Commun ACM 42(4):42–43

    Article  Google Scholar 

  24. Fershtman C, Gandal N (2011) Direct and indirect knowledge spillovers: the social network of open-source projects. RAND J Econ 42(1):70–91

    Article  Google Scholar 

  25. Farzan R, DiMicco JM, Millen DR, Dugan C, Geyer W, Brownholtz EA (2008) Results from deploying a participation incentive mechanism within the enterprise. In: CHI. ACM, pp 563–572

  26. German DM (2003) The GNOME project: a case study of open source, global software development. Softw Process: Improv Pract 8(4):201–215

    Article  Google Scholar 

  27. Grant S, Betts B (2013) Encouraging user behaviour with achievements: an empirical study. In: MSR. IEEE, pp 65–68

  28. Goeminne M, Mens T (2013) A comparison of identity merge algorithms for software repositories. Sci. Comput Program 78(8):971–986

    Article  Google Scholar 

  29. Guzzi A, Bacchelli A, Lanza M, Pinzger M, van Deursen A (2013) Communication in open source software development mailing lists. In: MSR. IEEE, pp 277–286

  30. Hertel G, Niedner S, Herrmann S (2003) Motivation of software developers in Open Source projects: an internet-based survey of contributors to the linux kernel. Res Policy 32(7):1159–1177

    Article  Google Scholar 

  31. Herraiz I, Robles G, Amor J, Romera T, González Barahona J (2006) The processes of joining in global distributed software projects. In: International workshop on global software development for the practitioner. ACM, pp 27–33

  32. Jensen C, Scacchi W (2007) Role migration and advancement processes in OSSD projects: a comparative case study. In: ICSE. IEEE, pp 364–374

  33. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering. ACM, p 9

  34. Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects. Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej, pp 69–81

  35. Kogut B, Metiu A (2001) Open-source software development and distributed innovation. Oxf Rev Econ Policy 17(2):248–264

    Article  Google Scholar 

  36. Krogh G, Hippel E (2006) The promise of research on open source software. Manag Sci 52(7):975–983

    Article  Google Scholar 

  37. Kouters E, Vasilescu B, Serebrenik A, van den Brand MGJ (2012) Who’s who in GNOME: using LSA to merge software repository identities. In: ICSM. IEEE, pp 592–595

  38. Long Y, Siau K (2007) Social network structures in open source software development teams. J Database Manag (JDM) 18(2):25–40

    Article  Google Scholar 

  39. Lucas RE, Diener E, Grob A, Suh EM, Shao L (2000) Cross-cultural evidence for the fundamental features of extraversion. J Pers Soc Psychol 79(3):452

    Article  Google Scholar 

  40. Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol (TOSEM) 11(3):309–346

    Article  Google Scholar 

  41. Mann HB (1945) Nonparametric tests against trend. Econometrica: J Econ Soc:245–259

  42. Nakakoji K, Yamamoto Y, Nishinaka Y, Kishida K, Ye Y (2002) Evolution patterns of open-source software systems and communities. In: IWPSE. ACM, pp 76–85

  43. Newman M, Forrest S, Balthrop J (2002) Email networks and the spread of computer viruses. Phys Rev E 66(3):035101(R):1–4

    Article  Google Scholar 

  44. Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 362–371

  45. Qureshi I, Fang Y (2011) Socialization in open source software projects: a growth mixture modeling approach. Organ Res Methods 14(1):208–238

    Article  Google Scholar 

  46. Robles G, Gonzalez-Barahona JM (2006) Contributor turnover in libre software projects. In: Open Source Systems. Springer, pp 273–286

  47. Roberts J, Hann I, Slaughter S (2006) Understanding the motivations, participation, and performance of open source software developers: a longitudinal study of the Apache projects. Manag Sci 52(7):984–999

    Article  Google Scholar 

  48. Raymond E (1999) The cathedral and the bazaar. Knowl, Technol & Policy 12(3):23–49

    MathSciNet  Article  Google Scholar 

  49. Rahman F, Posnett D, Devanbu P (2012) Recalling the imprecision of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM, p 61

  50. Rahman F, Posnett D, Herraiz I, Devanbu P (2013) Sample size vs. bias in defect prediction. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 147–157

  51. Sinha V, Mani S, Sinha S (2011) Entering the circle of trust: developer initiation as committers in open-source projects. In: MSR. ACM, pp 133–142

  52. Stewart K, Gosain S (2001) An exploratory study of ideology and trust in open source development groups. In: ICIS. ACM, pp 1–6

  53. Scacchi W (2007) Free/Open source software development: Recent research results and methods. Adv Comput 69:243–295

    Article  Google Scholar 

  54. Shibuya B, Tamai T (2009) Understanding the process of participating in open source communities. In: International workshop on emerging trends in free/libre/open source software research and development. IEEE, pp 1–6

  55. Schultz W (2006) Behavioral theories and the neurophysiology of reward. Annu Rev Psychol 57:87–115

    Article  Google Scholar 

  56. Spencer D (2009) Card sorting: Designing usable categories. Rosenfeld Media

  57. Von Krogh G, Spaeth S, Lakhani K (2003) Community, joining, and specialization in open source software innovation: a case study. Res Policy 32(7):1217–1241

    Article  Google Scholar 

  58. Vasilescu B, Serebrenik A, Goeminne M, Mens T (2013) On the variation and specialisation of workload—a case study of the GNOME ecosystem community. Empir Softw Eng 1–54

  59. Vasilescu B, Serebrenik A, Devanbu PT, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: CSCW. ACM, pp 342–354

  60. Vuong Q (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: J Econ Soc:307–333

  61. Ye Y, Kishida K (2003) Toward an understanding of the motivation of open source software developers. In: ICSE. IEEE, pp 419–429

  62. Zhou M, Mockus A (2012) What make long term contributors: willingness and opportunity in OSS community. In: ICSE. IEEE, pp 518–528

Download references


All authors gratefully acknowledge support from the Air Force Office of Scientific Research, award FA955-11-1-0246. Vasilescu gratefully acknowledges support from the Dutch Science Foundation (NWO), grant NWO 600.065.120.10N235. Part of this research was carried out during Vasilescu’s visits at UC Davis.

Author information



Corresponding author

Correspondence to Vladimir Filkov.

Additional information

Communicated by: Yann-Gaël Guéhéneuc and Tom Mens

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gharehyazie, M., Posnett, D., Vasilescu, B. et al. Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation. Empir Software Eng 20, 1318–1353 (2015).

Download citation


  • Open source software
  • Email social networks
  • Logistic regression
  • Developer initiation