Empirical Software Engineering

, Volume 20, Issue 5, pp 1318–1353 | Cite as

Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation

  • Mohammad Gharehyazie
  • Daryl Posnett
  • Bogdan Vasilescu
  • Vladimir Filkov


Maintaining a productive and collaborative team of developers is essential to Open Source Software (OSS) success, and hinges upon the trust inherent among the team. Whether a project participant is initiated as a committer is a function of both his technical contributions and also his social interactions with other project participants. One’s online social footprint is arguably easier to ascertain and gather than one’s technical contributions e.g., gathering patch submission information requires mining multiple sources with different formats, and then merging the aliases from these sources. In contrast to prior work, where patch submission was found to be an essential ingredient to achieving committer status, here we investigate the extent to which the likelihood of achieving that status can be modeled solely as a social network phenomenon. For 6 different Apache Software Foundation OSS projects we compile and integrate a set of social measures of the communications network among OSS project participants and a set of technical measures, i.e., OSS developers’ patch submission activities. We use these sets to predict whether a project participant will become a committer, and to characterize their socialization patterns around the time of becoming committer. We find that the social network metrics, in particular the amount of two-way communication a person participates in, are more significant predictors of one’s likelihood to becoming a committer. Further, we find that this is true to the extent that other predictors, e.g., patch submission info, need not be included in the models. In addition, we show that future committers are easy to identify with great fidelity when using the first three months of data of their social activities. Moreover, only the first month of their social links are a very useful predictor, coming within 10 % of the three month data’s predictions. Interestingly, we find that on average, for each project, one’s level of socialization ramps up before the time of becoming a committer. After obtaining committer status, their social behavior is more individualized, falling into few distinct modes of behavior. In a significant number of projects, immediately after the initiation there is a notable social cooling-off period. Finally, we find that it is easier to become a committer earlier in the projects life cycle than it is later as the project matures. These results should provide insight on the social nature of gaining trust and advancing in status in distributed projects.


Open source software Email social networks Logistic regression Developer initiation 



All authors gratefully acknowledge support from the Air Force Office of Scientific Research, award FA955-11-1-0246. Vasilescu gratefully acknowledges support from the Dutch Science Foundation (NWO), grant NWO 600.065.120.10N235. Part of this research was carried out during Vasilescu’s visits at UC Davis.


  1. Ashton MC, Lee K, Paunonen SV (2002) What is the central feature of extraversion? Social attention versus reward sensitivity. J Pers Soc Psychol 83(1):245CrossRefGoogle Scholar
  2. Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: WWW. ACM, pp 95–106Google Scholar
  3. Butler BS (2001) Membership size, communication activity, and sustainability: a resource-based model of online social structures. Inf Syst Res 12(4):346–362CrossRefGoogle Scholar
  4. Bird C, Gourley A, Devanbu P, Swaminathan A, Hsu G (2007) Open borders? pImmigration in open source projects. In: MSR. IEEE, p 6Google Scholar
  5. Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: ICPC. IEEE, pp 124–133Google Scholar
  6. Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality? An empirical case study of Windows Vista. Commun ACM 52(8):85–93CrossRefGoogle Scholar
  7. Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: MSR. ACM, pp. 137–143Google Scholar
  8. Begel A, Simon B (2008) Novice software developers, all over again. In: Proceedings of the 4th international workshop on computing education research. ACM, pp 3–14Google Scholar
  9. Bauer TN, Erdogan B (2011) Organizational socialization: the effective onboarding of new employeesGoogle Scholar
  10. Bettenburg N, Shihab E, Hassan AE (2009) An empirical study on the risks of using off-the-shelf techniques for processing mailing list data. In: ICSM. IEEE, pp 539–542Google Scholar
  11. Bauer TN, Bodner T, Erdogan B, Truxillo DM, Tucker JS (2007) Newcomer adjustment during organizational socialization: a meta-analytic review of antecedents, outcomes, and methods. J Appl Psychol 92(3):707CrossRefGoogle Scholar
  12. Crowston K, Wei K, Howison J, Wiggins A (2012) Free/libre open-source software development: What we know and what we do not know. ACM Comput Surv (CSUR) 44(2):7CrossRefGoogle Scholar
  13. Cataldo M, Herbsleb JD, K M Carley (2008) Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In: ESEM. ACM, pp 2–11Google Scholar
  14. Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10(2)Google Scholar
  15. Cheng R, Vassileva J (2006) Design and evaluation of an adaptive incentive mechanism for sustained educational online communities. User Model. User-Adap Inter 16(3–4):321–348CrossRefGoogle Scholar
  16. Cohen J (2003) Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence ErlbaumGoogle Scholar
  17. Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836MathSciNetCrossRefzbMATHGoogle Scholar
  18. Ducheneaut N (2005) Socialization in an open source software community: a socio-technical analysis. CSCW 14(4):323–368Google Scholar
  19. De Souza C, Froehlich J, Dourish P (2005) Seeking the source: software source code as a social and technical artifact. In: SIGGROUP. ACM, pp 197–206Google Scholar
  20. Depue RA, Collins PF (1999) Neurobiology of the structure of personality: Dopamine, facilitation of incentive motivation, and extraversion. Behav Brain Sci 22(03):491–517Google Scholar
  21. Deterding S, Sicart M, Nacke L, O’Hara K, Dixon D (2011) Gamification. using game-design elements in non-gaming contexts. In: CHI. ACM, pp 2425–2428Google Scholar
  22. Dai G, De Meuse KP (2007) A review of onboarding literature, Lominger Limited Inc., a subsidiary of Korn/Ferry InternationalGoogle Scholar
  23. Fielding R (1999) Shared leadership in the Apache project. Commun ACM 42(4):42–43CrossRefGoogle Scholar
  24. Fershtman C, Gandal N (2011) Direct and indirect knowledge spillovers: the social network of open-source projects. RAND J Econ 42(1):70–91CrossRefGoogle Scholar
  25. Farzan R, DiMicco JM, Millen DR, Dugan C, Geyer W, Brownholtz EA (2008) Results from deploying a participation incentive mechanism within the enterprise. In: CHI. ACM, pp 563–572Google Scholar
  26. German DM (2003) The GNOME project: a case study of open source, global software development. Softw Process: Improv Pract 8(4):201–215CrossRefGoogle Scholar
  27. Grant S, Betts B (2013) Encouraging user behaviour with achievements: an empirical study. In: MSR. IEEE, pp 65–68Google Scholar
  28. Goeminne M, Mens T (2013) A comparison of identity merge algorithms for software repositories. Sci. Comput Program 78(8):971–986CrossRefGoogle Scholar
  29. Guzzi A, Bacchelli A, Lanza M, Pinzger M, van Deursen A (2013) Communication in open source software development mailing lists. In: MSR. IEEE, pp 277–286Google Scholar
  30. Hertel G, Niedner S, Herrmann S (2003) Motivation of software developers in Open Source projects: an internet-based survey of contributors to the linux kernel. Res Policy 32(7):1159–1177CrossRefGoogle Scholar
  31. Herraiz I, Robles G, Amor J, Romera T, González Barahona J (2006) The processes of joining in global distributed software projects. In: International workshop on global software development for the practitioner. ACM, pp 27–33Google Scholar
  32. Jensen C, Scacchi W (2007) Role migration and advancement processes in OSSD projects: a comparative case study. In: ICSE. IEEE, pp 364–374Google Scholar
  33. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering. ACM, p 9Google Scholar
  34. Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects. Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej, pp 69–81Google Scholar
  35. Kogut B, Metiu A (2001) Open-source software development and distributed innovation. Oxf Rev Econ Policy 17(2):248–264CrossRefGoogle Scholar
  36. Krogh G, Hippel E (2006) The promise of research on open source software. Manag Sci 52(7):975–983CrossRefGoogle Scholar
  37. Kouters E, Vasilescu B, Serebrenik A, van den Brand MGJ (2012) Who’s who in GNOME: using LSA to merge software repository identities. In: ICSM. IEEE, pp 592–595Google Scholar
  38. Long Y, Siau K (2007) Social network structures in open source software development teams. J Database Manag (JDM) 18(2):25–40CrossRefGoogle Scholar
  39. Lucas RE, Diener E, Grob A, Suh EM, Shao L (2000) Cross-cultural evidence for the fundamental features of extraversion. J Pers Soc Psychol 79(3):452CrossRefGoogle Scholar
  40. Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol (TOSEM) 11(3):309–346CrossRefGoogle Scholar
  41. Mann HB (1945) Nonparametric tests against trend. Econometrica: J Econ Soc:245–259Google Scholar
  42. Nakakoji K, Yamamoto Y, Nishinaka Y, Kishida K, Ye Y (2002) Evolution patterns of open-source software systems and communities. In: IWPSE. ACM, pp 76–85Google Scholar
  43. Newman M, Forrest S, Balthrop J (2002) Email networks and the spread of computer viruses. Phys Rev E 66(3):035101(R):1–4CrossRefGoogle Scholar
  44. Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 362–371Google Scholar
  45. Qureshi I, Fang Y (2011) Socialization in open source software projects: a growth mixture modeling approach. Organ Res Methods 14(1):208–238CrossRefGoogle Scholar
  46. Robles G, Gonzalez-Barahona JM (2006) Contributor turnover in libre software projects. In: Open Source Systems. Springer, pp 273–286Google Scholar
  47. Roberts J, Hann I, Slaughter S (2006) Understanding the motivations, participation, and performance of open source software developers: a longitudinal study of the Apache projects. Manag Sci 52(7):984–999CrossRefGoogle Scholar
  48. Raymond E (1999) The cathedral and the bazaar. Knowl, Technol & Policy 12(3):23–49MathSciNetCrossRefGoogle Scholar
  49. Rahman F, Posnett D, Devanbu P (2012) Recalling the imprecision of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM, p 61Google Scholar
  50. Rahman F, Posnett D, Herraiz I, Devanbu P (2013) Sample size vs. bias in defect prediction. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 147–157Google Scholar
  51. Sinha V, Mani S, Sinha S (2011) Entering the circle of trust: developer initiation as committers in open-source projects. In: MSR. ACM, pp 133–142Google Scholar
  52. Stewart K, Gosain S (2001) An exploratory study of ideology and trust in open source development groups. In: ICIS. ACM, pp 1–6Google Scholar
  53. Scacchi W (2007) Free/Open source software development: Recent research results and methods. Adv Comput 69:243–295CrossRefGoogle Scholar
  54. Shibuya B, Tamai T (2009) Understanding the process of participating in open source communities. In: International workshop on emerging trends in free/libre/open source software research and development. IEEE, pp 1–6Google Scholar
  55. Schultz W (2006) Behavioral theories and the neurophysiology of reward. Annu Rev Psychol 57:87–115CrossRefGoogle Scholar
  56. Spencer D (2009) Card sorting: Designing usable categories. Rosenfeld MediaGoogle Scholar
  57. Von Krogh G, Spaeth S, Lakhani K (2003) Community, joining, and specialization in open source software innovation: a case study. Res Policy 32(7):1217–1241CrossRefGoogle Scholar
  58. Vasilescu B, Serebrenik A, Goeminne M, Mens T (2013) On the variation and specialisation of workload—a case study of the GNOME ecosystem community. Empir Softw Eng 1–54Google Scholar
  59. Vasilescu B, Serebrenik A, Devanbu PT, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: CSCW. ACM, pp 342–354Google Scholar
  60. Vuong Q (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: J Econ Soc:307–333Google Scholar
  61. Ye Y, Kishida K (2003) Toward an understanding of the motivation of open source software developers. In: ICSE. IEEE, pp 419–429Google Scholar
  62. Zhou M, Mockus A (2012) What make long term contributors: willingness and opportunity in OSS community. In: ICSE. IEEE, pp 518–528Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Mohammad Gharehyazie
    • 1
  • Daryl Posnett
    • 1
  • Bogdan Vasilescu
    • 2
  • Vladimir Filkov
    • 1
  1. 1.Department of Computer ScienceUniversity of California, DavisDavisUSA
  2. 2.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations