Empirical Software Engineering

, Volume 19, Issue 4, pp 955–1008 | Cite as

On the variation and specialisation of workload—A case study of the Gnome ecosystem community

  • Bogdan Vasilescu
  • Alexander Serebrenik
  • Mathieu Goeminne
  • Tom Mens


Most empirical studies of open source software repositories focus on the analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined a new series of workload and involvement metrics, as well as a novel approach—\(\widetilde{\mathbf{T}}\)-graphs—for reporting the results of comparing multiple distributions. We used these techniques to statistically study how workload and involvement of ecosystem contributors varies across projects and across activity types, and we explored to which extent projects and contributors specialise in particular activity types. Using Gnome as a case study we observed that, next to coding, the activities of localization, development documentation and building are prevalent throughout the ecosystem. We also observed notable differences between frequent and occasional contributors in terms of the activity types they are involved in and the number of projects they contribute to. Occasional contributors and contributors that are involved in many different projects tend to be more involved in the localization activity, while frequent contributors tend to be more involved in the coding activity in a limited number of projects.


Open source Software ecosystem Metrics Developer community Case study 



We thank Javier Perez and Romuald Deshayes for proofreading a draft version of this article. We are also grateful to Dr. Koo Rijpkema for a number of discussions on certain aspects of statistical analysis and Dr. Frank Konietschke for providing us with the (yet to be published) implementation of the \(\widetilde{\mathbf{T}}\) procedure. Moreover, we thank the anonymous reviewers for their numerous remarks that helped us to improve the article significantly.

This research has been partially supported by research projects FRFC 2.4515.09 financed by Fonds de la Recherche Scientifique (F.R.S-FNRS), ARC AUWB-08/12-UMH-3 and AUWB-12/17-UMONS-3 financed by the Ministère de la Communauté française—Direction générale de l’Enseignement non obligatoire et de la Recherche scientifique (Belgium), and NWO 600.065.120.10N235 financed by the Dutch Science Foundation (Nederlandse Organisatie voor Wetenschappelijk Onderzoek, NWO). Part of this research has been carried out during the second author’s stay at the Université de Mons, supported by grant BSS-2012/V 6/5/015 of the Fonds de la Recherche Scientifique (F.R.S-FNRS).


  1. Aho AV, Garey MR, Ullman JD (1972) The transitive reduction of a directed graph. SIAM J Comput 1(2):131–137CrossRefMATHMathSciNetGoogle Scholar
  2. Akritas M, Arnold S, Brunner E (1997) Nonparametric hypotheses and rank statistics for unbalanced factorial designs. J Am Stat Assoc 92:258–265CrossRefMATHMathSciNetGoogle Scholar
  3. Allison PD (1978) Measures of inequality. Am Sociol Rev 43(6):865–880CrossRefGoogle Scholar
  4. Antoniol G, Di Penta M, Harman M (2005) Search-based techniques applied to optimization of project planning for a massive maintenance project. In: Int conf softw maint. Inst Electr Electron Eng, pp 240–249Google Scholar
  5. Baxter G, Frean M, Noble J, Rickerby M, Smith H, Visser M, Melton H, Tempero E (2006) Understanding the shape of Java software. SIGPLAN Not 41(10):397–412CrossRefGoogle Scholar
  6. Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: Int conf program comprehension. Inst Electr Electron Eng, pp 124–133Google Scholar
  7. Bird C, Gourley A, Devanbu PT, Gertz M, Swaminathan A (2006) Mining email social networks. In: Min softw repos. Assoc comput mach, pp 137–143Google Scholar
  8. Bonaccorsi A, Giannangeli S, Rossi C (2006) Entry strategies under competing standards: hybrid business models in the open source software industry. Manag Sci 52(7):1085–1098CrossRefGoogle Scholar
  9. Brown BM, Hettmansperger TP (2002) Kruskal-Wallis, multiple comparisons and Efron dice. Aust N Z J Stat 44(4):427–438CrossRefMATHMathSciNetGoogle Scholar
  10. Brunner E, Munzel U (2000) The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation. Biom J 42(1):17–25CrossRefMATHMathSciNetGoogle Scholar
  11. Brunner E, Munzel U (2002) Nichtparametrische Datenanalysen: Unverbundene Stichproben. Statistik und ihre Anwendungen, SpringerGoogle Scholar
  12. Capiluppi A, Lago P, Morisio M (2003) Characteristics of open source projects. In: Conf softw maint reengineering. Inst electr electron eng, pp 317–327Google Scholar
  13. Capiluppi A, Serebrenik A, Singer L (2012a) Assessing technical candidates on the social web. IEEE Software 30(1):45–51Google Scholar
  14. Capiluppi A, Serebrenik A, Youssef A (2012b) Developing an h-index for OSS developers. In: Min softw repos. Inst Electr Electron Eng, pp 251–254Google Scholar
  15. Casebolt JR, Krein JL, MacLean AC, Knutson CD, Delorey DP (2009) Author entropy vs. file size in the GNOME suite of applications. In: Min softw repos. Inst Electr Electron Eng, pp 91–94Google Scholar
  16. Christen P (2006) A comparison of personal name matching: Techniques and practical issues. In: Int conf data min. Inst electr electron eng, pp 290–294Google Scholar
  17. Christen P, Churches T, Hegland M (2004) Febrl—a parallel open source data linkage system. In: Adv knowl discov data min. Lect Not Comput Sci, vol 3056. Springer, pp 638–647Google Scholar
  18. Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51:661–703CrossRefMATHMathSciNetGoogle Scholar
  19. Cowell FA (2000) Measurement of inequality. In: Handbook of income distribution. Handbooks in economics, vol 1. Elsevier, pp 87–166Google Scholar
  20. Cowell FA, Jenkins SP (1995) How much inequality can we explain? A methodology and an application to the United States. Econ J 105(429):421–430CrossRefGoogle Scholar
  21. D’Ambros M, Lanza M (2009) Visual software evolution reconstruction. J Softw Maint Evol 21:217–232CrossRefGoogle Scholar
  22. Davies J, German D, Godfrey M, Hindle A (2011) Software bertillonage: finding the provenance of an entity. In: Min softw repos. Assoc comput mach, pp 183–192Google Scholar
  23. Dinh-Trong T, Bieman J (2005) The FreeBSD project: a replication case study of open source development. Trans Softw Eng, Inst Electr Electron Eng 31(6):481–494Google Scholar
  24. Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64CrossRefMATHMathSciNetGoogle Scholar
  25. Dunnett CW (1955) A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 50(272):1096–1121CrossRefMATHGoogle Scholar
  26. Ernst N, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: Requirements engineering: foundation for software quality. Lect Not Comput Sci, vol 6182. Springer, pp 143–157Google Scholar
  27. Gabriel KR (1969) Simultaneous test procedures—some theory of multiple comparisons. Ann Math Stat 40(1):224–250CrossRefMATHMathSciNetGoogle Scholar
  28. German DM (2003) The GNOME project: a case study of open source, global software development. Softw Process Improv Pract 8(4):201–215CrossRefGoogle Scholar
  29. German DM (2004) Using software trails to reconstruct the evolution of software. J Softw Maint Evol 16(6):367–384CrossRefGoogle Scholar
  30. Gini C (1921) Measurement of inequality of incomes. Econ J 31:124–126CrossRefGoogle Scholar
  31. Goeminne M, Mens T (2011a) A comparison of identity merge algorithms for software repositories. Sci Comput Program. Available online 1 Dec 2011, ISSN 0167-6423. doi: 10.1016/j.scico.2011.11.004.
  32. Goeminne M, Mens T (2011b) Evidence for the Pareto principle in open source software activity. In: Int workshop softw qual maintainabGoogle Scholar
  33. Goeminne M, Mens T (2013) Analysing ecosystems for open source software developer communities. In: Software ecosystems: analyzing and managing business networks in the software industry. Palgrave-MacMillanGoogle Scholar
  34. Gousios G, Kalliamvakou E, Spinellis D (2008) Measuring developer contribution from software repository data. In: Min softw repos. Assoc comput mach, pp 129–132Google Scholar
  35. Hindle A, Godfrey MW, Holt RC (2007) Release pattern discovery: A case study of database systems. In: Int conf softw maint. Inst Electr Electron Eng, pp 285–294Google Scholar
  36. Hindle A, Herraiz I, Shihab E, Jiang ZM (2010) Mining challenge 2010: FreeBSD, GNOME desktop and Debian/Ubuntu. In: Min softw repos. Inst Electr Electron Eng, pp 82–85Google Scholar
  37. Holander M, Wolfe DA (1973) Nonparametric statistical methods. WileyGoogle Scholar
  38. Iqbal A, Hausenblas M (2012) Integrating developer-related information across open source repositories. In: 13th Int Conf Information reuse and integration (IRI), 2012 Inst Electr Electron Eng, pp 69–76Google Scholar
  39. ISO/IEC/IEEE (2009) Standard 9945:2009 information technology—portable operating system interface (posix) base specifications. Issue 7Google Scholar
  40. Jergensen C, Sarma A, Wagstrom P (2011) The onion patch: migration in open source ecosystems. In: Gyimóthy T, Zeller A (eds) SIGSOFT found softw eng. Assoc comput mach, pp 70–80Google Scholar
  41. Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1–2):81–93CrossRefMATHMathSciNetGoogle Scholar
  42. Khomh F, Di Penta M, Guéhéneuc YG (2009) An exploratory study of the impact of code smells on software change-proneness. In: Work conf reverse eng. Inst Electr Electron Eng, pp 75–84Google Scholar
  43. Knuth D (1973) The art of computer programming, vol 3. Sorting and searching. Addison WesleyGoogle Scholar
  44. Koch S, Schneider G (2002) Effort, co-operation and co-ordination in an open source software project: GNOME. Inf Syst J 12(1):27–42CrossRefGoogle Scholar
  45. Konietschke F (2012) nparcomp. Reference manualGoogle Scholar
  46. Konietschke F, Hothorn LA, Brunner E (2012) Rank-based multiple test procedures and simultaneous confidence intervals. Electron J Stat 6:738–759CrossRefMATHMathSciNetGoogle Scholar
  47. Kouters E, Vasilescu B, Serebrenik A, van den Brand MGJ (2012) Who’s who in Gnome: using LSA to merge software repository identities. In: Int conf softw maint. Inst Electr Electron Eng, pp 592–595Google Scholar
  48. Krinke J, Gold N, Jia Y, Binkley D (2010) Cloning and copying between GNOME projects. In: Min softw repos. Inst Electr Electron Eng, pp 98–101Google Scholar
  49. Kurtz TE, Link RF, Tukey JW, Wallace DL (1965) Short-cut multiple comparisons for balanced single and double classifications: part 2. Derivations and approximations. Biometrika 52(3–4):485–498MathSciNetGoogle Scholar
  50. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710MathSciNetGoogle Scholar
  51. Linstead E, Baldi P (2009) Mining the coherence of GNOME bug reports with statistical topic models. In: Min softw repos. Inst Electr Electron Eng, pp 99–102Google Scholar
  52. Little T (2006) Schedule estimation and uncertainty surrounding the cone of uncertainty. IEEE Software 23(3):48–54CrossRefGoogle Scholar
  53. Lopez-Fernandez L, Robles G, Gonzalez-Barahona J, Herraiz I (2006) Applying social network analysis techniques to community-driven libre software projects. Int J Inf Technol Web Eng 1(3):27–48CrossRefGoogle Scholar
  54. Lorenz MO (1905) Methods of measuring the concentration of wealth. J Am Stat Assoc 9(70):209–219Google Scholar
  55. Louridas P, Spinellis D, Vlachos V (2008) Power laws in software. Assoc Comput Mach: Trans Softw Eng Meth 18:2:1–2:26;Google Scholar
  56. Luijten B, Visser J, Zaidman A (2010) Assessment of issue handling efficiency. In: Min softw repos. Inst Electr Electron Eng, pp 94–97Google Scholar
  57. Lungu M, Malnati J, Lanza M (2009) Visualizing GNOME with the small project observatory. In: Min softw repos. Inst Electr Electron Eng, pp 103–106Google Scholar
  58. Lungu M, Lanza M, Gîrba T, Robbes R (2010) The small project observatory: visualizing software ecosystems. Sci Comput Program 75:264–275CrossRefMATHGoogle Scholar
  59. de Mendiburu F (2010) Agricolae. Practical manual. Faculty of Economics and Planning, La Molina National Agrarian University, La Molina, Lima, PeruGoogle Scholar
  60. Mens T, Goeminne M (2011) Analysing the evolution of social aspects of open source software ecosystems. In: Int workshop softw ecosystems, CEUR-WS, pp 1–14Google Scholar
  61. Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. Assoc Comput Mach: Trans Softw Eng Meth 11(3):309–346Google Scholar
  62. Moon JY, Sproull L (2000) Essence of distributed work: The case of Linux kernel. First Monday 5(11). http://firstmonday.org/issues/issue5_11/moon/index.html. Accessed December 2011
  63. Mordal K, Anquetil N, Laval J, Serebrenik A, Vasilescu B, Ducasse S (2012) Software quality metrics aggregation in industry. J Softw Evol Proc. doi: 10.1002/smr.1558
  64. Nakakoji K, Yamamoto Y, Nishinaka Y, Kishida K, Ye Y (2002) Evolution patterns of open-source software systems and communities. In: Int workshop princ softw evol. Assoc comput mach, pp 76–85Google Scholar
  65. Neary D, David V (2010) The GNOME census: who writes GNOME? In: GNOME users and developers European conferenceGoogle Scholar
  66. Neu S, Lanza M, Hattori L, D’Ambros M (2011) Telling stories about GNOME with complicity. In: Intl workshop vis softw underst anal. Inst Electr Electron Eng, pp 1–8Google Scholar
  67. Noether GE (1981) Why Kendall tau? Teach Stat 3(2):41–43CrossRefMathSciNetGoogle Scholar
  68. Pearson K (1895) Note on regression and inheritance in the case of two parents. Royal Soc Proc 58:240–242CrossRefGoogle Scholar
  69. Poncin W, Serebrenik A, van den Brand MGJ (2011) Process mining software repositories. In: Conf softw maint reengineering. Inst Electr Electron Eng, pp 5–14Google Scholar
  70. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C/C+ +: the art of scientific computing code. Cambridge University PressGoogle Scholar
  71. R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  72. Robles G, González-Barahona JM (2005) Developer identification methods for integrated data from various sources. In: Min softw repos. Assoc comput mach, pp 106–110Google Scholar
  73. Robles G, Gonzalez-Barahona JM, Merelo JJ (2006) Beyond source code: the importance of other artifacts in software development (a case study). J Syst Softw 79(9):1233–1248CrossRefGoogle Scholar
  74. Robles G, González-Barahona JM, Izquierdo-Cortazar D, Herraiz I (2009) Tools for the study of the usual data sources found in libre software projects. Int J Open Source Softw Process 1(1):24–45CrossRefGoogle Scholar
  75. Rose C (2001) Re: Handling Translations. https://mail.gnome.org/archives/gnome-web-list/2001-August/msg00073.html. Accessed December 2011
  76. Rose C (2007) Re: Git vs SVN (was: can we improve things?). https://mail.gnome.org/archives/foundation-list/2007-September/msg00050.html. Accessed December 2011
  77. Schackmann H, Lichter H (2009) Evaluating process quality in GNOME based on change request data. In: Min softw repos. Inst Electr Electron Eng, pp 95–98Google Scholar
  78. Sekhon JS (2011) Multivariate and propensity score matching software with automated balance optimization: the matching package for R. J Stat Softw 42(7):1–52Google Scholar
  79. Serebrenik A, van den Brand MGJ (2010) Theil index for aggregation of software metrics values. In: Int conf softw maint. Inst Electr Electron Eng, pp 1–9Google Scholar
  80. Serebrenik A, Vasilescu B, van den Brand MGJ (2011) Similar tasks, different effort: Why the same amount of functionality requires different development effort? In: 10th Belg-Neth softw evol semin, pp 4–5Google Scholar
  81. Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & HallGoogle Scholar
  82. Shibuya B, Tamai T (2009) Understanding the process of participating in open source communities. In: Emerg trends in free/libre/open-source softw. Inst Electr Electron Eng, pp 1–6Google Scholar
  83. Shihab E, Jiang ZM, Hassan A (2009) On the use of internet relay chat (IRC) meetings by developers of the GNOME GTK+ project. In: Min softw repos. Inst Electr Electron Eng, pp 107–110Google Scholar
  84. Souphavanh A, Karoonboonyanan T (2005) Free/open source software: localization. United Nations Asia Pacific Development Information ProgrammeGoogle Scholar
  85. Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101CrossRefGoogle Scholar
  86. Stone D (2004) Re: [fdo] Re: on translation regressions due to freedesktop.org dependencies. https://mail.gnome.org/archives/gnome-i18n/2004-July/msg00146.html. Accessed December 2011
  87. Taube-Schock C, Walker RJ, Witten IH (2011) Can we avoid high coupling? In: Eur conf object-oriented program. Lect not comp sci, vol 6813. Springer, pp 204–228Google Scholar
  88. Terceiro A, Rios LR, Chavez C (2010) An empirical study on the structural complexity introduced by core and peripheral developers in free software projects. In: Braz symp softw eng. Inst Electr Electron Eng, pp 21–29Google Scholar
  89. Theil H (1967) Economics and information theory. North-HollandGoogle Scholar
  90. Theil H (1971) Principles of econometrics. John WileyGoogle Scholar
  91. Tsay JT, Dabbish L, Herbsleb J (2012) Social media and success in open source projects. In: Comp support coop work companion. Assoc comput Mach. New York, NY, USA, pp 223–226Google Scholar
  92. Tukey JW (1951) Quick and dirty methods in statistics, part II. Simple analysis for standard designs. In: Am soc qual control, pp 189–197 Google Scholar
  93. Valverde S (2007) Crossover from endogenous to exogenous activity in open-source software development. Europhys Lett 77(2):20,002CrossRefGoogle Scholar
  94. Vasa R, Lumpe M, Branch P, Nierstrasz OM (2009) Comparative analysis of evolving software systems using the Gini coefficient. In: Int conf softw maint. Inst Electr Electron Eng, pp 179–188Google Scholar
  95. Vasilescu B, Serebrenik A, van den Brand MGJ (2011a) By no means a study on aggregating software metrics. In: Workshop emerg trends softw metr. Assoc comput Mach, pp 23–26Google Scholar
  96. Vasilescu B, Serebrenik A, van den Brand MGJ (2011b) You can’t control the unfamiliar: A study on the relations between aggregation techniques for software metrics. In: Int conf softw maint. Inst Electr Electron Eng, pp 313–322Google Scholar
  97. Villa L (2007) Re: GNOME Project Organogram. https://mail.gnome.org/archives/marketing-list/2007-February/msg00027.html. Accessed December 2011
  98. Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2):307–333CrossRefMATHMathSciNetGoogle Scholar
  99. Waugh J (2007) GNOME community celebrates 10 years of software freedom, innovation and industry adoption. https://mail.gnome.org/archives/gnome-announce-list/2007-August/msg00048.html. Accessed December 2011
  100. Weber S (2004) The success of open source. Harvard University PressGoogle Scholar
  101. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRefGoogle Scholar
  102. Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. KluwerGoogle Scholar
  103. Yu L, Ramaswamy S (2007) Mining CVS repositories to understand open-source project developer roles. In: Min softw repos. Inst Electr Electron Eng, p 8Google Scholar
  104. Zaidman A, Rompaey BV, van Deursen A, Demeyer S (2011) Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empir Softw Eng 16(3):325–364CrossRefGoogle Scholar
  105. Zeileis A (2009) ineq: Measuring Inequality, concentration, and poverty. R Foundation for Statistical ComputingGoogle Scholar
  106. Zimmerman DW, Zumbo BD (1992) Parametric alternatives to the Student t test under violation of normality and homogeneity of variance. Percept Mot Skills 74(3(1)):835–844CrossRefGoogle Scholar
  107. Zobel J, Dart P (1996) Phonetic string matching: lessons from information retrieval. In: Int conf res and dev inf retr. Assoc comput mach, pp 166–172Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Bogdan Vasilescu
    • 1
  • Alexander Serebrenik
    • 1
  • Mathieu Goeminne
    • 2
  • Tom Mens
    • 2
  1. 1.MDSEEindhoven University of TechnologyEindhovenThe Netherlands
  2. 2.COMPLEXYS Research InstituteUniversité de MonsMonsBelgium

Personalised recommendations