Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

An empirical comparison of developer retention in the RubyGems and npm software ecosystems

Abstract

Software ecosystems can be viewed as socio-technical networks consisting of technical components (software packages) and social components (communities of developers) that maintain the technical components. Ecosystems evolve over time through socio-technical changes that may greatly impact the ecosystem’s sustainability. Social changes like developer turnover may lead to technical degradation. This motivates the need to identify those factors leading to developer abandonment, in order to automate the process of identifying developers with high abandonment risk. This paper compares such factors for two software package ecosystems, RubyGems and npm. We analyse the evolution of their packages hosted on GitHub, considering development activity in terms of commits, and social interaction with other developers in terms of comments associated to commits, issues or pull requests. We analyse this socio-technical activity for more than 30 and 60k developers for RubyGems and npm, respectively. We use survival analysis to identify which factors coincide with a lower survival probability. Our results reveal that developers with a higher probability to abandon an ecosystem: do not engage in discussions with other developers; do not have strong social and technical activity intensity; communicate or commit less frequently; and do not participate to both technical and social activities for long periods of time. Such observations could be used to automate the identification of developers with a high probability of abandoning the ecosystem and, as such, reduce the risks associated to knowledge loss.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    https://rubygems.org/ for RubyGems.

  2. 2.

    https://www.npmjs.com/ for npm.

  3. 3.

    We use the 2016-09-05 dump of the GHTorrent dataset.

  4. 4.

    https://groups.google.com/forum/#!forum/npm-.

  5. 5.

    https://groups.google.com/forum/#!forum/rubygems-org.

  6. 6.

    https://groups.google.com/forum/#!forum/rubygems-developers.

References

  1. 1.

    Aué J, Haisma M, Tómasdóttir KF, Bacchelli A (2016) Social diversity and growth levels of open source software projects on GitHub. In: International symposium on empirical software engineering and measurement (ESEM), pp 41:1–41:6. doi:10.1145/2961111.2962633

  2. 2.

    Blincoe K, Harrison F, Damian D (2015) Ecosystems in GitHub and a method for ecosystem identification using reference coupling. In: Working conference on mining software repositories (MSR), pp 202–207

  3. 3.

    Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: an empirical investigation. In: ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 33:1–33:10. doi:10.1145/2652524.2652544

  4. 4.

    Constantinou E, Mens T (2016) Social and technical evolution of software ecosystems: a case study of rails. In: European conference on software architecture workshops (ECSAW), pp 23:1–23:4

  5. 5.

    Constantinou E, Mens T (2017) Socio-technical evolution of the Ruby ecosystem in GitHub. In: International conference on software analysis, evolution, and reengineering (SANER), pp 34–44

  6. 6.

    Crowston K, Wei K, Li Q, Howison J (2006) Core and periphery in free/libre and open source software team communications. In: Annual Hawaii international conference on system sciences (HICSS), p 118.1. doi:10.1109/ICSS.2006.101

  7. 7.

    Decan A, Goeminne M, Mens T (2017) On the interaction of relational database access technologies in open source java projects. In: CEUR workshop proceedings. Post-proceedings of the 8th seminar on advanced techniques and tools for software evolution (SATToSE), vol 1820. pp 26–35

  8. 8.

    Decan A, Mens T, Claes M (2017) An empirical comparison of dependency issues in OSS packaging ecosystems. In: International conference on software analysis, evolution, and reengineering (SANER)

  9. 9.

    Ehls D (2017) Open source project collapse—sources and patterns of failure. In: Hawaii international conference on system sciences (HICSS)

  10. 10.

    Ferreira M, Ferreira K, Tulio VM (2017) A comparison of three algorithms for computing truck factors. In: IEEE international conference on program comprehension (ICPC)

  11. 11.

    Foucault M, Palyart M, Blanc X, Murphy GC, Falleri JR (2015) Impact of developer turnover on quality in open-source software. In: Joint meeting on foundations of software engineering (ESEC/FSE), pp 829–841. doi:10.1145/2786805.2786870

  12. 12.

    Fritz T, Ou J, Murphy GC, Murphy-Hill E (2010) A degree-of-knowledge model to capture source code familiarity. In: ACM/IEEE international conference on software engineering—(ICSE), vol 1. pp 385–394. doi:10.1145/1806799.1806856

  13. 13.

    Gousios G (2013) The GHTorrent dataset and tool suite. In: Working conference on mining software repositories (MSR), pp 233–236

  14. 14.

    Guzzi A, Bacchelli A, Lanza M, Pinzger M, Deursen AV (2013) Communication in open source software development mailing lists. In: Working conference on mining software repositories (MSR), pp 277–286

  15. 15.

    Hirsch JE (2005) An index to quantify an individual’s scientific research output. Natl Acad Sci USA 102(46):16569–16572

  16. 16.

    Izquierdo-Cortazar D, Robles G, Ortega F, González-Barahona JM (2009) Using software archaeology to measure knowledge loss in software projects due to developer turnover. In: Hawaii international conference on system sciences (HICSS), pp 1–10

  17. 17.

    Joblin M, Apel S, Hunsen C, Mauerer W (2017) Classifying developers into core and peripheral: an empirical study on count and network metrics. In: International conference on software engineering (ICSE)

  18. 18.

    Kikas R, Gousios G, Dumas M, Pfahl D (2017) Structure and evolution of package dependency networks. In: International conference on mining software repositories (MSR)

  19. 19.

    Kleinbaum DG, Klein M (2012) Survival analysis: a self-learning text, 3rd edn. Springer, New York

  20. 20.

    Lanza M, Marinescu R (2006) Object-oriented metrics in practice, 1st edn. Springer, Berlin

  21. 21.

    Lin B, Robles G, Serebrenik A (2017) Developer turnover in global, industrial open source projects: insights from applying survival analysis. In: International conference on global software engineering (ICGSE)

  22. 22.

    Lungu M (2008) Towards reverse engineering software ecosystems. In: International conference on software maintenance (ICSM), pp 428–431

  23. 23.

    Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: apache and mozilla. ACM Trans Softw Eng Methodol (TOSEM) 11(3):309–346. doi:10.1145/567793.567795

  24. 24.

    Rigby PC, Zhu YC, Donadelli SM, Mockus A (2016) Quantifying and mitigating turnover-induced knowledge loss: case studies of Chrome and a project at Avaya. In: International conference on software engineering (ICSE), pp 1006–1016. doi:10.1145/2884781.2884851

  25. 25.

    Robles G, Gonzalez-Barahona JM (2006) Contributor turnover in libre software projects. In: IFIP international conference on open source systems (OSS), pp 273–286. doi:10.1007/0-387-34226-5_28

  26. 26.

    Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52(9):902–922. doi:10.1016/j.infsof.2010.05.001

  27. 27.

    Scacchi W (2007) Free/open source software development: recent research results and emerging opportunities. In: Joint meeting on european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers (ESEC-FSE companion), pp 459–468. doi:10.1145/1295014.1295019

  28. 28.

    Serebrenik A, Mens T (2015) Challenges in software ecosystems research. In: European conference on software architecture workshops (ECSAW), pp 40:1–40:6

  29. 29.

    Steinmacher I, Chaves AP, Conte TU, Gerosa MA (2014) Preliminary empirical identification of barriers faced by newcomers to open source software projects. In: Brazilian symposium on software engineering (SBES), pp 51–60. doi:10.1109/SBES.2014.9

  30. 30.

    Steinmacher I, Graciotto Silva MA, Gerosa MA, Redmiles DF (2015) A systematic literature review on the barriers faced by newcomers to open source software projects. Inf Softw Technol 59(C):67–85. doi:10.1016/j.infsof.2014.11.001

  31. 31.

    Steinmacher I, Wiese I, Chaves AP, Gerosa MA (2013) Why do newcomers abandon open source software projects? In: International workshop on cooperative and human aspects of software engineering (CHASE), pp 25–32. doi:10.1109/CHASE.2013.6614728

  32. 32.

    Syed S, Jansen S (2013) On clusters in open source ecosystems. In: International workshop on software ecosystems (IWSECO)

  33. 33.

    Terceiro A, Rios LR, Chavez C (2010) An empirical study on the structural complexity introduced by core and peripheral developers in free software projects. In: Brazilian symposium on software engineering, pp 21–29. doi:10.1109/SBES.2010.26

  34. 34.

    Vasilescu B, Posnett D, Ray B, van den Brand MG, Serebrenik A, Devanbu P, Filkov V (2015) Gender and tenure diversity in GitHub teams. In: ACM conference on human factors in computing systems (CHI), pp 3789–3798

  35. 35.

    Vasilescu B, Serebrenik A, Filkov V (2015) A data set for social diversity studies of GitHub teams. In: Working conference on mining software repositories (MSR), pp 514–517

  36. 36.

    Vasilescu B, Serebrenik A, Goeminne M, Mens T (2014) On the variation and specialisation of workload—a case study of the Gnome ecosystem community. Empir Softw Eng 19(4):955–1008. doi:10.1007/s10664-013-9244-1

  37. 37.

    Wahyudin D, Mustofa K, Schatten A, Biffl S, Tjoa AM (2007) Monitoring the health status of open source web-engineering projects. Int J Web Inf Syst 3(1):116–139. doi:10.1108/17440080710829252

  38. 38.

    Wellek S (1993) A log-rank test for equivalence of two survivor functions. Biometrics 49(3):877–881

  39. 39.

    Yamashit K, McIntosh S, Kamei Y, Ubayashi N (2014) Magnet or sticky? an OSS project-by-project typology. In: Working conference on mining software repositories (MSR), pp 344–347. ACM. doi:10.1145/2597073.2597116

  40. 40.

    Zhou M, Mockus A (2012) What make long term contributors: willingness and opportunity in OSS community. In: International conference on software engineering (ICSE), pp 518–528. doi:10.1109/ICSE.2012.6227164

Download references

Acknowledgements

This research was carried out in the context of FNRS crédit de recherche J.0023.16 entitled “Analysis of Software Project Survival” and the bilateral collaborative research program FRQ-FNRS 30440672 entitled “Towards an Interdisciplinary Socio-Technical Methodology and Analysis of Software Ecosystem Health”.

Author information

Correspondence to Eleni Constantinou.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Constantinou, E., Mens, T. An empirical comparison of developer retention in the RubyGems and npm software ecosystems. Innovations Syst Softw Eng 13, 101–115 (2017). https://doi.org/10.1007/s11334-017-0303-4

Download citation

Keywords

  • Software ecosystem
  • Socio-technical interaction
  • Software evolution
  • Empirical analysis
  • Survival analysis