Abstract
Nearly every popular programming language comes with one or more package managers. The software packages distributed by such package managers form large software ecosystems. These packaging ecosystems contain a large number of package releases that are updated regularly and that have many dependencies to other package releases. While packaging ecosystems are extremely useful for their respective communities of developers, they face challenges related to their scale, complexity, and rate of evolution. Typical problems are backward incompatible package updates, and the risk of (transitively) depending on packages that have become obsolete or inactive. This manuscript uses the libraries.io dataset to carry out a quantitative empirical analysis of the similarities and differences between the evolution of package dependency networks for seven packaging ecosystems of varying sizes and ages: Cargo for Rust, CPAN for Perl, CRAN for R, npm for JavaScript, NuGet for the .NET platform, Packagist for PHP, and RubyGems for Ruby. We propose novel metrics to capture the growth, changeability, reusability and fragility of these dependency networks, and use these metrics to analyze and compare their evolution. We observe that the dependency networks tend to grow over time, both in size and in number of package updates, while a minority of packages are responsible for most of the package updates. The majority of packages depend on other packages, but only a small proportion of packages accounts for most of the reverse dependencies. We observe a high proportion of “fragile” packages due to a high and increasing number of transitive dependencies. These findings are instrumental for assessing the quality of a package dependency network, and improving it through dependency management tools and imposed policies.
Similar content being viewed by others
Notes
Creative Commons Attribution-ShareAlike 4.0 International, see https://creativecommons.org/licenses/by-sa/4.0/.
R2 ? [0, 1] and the closer to 1 the better the model fits the data.
CPAN is twice as old as the other considered ecosystems except for CRAN.
Because the choice of one month period may seem arbitrary, we also computed this indexfor several other periods, and did not observe different behaviors.
https://www.secohealth.org (October 2017 - September 2019
References
Aalen O, Borgan O, Gjessing H (2008) Survival and event history analysis: a process point of view springer. https://doi.org/10.1007/978-0-387-68560-1
Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Joint Meeting on Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3106237.3106267, pp 385–395
Artho C, Suzaki K, Di Cosmo K, Treinen R, Zacchiroli RS (2012) Why do software packages conflict?. In: Int’l conference mining software repositories. https://doi.org/10.1109/MSR.2012.6224274, pp 141–150
Barabási AL (2016) Network science. Cambridge University Press, Cambridge
Bavota G, Canfora G, Di Penta M, Oliveto R, Panichella S (2015) How the apache community upgrades dependencies: an evolutionary study. Empir Softw Eng 20 (5):1275–1317. https://doi.org/10.1007/s10664-014-9325-9
Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: using socio-technical networks to predict failures. In: Int’l symposium software reliability engineering. https://doi.org/10.1109/ISSRE.2009.17. IEEE Computer Society, pp 109–119
Blincoe K, Harrison F, Damian D (2015) Ecosystems in GitHub and a method for ecosystem identification using reference coupling. In: Int’l conference mining software repositories. https://doi.org/10.1109/MSR.2015.26. IEEE, pp 202–211
Bogart C, Kästner C, Herbsleb J (2015) When it breaks, it breaks: how ecosystem developers reason about the stability of dependencies. In: Automated software engineering workshop. https://doi.org/10.1109/ASEW.2015.21, pp 86–89
Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: cost negotiation and community values in three software ecosystems. In: Int’l symposium foundations of software engineering. https://doi.org/10.1145/2950290.2950325
Cadariu M, Bouwers E, Visser J, van Deursen A (2015) Tracking known security vulnerabilities in proprietary software systems. In: Int’l conference software analysis, evolution, and reengineering. https://doi.org/10.1109/SANER.2015.7081868, pp 516–519
Cataldo M, Scholtes I, Valetto G (2014) A complex networks perspective on collaborative software engineering. Advances in Complex Systems 17(7-8). https://doi.org/10.1142/S0219525914300011
Claes M, Mens T, Grosjean P (2014) On the maintainability of CRAN packages. In: Int’l conference software maintenance, reengineering, and reverse engineering. https://doi.org/10.1109/CSMR-WCRE.2014.6747183. IEEE, pp 308–312
Constantinou E, Mens T (2017) Socio-technical evolution of the Ruby ecosystem in GitHub. In: Int’l Conference Software Analysis, Evolution and Reengineering (SANER). https://doi.org/10.1109/SANER.2017.7884607, pp 34–44
Costas R, Bordons M (2007) The h-index: advantages, limitations and its relation with other bibliometric indicators at the micro level. J Informetrics 1(3):193–203. https://doi.org/10.1016/j.joi.2007.02.001
Cox J, Bouwers E, van Eekelen M, Visser J (2015) Measuring dependency freshness in software systems. In: Int’l conference software engineering. IEEE Press, pp 109–118
CRAN Repository Maintainers (2016) CRAN repository policy. https://cran.r-project.org/web/packages/policies.html
Decan A, Mens T (2017) Replication package for An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems. https://doi.org/10.5281/zenodo.1109685
Decan A, Mens T, Claes M, Grosjean P (2015) On the development and distribution of R packages: an empirical analysis of the R ecosystem. In: European conference software architecture workshops. https://doi.org/10.1145/2797433.2797476, pp 41:1–41:6
Decan A, Mens T, Claes M (2016) On the topology of package dependency networks — a comparison of three programming language ecosystems. In: European conference software architecture workshops. https://doi.org/10.1145/2993412.3003382. ACM
Decan A, Mens T, Claes M, Grosjean P (2016) When GitHub meets CRAN: an analysis of inter-repository package dependency problems. In: Int’l conference software analysis, evolution, and reengineering. https://doi.org/10.1109/SANER.2016.12. IEEE, pp 493–504
Decan A, Goeminne M, Mens T (2017) On the interaction of relational database access technologies in open source java projects. In: Bagge A, Mens T, Osman H (eds) Post-proceedings of the 8th Seminar on Advanced Techniques and Tools for Software Evolution. CEUR-WS.org, vol 1820, pp 26–35
Decan A, Mens T, Claes M (2017) An empirical comparison of dependency issues in OSS packaging ecosystems. In: Int’l conference software analysis, evolution, and reengineering. https://doi.org/10.1109/SANER.2017.7884604, pp 2–12
Di Cosmo R, Vouillon J (2011) On software component co-installability. In: Joint european conference software engineering / foundations of software engineering. https://doi.org/10.1145/2025113.2025149. ACM, pp 256–266
Di Cosmo R, Zacchiroli S, Trezentos P (2008) Package upgrades in FOSS distributions: Details and challenges. In: 1St int’l workshop on hot topics in software upgrades. https://doi.org/10.1145/1490283.1490292. ACM, New York
Dietrich J, Yakovlev V, McCartin C, Jenson G, Duchrow M (2008) Cluster analysis of Java dependency graphs. In: Symposium software visualization. https://doi.org/10.1145/1409720.1409735. ACM, pp 91–94
Germán DM, Adams B, Hassan AE (2013) The evolution of the R software ecosystem. In: European conference software maintenance and reengineering. https://doi.org/10.1109/CSMR.2013.33, pp 243–252
Giger E, Pinzger M, Gall H (2011) Using the Gini coefficient for bug prediction in eclipse. In: Int’l workshop on principles of software evolution. https://doi.org/10.1145/2024445.2024455. ACM, pp 51–55
Gini C (1912) Variabilità e mutabilità. Memorie di metodologica statistica
Goeminne M, Mens T (2011) Evidence for the Pareto principle in open source software activity. In: Workshop on Software Quality and Maintainability (SQM), CEUR workshop proceedings. CEUR-WS.org, vol 701, pp 74–82
Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in Java projects. In: Int’l conference software maintenance and evolution. https://doi.org/10.1109/ICSM.2015.7332512
González-Barahona JM, Robles G, Michlmayr M, Amor JJ, Germán DM (2009) Macro-level software evolution: a case study of a large software compilation. Empir Softw Eng 14(3):262–285. https://doi.org/10.1007/s10664-008-9100-x
Haney D (2016) NPM & left-pad: Have we forgotten how to program? http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/
Hemel Z (2010) Javascript: a language in search of a standard library and module system. http://zef.me/blog/2856/javascript-a-language-in-search-of-a-standard-library-and-module-system
Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16,569–16,572. http://www.jstor.org/stable/4152261
Hornik K (2012) Are there too many R packages?. Austrian J Stat 41(1):59–66
Jansen S, Cusumano M, Brinkkemper S (eds.) (2013) Software Ecosystems: Analyzing and Managing Business Networks in the Software Industry. Edward Elgar
Kaplan EL, Meier P (2012) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Kikas R, Gousios G, Dumas M, Pfahl D (2017) Structure and evolution of package dependency networks. In: Int’l Conference Mining Software Repositories (MSR). https://doi.org/10.1109/MSR.2017.55, pp 102–112
Kwan I, Schroter A, Damian D (2011) Does socio-technical congruence have an effect on software build success? a study of coordination in a software project. IEEE Trans Soft Eng 37(3):307–324. https://doi.org/10.1109/TSE.2011.29
Kyriakakis P, Chatzigeorgiou A (2014) Maintenance patterns of large-scale PHP web applications. In: Int’l conference software maintenance and evolution. https://doi.org/10.1109/ICSME.2014.60, pp 381–390
Lehman MM, Fernandez Ramil J, Wernick PD, Perry DE, Turski WM (1997) Metrics and laws of software evolution – the nineties view. In: Int’l symposium software metrics. https://doi.org/10.1109/METRIC.1997.637156. IEEE Computer Society , pp 20–32
Lin B, Robles G, Serebrenik A (2017) Developer turnover in global, industrial open source projects: insights from applying survival analysis. In: Int’l Conference Global Software Engineering (ICGSE). https://doi.org/10.1109/ICGSE.2017.11
Lorenz MO (1905) Methods of measuring the concentration of wealth. Publ Am Stat Assoc 9(70):209–219. https://doi.org/10.1080/15225437.1905.10503443
Manikas K, Hansen KM (2013) Software ecosystems: a systematic literature review. J Syst Softw 86(5):1294–1306. https://doi.org/10.1016/j.jss.2012.12.026
Mens T (2015) Anonymized e-mail interviews with R package maintainers active on CRAN and GitHub. Tech. rep., University of Mons. arXiv:1606.05431
Mens T (2016) An ecosystemic and socio-technical view on software maintenance and evolution. In: Int’l conference software maintenance and evolution. https://doi.org/10.1109/ICSME.2016.19. IEEE
Morris B (2016) REST APIs don’t need a versioning strategy, they need a change strategy. http://www.ben-morris.com/rest-apis-dont-need-a-versioning-strategy-they-need-a-change-strategy/
Myers CR (2003) Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys Rev E 68:046,116
Nesbitt A, Nickolls B (2017) Libraries.io open source repository and dependency metadata. https://doi.org/10.5281/zenodo.808273
Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: Int’l conference software engineering. https://doi.org/10.1109/ICSE.2013.6606591. IEEE, pp 452–461
Raemaekers S, van Deursen A, Visser J (2014) Semantic versioning versus breaking changes: a study of the Maven repository. In: Working conference source code analysis and manipulation. https://doi.org/10.1109/SCAM.2014.30, pp 215–224
Robbes R, Lungu M, Röthlisberger D. (2012) How do developers react to API deprecation? the case of a Smalltalk ecosystem. In: Int’l symposium foundations of software engineering. https://doi.org/10.1145/2393596.2393662. ACM
Sametinger J (1997) Software engineering with reusable components. Springer, Berlin
Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52(9):902–922. https://doi.org/10.1016/j.infsof.2010.05.001
Santana F, Werner CML (2013) Towards the analysis of software projects dependencies: an exploratory visual study of software ecosystems. In: Int’l Workshop on Software Ecosystems (IWSECO), CEUR workshop proceedings. CEUR-WS.org, vol 987, pp 7–18
Scanniello G (2011) Source code survival with the Kaplan Meier estimator. In: Int’l conference software maintenance. https://doi.org/10.1109/ICSM.2011.6080823, pp 524–527
Schlueter IZ (2016) The npm blog: kik, left-pad, and npm. http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
Serebrenik A, Mens T (2015) Challenges in software ecosystems research. In: European conference software architecture workshops. https://doi.org/10.1145/2797433.2797475, pp 40:1–40:6
Vasa R, Lumpe M, Branch P, Nierstrasz O (2009) Comparative analysis of evolving software systems using the Gini coefficient. In: Int’l conference software maintenance. https://doi.org/10.1109/ICSM.2009.5306322, pp 179–188
Vouillon J, Di Cosmo R (2013) Broken sets in software repository evolution. In: Int’l Conference Software Engineering (ICSE). https://doi.org/10.1109/ICSE.2013.6606587. IEEE Press, pp 412–421
Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the JavaScript package ecosystem. In: Int’l conference mining software repositories. https://doi.org/10.1145/2901739.2901743. ACM, pp 351–361
Zanetti MS, Schweitzer F (2012) A network perspective on software modularity. In: ARCS Workshops, pp 1–8
Zheng X, Zeng D, Li H, Wang F (2008) Analyzing open-source software systems as complex networks. Physica A 387 (24):6190–6200. https://doi.org/10.1016/j.physa.2008.06.050
Acknowledgements
This research was carried out in the context of FRQ-FNRS collaborative research project R.60.04.18.F “SECOHealth”, ARC research project AUWB-12/17-UMONS-3 “Ecological Studies of Open Source Software Ecosystems”, and FNRS Research Credit J.0023.16 “Analysis of Software Project Survival”. We express our gratitude to Andrew Nesbitt and Ben Nickolls, both from libaries.io and dependencyci.com, for making the package manager dependency data available, and for the very useful email discussions. We thank Jesus Gonzalez-Barahona and Daniel Izquierdo from Bitergia for their relevant feedback. We thank Eleni Constantinou, Alexander Serebrenik and Damian Tamburri for proofreading this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Gabriele Bavota and Andrian Marcus
Rights and permissions
About this article
Cite this article
Decan, A., Mens, T. & Grosjean, P. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir Software Eng 24, 381–416 (2019). https://doi.org/10.1007/s10664-017-9589-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9589-y