Skip to main content
Log in

An empirical comparison of dependency network evolution in seven software packaging ecosystems

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Nearly every popular programming language comes with one or more package managers. The software packages distributed by such package managers form large software ecosystems. These packaging ecosystems contain a large number of package releases that are updated regularly and that have many dependencies to other package releases. While packaging ecosystems are extremely useful for their respective communities of developers, they face challenges related to their scale, complexity, and rate of evolution. Typical problems are backward incompatible package updates, and the risk of (transitively) depending on packages that have become obsolete or inactive. This manuscript uses the libraries.io dataset to carry out a quantitative empirical analysis of the similarities and differences between the evolution of package dependency networks for seven packaging ecosystems of varying sizes and ages: Cargo for Rust, CPAN for Perl, CRAN for R, npm for JavaScript, NuGet for the .NET platform, Packagist for PHP, and RubyGems for Ruby. We propose novel metrics to capture the growth, changeability, reusability and fragility of these dependency networks, and use these metrics to analyze and compare their evolution. We observe that the dependency networks tend to grow over time, both in size and in number of package updates, while a minority of packages are responsible for most of the package updates. The majority of packages depend on other packages, but only a small proportion of packages accounts for most of the reverse dependencies. We observe a high proportion of “fragile” packages due to a high and increasing number of transitive dependencies. These findings are instrumental for assessing the quality of a package dependency network, and improving it through dependency management tools and imposed policies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Creative Commons Attribution-ShareAlike 4.0 International, see https://creativecommons.org/licenses/by-sa/4.0/.

  2. https://github.com/ecos-umons/extractoR

  3. R2 ? [0, 1] and the closer to 1 the better the model fits the data.

  4. CPAN is twice as old as the other considered ecosystems except for CRAN.

  5. Because the choice of one month period may seem arbitrary, we also computed this indexfor several other periods, and did not observe different behaviors.

  6. http://grimoirelab.github.io

  7. https://bitergia.com

  8. https://github.com/ecos-umons/extractoR

  9. https://chaoss.community

  10. https://www.secohealth.org (October 2017 - September 2019

References

  • Aalen O, Borgan O, Gjessing H (2008) Survival and event history analysis: a process point of view springer. https://doi.org/10.1007/978-0-387-68560-1

  • Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Joint Meeting on Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3106237.3106267, pp 385–395

  • Artho C, Suzaki K, Di Cosmo K, Treinen R, Zacchiroli RS (2012) Why do software packages conflict?. In: Int’l conference mining software repositories. https://doi.org/10.1109/MSR.2012.6224274, pp 141–150

  • Barabási AL (2016) Network science. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Bavota G, Canfora G, Di Penta M, Oliveto R, Panichella S (2015) How the apache community upgrades dependencies: an evolutionary study. Empir Softw Eng 20 (5):1275–1317. https://doi.org/10.1007/s10664-014-9325-9

    Article  Google Scholar 

  • Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: using socio-technical networks to predict failures. In: Int’l symposium software reliability engineering. https://doi.org/10.1109/ISSRE.2009.17. IEEE Computer Society, pp 109–119

  • Blincoe K, Harrison F, Damian D (2015) Ecosystems in GitHub and a method for ecosystem identification using reference coupling. In: Int’l conference mining software repositories. https://doi.org/10.1109/MSR.2015.26. IEEE, pp 202–211

  • Bogart C, Kästner C, Herbsleb J (2015) When it breaks, it breaks: how ecosystem developers reason about the stability of dependencies. In: Automated software engineering workshop. https://doi.org/10.1109/ASEW.2015.21, pp 86–89

  • Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: cost negotiation and community values in three software ecosystems. In: Int’l symposium foundations of software engineering. https://doi.org/10.1145/2950290.2950325

  • Cadariu M, Bouwers E, Visser J, van Deursen A (2015) Tracking known security vulnerabilities in proprietary software systems. In: Int’l conference software analysis, evolution, and reengineering. https://doi.org/10.1109/SANER.2015.7081868, pp 516–519

  • Cataldo M, Scholtes I, Valetto G (2014) A complex networks perspective on collaborative software engineering. Advances in Complex Systems 17(7-8). https://doi.org/10.1142/S0219525914300011

  • Claes M, Mens T, Grosjean P (2014) On the maintainability of CRAN packages. In: Int’l conference software maintenance, reengineering, and reverse engineering. https://doi.org/10.1109/CSMR-WCRE.2014.6747183. IEEE, pp 308–312

  • Constantinou E, Mens T (2017) Socio-technical evolution of the Ruby ecosystem in GitHub. In: Int’l Conference Software Analysis, Evolution and Reengineering (SANER). https://doi.org/10.1109/SANER.2017.7884607, pp 34–44

  • Costas R, Bordons M (2007) The h-index: advantages, limitations and its relation with other bibliometric indicators at the micro level. J Informetrics 1(3):193–203. https://doi.org/10.1016/j.joi.2007.02.001

    Article  Google Scholar 

  • Cox J, Bouwers E, van Eekelen M, Visser J (2015) Measuring dependency freshness in software systems. In: Int’l conference software engineering. IEEE Press, pp 109–118

  • CRAN Repository Maintainers (2016) CRAN repository policy. https://cran.r-project.org/web/packages/policies.html

  • Decan A, Mens T (2017) Replication package for An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems. https://doi.org/10.5281/zenodo.1109685

  • Decan A, Mens T, Claes M, Grosjean P (2015) On the development and distribution of R packages: an empirical analysis of the R ecosystem. In: European conference software architecture workshops. https://doi.org/10.1145/2797433.2797476, pp 41:1–41:6

  • Decan A, Mens T, Claes M (2016) On the topology of package dependency networks — a comparison of three programming language ecosystems. In: European conference software architecture workshops. https://doi.org/10.1145/2993412.3003382. ACM

  • Decan A, Mens T, Claes M, Grosjean P (2016) When GitHub meets CRAN: an analysis of inter-repository package dependency problems. In: Int’l conference software analysis, evolution, and reengineering. https://doi.org/10.1109/SANER.2016.12. IEEE, pp 493–504

  • Decan A, Goeminne M, Mens T (2017) On the interaction of relational database access technologies in open source java projects. In: Bagge A, Mens T, Osman H (eds) Post-proceedings of the 8th Seminar on Advanced Techniques and Tools for Software Evolution. CEUR-WS.org, vol 1820, pp 26–35

  • Decan A, Mens T, Claes M (2017) An empirical comparison of dependency issues in OSS packaging ecosystems. In: Int’l conference software analysis, evolution, and reengineering. https://doi.org/10.1109/SANER.2017.7884604, pp 2–12

  • Di Cosmo R, Vouillon J (2011) On software component co-installability. In: Joint european conference software engineering / foundations of software engineering. https://doi.org/10.1145/2025113.2025149. ACM, pp 256–266

  • Di Cosmo R, Zacchiroli S, Trezentos P (2008) Package upgrades in FOSS distributions: Details and challenges. In: 1St int’l workshop on hot topics in software upgrades. https://doi.org/10.1145/1490283.1490292. ACM, New York

  • Dietrich J, Yakovlev V, McCartin C, Jenson G, Duchrow M (2008) Cluster analysis of Java dependency graphs. In: Symposium software visualization. https://doi.org/10.1145/1409720.1409735. ACM, pp 91–94

  • Germán DM, Adams B, Hassan AE (2013) The evolution of the R software ecosystem. In: European conference software maintenance and reengineering. https://doi.org/10.1109/CSMR.2013.33, pp 243–252

  • Giger E, Pinzger M, Gall H (2011) Using the Gini coefficient for bug prediction in eclipse. In: Int’l workshop on principles of software evolution. https://doi.org/10.1145/2024445.2024455. ACM, pp 51–55

  • Gini C (1912) Variabilità e mutabilità. Memorie di metodologica statistica

  • Goeminne M, Mens T (2011) Evidence for the Pareto principle in open source software activity. In: Workshop on Software Quality and Maintainability (SQM), CEUR workshop proceedings. CEUR-WS.org, vol 701, pp 74–82

  • Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in Java projects. In: Int’l conference software maintenance and evolution. https://doi.org/10.1109/ICSM.2015.7332512

  • González-Barahona JM, Robles G, Michlmayr M, Amor JJ, Germán DM (2009) Macro-level software evolution: a case study of a large software compilation. Empir Softw Eng 14(3):262–285. https://doi.org/10.1007/s10664-008-9100-x

    Article  Google Scholar 

  • Haney D (2016) NPM & left-pad: Have we forgotten how to program? http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/

  • Hemel Z (2010) Javascript: a language in search of a standard library and module system. http://zef.me/blog/2856/javascript-a-language-in-search-of-a-standard-library-and-module-system

  • Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16,569–16,572. http://www.jstor.org/stable/4152261

    Article  MATH  Google Scholar 

  • Hornik K (2012) Are there too many R packages?. Austrian J Stat 41(1):59–66

    Article  Google Scholar 

  • Jansen S, Cusumano M, Brinkkemper S (eds.) (2013) Software Ecosystems: Analyzing and Managing Business Networks in the Software Industry. Edward Elgar

  • Kaplan EL, Meier P (2012) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481

    Article  MathSciNet  MATH  Google Scholar 

  • Kikas R, Gousios G, Dumas M, Pfahl D (2017) Structure and evolution of package dependency networks. In: Int’l Conference Mining Software Repositories (MSR). https://doi.org/10.1109/MSR.2017.55, pp 102–112

  • Kwan I, Schroter A, Damian D (2011) Does socio-technical congruence have an effect on software build success? a study of coordination in a software project. IEEE Trans Soft Eng 37(3):307–324. https://doi.org/10.1109/TSE.2011.29

    Article  Google Scholar 

  • Kyriakakis P, Chatzigeorgiou A (2014) Maintenance patterns of large-scale PHP web applications. In: Int’l conference software maintenance and evolution. https://doi.org/10.1109/ICSME.2014.60, pp 381–390

  • Lehman MM, Fernandez Ramil J, Wernick PD, Perry DE, Turski WM (1997) Metrics and laws of software evolution – the nineties view. In: Int’l symposium software metrics. https://doi.org/10.1109/METRIC.1997.637156. IEEE Computer Society , pp 20–32

  • Lin B, Robles G, Serebrenik A (2017) Developer turnover in global, industrial open source projects: insights from applying survival analysis. In: Int’l Conference Global Software Engineering (ICGSE). https://doi.org/10.1109/ICGSE.2017.11

  • Lorenz MO (1905) Methods of measuring the concentration of wealth. Publ Am Stat Assoc 9(70):209–219. https://doi.org/10.1080/15225437.1905.10503443

    Google Scholar 

  • Manikas K, Hansen KM (2013) Software ecosystems: a systematic literature review. J Syst Softw 86(5):1294–1306. https://doi.org/10.1016/j.jss.2012.12.026

    Article  Google Scholar 

  • Mens T (2015) Anonymized e-mail interviews with R package maintainers active on CRAN and GitHub. Tech. rep., University of Mons. arXiv:1606.05431

  • Mens T (2016) An ecosystemic and socio-technical view on software maintenance and evolution. In: Int’l conference software maintenance and evolution. https://doi.org/10.1109/ICSME.2016.19. IEEE

  • Morris B (2016) REST APIs don’t need a versioning strategy, they need a change strategy. http://www.ben-morris.com/rest-apis-dont-need-a-versioning-strategy-they-need-a-change-strategy/

  • Myers CR (2003) Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys Rev E 68:046,116

    Article  Google Scholar 

  • Nesbitt A, Nickolls B (2017) Libraries.io open source repository and dependency metadata. https://doi.org/10.5281/zenodo.808273

  • Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: Int’l conference software engineering. https://doi.org/10.1109/ICSE.2013.6606591. IEEE, pp 452–461

  • Raemaekers S, van Deursen A, Visser J (2014) Semantic versioning versus breaking changes: a study of the Maven repository. In: Working conference source code analysis and manipulation. https://doi.org/10.1109/SCAM.2014.30, pp 215–224

  • Robbes R, Lungu M, Röthlisberger D. (2012) How do developers react to API deprecation? the case of a Smalltalk ecosystem. In: Int’l symposium foundations of software engineering. https://doi.org/10.1145/2393596.2393662. ACM

  • Sametinger J (1997) Software engineering with reusable components. Springer, Berlin

    Book  MATH  Google Scholar 

  • Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52(9):902–922. https://doi.org/10.1016/j.infsof.2010.05.001

    Article  Google Scholar 

  • Santana F, Werner CML (2013) Towards the analysis of software projects dependencies: an exploratory visual study of software ecosystems. In: Int’l Workshop on Software Ecosystems (IWSECO), CEUR workshop proceedings. CEUR-WS.org, vol 987, pp 7–18

  • Scanniello G (2011) Source code survival with the Kaplan Meier estimator. In: Int’l conference software maintenance. https://doi.org/10.1109/ICSM.2011.6080823, pp 524–527

  • Schlueter IZ (2016) The npm blog: kik, left-pad, and npm. http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm

  • Serebrenik A, Mens T (2015) Challenges in software ecosystems research. In: European conference software architecture workshops. https://doi.org/10.1145/2797433.2797475, pp 40:1–40:6

  • Vasa R, Lumpe M, Branch P, Nierstrasz O (2009) Comparative analysis of evolving software systems using the Gini coefficient. In: Int’l conference software maintenance. https://doi.org/10.1109/ICSM.2009.5306322, pp 179–188

  • Vouillon J, Di Cosmo R (2013) Broken sets in software repository evolution. In: Int’l Conference Software Engineering (ICSE). https://doi.org/10.1109/ICSE.2013.6606587. IEEE Press, pp 412–421

  • Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the JavaScript package ecosystem. In: Int’l conference mining software repositories. https://doi.org/10.1145/2901739.2901743. ACM, pp 351–361

  • Zanetti MS, Schweitzer F (2012) A network perspective on software modularity. In: ARCS Workshops, pp 1–8

  • Zheng X, Zeng D, Li H, Wang F (2008) Analyzing open-source software systems as complex networks. Physica A 387 (24):6190–6200. https://doi.org/10.1016/j.physa.2008.06.050

    Article  Google Scholar 

Download references

Acknowledgements

This research was carried out in the context of FRQ-FNRS collaborative research project R.60.04.18.F “SECOHealth”, ARC research project AUWB-12/17-UMONS-3 “Ecological Studies of Open Source Software Ecosystems”, and FNRS Research Credit J.0023.16 “Analysis of Software Project Survival”. We express our gratitude to Andrew Nesbitt and Ben Nickolls, both from libaries.io and dependencyci.com, for making the package manager dependency data available, and for the very useful email discussions. We thank Jesus Gonzalez-Barahona and Daniel Izquierdo from Bitergia for their relevant feedback. We thank Eleni Constantinou, Alexander Serebrenik and Damian Tamburri for proofreading this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre Decan.

Additional information

Communicated by: Gabriele Bavota and Andrian Marcus

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Decan, A., Mens, T. & Grosjean, P. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir Software Eng 24, 381–416 (2019). https://doi.org/10.1007/s10664-017-9589-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9589-y

Keywords

Navigation