Advertisement

Empirical Software Engineering

, Volume 24, Issue 1, pp 381–416 | Cite as

An empirical comparison of dependency network evolution in seven software packaging ecosystems

  • Alexandre DecanEmail author
  • Tom Mens
  • Philippe Grosjean
Article

Abstract

Nearly every popular programming language comes with one or more package managers. The software packages distributed by such package managers form large software ecosystems. These packaging ecosystems contain a large number of package releases that are updated regularly and that have many dependencies to other package releases. While packaging ecosystems are extremely useful for their respective communities of developers, they face challenges related to their scale, complexity, and rate of evolution. Typical problems are backward incompatible package updates, and the risk of (transitively) depending on packages that have become obsolete or inactive. This manuscript uses the libraries.io dataset to carry out a quantitative empirical analysis of the similarities and differences between the evolution of package dependency networks for seven packaging ecosystems of varying sizes and ages: Cargo for Rust, CPAN for Perl, CRAN for R, npm for JavaScript, NuGet for the .NET platform, Packagist for PHP, and RubyGems for Ruby. We propose novel metrics to capture the growth, changeability, reusability and fragility of these dependency networks, and use these metrics to analyze and compare their evolution. We observe that the dependency networks tend to grow over time, both in size and in number of package updates, while a minority of packages are responsible for most of the package updates. The majority of packages depend on other packages, but only a small proportion of packages accounts for most of the reverse dependencies. We observe a high proportion of “fragile” packages due to a high and increasing number of transitive dependencies. These findings are instrumental for assessing the quality of a package dependency network, and improving it through dependency management tools and imposed policies.

Keywords

Software repository mining Software ecosystem Package manager Dependency network Software evolution 

Notes

Acknowledgements

This research was carried out in the context of FRQ-FNRS collaborative research project R.60.04.18.F “SECOHealth”, ARC research project AUWB-12/17-UMONS-3 “Ecological Studies of Open Source Software Ecosystems”, and FNRS Research Credit J.0023.16 “Analysis of Software Project Survival”. We express our gratitude to Andrew Nesbitt and Ben Nickolls, both from libaries.io and dependencyci.com, for making the package manager dependency data available, and for the very useful email discussions. We thank Jesus Gonzalez-Barahona and Daniel Izquierdo from Bitergia for their relevant feedback. We thank Eleni Constantinou, Alexander Serebrenik and Damian Tamburri for proofreading this work.

References

  1. Aalen O, Borgan O, Gjessing H (2008) Survival and event history analysis: a process point of view springer.  https://doi.org/10.1007/978-0-387-68560-1
  2. Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Joint Meeting on Foundations of Software Engineering (ESEC/FSE).  https://doi.org/10.1145/3106237.3106267, pp 385–395
  3. Artho C, Suzaki K, Di Cosmo K, Treinen R, Zacchiroli RS (2012) Why do software packages conflict?. In: Int’l conference mining software repositories.  https://doi.org/10.1109/MSR.2012.6224274, pp 141–150
  4. Barabási AL (2016) Network science. Cambridge University Press, CambridgezbMATHGoogle Scholar
  5. Bavota G, Canfora G, Di Penta M, Oliveto R, Panichella S (2015) How the apache community upgrades dependencies: an evolutionary study. Empir Softw Eng 20 (5):1275–1317.  https://doi.org/10.1007/s10664-014-9325-9 CrossRefGoogle Scholar
  6. Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: using socio-technical networks to predict failures. In: Int’l symposium software reliability engineering.  https://doi.org/10.1109/ISSRE.2009.17. IEEE Computer Society, pp 109–119
  7. Blincoe K, Harrison F, Damian D (2015) Ecosystems in GitHub and a method for ecosystem identification using reference coupling. In: Int’l conference mining software repositories.  https://doi.org/10.1109/MSR.2015.26. IEEE, pp 202–211
  8. Bogart C, Kästner C, Herbsleb J (2015) When it breaks, it breaks: how ecosystem developers reason about the stability of dependencies. In: Automated software engineering workshop.  https://doi.org/10.1109/ASEW.2015.21, pp 86–89
  9. Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: cost negotiation and community values in three software ecosystems. In: Int’l symposium foundations of software engineering.  https://doi.org/10.1145/2950290.2950325
  10. Cadariu M, Bouwers E, Visser J, van Deursen A (2015) Tracking known security vulnerabilities in proprietary software systems. In: Int’l conference software analysis, evolution, and reengineering.  https://doi.org/10.1109/SANER.2015.7081868, pp 516–519
  11. Cataldo M, Scholtes I, Valetto G (2014) A complex networks perspective on collaborative software engineering. Advances in Complex Systems 17(7-8).  https://doi.org/10.1142/S0219525914300011
  12. Claes M, Mens T, Grosjean P (2014) On the maintainability of CRAN packages. In: Int’l conference software maintenance, reengineering, and reverse engineering.  https://doi.org/10.1109/CSMR-WCRE.2014.6747183. IEEE, pp 308–312
  13. Constantinou E, Mens T (2017) Socio-technical evolution of the Ruby ecosystem in GitHub. In: Int’l Conference Software Analysis, Evolution and Reengineering (SANER).  https://doi.org/10.1109/SANER.2017.7884607, pp 34–44
  14. Costas R, Bordons M (2007) The h-index: advantages, limitations and its relation with other bibliometric indicators at the micro level. J Informetrics 1(3):193–203.  https://doi.org/10.1016/j.joi.2007.02.001 CrossRefGoogle Scholar
  15. Cox J, Bouwers E, van Eekelen M, Visser J (2015) Measuring dependency freshness in software systems. In: Int’l conference software engineering. IEEE Press, pp 109–118Google Scholar
  16. CRAN Repository Maintainers (2016) CRAN repository policy. https://cran.r-project.org/web/packages/policies.html
  17. Decan A, Mens T (2017) Replication package for An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems.  https://doi.org/10.5281/zenodo.1109685
  18. Decan A, Mens T, Claes M, Grosjean P (2015) On the development and distribution of R packages: an empirical analysis of the R ecosystem. In: European conference software architecture workshops.  https://doi.org/10.1145/2797433.2797476, pp 41:1–41:6
  19. Decan A, Mens T, Claes M (2016) On the topology of package dependency networks — a comparison of three programming language ecosystems. In: European conference software architecture workshops.  https://doi.org/10.1145/2993412.3003382. ACM
  20. Decan A, Mens T, Claes M, Grosjean P (2016) When GitHub meets CRAN: an analysis of inter-repository package dependency problems. In: Int’l conference software analysis, evolution, and reengineering.  https://doi.org/10.1109/SANER.2016.12. IEEE, pp 493–504
  21. Decan A, Goeminne M, Mens T (2017) On the interaction of relational database access technologies in open source java projects. In: Bagge A, Mens T, Osman H (eds) Post-proceedings of the 8th Seminar on Advanced Techniques and Tools for Software Evolution. CEUR-WS.org, vol 1820, pp 26–35Google Scholar
  22. Decan A, Mens T, Claes M (2017) An empirical comparison of dependency issues in OSS packaging ecosystems. In: Int’l conference software analysis, evolution, and reengineering.  https://doi.org/10.1109/SANER.2017.7884604, pp 2–12
  23. Di Cosmo R, Vouillon J (2011) On software component co-installability. In: Joint european conference software engineering / foundations of software engineering.  https://doi.org/10.1145/2025113.2025149. ACM, pp 256–266
  24. Di Cosmo R, Zacchiroli S, Trezentos P (2008) Package upgrades in FOSS distributions: Details and challenges. In: 1St int’l workshop on hot topics in software upgrades.  https://doi.org/10.1145/1490283.1490292. ACM, New York
  25. Dietrich J, Yakovlev V, McCartin C, Jenson G, Duchrow M (2008) Cluster analysis of Java dependency graphs. In: Symposium software visualization.  https://doi.org/10.1145/1409720.1409735. ACM, pp 91–94
  26. Germán DM, Adams B, Hassan AE (2013) The evolution of the R software ecosystem. In: European conference software maintenance and reengineering.  https://doi.org/10.1109/CSMR.2013.33, pp 243–252
  27. Giger E, Pinzger M, Gall H (2011) Using the Gini coefficient for bug prediction in eclipse. In: Int’l workshop on principles of software evolution.  https://doi.org/10.1145/2024445.2024455. ACM, pp 51–55
  28. Gini C (1912) Variabilità e mutabilità. Memorie di metodologica statisticaGoogle Scholar
  29. Goeminne M, Mens T (2011) Evidence for the Pareto principle in open source software activity. In: Workshop on Software Quality and Maintainability (SQM), CEUR workshop proceedings. CEUR-WS.org, vol 701, pp 74–82Google Scholar
  30. Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in Java projects. In: Int’l conference software maintenance and evolution.  https://doi.org/10.1109/ICSM.2015.7332512
  31. González-Barahona JM, Robles G, Michlmayr M, Amor JJ, Germán DM (2009) Macro-level software evolution: a case study of a large software compilation. Empir Softw Eng 14(3):262–285.  https://doi.org/10.1007/s10664-008-9100-x CrossRefGoogle Scholar
  32. Haney D (2016) NPM & left-pad: Have we forgotten how to program? http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/
  33. Hemel Z (2010) Javascript: a language in search of a standard library and module system. http://zef.me/blog/2856/javascript-a-language-in-search-of-a-standard-library-and-module-system
  34. Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16,569–16,572. http://www.jstor.org/stable/4152261 CrossRefzbMATHGoogle Scholar
  35. Hornik K (2012) Are there too many R packages?. Austrian J Stat 41(1):59–66CrossRefGoogle Scholar
  36. Jansen S, Cusumano M, Brinkkemper S (eds.) (2013) Software Ecosystems: Analyzing and Managing Business Networks in the Software Industry. Edward ElgarGoogle Scholar
  37. Kaplan EL, Meier P (2012) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481MathSciNetCrossRefzbMATHGoogle Scholar
  38. Kikas R, Gousios G, Dumas M, Pfahl D (2017) Structure and evolution of package dependency networks. In: Int’l Conference Mining Software Repositories (MSR).  https://doi.org/10.1109/MSR.2017.55, pp 102–112
  39. Kwan I, Schroter A, Damian D (2011) Does socio-technical congruence have an effect on software build success? a study of coordination in a software project. IEEE Trans Soft Eng 37(3):307–324.  https://doi.org/10.1109/TSE.2011.29 CrossRefGoogle Scholar
  40. Kyriakakis P, Chatzigeorgiou A (2014) Maintenance patterns of large-scale PHP web applications. In: Int’l conference software maintenance and evolution.  https://doi.org/10.1109/ICSME.2014.60, pp 381–390
  41. Lehman MM, Fernandez Ramil J, Wernick PD, Perry DE, Turski WM (1997) Metrics and laws of software evolution – the nineties view. In: Int’l symposium software metrics.  https://doi.org/10.1109/METRIC.1997.637156. IEEE Computer Society , pp 20–32
  42. Lin B, Robles G, Serebrenik A (2017) Developer turnover in global, industrial open source projects: insights from applying survival analysis. In: Int’l Conference Global Software Engineering (ICGSE).  https://doi.org/10.1109/ICGSE.2017.11
  43. Lorenz MO (1905) Methods of measuring the concentration of wealth. Publ Am Stat Assoc 9(70):209–219.  https://doi.org/10.1080/15225437.1905.10503443 Google Scholar
  44. Manikas K, Hansen KM (2013) Software ecosystems: a systematic literature review. J Syst Softw 86(5):1294–1306.  https://doi.org/10.1016/j.jss.2012.12.026 CrossRefGoogle Scholar
  45. Mens T (2015) Anonymized e-mail interviews with R package maintainers active on CRAN and GitHub. Tech. rep., University of Mons. arXiv:1606.05431
  46. Mens T (2016) An ecosystemic and socio-technical view on software maintenance and evolution. In: Int’l conference software maintenance and evolution.  https://doi.org/10.1109/ICSME.2016.19. IEEE
  47. Morris B (2016) REST APIs don’t need a versioning strategy, they need a change strategy. http://www.ben-morris.com/rest-apis-dont-need-a-versioning-strategy-they-need-a-change-strategy/
  48. Myers CR (2003) Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys Rev E 68:046,116CrossRefGoogle Scholar
  49. Nesbitt A, Nickolls B (2017) Libraries.io open source repository and dependency metadata.  https://doi.org/10.5281/zenodo.808273
  50. Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: Int’l conference software engineering.  https://doi.org/10.1109/ICSE.2013.6606591. IEEE, pp 452–461
  51. Raemaekers S, van Deursen A, Visser J (2014) Semantic versioning versus breaking changes: a study of the Maven repository. In: Working conference source code analysis and manipulation.  https://doi.org/10.1109/SCAM.2014.30, pp 215–224
  52. Robbes R, Lungu M, Röthlisberger D. (2012) How do developers react to API deprecation? the case of a Smalltalk ecosystem. In: Int’l symposium foundations of software engineering.  https://doi.org/10.1145/2393596.2393662. ACM
  53. Sametinger J (1997) Software engineering with reusable components. Springer, BerlinCrossRefzbMATHGoogle Scholar
  54. Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52(9):902–922.  https://doi.org/10.1016/j.infsof.2010.05.001 CrossRefGoogle Scholar
  55. Santana F, Werner CML (2013) Towards the analysis of software projects dependencies: an exploratory visual study of software ecosystems. In: Int’l Workshop on Software Ecosystems (IWSECO), CEUR workshop proceedings. CEUR-WS.org, vol 987, pp 7–18Google Scholar
  56. Scanniello G (2011) Source code survival with the Kaplan Meier estimator. In: Int’l conference software maintenance.  https://doi.org/10.1109/ICSM.2011.6080823, pp 524–527
  57. Schlueter IZ (2016) The npm blog: kik, left-pad, and npm. http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
  58. Serebrenik A, Mens T (2015) Challenges in software ecosystems research. In: European conference software architecture workshops.  https://doi.org/10.1145/2797433.2797475, pp 40:1–40:6
  59. Vasa R, Lumpe M, Branch P, Nierstrasz O (2009) Comparative analysis of evolving software systems using the Gini coefficient. In: Int’l conference software maintenance.  https://doi.org/10.1109/ICSM.2009.5306322, pp 179–188
  60. Vouillon J, Di Cosmo R (2013) Broken sets in software repository evolution. In: Int’l Conference Software Engineering (ICSE).  https://doi.org/10.1109/ICSE.2013.6606587. IEEE Press, pp 412–421
  61. Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the JavaScript package ecosystem. In: Int’l conference mining software repositories.  https://doi.org/10.1145/2901739.2901743. ACM, pp 351–361
  62. Zanetti MS, Schweitzer F (2012) A network perspective on software modularity. In: ARCS Workshops, pp 1–8Google Scholar
  63. Zheng X, Zeng D, Li H, Wang F (2008) Analyzing open-source software systems as complex networks. Physica A 387 (24):6190–6200.  https://doi.org/10.1016/j.physa.2008.06.050 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.COMPLEXYS Research InstituteUniversity of MonsMonsBelgium

Personalised recommendations