Abstract
Modern programming languages such as Java, JavaScript, and Rust encourage software reuse by hosting diverse and fast-growing repositories of highly interdependent packages (i.e., reusable libraries) for their users. The standard way to study the interdependence between software packages is to infer a package dependency network by parsing manifest data. Such networks help answer questions such as “How many packages have dependencies to packages with known security issues?” or “What are the most used packages?”. However, an overlooked aspect in existing studies is that manifest-inferred relationships do not necessarily examine the actual usage of these dependencies in source code. To better model dependencies between packages, we developed Präzi, an approach combining manifests and call graphs of packages. Präzi constructs a dependency network at the more fine-grained function-level, instead of at the manifest level. This paper discusses a prototypical Präzi implementation for the popular system programming language Rust. We use Präzi to characterize Rust’s package repository, Crates.io, at the function level and perform a comparative study with metadata-based networks. Our results show that metadata-based networks generalize how packages use their dependencies. Using Präzi, we find packages call only 40% of their resolved dependencies, and that manual analysis of 34 cases reveals that not all packages use a dependency the same way. We argue that researchers and practitioners interested in understanding how developers or programs use dependencies should account for its context—not the sum of all resolved dependencies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
after normalizing the networks (i.e., inner join of common packages in all three networks)
See footnote 22
due to presentation reasons, we showcase for only three years
References
Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp 385–395
Abdalkareem R, Oda V, Mujahid S, Shihab E (2019) On the impact of using trivial packages: an empirical case study on npm and pypi. Empir Softw Eng:1–37
Albert R, Barabási A L (2002) Statistical mechanics of complex networks. Rev Modern Phys 74(1):47
Ali K, Lhoták O (2012) Application-only call graph construction. In: European Conference on Object-Oriented Programming. Springer, pp 688–712
Alimadadi S, Mesbah A, Pattabiraman K (2015) Hybrid Dom-sensitive change impact analysis for javascript. In: 29th European Conference on Object-Oriented Programming (ECOOP 2015), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Aparicio J (2019) cargo-call-stack: Static, whole program stack analysis. https://github.com/japaric/cargo-call-stack
Baldwin A (2018) Details about the event-stream incident. https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident
Beller M, Bholanath R, McIntosh S, Zaidman A (2016) Analyzing The state of static analysis: A large-scale evaluation in open source software. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. IEEE, pp 470–481
Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: Cost negotiation and community values in three software ecosystems. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, pp 109–120
Brian A, David T, Aaron T (2020) The rust libz blitz. https://blog.rust-lang.org/2017/05/05/libz-blitz.html
Chen L, Hassan F, Wang X, Zhang L (2020) Taming behavioral backward incompatibilities via cross-project testing and analysis. In: IEEE/ACM International Conference on Software Engineering
Chinthanet B, Ponta S E, Plate H, Sabetta A, Kula R G, Ishio T, Matsumoto K (2020) Code-based Vulnerability detection in node. js applications: How far are we? In: 2020 35Th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 1199–1203
Cogo F R, Oliva GA, Hassan AE (2019) An empirical study of dependency downgrades in the npm ecosystem. IEEE Transactions on Software Engineering
Decan A, Mens T, Constantinou E (2018a) On the impact of security vulnerabilities in the npm package dependency network. In: International Conference on Mining Software Repositories
Decan A, Mens T, Grosjean P (2018b) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering
Decan A, Mens T, Grosjean P (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir Softw Eng 24(1):381–416
Dietrich J, Pearce D, Stringer J, Tahir A, Blincoe K (2019) Dependency versioning in the wild. In: 2019 IEEE/ACM 16Th international conference on mining software repositories (MSR). IEEE, pp 349–359
Duan R, Bijlani A, Xu M, Kim T, Lee W (2017) Identifying open-source license violation and 1-day security risk at large scale. In: Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security, pp 2169–2185
Dunn J (2017) Pypi python repository hit by typosquatting sneak attack. https://nakedsecurity.sophos.com/2017/09/19/pypi-python-repository-hit-by-typosquatting-sneak-attack/
Emami M, Ghiya R, Hendren L J (1994) Context-sensitive interprocedural points-to analysis in the presence of function pointers. ACM SIGPLAN Not 29(6):242–256
Hejderup J (2015) In dependencies we trust: How vulnerable are dependencies in software modules? Master’s thesis, Delft University of technology
Hejderup J, van Deursen A, Gousios G (2018) Software ecosystem call graph for dependency management.In: Proceedings of the 40th International Conference on Software Engineering, New Ideas and Emerging Results. ACM, pp 101–104
Hejderup J, Beller M, Triantafyllou K, Gousios G (2021) Präzi: From Package-based to Call-based Dependency Networks. https://doi.org/10.5281/zenodo.4478981
Hopkins WG (1997) A new view of statistics. Will G. Hopkins
Katz Y (2016) Cargo: predictable dependency management. https://blog.rust-lang.org/2016/05/05/cargo-pillars.html
Kikas R, Gousios G, Dumas M, Pfahl D (2017) Structure and evolution of package dependency networks. In: Proceedings of the 14th International Conference on Mining Software Repositories, IEEE Press, pp 102–112
Kula R G, De Roover C, German D M, Ishio T, Inoue K (2018a) A generalized model for visualizing library popularity, adoption, and diffusion within a software ecosystem. In: 2018 IEEE 25Th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 288–299
Kula R G, Ouni A, German D M, Inoue K (2018b) An empirical study on the impact of refactoring activities on evolving client-used apis. Inf Softw Technol 93:186–199
Lehman M M (1980) Programs, life cycles, and laws of software evolution. Proc IEEE 68(9):1060–1076
Livshits B, Sridharan M, Smaragdakis Y, Lhoták O, Amaral J N, Chang B Y E, Guyer S Z, Khedker U P, Møller A, Vardoulakis D (2015) In defense of soundiness: a manifesto. Commun ACM 58(2):44–46
Martins P, Achar R, Lopes C V (2018) 50K-c: a dataset of compilable, and compiled, java projects. In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 1–5
Matsakis N (2016) Introducing mir. https://blog.rust-lang.org/2016/04/19/MIR.html
Mezzetti G, Møller A, Torp MT (2018) Type regression testing to detect breaking changes in node. js libraries. In: 32nd European Conference on Object-Oriented Programming (ECOOP 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, pp 84–94
Mones E, Vicsek L, Vicsek T (2012) Hierarchy measure for complex networks. PloS one 7(3):e33799
Mujahid S, Abdalkareem R, Shihab E, McIntosh S (2020) Using others’ tests to identify breaking updates. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp 466–476
Nguyen H A, Nguyen T N, Dig D, Nguyen S, Tran H, Hilton M (2019) Graph-based mining of in-the-wild, fine-grained, semantic code change patterns. In: 2019 IEEE/ACM 41St international conference on software engineering (ICSE). IEEE, pp 819–830
Ponta S E, Plate H, Sabetta A (2018) Beyond Metadata: Code-centric and usage-based analysis of known vulnerabilities in open-source software. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 449–460
Preston-Werner T (2013) Semantic versioning. https://semver.org/
Raemaekers S, van Deursen A, Visser J (2017) Semantic versioning and impact of breaking changes in the maven repository. J Syst Softw 129:140–158
Robbes R, Lungu M, Röthlisberger D (2012) How do developers react to api deprecation? the case of a smalltalk ecosystem. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp 1–11
Ryder B G (1979) Constructing the call graph of a program. IEEE Trans Softw Eng (3):216–226
Salis V, Sotiropoulos T, Louridas P, Spinellis D, Mitropoulos D (2021) Pycg: Practical Call graph generation in pytho. In: 2021 IEEE/ACM 43Rd international conference on software engineering (ICSE). IEEE, pp 1646–1657
Sawant AA, Bacchelli A (2017) Fine-grape: fine-grained api usage extractor-an approach and dataset to investigate api usage. Empir Softw Eng 22(3):1348-1371
Sawant A A, Aniche M, van Deursen A, Bacchelli A (2018a) Understanding developers’ needs on deprecation as a language feature. In: 2018 IEEE/ACM 40Th international conference on software engineering (ICSE). IEEE, pp 561–571
Sawant AA, Aniche M, van Deursen A, Bacchelli A (2018b) Understanding developers’ needs on deprecation as a language feature. In: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18. ACM, New York, pp 561–571
Sawant AA, Robbes R, Bacchelli A (2018c) On the reaction to deprecation of clients of 4 + 1 popular java apis and the jdk. Empir Softw Eng 23(4):2158–2197
Schlueter I (2013) Unix philosophy and node.js. https://blog.izs.me/2013/04/unix-philosophy-and-nodejs/
Schlueter I (2017) The npm blog – kik, left-pad, and npm. http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
Shapiro M, Horwitz S (1997) Fast and accurate flow-insensitive points-to analysis. In: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp 1–14
Shivers O (1991) Control-flow analysis of higher-order languages. PhD thesis, Carnegie Mellon University
Steensgaard B (1996) Points-to analysis in almost linear time. In: Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp 32–41
Sulír M, Porubän J (2016) A quantitative study of Java software buildability. In: Proceedings of the 7th International Workshop on Evaluation and Usability of Programming Languages and Tools. ACM, pp 17–25
Sundaresan V, Hendren L, Razafimahefa C, Vallée-Rai R, Lam P, Gagnon E, Godin C (2000) Practical virtual method call resolution for Java, vol 35. ACM
Tip F, Palsberg J (2000) Scalable propagation-based call graph construction algorithms. In: Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pp 281–293
Triantafyllou K (2019) A benchmark for rust call-graph generators. https://users.rust-lang.org/t/a-benchmark-for-rust-call-graph-generators/34494
Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2017) There and back again: Can you compile that snapshot? J Softw Evol Process 29(4)
Valiev M, Vasilescu B, Herbsleb J (2018) Ecosystem-level determinants of sustained activity in open-source projects: a case study of the pypi ecosystem. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, pp 644–655
Xavier L, Brito A, Hora A, Valente M T (2017) Historical And impact analysis of api breaking changes: A large-scale study. In: 2017 IEEE 24Th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 138–147
Zapata R E, Kula R G, Chinthanet B, Ishio T, Matsumoto K, Ihara A (2018) Towards Smoother library migrations: A look at vulnerable dependency migrations at function level for npm javascript packages. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 559–563
Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in npm package dependencies. In: International Conference on Software Reuse. Springer, pp 95–110
Zhang T, Hartmann B, Kim M, Glassman EL (2020a) Enabling data-driven api design with community usage data: A need-finding study. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp 1–13
Zhang T, Hartmann B, Kim M, Glassman EL (2020b) Enabling data-driven api design with community usage data: A need-finding study. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp 1–13
Zhong H, Thummalapenta S, Xie T, Zhang L, Wang Q (2010) Mining api mapping for language migration. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume, vol 1, pp 195–204
Zimmermann M, Staicu C A, Tenny C, Pradel M (2019) Small world with high risks: a study of security threats in the npm ecosystem. In: 28Th USENIX security symposium (USENIX security, vol 19, pp 995–1010
Acknowledgements
The work in this paper was partially funded by NWO grant 628.008.001 (CodeFeedr) and H2020 grant 825328 (FASTEN). Georgios Gousios is the main recipient of both funding grants.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Romain Robbes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Work largely conducted while the authors, Moritz Beller and Georgios Gousios, were researchers at TU Delft, The Netherlands.
Appendices
Appendix
A Selecting a time window for dependency resolution
Instead of using a single fixed version at all times, version constraints allow developers to use a time-constrained version that updates itself at new compilations. Nearly all dependencies in Crates.io specify a dynamic version constraint—only 2.92% of all dependency specifications in Crates.io use a single (immutable) version (Dietrich et al. 2019). Before studying the evolution and structure of Crates.io, we first decide the number of time points and a time window between each time point. Although popular studies such as Kikas et al. (2017) and Decan et al. (2019) use a time window of one year to study structural changes, we, instead, determine a time window based on the frequency of structural changes in Crates.io.
After resolving the dependency tree of a set of packages in Crates.io at a time t, we then re-resolve it using six different time points (i.e., one day, one week, one month, three months, six months, and one year) to find a time window where a large fraction of them have a changed dependency tree. We perform this using a set of packages having at least one non-optional dependency at the beginning of 2017 (5,252 package releases), 2018 (9,716 package releases), and 2019 (16,098 package releases).
Figure 11 shows the fraction of packages with a changed dependency tree (i.e., a tree with at least one different version) over time. We observe a logarithmic trendline for each year group; a high increase of packages with changed dependency between time points before three months, and then it levels out. After one month, we already find that 40% of all packages have a changed dependency tree due to new releases of 148 packages in 2017, 190 packages in 2018, and 240 packages in 2019. In all year groups, we find that the dependence on libc triggers a new version resolution for most packages, followed by other popular packages such as quote, serde, and syn. A manual inspection of the release log for libcFootnote 30 and serdeFootnote 31, suggests a frequency of at least two releases per month.
Finally, we also observe that 26% of all packages in 2017 have an identical dependency tree after one year. Among those unchanged packages, nearly all of them (2017: 83%, 2018: 93%, 2019: 90%) are outdated packages. With outdated, we mean that no recent releases for those packages in more than one year. Although packages may be outdated, they still could use flexible version constraints. In roughly one-third (2017: 31%, 2018: 34% 2019: 40%) of all dependency constraints, the dependencies are outdated packages (i.e., there are no recent releases). In the remaining cases (i.e., where more recent versions exist), the version constraints cover old releases (e.g., depending on serde 2.x when 4.x exists), and less than 1% are fixed versions. For example, xml-attributes-derive::0.1.0Footnote 32 depends on older versions of syn, quote, and proc-macro2, and trie-root::0.11.0Footnote 33 depends on an old version of hash-db.
Given these observations, we select a time window of one month and thus perform dependency resolution every month per year.
Rights and permissions
About this article
Cite this article
Hejderup, J., Beller, M., Triantafyllou, K. et al. Präzi: from package-based to call-based dependency networks. Empir Software Eng 27, 102 (2022). https://doi.org/10.1007/s10664-021-10071-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10071-9