Skip to main content
Log in

Präzi: from package-based to call-based dependency networks

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Modern programming languages such as Java, JavaScript, and Rust encourage software reuse by hosting diverse and fast-growing repositories of highly interdependent packages (i.e., reusable libraries) for their users. The standard way to study the interdependence between software packages is to infer a package dependency network by parsing manifest data. Such networks help answer questions such as “How many packages have dependencies to packages with known security issues?” or “What are the most used packages?”. However, an overlooked aspect in existing studies is that manifest-inferred relationships do not necessarily examine the actual usage of these dependencies in source code. To better model dependencies between packages, we developed Präzi, an approach combining manifests and call graphs of packages. Präzi constructs a dependency network at the more fine-grained function-level, instead of at the manifest level. This paper discusses a prototypical Präzi implementation for the popular system programming language Rust. We use Präzi to characterize Rust’s package repository, Crates.io, at the function level and perform a comparative study with metadata-based networks. Our results show that metadata-based networks generalize how packages use their dependencies. Using Präzi, we find packages call only 40% of their resolved dependencies, and that manual analysis of 34 cases reveals that not all packages use a dependency the same way. We argue that researchers and practitioners interested in understanding how developers or programs use dependencies should account for its context—not the sum of all resolved dependencies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://web.archive.org/web/20201201224020/https://github.com/rust-lang-nursery/ecosystem-wg

  2. https://web.archive.org/web/20201201220347/https://dependabot.com/

  3. https://web.archive.org/web/20201201224403/https://github.com/RustSec/cargo-audit

  4. https://web.archive.org/web/20180416152826/https://blog.rust-lang.org/2015/05/15/Rust-1.0.html

  5. https://web.archive.org/web/20201201224023/http://www.modulecounts.com/

  6. http://web.archive.org/web/20180224105846/https://github.com/rust-lang/crates.io-index

  7. https://web.archive.org/web/20201201224058/https://github.com/rust-lang/docs.rs

  8. https://web.archive.org/web/20180517123938/https://llvm.org/docs/Passes.html

  9. http://web.archive.org/web/20201201224110/https://doc.rust-lang.org/book/ch03-03-how-functions-work.html

  10. http://web.archive.org/web/20201201220221/https://doc.rust-lang.org/stable/reference/macros.html

  11. http://web.archive.org/web/20201201220211/https://doc.rust-lang.org/stable/reference/types.html

  12. http://web.archive.org/web/20201201220224/https://doc.rust-lang.org/book/ch19-05-advanced-functions-and-closures.html

  13. http://web.archive.org/web/20201201220255/https://doc.rust-lang.org/rustc/command-line-arguments.htmlhttp://web.archive.org/web/20201201220255/https://doc.rust-lang.org/rustc/command-line-arguments.html

  14. http://web.archive.org/web/20201201220252/http://llvm.org/doxygen/CallGraph_8h_source.html

  15. http://web.archive.org/web/20201201224720/https://github.com/rust-lang/rust/issues/59412

  16. https://github.com/ktrianta/rust-callgraphs

  17. https://web.archive.org/web/20210426093903/https://rust-lang.github.io/unsafe-code-guidelines/layout/function-pointers.html

  18. https://web.archive.org/web/20210426093903/http://npm.github.io/npm-like-im-5/npm3/dependency-resolution.html

  19. https://web.archive.org/web/20201112013908/https://alschwalm.com/blog/static/2017/03/07/exploring-dynamic-dispatch-in-rust/

  20. https://docs.rs/crate/epoxy/0.1.0/source/

  21. https://docs.rs/crate/sv-parser-syntaxtree/0.6.0/source/

  22. after normalizing the networks (i.e., inner join of common packages in all three networks)

  23. See footnote 22

  24. https://crates.io/crates/downwards

  25. http://web.archive.org/web/20201201224352/https://docs.rs/crate/mpris/2.0.0-rc2/source/Cargo.lock

  26. due to presentation reasons, we showcase for only three years

  27. https://docs.rs/crate/serde_derive/1.0.106/source/Cargo.lock

  28. http://web.archive.org/web/20201201224416/https://github.com/idnow/de.idnow.ios.sdk/issues/20

  29. http://web.archive.org/web/20180416152826/https://doc.rust-lang.org/book/first-edition/trait-objects.html

  30. https://crates.io/crates/libc/versions

  31. https://crates.io/crates/serde/versions

  32. https://docs.rs/crate/xml-attributes-derive/0.1.0/source/Cargo.toml

  33. https://docs.rs/crate/trie-root/0.11.0/source/Cargo.toml

References

  • Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp 385–395

  • Abdalkareem R, Oda V, Mujahid S, Shihab E (2019) On the impact of using trivial packages: an empirical case study on npm and pypi. Empir Softw Eng:1–37

  • Albert R, Barabási A L (2002) Statistical mechanics of complex networks. Rev Modern Phys 74(1):47

    Article  MathSciNet  Google Scholar 

  • Ali K, Lhoták O (2012) Application-only call graph construction. In: European Conference on Object-Oriented Programming. Springer, pp 688–712

  • Alimadadi S, Mesbah A, Pattabiraman K (2015) Hybrid Dom-sensitive change impact analysis for javascript. In: 29th European Conference on Object-Oriented Programming (ECOOP 2015), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

  • Aparicio J (2019) cargo-call-stack: Static, whole program stack analysis. https://github.com/japaric/cargo-call-stack

  • Baldwin A (2018) Details about the event-stream incident. https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident

  • Beller M, Bholanath R, McIntosh S, Zaidman A (2016) Analyzing The state of static analysis: A large-scale evaluation in open source software. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. IEEE, pp 470–481

  • Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: Cost negotiation and community values in three software ecosystems. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, pp 109–120

  • Brian A, David T, Aaron T (2020) The rust libz blitz. https://blog.rust-lang.org/2017/05/05/libz-blitz.html

  • Chen L, Hassan F, Wang X, Zhang L (2020) Taming behavioral backward incompatibilities via cross-project testing and analysis. In: IEEE/ACM International Conference on Software Engineering

  • Chinthanet B, Ponta S E, Plate H, Sabetta A, Kula R G, Ishio T, Matsumoto K (2020) Code-based Vulnerability detection in node. js applications: How far are we? In: 2020 35Th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 1199–1203

  • Cogo F R, Oliva GA, Hassan AE (2019) An empirical study of dependency downgrades in the npm ecosystem. IEEE Transactions on Software Engineering

  • Decan A, Mens T, Constantinou E (2018a) On the impact of security vulnerabilities in the npm package dependency network. In: International Conference on Mining Software Repositories

  • Decan A, Mens T, Grosjean P (2018b) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering

  • Decan A, Mens T, Grosjean P (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir Softw Eng 24(1):381–416

    Article  Google Scholar 

  • Dietrich J, Pearce D, Stringer J, Tahir A, Blincoe K (2019) Dependency versioning in the wild. In: 2019 IEEE/ACM 16Th international conference on mining software repositories (MSR). IEEE, pp 349–359

  • Duan R, Bijlani A, Xu M, Kim T, Lee W (2017) Identifying open-source license violation and 1-day security risk at large scale. In: Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security, pp 2169–2185

  • Dunn J (2017) Pypi python repository hit by typosquatting sneak attack. https://nakedsecurity.sophos.com/2017/09/19/pypi-python-repository-hit-by-typosquatting-sneak-attack/

  • Emami M, Ghiya R, Hendren L J (1994) Context-sensitive interprocedural points-to analysis in the presence of function pointers. ACM SIGPLAN Not 29(6):242–256

    Article  Google Scholar 

  • Hejderup J (2015) In dependencies we trust: How vulnerable are dependencies in software modules? Master’s thesis, Delft University of technology

  • Hejderup J, van Deursen A, Gousios G (2018) Software ecosystem call graph for dependency management.In: Proceedings of the 40th International Conference on Software Engineering, New Ideas and Emerging Results. ACM, pp 101–104

  • Hejderup J, Beller M, Triantafyllou K, Gousios G (2021) Präzi: From Package-based to Call-based Dependency Networks. https://doi.org/10.5281/zenodo.4478981

  • Hopkins WG (1997) A new view of statistics. Will G. Hopkins

  • Katz Y (2016) Cargo: predictable dependency management. https://blog.rust-lang.org/2016/05/05/cargo-pillars.html

  • Kikas R, Gousios G, Dumas M, Pfahl D (2017) Structure and evolution of package dependency networks. In: Proceedings of the 14th International Conference on Mining Software Repositories, IEEE Press, pp 102–112

  • Kula R G, De Roover C, German D M, Ishio T, Inoue K (2018a) A generalized model for visualizing library popularity, adoption, and diffusion within a software ecosystem. In: 2018 IEEE 25Th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 288–299

  • Kula R G, Ouni A, German D M, Inoue K (2018b) An empirical study on the impact of refactoring activities on evolving client-used apis. Inf Softw Technol 93:186–199

    Article  Google Scholar 

  • Lehman M M (1980) Programs, life cycles, and laws of software evolution. Proc IEEE 68(9):1060–1076

    Article  Google Scholar 

  • Livshits B, Sridharan M, Smaragdakis Y, Lhoták O, Amaral J N, Chang B Y E, Guyer S Z, Khedker U P, Møller A, Vardoulakis D (2015) In defense of soundiness: a manifesto. Commun ACM 58(2):44–46

    Article  Google Scholar 

  • Martins P, Achar R, Lopes C V (2018) 50K-c: a dataset of compilable, and compiled, java projects. In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 1–5

  • Matsakis N (2016) Introducing mir. https://blog.rust-lang.org/2016/04/19/MIR.html

  • Mezzetti G, Møller A, Torp MT (2018) Type regression testing to detect breaking changes in node. js libraries. In: 32nd European Conference on Object-Oriented Programming (ECOOP 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

  • Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, pp 84–94

  • Mones E, Vicsek L, Vicsek T (2012) Hierarchy measure for complex networks. PloS one 7(3):e33799

    Article  Google Scholar 

  • Mujahid S, Abdalkareem R, Shihab E, McIntosh S (2020) Using others’ tests to identify breaking updates. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp 466–476

  • Nguyen H A, Nguyen T N, Dig D, Nguyen S, Tran H, Hilton M (2019) Graph-based mining of in-the-wild, fine-grained, semantic code change patterns. In: 2019 IEEE/ACM 41St international conference on software engineering (ICSE). IEEE, pp 819–830

  • Ponta S E, Plate H, Sabetta A (2018) Beyond Metadata: Code-centric and usage-based analysis of known vulnerabilities in open-source software. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 449–460

  • Preston-Werner T (2013) Semantic versioning. https://semver.org/

  • Raemaekers S, van Deursen A, Visser J (2017) Semantic versioning and impact of breaking changes in the maven repository. J Syst Softw 129:140–158

    Article  Google Scholar 

  • Robbes R, Lungu M, Röthlisberger D (2012) How do developers react to api deprecation? the case of a smalltalk ecosystem. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp 1–11

  • Ryder B G (1979) Constructing the call graph of a program. IEEE Trans Softw Eng (3):216–226

  • Salis V, Sotiropoulos T, Louridas P, Spinellis D, Mitropoulos D (2021) Pycg: Practical Call graph generation in pytho. In: 2021 IEEE/ACM 43Rd international conference on software engineering (ICSE). IEEE, pp 1646–1657

  • Sawant AA, Bacchelli A (2017) Fine-grape: fine-grained api usage extractor-an approach and dataset to investigate api usage. Empir Softw Eng 22(3):1348-1371

  • Sawant A A, Aniche M, van Deursen A, Bacchelli A (2018a) Understanding developers’ needs on deprecation as a language feature. In: 2018 IEEE/ACM 40Th international conference on software engineering (ICSE). IEEE, pp 561–571

  • Sawant AA, Aniche M, van Deursen A, Bacchelli A (2018b) Understanding developers’ needs on deprecation as a language feature. In: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18. ACM, New York, pp 561–571

  • Sawant AA, Robbes R, Bacchelli A (2018c) On the reaction to deprecation of clients of 4 + 1 popular java apis and the jdk. Empir Softw Eng 23(4):2158–2197

  • Schlueter I (2013) Unix philosophy and node.js. https://blog.izs.me/2013/04/unix-philosophy-and-nodejs/

  • Schlueter I (2017) The npm blog – kik, left-pad, and npm. http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm

  • Shapiro M, Horwitz S (1997) Fast and accurate flow-insensitive points-to analysis. In: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp 1–14

  • Shivers O (1991) Control-flow analysis of higher-order languages. PhD thesis, Carnegie Mellon University

  • Steensgaard B (1996) Points-to analysis in almost linear time. In: Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp 32–41

  • Sulír M, Porubän J (2016) A quantitative study of Java software buildability. In: Proceedings of the 7th International Workshop on Evaluation and Usability of Programming Languages and Tools. ACM, pp 17–25

  • Sundaresan V, Hendren L, Razafimahefa C, Vallée-Rai R, Lam P, Gagnon E, Godin C (2000) Practical virtual method call resolution for Java, vol 35. ACM

  • Tip F, Palsberg J (2000) Scalable propagation-based call graph construction algorithms. In: Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pp 281–293

  • Triantafyllou K (2019) A benchmark for rust call-graph generators. https://users.rust-lang.org/t/a-benchmark-for-rust-call-graph-generators/34494

  • Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2017) There and back again: Can you compile that snapshot? J Softw Evol Process 29(4)

  • Valiev M, Vasilescu B, Herbsleb J (2018) Ecosystem-level determinants of sustained activity in open-source projects: a case study of the pypi ecosystem. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, pp 644–655

  • Xavier L, Brito A, Hora A, Valente M T (2017) Historical And impact analysis of api breaking changes: A large-scale study. In: 2017 IEEE 24Th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 138–147

  • Zapata R E, Kula R G, Chinthanet B, Ishio T, Matsumoto K, Ihara A (2018) Towards Smoother library migrations: A look at vulnerable dependency migrations at function level for npm javascript packages. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 559–563

  • Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in npm package dependencies. In: International Conference on Software Reuse. Springer, pp 95–110

  • Zhang T, Hartmann B, Kim M, Glassman EL (2020a) Enabling data-driven api design with community usage data: A need-finding study. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp 1–13

  • Zhang T, Hartmann B, Kim M, Glassman EL (2020b) Enabling data-driven api design with community usage data: A need-finding study. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp 1–13

  • Zhong H, Thummalapenta S, Xie T, Zhang L, Wang Q (2010) Mining api mapping for language migration. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume, vol 1, pp 195–204

  • Zimmermann M, Staicu C A, Tenny C, Pradel M (2019) Small world with high risks: a study of security threats in the npm ecosystem. In: 28Th USENIX security symposium (USENIX security, vol 19, pp 995–1010

Download references

Acknowledgements

The work in this paper was partially funded by NWO grant 628.008.001 (CodeFeedr) and H2020 grant 825328 (FASTEN). Georgios Gousios is the main recipient of both funding grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph Hejderup.

Additional information

Communicated by: Romain Robbes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Work largely conducted while the authors, Moritz Beller and Georgios Gousios, were researchers at TU Delft, The Netherlands.

Appendices

Appendix

A Selecting a time window for dependency resolution

Instead of using a single fixed version at all times, version constraints allow developers to use a time-constrained version that updates itself at new compilations. Nearly all dependencies in Crates.io specify a dynamic version constraint—only 2.92% of all dependency specifications in Crates.io use a single (immutable) version (Dietrich et al. 2019). Before studying the evolution and structure of Crates.io, we first decide the number of time points and a time window between each time point. Although popular studies such as Kikas et al. (2017) and Decan et al. (2019) use a time window of one year to study structural changes, we, instead, determine a time window based on the frequency of structural changes in Crates.io.

After resolving the dependency tree of a set of packages in Crates.io at a time t, we then re-resolve it using six different time points (i.e., one day, one week, one month, three months, six months, and one year) to find a time window where a large fraction of them have a changed dependency tree. We perform this using a set of packages having at least one non-optional dependency at the beginning of 2017 (5,252 package releases), 2018 (9,716 package releases), and 2019 (16,098 package releases).

Figure 11 shows the fraction of packages with a changed dependency tree (i.e., a tree with at least one different version) over time. We observe a logarithmic trendline for each year group; a high increase of packages with changed dependency between time points before three months, and then it levels out. After one month, we already find that 40% of all packages have a changed dependency tree due to new releases of 148 packages in 2017, 190 packages in 2018, and 240 packages in 2019. In all year groups, we find that the dependence on libc triggers a new version resolution for most packages, followed by other popular packages such as quote, serde, and syn. A manual inspection of the release log for libcFootnote 30 and serdeFootnote 31, suggests a frequency of at least two releases per month.

Fig. 11
figure 11

Retroactive resolution of dependencies over a time period of one year in 2017, 2018, and 2019

Finally, we also observe that 26% of all packages in 2017 have an identical dependency tree after one year. Among those unchanged packages, nearly all of them (2017: 83%, 2018: 93%, 2019: 90%) are outdated packages. With outdated, we mean that no recent releases for those packages in more than one year. Although packages may be outdated, they still could use flexible version constraints. In roughly one-third (2017: 31%, 2018: 34% 2019: 40%) of all dependency constraints, the dependencies are outdated packages (i.e., there are no recent releases). In the remaining cases (i.e., where more recent versions exist), the version constraints cover old releases (e.g., depending on serde 2.x when 4.x exists), and less than 1% are fixed versions. For example, xml-attributes-derive::0.1.0Footnote 32 depends on older versions of syn, quote, and proc-macro2, and trie-root::0.11.0Footnote 33 depends on an old version of hash-db.

Given these observations, we select a time window of one month and thus perform dependency resolution every month per year.

figure ad

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hejderup, J., Beller, M., Triantafyllou, K. et al. Präzi: from package-based to call-based dependency networks. Empir Software Eng 27, 102 (2022). https://doi.org/10.1007/s10664-021-10071-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10071-9

Keywords

Navigation