Abstract
Context
A reproducible build occurs if, given the same source code, build instructions, and build environment (i.e., installed build dependencies), compiling a software project repeatedly generates the same build artifacts. Reproducible builds are essential to identify tampering attempts responsible for supply chain attacks, with most of the research on reproducible builds considering build reproducibility as a project-specific issue. In contrast, modern software projects are part of a larger ecosystem and depend on dozens of other projects, which begs the question of to what extent build reproducibility of a project is the responsibility of that project or perhaps something forced on it.
Objective
This empirical study aims at analyzing reproducible and unreproducible builds in Linux Distributions to systematically investigate the process of making builds reproducible in open-source distributions. Our study targets build performed on 11,528 and 597,066 Arch Linux and Debian packages, respectively.
Method
We compute the likelihood of unreproducible packages becoming reproducible (and vice versa) and identify the root causes behind unreproducible builds. Finally, we compute the correlation between the reproducibility status of packages and three ecosystem factors (i.e., factors outside the control of a given package).
Results
Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed. We identified a taxonomy of 16 root causes of unreproducible builds and found that the build reproducibility status of a package across different hardware architectures is statistically significantly different (strong effect size). At the same time, the status also differs between versions of a package for different distributions and depends on the build reproducibility of a package’s build dependencies, albeit with weaker effect sizes.
Conclusions
The ecosystem a project belongs to, plays an important role w.r.t. the project’s build reproducibility. Since these are outside a developer’s control, future work on (fixing) unreproducible builds should consider these ecosystem influences.
Similar content being viewed by others
Data Availibility
The datasets generated and analyzed during the study are available from the corresponding author in a GitHub repository. https://github.com/SAILResearch/replication-21-rahul_bajaj-reproducible_builds-code
Notes
A source package like glibc, when built can produce multiple binary packages like libc6 and libc6-dev.
References
Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 11th joint meeting on foundations of software engineering (ESEC/FSE). pp 385–395
Adams B, Kavanagh R, Hassan AE, German DM (2016) An empirical study of integration activities in distributions of open source software. Empir Softw Eng 21(3):960–1001
Allison PD (2010) Survival analysis using SAS: a practical guide, 2nd edn. SAS Institute
Brooks FP (1974) The mythical man-month. Datamation 20(12):44–52
Butler S, Gamalielsson J, Lundell B, Brax C, Mattsson A, Gustavsson T, Feist J, Kvarnström B, Lönroth E (2022) On business adoption and use of reproducible builds for open and closed source software. Software Qual J 1–33
de Carné de Carnavalet X, Mannan M (2014) Challenges and implications of verifiable builds for security-critical open-source software. In: Proceedings of the 30th annual computer security applications conference (ACSAC). pp 16–25
Chowdhury MAR, Abdalkareem R, Shihab E, Adams B (2021) On the untriviality of trivial packages: An empirical study of npm javascript packages. IEEE Transactions on Software Engineering pp 1–15
Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of Debian package incompatibilities. In: Proceedings of the 12th working conference on mining software repositories (MSR). pp 212–223
Decan A, Mens T, Claes M (2016) On the topology of package dependency networks: A comparison of three programming language ecosystems. In: Proceedings of the 10th European conference on software architecture workshops (ECSAW). pp 21:1–21:4
Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the NPM package dependency network. In: Proceedings of the 15th international conference on mining software repositories. pp 181–191
Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. In: Guide to advanced empirical software engineering. Springer, pp 285–311
Fried L (1991) Team size and productivity in systems development bigger does not always mean better. J Inf Syst Manag 8(3):27–35
Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in java projects. In: Proceedings of the 2015 IEEE international conference on software maintenance and evolution (ICSME), pp 551–555
Goswami P, Gupta S, Li Z, Meng N, Yao D (2020) Investigating the reproducibility of NPM packages. In: Proceedings of the 2020 international conference on software maintenance and evolution (ICSME). pp 677–681
Kaplan E, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Koen R, Olivier MS (2008) The use of file timestamps in digital forensics. In: ISSA. Citeseer, pp 1–16
Lamb C, Zacchiroli S (2021) Reproducible builds: Increasing the integrity of software supply chains. IEEE Software 39(2):62–70
Maes-Bermejo M, Gallego M, Gortázar F, Robles G, Gonzalez-Barahona JM (2022) Revisiting the building of past snapshots-a replication and reproduction study. Empir Softw Eng (EMSE) 27(3):1–26
Mancinelli F, Boender J, Di Cosmo R, Vouillon J, Durak B, Leroy X, Treinen R (2006) Managing the complexity of large free and open source package-based software distributions. In: Proceedings of the 21st international conference on automated software engineering (ASE). pp 199–208
Mäntylä MV, Adams B, Khomh F, Engström E, Petersen K (2015) On rapid releases and software testing: A case study and a semi-systematic literature review. Empirical Software Engineering 20(5):1384–1425
Mao A, Mason W, Suri S, Watts DJ (2016) An experimental study of team size and performance on a complex task. PloS one 11(4):e0153048
Massacci F, Jaeger T, Peisert S (2021) Solarwinds and the challenges of patching: Can we ever stop dancing with the devil? IEEE Secur Priv 19:14–19
Maste E (2017) Reproducible builds in freebsd. In: Proceedings of 11th Asian conference on BSD based systems (AsiaBSDCon). pp 1–8
McHugh M (2012) Interrater reliability: The Kappa statistic. Biochemia Medica 22(3):276–282
McIntosh S, Adams B, Nagappan M, Hassan AE (2014) Mining co-change information to understand when build changes are necessary. In: Proceedings of the 2014 IEEE international conference on software maintenance and evolution (ICSME). pp 241–250
Michlmayr M, Hunt F, Probert D (2007) Release management in free software projects: Practices and problems. In: Proceedings of the 2007 international federation for information processing international conference on open source systems (IFIPAICT), vol 234. pp 295–300
Miller P (1998) Recursive make considered harmful. AUUGN Journal of AUUG Inc 19(1):14–25
Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). pp 84–94
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering. pp 284–292
Nussbaum L, Zacchiroli S (2010) The ultimate Debian database: Consolidating bazaar metadata for quality assurance and data mining. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). pp 52–61
Ohm M, Plate H, Sykosch A, Meier M (2020) Backstabber’s knife collection: A review of open source software supply chain attacks. In: Proceedings of the 2020 international conference on detection of intrusions and malware, and vulnerability assessment, vol 12223. pp 23–43
Ohm M, Sykosch A, Meier M (2020) Towards detection of software supply chain attacks by forensic artifacts. In: Proceedings of the 15th international conference on availability, reliability and security (ARES). pp 1–6
Plackett R (1983) Karl Pearson and the Chi-Squared test. Int Stat Rev 51(1):59–72
Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49
Rea LM, Parker RA (2014) Designing and conducting survey research: A comprehensive guide, 1st edn. John Wiley & Sons
Ren Z, Jiang H, Xuan J, Yang Z (2016) Automated localization for unreproducible builds. In: Proceedings of the 40th international conference on software engineering (ICSE). pp 71–81
Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52(9):902–922
Shi Y, Wen M, Cogo FR, Chen B, Jiang ZMJ (2021) An experience report on producing verifiable builds for large-scale commercial systems. IEEE Transactions on Software Engineering
Thompson K (1984) Reflections on trusting trust. Commun ACM 27(8):761–763
Vu DL, Pashchenko I, Massacci F, Plate H, Sabetta A (2020) Towards using source code repositories to identify software supply chain attacks, pp 2093–2095
Wang Z, Zhang H, Chen TH, Wang S (2021) Would you like a quick peek? Providing logging support to monitor data processing in big data applications. In: Proceedings of the 29th joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). pp 516–526
Wheeler DA (2005) Countering trusting trust through diverse double-compiling. In: Proceedings of the 21st annual computer security applications conference (ACSAC). pp 1–13
Yan D, Niu Y, Liu K, Liu Z, Liu Z, Bissyandé TF (2021) Estimating the attack surface from residual vulnerabilities in open source software supply chain. In: Proceedings of the 21st international conference on software quality, reliability and security (QRS). pp 493–502
Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in NPM package dependencies. In: International conference on software reuse. pp 95–110
Zerouali A, Mens T, Robles G, Gonzalez-Barahona JM (2019) On the diversity of software package popularity metrics: an empirical study of npm. In: Proceedings of the 26th international conference on software analysis, evolution and reengineering (SANER). pp 589–593
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
All authors declare that there is no conflict of interest.
Additional information
Communicated by: Philipp Leitner.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bajaj, R., Fernandes, E., Adams, B. et al. Unreproducible builds: time to fix, causes, and correlation with external ecosystem factors. Empir Software Eng 29, 11 (2024). https://doi.org/10.1007/s10664-023-10399-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-023-10399-4