Empirical Software Engineering

, Volume 20, Issue 6, pp 1587–1633 | Cite as

A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance

  • Shane McIntoshEmail author
  • Meiyappan Nagappan
  • Bram Adams
  • Audris Mockus
  • Ahmed E. Hassan


Build systems specify how source code is translated into deliverables. They require continual maintenance as the system they build evolves. This build maintenance can become so burdensome that projects switch build technologies, potentially having to rewrite thousands of lines of build code. We aim to understand the prevalence of different build technologies and the relationship between build technology and build maintenance by analyzing version histories in a corpus of 177,039 repositories spread across four software forges, three software ecosystems, and four large-scale projects. We study low-level, abstraction-based, and framework-driven build technologies, as well as tools that automatically manage external dependencies. We find that modern, framework-driven build technologies need to be maintained more often and these build changes are more tightly coupled with the source code than low-level or abstraction-based ones. However, build technology migrations tend to coincide with a shift of build maintenance work to a build-focused team, deferring the cost of build maintenance to them.


Build systems Software maintenance Large-scale analysis Open source 


  1. Adams B, De Schutter K, Tromp H, Meuter W (2007) Design recovery and maintenance of build systems. In: Proceedings of the 23rd int’l conference on software maintenance (ICSM), pp 114–123Google Scholar
  2. Adams B, Schutter KD, Tromp H, Meuter WD (2008) The evolution of the Linux Build System. Electronic Communications of the ECEASST 8Google Scholar
  3. Al-Kofahi JM, Nguyen HV, Nguyen AT, Nguyen TT, Nguyen TN (2012) Detecting semantic changes in Makefile Build Code. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 150–159Google Scholar
  4. Bauer DF (1972) Constructing confidence sets using rank statistics. J Am Stat Assoc 67(339):687–690zbMATHCrossRefGoogle Scholar
  5. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced? Bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the European software engineering conference and the symposium on the foundations of software engineering (ESEC/FSE), pp 121–130Google Scholar
  6. Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009b) The promises and perils of mining git. In: Proceedings of the 6th working conference on mining software repositories (MSR)Google Scholar
  7. Dietrich C, Tartler R, Schröder-Preikschat W, Lohmann D (2012) A robust approach for variability extraction from the Linux Build System. In: Proceedings of the 16th int’l software product line conference (SPLC), pp 21–30Google Scholar
  8. Ebersole S (2007) Maven migration., last viewed: 18 Mar 2010
  9. Feldman S (1979) Make—a program for maintaining computer programs. Softw - Pract Exp 9 (4): 255–265zbMATHCrossRefGoogle Scholar
  10. Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the 14th int’l conference on software maintenance (ICSM), pp 190–198Google Scholar
  11. Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. Trans Softw Eng (TSE) 26(7):653–661CrossRefGoogle Scholar
  12. Grimmer L (2010) Building MySQL server with CMake on Linux/Unix., Last viewed: 20 Aug 2010
  13. Herraiz I, Robles G, Gonzalez-Barahona J, Capiluppi A, Ramil J (2006) Comparison between SLOCs and number of files as size metrics for software evolution analysis. In: Proceedings of the 10th European conference on software maintenance and reengineering (CSMR), pp 213–221Google Scholar
  14. Hochstein L, Jiao Y (2011) The cost of the build tax in scientific software. In: Proceedings of the 5th international symposium on empirical software engineering and measurement (ESEM), pp 384–387Google Scholar
  15. Humble J, Farley D (2010) Continuous delivery: reliable software releases through build, test, and deployment automation. Addison-Wesley, ReadingGoogle Scholar
  16. Kampstra P (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw, Code Snippets 28(1):1–9. Google Scholar
  17. Lawrence R (2004) The space efficience of XML. Information and software technology (IST) 46 (11): 753–759CrossRefGoogle Scholar
  18. Linden Labs (2010) CMake., Last viewed: 20 Aug 2010
  19. McIntosh S, Adams B, Nguyen THD, Kamei Y, Hassan AE (2011) An empirical study of build maintenance effort. In: Proceedings of the 33rd int’l conference on software engineering (ICSE), pp 141–150Google Scholar
  20. McIntosh S, Adams B, Hassan AE (2012) The evolution of Java build systems. Empir Softw Eng 17(4–5):578–608CrossRefGoogle Scholar
  21. Miller P (1998) Recursive make considered harmful. In: Australian Unix User Group Newsletter, vol 19, pp 14–25Google Scholar
  22. Miller RG (1981) Simultaneous statistical inference. Springer, BerlinzbMATHCrossRefGoogle Scholar
  23. Mockus A (2007) Software support tools and experimental work. In: Proc of the int’l conference on empirical software engineering issues: critical assessment and future directions, pp 91–99Google Scholar
  24. Mockus A (2009) Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th working conference on mining software repositories (MSR), pp 11–20Google Scholar
  25. Nadi S, Holt R (2011) Make it or break it: mining anomalies in Linux Kbuild. In: Proceedings of the 18th working conference on reverse engineering (WCRE), pp 315–324Google Scholar
  26. Nadi S, Holt R (2012) Mining Kbuild to detect variability anomalies in Linux. In: Proceedings of the 16th European conference on software maintenance and reengineering (CSMR), pp 107–116Google Scholar
  27. Neitsch A, Wong K, Godfrey MW (2012) Build system issues in multilanguage software. In: Proceedings of the 28th int’l conference on software maintenance, pp 140–149Google Scholar
  28. Neundorf A (2010) Why the KDE project switched to CMake—and how (continued)., last viewed: 06 Mar 2010
  29. Neville-Neal GV (2009) Kode vicious: system changes and side effects. Commun ACM 52 (4): 25–26CrossRefGoogle Scholar
  30. Nguyen THD, Adams B, Hassan AE (2010) A case study of bias in bug-fix datasets. In: Proceedings of the 17th working conference on reverse engineering (WCRE), pp 259–268Google Scholar
  31. Savage B (2010) Build systems: relevancy of automated builds in a web world.
  32. Smith P (2011) Software build systems: principles and experience, 1st edn. Addison-Wesley, ReadingGoogle Scholar
  33. Suvorov R, Nagappan M, Hassan AE, Zou Y, Adams B (2012) An empirical study of build system migrations in practice: case studies on KDE and the Linux Kernel. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 160–169Google Scholar
  34. Tamrawi A, Nguyen HA, Nguyen HV, Nguyen T (2012) Build code analysis with symbolic evaluation. In: Proceedings of the 34th int’l conference on software engineering (ICSE), pp 650–660Google Scholar
  35. Tu Q, Godfrey M (2002) The build-time software architecture view. In: Proceedings of int’l conference on software maintenance (ICSM), pp 398–407Google Scholar
  36. Zadok E (2002) Overhauling Amd for the ’00s: a case study of GNU Autotools. In: Proceedings of the FREENIX track on the USENIX technical conference. USENIX Association, pp 287–297Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Shane McIntosh
    • 1
    Email author
  • Meiyappan Nagappan
    • 1
  • Bram Adams
    • 2
  • Audris Mockus
    • 3
    • 4
  • Ahmed E. Hassan
    • 1
  1. 1.Software Analysis and Intelligence Lab (SAIL)Queen’s UniversityKingstonCanada
  2. 2.Lab on Maintenance, Construction, and Intelligence of Software (MCIS)Polytechnique MontréalMontréalCanada
  3. 3.Department of Electrical Engineering and Computer ScienceUniversity of TennesseeKnoxvilleUSA
  4. 4.Avaya Labs ResearchBasking RidgeUSA

Personalised recommendations