Every build you break: developer-oriented assistance for build failure resolution

  • Carmine VassalloEmail author
  • Sebastian Proksch
  • Timothy Zemp
  • Harald C. Gall


Continuous integration is an agile software development practice. Instead of integrating features right before a release, they are constantly being integrated into an automated build process. This shortens the release cycle, improves software quality, and reduces time to market. However, the whole process will come to a halt when a commit breaks the build, which can happen for several reasons, e.g., compilation errors or test failures, and fixing the build suddenly becomes a top priority. Developers not only have to find the cause of the build break and fix it, but they have to be quick in all of it to avoid a delay for others. Unfortunately, these steps require deep knowledge and are often time-consuming. To support developers in fixing a build break, we propose Bart, a tool that summarizes the reasons for Maven build failures and suggests possible solutions found on the internet. We will show in a case study with 17 participants that developers find Bart useful to understand build breaks and that using Bart substantially reduces the time to fix a build break, on average by 37%. We have also conducted a qualitative study to better understand the workflows and information needs when fixing builds. We found that typical workflows differ substantially between various error categories, and that several uncommon build errors are both very hard to investigate and to fix. These findings will be useful to inform future research in this area.


Software engineering Agile software development Software development tools Build break Summarization Error recovery 



We would like to thank all the study participants. C. Vassallo and H. Gall acknowledge the support of the Swiss National Science Foundation for their project SURF-MobileAppsData (SNF Project No. 200021-166275).


  1. Active JPA (2018) A simple active record pattern library in Java that makes programming DAL easier. Accessed: 2018-02-08
  2. BART (2019) Jenkins-Plugin. Accessed: 2019-07-24
  3. Bavota G, Gravino C, Oliveto R, De Lucia A, Tortora G, Genero M, Cruz-Lemus JA (2011) Identifying the weaknesses of uml class diagrams during data model comprehension. In: Proceedings of the 14th international conference on model driven engineering languages and systems, MODELS’11. Springer, Berlin, pp 168–182CrossRefGoogle Scholar
  4. Beller M, Bholanath R, McIntosh S, Zaidman A (2016) Analyzing the state of static analysis: a large-scale evaluation in open source software. In: IEEE 23rd International conference on software analysis, evolution, and reengineering (SANER), pp 470–481.
  5. Beller M, Gousios G, Zaidman A (2017) Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub. In: International conference on mining software repositoriesGoogle Scholar
  6. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46CrossRefGoogle Scholar
  7. Duvall P, Matyas SM, Glover A (2007) Continuous integration: improving software quality and reducing risk. Addison-WesleyGoogle Scholar
  8. Everitt B (2002) The Cambridge dictionary of statistics. Cambridge University Press, Cambridge. zbMATHGoogle Scholar
  9. Fongo (2018) Faked out in-memory Mongo for Java. Accessed: 2018-02-08
  10. Fraser G, Staats M, McMinn P, Arcuri A, Padberg F (2015) Does automated unit test generation really help software testers? a controlled empirical study. ACM Trans Softw Eng Methodol (TOSEM) 24(4):23CrossRefGoogle Scholar
  11. Gallaba K, McIntosh S (2018) Use and misuse of continuous integration features: An empirical study of projects that (mis)use travis ci. IEEE Trans Softw Eng, 1–1.
  12. Gallaba K, Macho C, Pinzger M, McIntosh S (2018) Noise and heterogeneity in historical build data: an empirical study of travis CI. In: ASE. ACM, pp 87–97Google Scholar
  13. Haiduc S, Aponte J, Marcus A (2010) Supporting program comprehension with source code summarization. In: ICSE (2)Google Scholar
  14. Hassan F, Wang X (2018) Hirebuild: an automatic approach to history-driven repair of build scripts. In: ICSE. ACM, pp 1078–1089Google Scholar
  15. Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering (ASE), pp 426–437Google Scholar
  16. Hilton M, Nelson N, Tunnell T, Marinov D, Dig D (2017) Trade-offs in continuous integration: assurance, security, and flexibility. In: Proceedings of the 25th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2017, p. To AppearGoogle Scholar
  17. Humble J, Farley D (2010) Continuous delivery, reliable software releases through build, test, and deployment automation. Addison-Wesley ProfessionalGoogle Scholar
  18. Kerzazi N, Khomh F, Adams B (2014) Why do automated builds break? An empirical study. In: 30th IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 41–50, DOI, (to appear in print)
  19. LaToza TD, Venolia G, DeLine R (2006) Maintaining mental models: a study of developer work habits. In: Proceedings of the 28th international conference on software engineering, ICSE ’06. ACM, New York, pp 492–501
  20. Likert R (1932) A technique for the measurement of attitudes. Archives of psychologyGoogle Scholar
  21. Log parser plugin (2018) Accessed: 2018-02-08
  22. Lou Y, Chen J, Zhang L, Hao D, Zhang L (2019) History-driven build failure fixing: how far are we?. In: ISSTA. ACM, pp 43–54Google Scholar
  23. Macho C, McIntosh S, Pinzger M (2018) Automatically repairing dependency-related build breakage. In: Proc. of the international conference on software analysis, evolution, and reengineering (SANER), p. To appearGoogle Scholar
  24. Maple S (2016) Java tools and technologies landscape report 2016. ZeroTurnaround post.
  25. Maven (2018) Accessed: 2018-02-08
  26. Miller A (2008) A hundred days of continuous integration. In: Proceedings of the Agile 2008, AGILE ’08, pp 289–293Google Scholar
  27. Moreno L, Marcus A (2017) Automatic software summarization: the state of the art. In: ICSE (companion volume). IEEE Computer Society, pp 511–512Google Scholar
  28. Moreno L, Aponte J, Sridhara G, Marcus A, Pollock LL, Vijay-Shanker K (2013) Automatic generation of natural language summaries for java classes. In: ICPC. IEEE Computer Society, pp 23–32Google Scholar
  29. Moreno L, Bavota G, Penta MD, Oliveto R, Marcus A (2015) How can I use this method?. In: ICSE (1). IEEE Computer Society, pp 880–890Google Scholar
  30. Myers GJ (2004) The art of software testing, 2. edn. WileyGoogle Scholar
  31. Panichella S, Panichella A, Beller M, Zaidman A, Gall HC (2016) The impact of test case summaries on bug fixing performance: an empirical investigation. In: ICSE. ACM, pp 547–558Google Scholar
  32. Ponzanelli L, Bavota G, Penta MD, Oliveto R, Lanza M (2014) Mining Stackoverflow to turn the ide into a self-confident programming prompter. In: MSRGoogle Scholar
  33. Rahman A, Parnin C, Williams L (2019) The seven sins: security smells in infrastructure as code scripts. In: 41st International conference on software engineering (ICSE). IEEE/ACMGoogle Scholar
  34. Rastkar S, Murphy GC, Murray G (2010) Summarizing software artifacts: a case study of bug reports. In: ICSE (1). ACM, pp 505–514Google Scholar
  35. Rausch T, Hummer W, Leitner P, Schulte S (2017) An empirical analysis of build failures in the continuous integration workflows of java-based open-source software. In: Proceedings of the 14th international conference on mining software repositories, MSR’17. ACM, New York, p nnGoogle Scholar
  36. Reddit (2018) Accessed: 2018-02-08
  37. Robbins NB, Heiberger RM (2011) Plotting Likert and other rating scales. In: Proceedings of the 2011 joint statistical meeting, pp 1058–1066Google Scholar
  38. Robinson D (2003) An introduction to abstract algebra. De Gruyter textbook. Walter de Gruyter.
  39. Sentry Java (2018) A sentry SDK for Java and other JVM languages. Accessed: 2018-02-08
  40. Seo H, Sadowski C, Elbaum SG, Aftandilian E, Bowdidge RW (2014) Programmers’ build errors: a case study (at Google). In: Proc. Int’l conf on software engineering (ICSE)., pp 724–734
  41. Sorbo AD, Panichella S, Alexandru CV, Shimagaki J, Visaggio CA, Canfora G, Gall HC (2016) What would users change in my app? summarizing app reviews for recommending software changes. In: SIGSOFT FSE. ACM, pp 499–510Google Scholar
  42. Spencer D (2009) Card sorting: designing usable categories. Rosenfeld MediaGoogle Scholar
  43. StackOverflow (2018) Maven. Accessed: 2018-02-08
  44. Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web? (nier track). In: Proceedings of the 33rd international conference on software engineering, ICSE ’11. ACM, New York, pp 804–807
  45. Urli S, Yu Z, Seinturier L, Monperrus M (2018) How to design a program repair bot? Insights from the repairnator project. arXiv:1811.09852
  46. Vasilescu B, Filkov V, Serebrenik A (2013) Stackoverflow and github: associations between software development and crowdsourced knowledge. In: SocialCom. IEEE Computer Society, pp 188–195Google Scholar
  47. Vassallo C, Panichella S, Penta MD, Canfora G (2014) CODES: mining source code descriptions from developers discussions. In: ICPC. ACM, pp 106–109Google Scholar
  48. Vassallo C, Zampetti F, Romano D, Beller M, Panichella A, Di Penta M, Zaidman A (2016) Continuous delivery practices in a large financial organization. In: 32nd IEEE International conference on software maintenance and evolution (ICSME), pp 41–50Google Scholar
  49. Vassallo C, Schermann G, Zampetti F, Romano D, Leitner P, Zaidman A, Penta MD, Panichella S (2017) A tale of CI build failures: an open source and a financial organization perspective. In: 2017 IEEE International conference on software maintenance and evolution, ICSME 2017, Shanghai, China, September 17-22, 2017., pp 183–193
  50. Vassallo C, Proksch S, Zemp T, Gall HC (2018) Un-break my build: assisting developers with build repair hints. In: International conference on program comprehensionGoogle Scholar
  51. Vassallo C, Proksch S, Gall HC, Penta MD (2019a) Automated reporting of anti-patterns and decay in continuous integration. In: ICSE. IEEE / ACM, pp 105–115Google Scholar
  52. Vassallo C, Proksch S, Zemp T, Gall HC (2019b) Replication package for “Every build you break: developer-oriented assistance for build failure resolution”.
  53. Vos TEJ, Tonella P, Prasetya W, Kruse PM, Bagnato A, Harman M, Shehory O (2014) FITTEST: a new continuous and automated testing process for future internet applications. In: CSMR-WCRE. IEEE Computer Society, pp 407–410Google Scholar
  54. Wong E, Yang J, Tan L (2013) Autocomment: mining question and answer sites for automatic comment generation. In: ASE. IEEE, pp 562–567Google Scholar
  55. Ying ATT, Robillard MP (2013) Code fragment summarization. In: ESEC/SIGSOFT FSE. ACM, pp 655–658Google Scholar
  56. Zampetti F, Scalabrino S, Oliveto R, Canfora G, Di Penta M (2017) How open source projects use static code analysis tools in continuous integration pipelines. In: Proceedings of the 14th international conference on mining software repositories. IEEE Press, pp 334–344Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of ZurichZurichSwitzerland

Personalised recommendations