Skip to main content
Log in

Continuous build outcome prediction: an experimental evaluation and acceptance modelling

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Continuous Build Outcome Prediction (CBOP) is a lightweight implementation of Continuous Defect Prediction (CDP). CBOP combines: 1) results of continuous integration (CI) and 2) the data mined from the version control system with 3) machine learning (ML) to form a practice that evolved from software defect prediction (SDP) where a failing build is treated as a defect to fight against. Here, we explain the CBOP idea, where we use historical build results together with metrics derived from a software repository to create a model that classifies changes the developer is introducing to the source code during her work in a just-in-time manner. To evaluate the CBOP idea, we perform a small-n repeated measure with two conditions and replicate experiment in a real-life, business-driven software project. In this preliminary evaluation of CBOP, we study whether the practice will reduce the Failed Build Ratio (FBR) - the ratio of failing build results to all other build results. We calculate effect size and p-value of change in FBR while using the CBOP practice, provide an analysis of our model, and perform and report the results of a Technology Acceptance Model (TAM)-inspired survey that we conducted among experiment participants and industry specialists to assess the acceptance of CBOP and the tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Listing 1

Similar content being viewed by others

Notes

  1. Jenkins CI Build results: http://javadoc.jenkins-ci.org/hudson/model/Result.html.

  2. https://www.rdocumentation.org/packages/pwr/versions/1.3-0

  3. https://bitbucket.org/

  4. https://www.microsoft.com/en-us/cloud-platform/r-server

  5. https://azure.microsoft.com/en-us/services/machine-learning-studio/

  6. https://mlr3.mlr-org.com/

  7. https://github.com/stilab-ets/DL-CIBuild/

References

  1. Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Chechik M., Vigder M, Stewart D (eds) Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, CASCON ’08. ACM, New York, pp 23:304–23:318

  2. Arora I, Tetarwal V, Saha A (2015) Open issues in software defect prediction. Proc Comput Sci 46:906–912. https://doi.org/10.1016/j.procs.2015.02.161

    Article  Google Scholar 

  3. Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. In: Marciniak J.J. (ed) Encyclopedia of software engineering. Wiley

  4. Bennin K, Ali N, Börstler J, Yu X (2020) Revisiting the impact of concept drift on just-in-time quality assurance. In: Chan W., Nagappan M, Budnik C (eds) 2020 IEEE 20th international conference on software quality, reliability and security (QRS), pp 53–59

  5. Bickman L (1974) The social power of a uniform. J Appl Soc Psychol 4:47–61. https://doi.org/10.1111/j.1559-1816.1974.tb02807.x

    Article  Google Scholar 

  6. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  7. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York

    MATH  Google Scholar 

  8. Brian R, Terry T, Beth A (2015) Package ‘rpart’ - Recursive partitioning for classification

  9. Bulté I, Onghena P (2009) Randomization tests for multiple-baseline designs: An extension of the SCRT-R package. Behavi Res Methods 41:477–85. https://doi.org/10.3758/BRM.41.2.477

    Article  Google Scholar 

  10. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    Article  MATH  Google Scholar 

  11. Cialdini R (2009) Influence: the psychology of persuasion. Collins Business Essentials. HarperCollins e-books

  12. Criminisi A, Shotton J, Criminisi A, Shotton J (2013) Decision forests for computer vision and medical image analysis. Springer, New York

    Book  MATH  Google Scholar 

  13. Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. NOW Publishers

  14. D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4-5):531–577

    Article  Google Scholar 

  15. Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 13(3):319–340

    Article  Google Scholar 

  16. Davis FD, Bagozzi RP, Warshaw PR (1989) User acceptance of computer technology: a comparison of two theoretical models. Manag Sci 35:982–1003

    Article  Google Scholar 

  17. Dugard P, File P, Todman J (2012) Single-case and Small-n experimental designs: a practical guide to randomization tests, 2nd edn. Routledge, Evanston

    Book  MATH  Google Scholar 

  18. Eken B, Tosun A (2021) Investigating the performance of personalized models for software defect prediction. J Syst Softw 181:111038. https://doi.org/10.1016/j.jss.2021.111038

    Article  Google Scholar 

  19. Felix EA, Lee SP (2020) Predicting the number of defects in a new software version. PLoS ONE 15(3):1–30. https://doi.org/10.1371/journal.pone.0229131

    Article  Google Scholar 

  20. Ferguson CJ (2009) An effect size primer: A guide for clinicians and researchers. Prof Psychol Res Pract 40(5):532–538

    Article  Google Scholar 

  21. Finlay J, Pears R, Connor AM (2014) Data stream mining for predicting software build outcomes using source code metrics. Inf Softw Technol 56(2):183–198. https://doi.org/10.1016/j.infsof.2013.09.001

    Article  Google Scholar 

  22. Fisher GG, Chacon M, Chaffee DS (2019) Chapter 2 - theories of cognitive aging and work. In: Baltes BB, Rudolph CW, Zacher H (eds) Work across the lifespan. Academic Press, pp 17–45. https://doi.org/10.1016/B978-0-12-812756-8.00002-5

  23. Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    MathSciNet  MATH  Google Scholar 

  24. Hoang T, Kang HJ, Lo D, Lawall J (2020) CC2Vec: distributed representations of code changes. In: Rothermel G, Bae D-H (eds) Proceedings of the ACM/IEEE 42nd international conference on software engineering. Association for Computing Machinery, New York, pp 518–529

  25. Hoang T, Khanh Dam H, Kamei Y, Lo D, Ubayashi N (2019) DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In: Storey M-A, Adams B, Haiduc S (eds) 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 34–45

  26. James G, Witten D, Hastie T, Tibshirani R (2014) An introduction to statistical learning: with applications in R. Springer Publishing Company, Incorporated

  27. Jiang L, Jiang S, Gong L, Dong Y, Yu Q (2020) Which process metrics are significantly important to change of defects in evolving projects: an empirical study. IEEE Access 8:93705–93722. https://doi.org/10.1109/ACCESS.2020.2994528

    Article  Google Scholar 

  28. Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: Denney E, Bultan T, Zeller A (eds) 2013 28th IEEE/ACM international conference on automated software engineering (ASE), pp 279–289

  29. Kabir MA, Keung J, Turhan B, Bennin K (2021) Inter-release defect prediction with feature selection using temporal chunk-based learning: an empirical study. Appl Soft Comput 113:107870. https://doi.org/10.1016/j.asoc.2021.107870

    Article  Google Scholar 

  30. Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  31. Kawalerowicz M, Madeyski L (2021a) Continuous build outcome prediction: a small-n experiment in settings of a real software project. In: Fujita H, Selamat A, Lin JC-W, Ali M (eds) Advances and trends in artificial intelligence. From theory to practice. Springer International Publishing, Cham, pp 412–425

  32. Kawalerowicz M, Madeyski L (2021b) Jaskier: A supporting software tool for continuous build outcome prediction practice. In: Fujita H, Selamat A, Lin JC-W, Ali M (eds) Advances and trends in artificial intelligence. from theory to practice, Springer International Publishing, Cham, 1128 pp 426–438. https://doi.org/10.1007/978-3-030-79463-7_36

  33. Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  34. Kitchenham B, Madeyski L, Budgen D, Keung J, Brereton P, Charters S, Gibbs S, Pohthong A (2017) Robust statistical methods for empirical software engineering. Empir Softw Eng 22(2):579–630. https://doi.org/10.1007/s10664-016-9437-5

    Article  Google Scholar 

  35. Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26

    Article  Google Scholar 

  36. Lanza M, Mocci A, Ponzanelli L (2016) The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw 33(6):102–105

    Article  Google Scholar 

  37. Liaw A, Wiener M (2015) Package ‘randomForest’ - Breiman and Cutler’s random forests for classification and regression

  38. Ma HH (2006) An alternative method for quantitative synthesis of single-subject researches. Behav Modif 30(5):598–617

    Article  Google Scholar 

  39. Madeyski L (2010) Test-driven development: an empirical evaluation of agile practice. Springer, New York

    Book  Google Scholar 

  40. Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J 23(3):393–422. https://doi.org/10.1007/s11219-014-9241-7

    Article  Google Scholar 

  41. Madeyski L, Kawalerowicz M (2017) Continuous defect prediction: the idea and a related dataset. In: González-Barahona JM, Hindle A, Tan L (eds) 14th international conference on mining software repositories (May 20-21, 2017. Buenos Aires, Argentina), pp 515–518. https://doi.org/10.1109/MSR.2017.46

  42. Madhavan JT, Whitehead, EJ Jr (2007) Predicting buggy changes inside an integrated development environment. In: Cheng L-T, Morris C, Orso A, Robillard M (eds) Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, eclipse ’07. ACM, New York, pp 36–40

  43. Marciniak JJ (2002) Encyclopedia of software engineering, 2n. Halsted Press, USA

    Book  MATH  Google Scholar 

  44. Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407. https://doi.org/10.1007/s10515-010-0069-5

    Article  Google Scholar 

  45. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Schäfer W, Dwyer MB, Gruhn V (eds) 2008 ACM/IEEE 30th international conference on software engineering, pp 181–190

  46. Onghena P (1992) Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behav Assess 14:153–171

    Google Scholar 

  47. Parker RI, Hagan-Burke S, Vannest K (2007) Percentage of all non-overlapping data (PAND): an Alternative to PND. J Spec Educ 40:194–204

    Article  Google Scholar 

  48. Saidani I, Ouni A (2021) Toward a smell-aware prediction model for CI build failures. In: Grundy J, Hao D, Poshyvanyk D (eds) 2021 36th IEEE/ACM international conference on automated software engineering workshops (ASEW), pp 18–25

  49. Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020a) On the prediction of continuous integration build failures using search-based software engineering. In: Coello CAC (ed) Proceedings of the 2020 genetic and evolutionary computation conference companion, GECCO ’20. Association for Computing Machinery, New York, pp 313–314

  50. Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020b) Predicting continuous integration build failures using evolutionary search. Inf Softw Technol 128:106392

  51. Saidani I, Ouni A, Mkaouer MW (2022) Improving the prediction of continuous integration build failures using deep learning. Autom Softw Eng 29(1):1–61. https://doi.org/10.1007/s10515-021-00319-5

    Article  Google Scholar 

  52. Schneider A, Honeyman C (2006) The Negotiator’s fieldbook. American Bar Association, Section of Dispute Resolution

  53. Sing T, Sander O, Beerenwinkel N, Lengauer T (2005) ROCR: visualizing classifier performance in R. Bioinformatics 21(20):3940. https://doi.org/10.1093/bioinformatics/bti623

    Article  Google Scholar 

  54. Turner M, Kitchenham B, Brereton P, Charters S, Budgen D (2010) Does the technology acceptance model predict actual use? A systematic literature review. Inf Softw Technol 52(5):463–479. https://doi.org/10.1016/j.infsof.2009.11.005

    Article  Google Scholar 

  55. Venkatesh V, Davis FD (2000) A theoretical extension of the technology acceptance model: Four longitudinal field studies. Manag Sci 46(2):186–204

    Article  Google Scholar 

  56. Weyuker E, Ostrand T, Bell R (2008) Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empir Softw Eng 13:539–559. https://doi.org/10.1007/s10664-008-9082-8

    Article  Google Scholar 

  57. Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in Software Engineering. Computer Science. Springer

  58. Wright MN, Zieglerm A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77(1):1–17. https://doi.org/10.18637/jss.v077.i01

    Article  Google Scholar 

  59. Yan M, Xia X, Fan Y, Hassan AE, Lo D, Li S (2022) Just-in-time defect identification and localization: a two-phase framework. IEEE Trans Softw Eng 48(1):82–101. https://doi.org/10.1109/TSE.2020.2978819

    Article  Google Scholar 

  60. Yang X, Lo D, Xia X, Sun J (2017) TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220

    Article  Google Scholar 

  61. Zeng Z, Zhang Y, Zhang H, Zhang L (2021) Deep just-in-time defect prediction: How far are we? In: Cadar C, Zhang X (eds) Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2021. Association for Computing Machinery, New York, pp 427–438

Download references

Acknowledgements

Calculations in R have been carried out using resources provided by Wroclaw Centre for Networking and Supercomputing (http://wcss.pl), grant No. 578.

Author information

Authors and Affiliations

Authors

Contributions

M. Kawalerowicz: Conceptualization, Methodology, Software, Validation, Investigation, Data Curation, Writing – original draft, Writing - review & editing, Visualization. L. Madeyski: Conceptualization, Methodology, Software, Validation, Investigation, Writing – original draft, Writing - review & editing, Supervision.

Corresponding author

Correspondence to Marcin Kawalerowicz.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Emerging Topics in Artificial Intelligence Selected from IEA/AIE2021 Guest Editors: Ali Selamat and Jerry Chun-Wei Lin

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kawalerowicz, M., Madeyski, L. Continuous build outcome prediction: an experimental evaluation and acceptance modelling. Appl Intell 53, 8673–8692 (2023). https://doi.org/10.1007/s10489-023-04523-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04523-6

Keywords

Navigation