Continuous build outcome prediction: an experimental evaluation and acceptance modelling

Kawalerowicz, Marcin; Madeyski, Lech

doi:10.1007/s10489-023-04523-6

Continuous build outcome prediction: an experimental evaluation and acceptance modelling

Published: 18 April 2023

Volume 53, pages 8673–8692, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

256 Accesses
2 Altmetric
Explore all metrics

Abstract

Continuous Build Outcome Prediction (CBOP) is a lightweight implementation of Continuous Defect Prediction (CDP). CBOP combines: 1) results of continuous integration (CI) and 2) the data mined from the version control system with 3) machine learning (ML) to form a practice that evolved from software defect prediction (SDP) where a failing build is treated as a defect to fight against. Here, we explain the CBOP idea, where we use historical build results together with metrics derived from a software repository to create a model that classifies changes the developer is introducing to the source code during her work in a just-in-time manner. To evaluate the CBOP idea, we perform a small-n repeated measure with two conditions and replicate experiment in a real-life, business-driven software project. In this preliminary evaluation of CBOP, we study whether the practice will reduce the Failed Build Ratio (FBR) - the ratio of failing build results to all other build results. We calculate effect size and p-value of change in FBR while using the CBOP practice, provide an analysis of our model, and perform and report the results of a Technology Acceptance Model (TAM)-inspired survey that we conducted among experiment participants and industry specialists to assess the acceptance of CBOP and the tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based testing leveraged for automated web tests

Article 27 November 2021

Applications of AI in classical software engineering

Article Open access 26 July 2020

An Improved and Optimized Random Forest Based Approach to Predict the Software Faults

Article Open access 09 May 2024

Notes

References

Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Chechik M., Vigder M, Stewart D (eds) Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, CASCON ’08. ACM, New York, pp 23:304–23:318
Arora I, Tetarwal V, Saha A (2015) Open issues in software defect prediction. Proc Comput Sci 46:906–912. https://doi.org/10.1016/j.procs.2015.02.161
Article Google Scholar
Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. In: Marciniak J.J. (ed) Encyclopedia of software engineering. Wiley
Bennin K, Ali N, Börstler J, Yu X (2020) Revisiting the impact of concept drift on just-in-time quality assurance. In: Chan W., Nagappan M, Budnik C (eds) 2020 IEEE 20th international conference on software quality, reliability and security (QRS), pp 53–59
Bickman L (1974) The social power of a uniform. J Appl Soc Psychol 4:47–61. https://doi.org/10.1111/j.1559-1816.1974.tb02807.x
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York
MATH Google Scholar
Brian R, Terry T, Beth A (2015) Package ‘rpart’ - Recursive partitioning for classification
Bulté I, Onghena P (2009) Randomization tests for multiple-baseline designs: An extension of the SCRT-R package. Behavi Res Methods 41:477–85. https://doi.org/10.3758/BRM.41.2.477
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Article MATH Google Scholar
Cialdini R (2009) Influence: the psychology of persuasion. Collins Business Essentials. HarperCollins e-books
Criminisi A, Shotton J, Criminisi A, Shotton J (2013) Decision forests for computer vision and medical image analysis. Springer, New York
Book MATH Google Scholar
Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. NOW Publishers
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4-5):531–577
Article Google Scholar
Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 13(3):319–340
Article Google Scholar
Davis FD, Bagozzi RP, Warshaw PR (1989) User acceptance of computer technology: a comparison of two theoretical models. Manag Sci 35:982–1003
Article Google Scholar
Dugard P, File P, Todman J (2012) Single-case and Small-n experimental designs: a practical guide to randomization tests, 2nd edn. Routledge, Evanston
Book MATH Google Scholar
Eken B, Tosun A (2021) Investigating the performance of personalized models for software defect prediction. J Syst Softw 181:111038. https://doi.org/10.1016/j.jss.2021.111038
Article Google Scholar
Felix EA, Lee SP (2020) Predicting the number of defects in a new software version. PLoS ONE 15(3):1–30. https://doi.org/10.1371/journal.pone.0229131
Article Google Scholar
Ferguson CJ (2009) An effect size primer: A guide for clinicians and researchers. Prof Psychol Res Pract 40(5):532–538
Article Google Scholar
Finlay J, Pears R, Connor AM (2014) Data stream mining for predicting software build outcomes using source code metrics. Inf Softw Technol 56(2):183–198. https://doi.org/10.1016/j.infsof.2013.09.001
Article Google Scholar
Fisher GG, Chacon M, Chaffee DS (2019) Chapter 2 - theories of cognitive aging and work. In: Baltes BB, Rudolph CW, Zacher H (eds) Work across the lifespan. Academic Press, pp 17–45. https://doi.org/10.1016/B978-0-12-812756-8.00002-5
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
MathSciNet MATH Google Scholar
Hoang T, Kang HJ, Lo D, Lawall J (2020) CC2Vec: distributed representations of code changes. In: Rothermel G, Bae D-H (eds) Proceedings of the ACM/IEEE 42nd international conference on software engineering. Association for Computing Machinery, New York, pp 518–529
Hoang T, Khanh Dam H, Kamei Y, Lo D, Ubayashi N (2019) DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In: Storey M-A, Adams B, Haiduc S (eds) 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 34–45
James G, Witten D, Hastie T, Tibshirani R (2014) An introduction to statistical learning: with applications in R. Springer Publishing Company, Incorporated
Jiang L, Jiang S, Gong L, Dong Y, Yu Q (2020) Which process metrics are significantly important to change of defects in evolving projects: an empirical study. IEEE Access 8:93705–93722. https://doi.org/10.1109/ACCESS.2020.2994528
Article Google Scholar
Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: Denney E, Bultan T, Zeller A (eds) 2013 28th IEEE/ACM international conference on automated software engineering (ASE), pp 279–289
Kabir MA, Keung J, Turhan B, Bennin K (2021) Inter-release defect prediction with feature selection using temporal chunk-based learning: an empirical study. Appl Soft Comput 113:107870. https://doi.org/10.1016/j.asoc.2021.107870
Article Google Scholar
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Article Google Scholar
Kawalerowicz M, Madeyski L (2021a) Continuous build outcome prediction: a small-n experiment in settings of a real software project. In: Fujita H, Selamat A, Lin JC-W, Ali M (eds) Advances and trends in artificial intelligence. From theory to practice. Springer International Publishing, Cham, pp 412–425
Kawalerowicz M, Madeyski L (2021b) Jaskier: A supporting software tool for continuous build outcome prediction practice. In: Fujita H, Selamat A, Lin JC-W, Ali M (eds) Advances and trends in artificial intelligence. from theory to practice, Springer International Publishing, Cham, 1128 pp 426–438. https://doi.org/10.1007/978-3-030-79463-7_36
Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Article Google Scholar
Kitchenham B, Madeyski L, Budgen D, Keung J, Brereton P, Charters S, Gibbs S, Pohthong A (2017) Robust statistical methods for empirical software engineering. Empir Softw Eng 22(2):579–630. https://doi.org/10.1007/s10664-016-9437-5
Article Google Scholar
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
Article Google Scholar
Lanza M, Mocci A, Ponzanelli L (2016) The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw 33(6):102–105
Article Google Scholar
Liaw A, Wiener M (2015) Package ‘randomForest’ - Breiman and Cutler’s random forests for classification and regression
Ma HH (2006) An alternative method for quantitative synthesis of single-subject researches. Behav Modif 30(5):598–617
Article Google Scholar
Madeyski L (2010) Test-driven development: an empirical evaluation of agile practice. Springer, New York
Book Google Scholar
Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J 23(3):393–422. https://doi.org/10.1007/s11219-014-9241-7
Article Google Scholar
Madeyski L, Kawalerowicz M (2017) Continuous defect prediction: the idea and a related dataset. In: González-Barahona JM, Hindle A, Tan L (eds) 14th international conference on mining software repositories (May 20-21, 2017. Buenos Aires, Argentina), pp 515–518. https://doi.org/10.1109/MSR.2017.46
Madhavan JT, Whitehead, EJ Jr (2007) Predicting buggy changes inside an integrated development environment. In: Cheng L-T, Morris C, Orso A, Robillard M (eds) Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, eclipse ’07. ACM, New York, pp 36–40
Marciniak JJ (2002) Encyclopedia of software engineering, 2n. Halsted Press, USA
Book MATH Google Scholar
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407. https://doi.org/10.1007/s10515-010-0069-5
Article Google Scholar
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Schäfer W, Dwyer MB, Gruhn V (eds) 2008 ACM/IEEE 30th international conference on software engineering, pp 181–190
Onghena P (1992) Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behav Assess 14:153–171
Google Scholar
Parker RI, Hagan-Burke S, Vannest K (2007) Percentage of all non-overlapping data (PAND): an Alternative to PND. J Spec Educ 40:194–204
Article Google Scholar
Saidani I, Ouni A (2021) Toward a smell-aware prediction model for CI build failures. In: Grundy J, Hao D, Poshyvanyk D (eds) 2021 36th IEEE/ACM international conference on automated software engineering workshops (ASEW), pp 18–25
Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020a) On the prediction of continuous integration build failures using search-based software engineering. In: Coello CAC (ed) Proceedings of the 2020 genetic and evolutionary computation conference companion, GECCO ’20. Association for Computing Machinery, New York, pp 313–314
Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020b) Predicting continuous integration build failures using evolutionary search. Inf Softw Technol 128:106392
Saidani I, Ouni A, Mkaouer MW (2022) Improving the prediction of continuous integration build failures using deep learning. Autom Softw Eng 29(1):1–61. https://doi.org/10.1007/s10515-021-00319-5
Article Google Scholar
Schneider A, Honeyman C (2006) The Negotiator’s fieldbook. American Bar Association, Section of Dispute Resolution
Sing T, Sander O, Beerenwinkel N, Lengauer T (2005) ROCR: visualizing classifier performance in R. Bioinformatics 21(20):3940. https://doi.org/10.1093/bioinformatics/bti623
Article Google Scholar
Turner M, Kitchenham B, Brereton P, Charters S, Budgen D (2010) Does the technology acceptance model predict actual use? A systematic literature review. Inf Softw Technol 52(5):463–479. https://doi.org/10.1016/j.infsof.2009.11.005
Article Google Scholar
Venkatesh V, Davis FD (2000) A theoretical extension of the technology acceptance model: Four longitudinal field studies. Manag Sci 46(2):186–204
Article Google Scholar
Weyuker E, Ostrand T, Bell R (2008) Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empir Softw Eng 13:539–559. https://doi.org/10.1007/s10664-008-9082-8
Article Google Scholar
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in Software Engineering. Computer Science. Springer
Wright MN, Zieglerm A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77(1):1–17. https://doi.org/10.18637/jss.v077.i01
Article Google Scholar
Yan M, Xia X, Fan Y, Hassan AE, Lo D, Li S (2022) Just-in-time defect identification and localization: a two-phase framework. IEEE Trans Softw Eng 48(1):82–101. https://doi.org/10.1109/TSE.2020.2978819
Article Google Scholar
Yang X, Lo D, Xia X, Sun J (2017) TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220
Article Google Scholar
Zeng Z, Zhang Y, Zhang H, Zhang L (2021) Deep just-in-time defect prediction: How far are we? In: Cadar C, Zhang X (eds) Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2021. Association for Computing Machinery, New York, pp 427–438

Download references

Acknowledgements

Calculations in R have been carried out using resources provided by Wroclaw Centre for Networking and Supercomputing (http://wcss.pl), grant No. 578.

Author information

Authors and Affiliations

CODEFUSION Sp. z o.o., Armii Krajowej 16, Opole, 45071, Poland
Marcin Kawalerowicz
Faculty of Electrical Engineering, Automatic Control and Informatics, Opole University of Technology, Proszkowska 76, Opole, 45272, Poland
Marcin Kawalerowicz
Department of Applied Informatics, Wroclaw University of Science and Technology, Wyb.Wyspianskiego 27, Wroclaw, 50370, Poland
Lech Madeyski

Authors

Marcin Kawalerowicz
View author publications
You can also search for this author in PubMed Google Scholar
Lech Madeyski
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M. Kawalerowicz: Conceptualization, Methodology, Software, Validation, Investigation, Data Curation, Writing – original draft, Writing - review & editing, Visualization. L. Madeyski: Conceptualization, Methodology, Software, Validation, Investigation, Writing – original draft, Writing - review & editing, Supervision.

Corresponding author

Correspondence to Marcin Kawalerowicz.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Emerging Topics in Artificial Intelligence Selected from IEA/AIE2021 Guest Editors: Ali Selamat and Jerry Chun-Wei Lin

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kawalerowicz, M., Madeyski, L. Continuous build outcome prediction: an experimental evaluation and acceptance modelling. Appl Intell 53, 8673–8692 (2023). https://doi.org/10.1007/s10489-023-04523-6

Download citation

Accepted: 09 February 2023
Published: 18 April 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-023-04523-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous build outcome prediction: an experimental evaluation and acceptance modelling

Abstract

Access this article

Similar content being viewed by others

Model-based testing leveraged for automated web tests

Applications of AI in classical software engineering

An Improved and Optimized Random Forest Based Approach to Predict the Software Faults

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Continuous build outcome prediction: an experimental evaluation and acceptance modelling

Abstract

Access this article

Similar content being viewed by others

Model-based testing leveraged for automated web tests

Applications of AI in classical software engineering

An Improved and Optimized Random Forest Based Approach to Predict the Software Faults

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation