Abstract
Continuous Build Outcome Prediction (CBOP) is a lightweight implementation of Continuous Defect Prediction (CDP). CBOP combines: 1) results of continuous integration (CI) and 2) the data mined from the version control system with 3) machine learning (ML) to form a practice that evolved from software defect prediction (SDP) where a failing build is treated as a defect to fight against. Here, we explain the CBOP idea, where we use historical build results together with metrics derived from a software repository to create a model that classifies changes the developer is introducing to the source code during her work in a just-in-time manner. To evaluate the CBOP idea, we perform a small-n repeated measure with two conditions and replicate experiment in a real-life, business-driven software project. In this preliminary evaluation of CBOP, we study whether the practice will reduce the Failed Build Ratio (FBR) - the ratio of failing build results to all other build results. We calculate effect size and p-value of change in FBR while using the CBOP practice, provide an analysis of our model, and perform and report the results of a Technology Acceptance Model (TAM)-inspired survey that we conducted among experiment participants and industry specialists to assess the acceptance of CBOP and the tool.
Similar content being viewed by others
Notes
Jenkins CI Build results: http://javadoc.jenkins-ci.org/hudson/model/Result.html.
References
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Chechik M., Vigder M, Stewart D (eds) Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, CASCON ’08. ACM, New York, pp 23:304–23:318
Arora I, Tetarwal V, Saha A (2015) Open issues in software defect prediction. Proc Comput Sci 46:906–912. https://doi.org/10.1016/j.procs.2015.02.161
Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. In: Marciniak J.J. (ed) Encyclopedia of software engineering. Wiley
Bennin K, Ali N, Börstler J, Yu X (2020) Revisiting the impact of concept drift on just-in-time quality assurance. In: Chan W., Nagappan M, Budnik C (eds) 2020 IEEE 20th international conference on software quality, reliability and security (QRS), pp 53–59
Bickman L (1974) The social power of a uniform. J Appl Soc Psychol 4:47–61. https://doi.org/10.1111/j.1559-1816.1974.tb02807.x
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York
Brian R, Terry T, Beth A (2015) Package ‘rpart’ - Recursive partitioning for classification
Bulté I, Onghena P (2009) Randomization tests for multiple-baseline designs: An extension of the SCRT-R package. Behavi Res Methods 41:477–85. https://doi.org/10.3758/BRM.41.2.477
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Cialdini R (2009) Influence: the psychology of persuasion. Collins Business Essentials. HarperCollins e-books
Criminisi A, Shotton J, Criminisi A, Shotton J (2013) Decision forests for computer vision and medical image analysis. Springer, New York
Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. NOW Publishers
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4-5):531–577
Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 13(3):319–340
Davis FD, Bagozzi RP, Warshaw PR (1989) User acceptance of computer technology: a comparison of two theoretical models. Manag Sci 35:982–1003
Dugard P, File P, Todman J (2012) Single-case and Small-n experimental designs: a practical guide to randomization tests, 2nd edn. Routledge, Evanston
Eken B, Tosun A (2021) Investigating the performance of personalized models for software defect prediction. J Syst Softw 181:111038. https://doi.org/10.1016/j.jss.2021.111038
Felix EA, Lee SP (2020) Predicting the number of defects in a new software version. PLoS ONE 15(3):1–30. https://doi.org/10.1371/journal.pone.0229131
Ferguson CJ (2009) An effect size primer: A guide for clinicians and researchers. Prof Psychol Res Pract 40(5):532–538
Finlay J, Pears R, Connor AM (2014) Data stream mining for predicting software build outcomes using source code metrics. Inf Softw Technol 56(2):183–198. https://doi.org/10.1016/j.infsof.2013.09.001
Fisher GG, Chacon M, Chaffee DS (2019) Chapter 2 - theories of cognitive aging and work. In: Baltes BB, Rudolph CW, Zacher H (eds) Work across the lifespan. Academic Press, pp 17–45. https://doi.org/10.1016/B978-0-12-812756-8.00002-5
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Hoang T, Kang HJ, Lo D, Lawall J (2020) CC2Vec: distributed representations of code changes. In: Rothermel G, Bae D-H (eds) Proceedings of the ACM/IEEE 42nd international conference on software engineering. Association for Computing Machinery, New York, pp 518–529
Hoang T, Khanh Dam H, Kamei Y, Lo D, Ubayashi N (2019) DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In: Storey M-A, Adams B, Haiduc S (eds) 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 34–45
James G, Witten D, Hastie T, Tibshirani R (2014) An introduction to statistical learning: with applications in R. Springer Publishing Company, Incorporated
Jiang L, Jiang S, Gong L, Dong Y, Yu Q (2020) Which process metrics are significantly important to change of defects in evolving projects: an empirical study. IEEE Access 8:93705–93722. https://doi.org/10.1109/ACCESS.2020.2994528
Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: Denney E, Bultan T, Zeller A (eds) 2013 28th IEEE/ACM international conference on automated software engineering (ASE), pp 279–289
Kabir MA, Keung J, Turhan B, Bennin K (2021) Inter-release defect prediction with feature selection using temporal chunk-based learning: an empirical study. Appl Soft Comput 113:107870. https://doi.org/10.1016/j.asoc.2021.107870
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Kawalerowicz M, Madeyski L (2021a) Continuous build outcome prediction: a small-n experiment in settings of a real software project. In: Fujita H, Selamat A, Lin JC-W, Ali M (eds) Advances and trends in artificial intelligence. From theory to practice. Springer International Publishing, Cham, pp 412–425
Kawalerowicz M, Madeyski L (2021b) Jaskier: A supporting software tool for continuous build outcome prediction practice. In: Fujita H, Selamat A, Lin JC-W, Ali M (eds) Advances and trends in artificial intelligence. from theory to practice, Springer International Publishing, Cham, 1128 pp 426–438. https://doi.org/10.1007/978-3-030-79463-7_36
Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Kitchenham B, Madeyski L, Budgen D, Keung J, Brereton P, Charters S, Gibbs S, Pohthong A (2017) Robust statistical methods for empirical software engineering. Empir Softw Eng 22(2):579–630. https://doi.org/10.1007/s10664-016-9437-5
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
Lanza M, Mocci A, Ponzanelli L (2016) The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw 33(6):102–105
Liaw A, Wiener M (2015) Package ‘randomForest’ - Breiman and Cutler’s random forests for classification and regression
Ma HH (2006) An alternative method for quantitative synthesis of single-subject researches. Behav Modif 30(5):598–617
Madeyski L (2010) Test-driven development: an empirical evaluation of agile practice. Springer, New York
Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J 23(3):393–422. https://doi.org/10.1007/s11219-014-9241-7
Madeyski L, Kawalerowicz M (2017) Continuous defect prediction: the idea and a related dataset. In: González-Barahona JM, Hindle A, Tan L (eds) 14th international conference on mining software repositories (May 20-21, 2017. Buenos Aires, Argentina), pp 515–518. https://doi.org/10.1109/MSR.2017.46
Madhavan JT, Whitehead, EJ Jr (2007) Predicting buggy changes inside an integrated development environment. In: Cheng L-T, Morris C, Orso A, Robillard M (eds) Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, eclipse ’07. ACM, New York, pp 36–40
Marciniak JJ (2002) Encyclopedia of software engineering, 2n. Halsted Press, USA
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407. https://doi.org/10.1007/s10515-010-0069-5
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Schäfer W, Dwyer MB, Gruhn V (eds) 2008 ACM/IEEE 30th international conference on software engineering, pp 181–190
Onghena P (1992) Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behav Assess 14:153–171
Parker RI, Hagan-Burke S, Vannest K (2007) Percentage of all non-overlapping data (PAND): an Alternative to PND. J Spec Educ 40:194–204
Saidani I, Ouni A (2021) Toward a smell-aware prediction model for CI build failures. In: Grundy J, Hao D, Poshyvanyk D (eds) 2021 36th IEEE/ACM international conference on automated software engineering workshops (ASEW), pp 18–25
Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020a) On the prediction of continuous integration build failures using search-based software engineering. In: Coello CAC (ed) Proceedings of the 2020 genetic and evolutionary computation conference companion, GECCO ’20. Association for Computing Machinery, New York, pp 313–314
Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020b) Predicting continuous integration build failures using evolutionary search. Inf Softw Technol 128:106392
Saidani I, Ouni A, Mkaouer MW (2022) Improving the prediction of continuous integration build failures using deep learning. Autom Softw Eng 29(1):1–61. https://doi.org/10.1007/s10515-021-00319-5
Schneider A, Honeyman C (2006) The Negotiator’s fieldbook. American Bar Association, Section of Dispute Resolution
Sing T, Sander O, Beerenwinkel N, Lengauer T (2005) ROCR: visualizing classifier performance in R. Bioinformatics 21(20):3940. https://doi.org/10.1093/bioinformatics/bti623
Turner M, Kitchenham B, Brereton P, Charters S, Budgen D (2010) Does the technology acceptance model predict actual use? A systematic literature review. Inf Softw Technol 52(5):463–479. https://doi.org/10.1016/j.infsof.2009.11.005
Venkatesh V, Davis FD (2000) A theoretical extension of the technology acceptance model: Four longitudinal field studies. Manag Sci 46(2):186–204
Weyuker E, Ostrand T, Bell R (2008) Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empir Softw Eng 13:539–559. https://doi.org/10.1007/s10664-008-9082-8
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in Software Engineering. Computer Science. Springer
Wright MN, Zieglerm A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77(1):1–17. https://doi.org/10.18637/jss.v077.i01
Yan M, Xia X, Fan Y, Hassan AE, Lo D, Li S (2022) Just-in-time defect identification and localization: a two-phase framework. IEEE Trans Softw Eng 48(1):82–101. https://doi.org/10.1109/TSE.2020.2978819
Yang X, Lo D, Xia X, Sun J (2017) TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220
Zeng Z, Zhang Y, Zhang H, Zhang L (2021) Deep just-in-time defect prediction: How far are we? In: Cadar C, Zhang X (eds) Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2021. Association for Computing Machinery, New York, pp 427–438
Acknowledgements
Calculations in R have been carried out using resources provided by Wroclaw Centre for Networking and Supercomputing (http://wcss.pl), grant No. 578.
Author information
Authors and Affiliations
Contributions
M. Kawalerowicz: Conceptualization, Methodology, Software, Validation, Investigation, Data Curation, Writing – original draft, Writing - review & editing, Visualization. L. Madeyski: Conceptualization, Methodology, Software, Validation, Investigation, Writing – original draft, Writing - review & editing, Supervision.
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Emerging Topics in Artificial Intelligence Selected from IEA/AIE2021 Guest Editors: Ali Selamat and Jerry Chun-Wei Lin
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kawalerowicz, M., Madeyski, L. Continuous build outcome prediction: an experimental evaluation and acceptance modelling. Appl Intell 53, 8673–8692 (2023). https://doi.org/10.1007/s10489-023-04523-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04523-6