Abstract
Modern Code Review (MCR) is an essential practice in software engineering. MCR helps with the early detection of defects and preventing poor implementation practices and other benefits such as knowledge sharing, team awareness, and collaboration. However, reviewing code changes is a hard and time-consuming task requiring developers to prioritize code review tasks to optimize their time and effort spent on code review. Previous approaches attempted to prioritize code reviews based on their likelihood to be merged by leveraging Machine learning (ML) models to maximize the prediction performance. However, these approaches did not consider the review effort dimension which results in sub-optimal solutions for code review prioritization. It is thus important to consider the code review effort in code review request prioritization to help developers optimize their code review efforts while maximizing the number of merged code changes. To address this issue, we propose \({CostAwareCR}\), a multi-objective optimization-based approach to predict and prioritize code review requests based on their likelihood to be merged, and their review effort measured in terms of the size of the reviewed code. \({CostAwareCR}\)uses the RuleFit algorithm to learn relevant features. Then, our approach learns Logistic Regression (LR) model weights using the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to simultaneously maximize (1) the prediction performance and, (2) the cost-effectiveness. To evaluate the performance of \({CostAwareCR}\), we performed a large empirical study on 146,612 code reviews across 3 large organizations, namely LibreOffice, Eclipse and GerritHub. The obtained results indicate that \({CostAwareCR}\)achieves promising Area Under the Curve (AUC) scores ranging from 0.75 to 0.77. Additionally, \({CostAwareCR}\)outperforms various baseline approaches in terms of effort-awareness performance metrics being able to prioritize the review of 87% of code changes by using only 20% of the effort. Furthermore, our approach achieved 0.92 in terms of the normalized area under the lift chart (\(P_{opt}\)) indicating that our approach is able to provide near-optimal code review prioritization based on the review effort. Our results indicate that our multi-objective formulation is prominent for learning models that provide a trade-off between good cost-effectiveness while keeping promising prediction performance.
Similar content being viewed by others
Notes
References
AlOmar EA, Chouchen M, Mkaouer MW, Ouni A (2022) Code review practices for refactoring changes: An empirical study on openstack. In: Proceedings of the 19th international conference on mining software repositories, pp 689–701
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering, pp 1–10
Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: 2013 35th International conference on software engineering (ICSE). IEEE, pp 712–721
Baysal O, Kononenko O, Holmes R, Godfrey MW (2016) Investigating technical and non-technical factors influencing modern code review. Empir Softw Eng 21(3):932–959
Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the 11th working conference on mining software repositories, pp 202–211
Blank J, Deb K (2020) Pymoo: Multi–objective optimization in python. IEEE Access 8:89,497–89,509
Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: An empirical investigation. In: Int. symp. on empirical software eng. and measurement, pp. 1–10
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth international conference on software testing, verification and validation. IEEE, pp 252–261
Chen X, Zhao Y, Wang Q, Yuan Z (2018) Multi: multi-objective effort-aware just-in-time software defect prediction. Inf Softw Technol 93:1–13
Chen D, Fu W, Krishna R, Menzies T (2018) Applications of psychological science for actionable analytics. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 456–467
Chen L, Rigby PC, Nagappan N (2022) Understanding why we cannot model how long a code review will take: an industrial case study. In: Proceedings of the 30th ACM Joint European software engineering conference and symposium on the foundations of software engineering, pp 1314–1319
Chouchen M, Ouni A, Olongo J, Mkaouer MW (2023) Learning to predict code review completion time in modern code review. Empir Softw Eng 28(4):82
Chouchen M, Ouni A, Kula RG, Wang D, Thongtanunam P, Mkaouer MW, Matsumoto K (2021) Anti-patterns in modern code review: symptoms and prevalence. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 531–535
Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic press
CostAwareCR Replication Package (2023) https://github.com/stilab-ets/CostAwareCR
DA Van Veldhuizen, GB Lamont et al (1998) Evolutionary computation and convergence to a pareto front. In: Late breaking papers at the genetic programming 1998 conference. Citeseer, pp 221–228
De Winter JC, Gosling SD, Potter J (2016) Comparing the pearson and spearman correlation coefficients across distributions and sample sizes: a tutorial using simulations and empirical data. Psychol Methods 21(3):273
Deb K, Jai H (2013) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints. IEEE Trans Evol Comput 18(4):577–601
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Deb K, Agrawal S (1999) A niched-penalty approach for constraint handling in genetic algorithms. In: Artificial neural nets and genetic algorithms. Springer, pp 235–243
Deb K, Sindhya K, Okabe T (2007) Self-adaptive simulated binary crossover for real-parameter optimization. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, pp 1187–1194
Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems: first international workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings 1. Springer, pp 1–15
Egelman CD, Murphy–Hill E, Kammer E, Hodges MM, Green C, Jaspan C, Lin J (2020) Predicting developers’ negative feelings about code review. In: 2020 IEEE/ACM 42nd International conference on software engineering (ICSE). IEEE, pp 174–185
Fagan ME (1999) Design and code inspections to reduce errors in program development. IBM Syst J 38(2.3):258–287
Fan Y, Xia X, Lo D, Li S (2018) Early prediction of merged code changes to prioritize reviewing tasks. Empir Softw Eng 23(6):3346–3393
Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stati 916–954
Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355
Gousios G, Storey MA, Bacchelli A (2016) Work practices and challenges in pull-based development: the contributor’s perspective. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, pp 285–296
Gousios G, Zaidman A, Storey M–A, Van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 358–368
Greiler M, Bird C, Storey M–A, MacLeod L, Czerwonka J (2016) Code reviewing in the trenches: understanding challenges, best practices and tool needs
Guo Y, Shepperd M, Li N (2018) Bridging effort-aware prediction and strong classification: a just-in-time software defect prediction study. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, pp 325–326
Harman M, Jones BF (2001) Search-based software engineering. Inf softw Technol 43(14):833–839
Harman M, Mansouri SA, Zhang Y (2012) Search-based software engineering: trends, techniques and applications. ACM Computing Surveys (CSUR) 45(1):1–61
Harman M, Clark J (2004) Metrics are fitness functions too. In: 10th International symposium on software metrics, 2004. proceedings. Ieee, pp 58–69
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowledge Manag Process 5(2):1
Huang Y, Liang X, Chen Z, Jia N, Luo X, Chen X, Zheng Z, Zhou X (2022) Reviewing rounds prediction for code patches. Empir Softw Eng 27:1–40
Islam K, Ahmed T, Shahriyar R, Iqbal A, Uddin G (2022) Early prediction for merged vs abandoned code changes in modern code reviews. Inf Softw Technol 142:106756
Jeong G, Kim S, Zimmermann T,. Yi (2009) Improving code review by predicting reviewers and acceptance of patches. Research on software analysis for error-free computing center Tech-Memo (ROSAEC MEMO 2009-006), pp 1–18
Jiang Y, Adams B, German DM (2013) Will my patch make it? and how fast? case study on the linux kernel. In: 2013 10th Working conference on mining software repositories (MSR). IEEE, pp. 101–110
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Khatoonabadi S, Costa DE, Abdalkareem R, Shihab E (2021) On wasted contributions: understanding the dynamics of contributor-abandoned pull requests: a mixed-methods study of 10 large open-source projects. ACM Trans Softw Eng Methodol
LaValle SM, Branicky MS, Lindemann SR (2004) On the relationship between classical grid search and probabilistic roadmaps. Int J Robot Res 23(7–8):673–692
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
MacLeod L, Greiler M, Storey M-A, Bird C, Czerwonka J (2017) Code reviewing in the trenches: challenges and best practices. IEEE Softw 35(4):34–42
Mende T, Koschke R (2010) Effort-aware defect prediction models. In: 2010 14th European conference on software maintenance and reengineering. IEEE, pp 107–116
Milanesio L (2013) Learning Gerrit Code Review. Packt Publishing, vol 144
Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobotics 7:21
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355
Panichella A (2019) An adaptive evolutionary algorithm based on non-euclidean geometry for many-objective optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 595–603
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Pornprasit C, Tantithamthavorn C, Jiarpakdee J, Fu M, Thongtanunam P (2021) Pyexplainer: explaining the predictions of just-in-time defect models. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). IEEE, pp 407–418
Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, pp 202–212
Riquelme N, Von Lücken C, Baran B (2015) Performance metrics in multi-objective optimization. In: 2015 Latin American computing conference (CLEI). IEEE, pp 1–11
Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. In: 2011 27th IEEE international conference on software maintenance (ICSM). IEEE, pp 303–312
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020) Predicting continuous integration build failures using evolutionary search. Inf Softw Technol 128:106392
Saidani I, Ouni A, Mkaouer MW (2022) Improving the prediction of continuous integration build failures using deep learning. Autom Softw Eng 29(1):1–61
Seada H, Deb K (2015) A unified evolutionary optimization procedure for single, multiple, and many objectives. IEEE Trans Evol Comput 20(3):358–369
Segura C, Coello CAC, Miranda G, León C (2016) Using multi-objective evolutionary algorithms for single-objective constrained and unconstrained optimization. Ann Oper Res 240:217–250
Shukla S, Radhakrishnan T, Muthukumaran K, Neti LBM (2018) Multi-objective cross-version defect prediction. Soft Comput 22(6):1959–1980
Shull F, Seaman C (2008) Inspecting the history of inspections: an example of evidence-based technology diffusion. IEEE Softw 25(1):88–90
Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1541–1546
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization for defect prediction models
Thongtanunam P, McIntosh S, Hassan AE, Iida H (2017) Review participation in modern code review. Empir Softw Eng 22(2):768–817
Wang S, Ali S, Yue T, Li Y, Liaaen M (2016) A practical guide to select quality indicators for assessing pareto-based search algorithms in search-based software engineering. In: Proceedings of the 38th international conference on software engineering, pp 631–642
Wang S, Bansal C, Nagappan N, Philip AA (2019) Leveraging change intents for characterizing and identifying large-review-effort changes. In: Proceedings of the fifteenth international conference on predictive models and data analytics in software engineering, pp 46–55
Weißgerber P, Neu D, Diehl S (2008) Small patches get in!. In: Proceedings of the 2008 international working conference on mining software repositories, pp 67–76
Yang X, Kula RG, Yoshida N, Iida H (2016) Mining the modern code review repositories: a dataset of people, process and product. In: Proceedings of the 13th international conference on mining software repositories, pp 460–463
Zhang X, Tian Y, Jin Y (2014) A knee point-driven evolutionary algorithm for many-objective optimization. IEEE Trans Evol Comput 19(6):761–776
Zhao G, da Costa DA, Zou Y (2019) Improving the pull requests review process using learning-to-rank algorithms. Empir Softw Eng 24(4):2140–2170
Zhou Q, Wu J, Xue T, Jin P (2021) A two-stage adaptive multi-fidelity surrogate model-assisted multi-objective genetic algorithm for computationally expensive problems. Engi Comput 37:623–639
Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms–a comparative case study. In: International conference on parallel problem solving from nature. Springer, pp 292–301
Acknowledgements
This research has been funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) RGPIN-2018-05960.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Yasutaka Kamei.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chouchen, M., Ouni, A. A multi-objective effort-aware approach for early code review prediction and prioritization. Empir Software Eng 29, 29 (2024). https://doi.org/10.1007/s10664-023-10431-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-023-10431-7