Skip to main content

Advertisement

Log in

A multi-objective effort-aware approach for early code review prediction and prioritization

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Modern Code Review (MCR) is an essential practice in software engineering. MCR helps with the early detection of defects and preventing poor implementation practices and other benefits such as knowledge sharing, team awareness, and collaboration. However, reviewing code changes is a hard and time-consuming task requiring developers to prioritize code review tasks to optimize their time and effort spent on code review. Previous approaches attempted to prioritize code reviews based on their likelihood to be merged by leveraging Machine learning (ML) models to maximize the prediction performance. However, these approaches did not consider the review effort dimension which results in sub-optimal solutions for code review prioritization. It is thus important to consider the code review effort in code review request prioritization to help developers optimize their code review efforts while maximizing the number of merged code changes. To address this issue, we propose \({CostAwareCR}\), a multi-objective optimization-based approach to predict and prioritize code review requests based on their likelihood to be merged, and their review effort measured in terms of the size of the reviewed code. \({CostAwareCR}\)uses the RuleFit algorithm to learn relevant features. Then, our approach learns Logistic Regression (LR) model weights using the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to simultaneously maximize (1) the prediction performance and, (2) the cost-effectiveness. To evaluate the performance of \({CostAwareCR}\), we performed a large empirical study on 146,612 code reviews across 3 large organizations, namely LibreOffice, Eclipse and GerritHub. The obtained results indicate that \({CostAwareCR}\)achieves promising Area Under the Curve (AUC) scores ranging from 0.75 to 0.77. Additionally, \({CostAwareCR}\)outperforms various baseline approaches in terms of effort-awareness performance metrics being able to prioritize the review of 87% of code changes by using only 20% of the effort. Furthermore, our approach achieved 0.92 in terms of the normalized area under the lift chart (\(P_{opt}\)) indicating that our approach is able to provide near-optimal code review prioritization based on the review effort. Our results indicate that our multi-objective formulation is prominent for learning models that provide a trade-off between good cost-effectiveness while keeping promising prediction performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://www.gerritcodereview.com/

  2. https://github.com/features/code-review

  3. https://gerrit.libreoffice.org/c/core/+/65990/

  4. https://gerrit.libreoffice.org/c/core/+/57241/

  5. https://git.eclipse.org/r/q/status:open+-is:wip

  6. https://gerrit.libreoffice.org/q/status:open+-is:wip

  7. https://review.gerrithub.io/q/status:open+-is:wip

  8. https://lightgbm.readthedocs.io/en/latest/index.html

  9. https://github.com/klainfo/ScottKnottESD/tree/master

References

  • AlOmar EA, Chouchen M, Mkaouer MW, Ouni A (2022) Code review practices for refactoring changes: An empirical study on openstack. In: Proceedings of the 19th international conference on mining software repositories, pp 689–701

  • Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering, pp 1–10

  • Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17

    Article  Google Scholar 

  • Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: 2013 35th International conference on software engineering (ICSE). IEEE, pp 712–721

  • Baysal O, Kononenko O, Holmes R, Godfrey MW (2016) Investigating technical and non-technical factors influencing modern code review. Empir Softw Eng 21(3):932–959

    Article  Google Scholar 

  • Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the 11th working conference on mining software repositories, pp 202–211

  • Blank J, Deb K (2020) Pymoo: Multi–objective optimization in python. IEEE Access 8:89,497–89,509

  • Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: An empirical investigation. In: Int. symp. on empirical software eng. and measurement, pp. 1–10

  • Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth international conference on software testing, verification and validation. IEEE, pp 252–261

  • Chen X, Zhao Y, Wang Q, Yuan Z (2018) Multi: multi-objective effort-aware just-in-time software defect prediction. Inf Softw Technol 93:1–13

    Article  Google Scholar 

  • Chen D, Fu W, Krishna R, Menzies T (2018) Applications of psychological science for actionable analytics. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 456–467

  • Chen L, Rigby PC, Nagappan N (2022) Understanding why we cannot model how long a code review will take: an industrial case study. In: Proceedings of the 30th ACM Joint European software engineering conference and symposium on the foundations of software engineering, pp 1314–1319

  • Chouchen M, Ouni A, Olongo J, Mkaouer MW (2023) Learning to predict code review completion time in modern code review. Empir Softw Eng 28(4):82

    Article  Google Scholar 

  • Chouchen M, Ouni A, Kula RG, Wang D, Thongtanunam P, Mkaouer MW, Matsumoto K (2021) Anti-patterns in modern code review: symptoms and prevalence. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 531–535

  • Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic press

  • CostAwareCR Replication Package (2023) https://github.com/stilab-ets/CostAwareCR

  • DA Van Veldhuizen, GB Lamont et al (1998) Evolutionary computation and convergence to a pareto front. In: Late breaking papers at the genetic programming 1998 conference. Citeseer, pp 221–228

  • De Winter JC, Gosling SD, Potter J (2016) Comparing the pearson and spearman correlation coefficients across distributions and sample sizes: a tutorial using simulations and empirical data. Psychol Methods 21(3):273

    Article  Google Scholar 

  • Deb K, Jai H (2013) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints. IEEE Trans Evol Comput 18(4):577–601

    Article  Google Scholar 

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: nsga-ii. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  • Deb K, Agrawal S (1999) A niched-penalty approach for constraint handling in genetic algorithms. In: Artificial neural nets and genetic algorithms. Springer, pp 235–243

  • Deb K, Sindhya K, Okabe T (2007) Self-adaptive simulated binary crossover for real-parameter optimization. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, pp 1187–1194

  • Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems: first international workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings 1. Springer, pp 1–15

  • Egelman CD, Murphy–Hill E, Kammer E, Hodges MM, Green C, Jaspan C, Lin J (2020) Predicting developers’ negative feelings about code review. In: 2020 IEEE/ACM 42nd International conference on software engineering (ICSE). IEEE, pp 174–185

  • Fagan ME (1999) Design and code inspections to reduce errors in program development. IBM Syst J 38(2.3):258–287

  • Fan Y, Xia X, Lo D, Li S (2018) Early prediction of merged code changes to prioritize reviewing tasks. Empir Softw Eng 23(6):3346–3393

    Article  Google Scholar 

  • Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stati 916–954

  • Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355

  • Gousios G, Storey MA, Bacchelli A (2016) Work practices and challenges in pull-based development: the contributor’s perspective. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, pp 285–296

  • Gousios G, Zaidman A, Storey M–A, Van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 358–368

  • Greiler M, Bird C, Storey M–A, MacLeod L, Czerwonka J (2016) Code reviewing in the trenches: understanding challenges, best practices and tool needs

  • Guo Y, Shepperd M, Li N (2018) Bridging effort-aware prediction and strong classification: a just-in-time software defect prediction study. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, pp 325–326

  • Harman M, Jones BF (2001) Search-based software engineering. Inf softw Technol 43(14):833–839

    Article  Google Scholar 

  • Harman M, Mansouri SA, Zhang Y (2012) Search-based software engineering: trends, techniques and applications. ACM Computing Surveys (CSUR) 45(1):1–61

    Article  Google Scholar 

  • Harman M, Clark J (2004) Metrics are fitness functions too. In: 10th International symposium on software metrics, 2004. proceedings. Ieee, pp 58–69

  • Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowledge Manag Process 5(2):1

    Article  Google Scholar 

  • Huang Y, Liang X, Chen Z, Jia N, Luo X, Chen X, Zheng Z, Zhou X (2022) Reviewing rounds prediction for code patches. Empir Softw Eng 27:1–40

    Article  Google Scholar 

  • Islam K, Ahmed T, Shahriyar R, Iqbal A, Uddin G (2022) Early prediction for merged vs abandoned code changes in modern code reviews. Inf Softw Technol 142:106756

    Article  Google Scholar 

  • Jeong G, Kim S, Zimmermann T,. Yi (2009) Improving code review by predicting reviewers and acceptance of patches. Research on software analysis for error-free computing center Tech-Memo (ROSAEC MEMO 2009-006), pp 1–18

  • Jiang Y, Adams B, German DM (2013) Will my patch make it? and how fast? case study on the linux kernel. In: 2013 10th Working conference on mining software repositories (MSR). IEEE, pp. 101–110

  • Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Khatoonabadi S, Costa DE, Abdalkareem R, Shihab E (2021) On wasted contributions: understanding the dynamics of contributor-abandoned pull requests: a mixed-methods study of 10 large open-source projects. ACM Trans Softw Eng Methodol

  • LaValle SM, Branicky MS, Lindemann SR (2004) On the relationship between classical grid search and probabilistic roadmaps. Int J Robot Res 23(7–8):673–692

    Article  Google Scholar 

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  • MacLeod L, Greiler M, Storey M-A, Bird C, Czerwonka J (2017) Code reviewing in the trenches: challenges and best practices. IEEE Softw 35(4):34–42

    Article  Google Scholar 

  • Mende T, Koschke R (2010) Effort-aware defect prediction models. In: 2010 14th European conference on software maintenance and reengineering. IEEE, pp 107–116

  • Milanesio L (2013) Learning Gerrit Code Review. Packt Publishing, vol 144

  • Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobotics 7:21

    Article  Google Scholar 

  • Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355

    Article  Google Scholar 

  • Panichella A (2019) An adaptive evolutionary algorithm based on non-euclidean geometry for many-objective optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 595–603

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  Google Scholar 

  • Pornprasit C, Tantithamthavorn C, Jiarpakdee J, Fu M, Thongtanunam P (2021) Pyexplainer: explaining the predictions of just-in-time defect models. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). IEEE, pp 407–418

  • Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, pp 202–212

  • Riquelme N, Von Lücken C, Baran B (2015) Performance metrics in multi-objective optimization. In: 2015 Latin American computing conference (CLEI). IEEE, pp 1–11

  • Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. In: 2011 27th IEEE international conference on software maintenance (ICSM). IEEE, pp 303–312

  • Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747

  • Saidani I, Ouni A, Chouchen M, Mkaouer MW (2020) Predicting continuous integration build failures using evolutionary search. Inf Softw Technol 128:106392

    Article  Google Scholar 

  • Saidani I, Ouni A, Mkaouer MW (2022) Improving the prediction of continuous integration build failures using deep learning. Autom Softw Eng 29(1):1–61

    Article  Google Scholar 

  • Seada H, Deb K (2015) A unified evolutionary optimization procedure for single, multiple, and many objectives. IEEE Trans Evol Comput 20(3):358–369

    Article  Google Scholar 

  • Segura C, Coello CAC, Miranda G, León C (2016) Using multi-objective evolutionary algorithms for single-objective constrained and unconstrained optimization. Ann Oper Res 240:217–250

    Article  MathSciNet  Google Scholar 

  • Shukla S, Radhakrishnan T, Muthukumaran K, Neti LBM (2018) Multi-objective cross-version defect prediction. Soft Comput 22(6):1959–1980

    Article  Google Scholar 

  • Shull F, Seaman C (2008) Inspecting the history of inspections: an example of evidence-based technology diffusion. IEEE Softw 25(1):88–90

    Article  Google Scholar 

  • Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1541–1546

  • Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization for defect prediction models

  • Thongtanunam P, McIntosh S, Hassan AE, Iida H (2017) Review participation in modern code review. Empir Softw Eng 22(2):768–817

    Article  Google Scholar 

  • Wang S, Ali S, Yue T, Li Y, Liaaen M (2016) A practical guide to select quality indicators for assessing pareto-based search algorithms in search-based software engineering. In: Proceedings of the 38th international conference on software engineering, pp 631–642

  • Wang S, Bansal C, Nagappan N, Philip AA (2019) Leveraging change intents for characterizing and identifying large-review-effort changes. In: Proceedings of the fifteenth international conference on predictive models and data analytics in software engineering, pp 46–55

  • Weißgerber P, Neu D, Diehl S (2008) Small patches get in!. In: Proceedings of the 2008 international working conference on mining software repositories, pp 67–76

  • Yang X, Kula RG, Yoshida N, Iida H (2016) Mining the modern code review repositories: a dataset of people, process and product. In: Proceedings of the 13th international conference on mining software repositories, pp 460–463

  • Zhang X, Tian Y, Jin Y (2014) A knee point-driven evolutionary algorithm for many-objective optimization. IEEE Trans Evol Comput 19(6):761–776

    Article  Google Scholar 

  • Zhao G, da Costa DA, Zou Y (2019) Improving the pull requests review process using learning-to-rank algorithms. Empir Softw Eng 24(4):2140–2170

    Article  Google Scholar 

  • Zhou Q, Wu J, Xue T, Jin P (2021) A two-stage adaptive multi-fidelity surrogate model-assisted multi-objective genetic algorithm for computationally expensive problems. Engi Comput 37:623–639

    Article  Google Scholar 

  • Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms–a comparative case study. In: International conference on parallel problem solving from nature. Springer, pp 292–301

Download references

Acknowledgements

This research has been funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) RGPIN-2018-05960.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moataz Chouchen.

Additional information

Communicated by: Yasutaka Kamei.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chouchen, M., Ouni, A. A multi-objective effort-aware approach for early code review prediction and prioritization. Empir Software Eng 29, 29 (2024). https://doi.org/10.1007/s10664-023-10431-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-023-10431-7

Keywords

Navigation