What makes a code review useful to OpenDev developers? An empirical investigation

Turzo, Asif Kamal; Bosu, Amiangshu

doi:10.1007/s10664-023-10411-x

What makes a code review useful to OpenDev developers? An empirical investigation

Published: 22 November 2023

Volume 29, article number 6, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

470 Accesses
Explore all metrics

Abstract

Context

Due to the association of significant efforts, even a minor improvement in the effectiveness of Code Reviews(CR) can incur significant savings for a software development organization.

Objective

This study aims to develop a finer grain understanding of what makes a code review comment useful to OSS developers, to what extent a code review comment is considered useful to them, and how various contextual and participant-related factors influence its degree of usefulness.

Method

On this goal, we have conducted a three-stage mixed-method study. We randomly selected 2,500 CR comments from the OpenDev Nova project and manually categorized the comments. We designed a survey of OpenDev developers to better understand their perspectives on useful CRs. Combining our survey-obtained scores with our manually labeled dataset, we trained two regression models - one to identify factors that influence the usefulness of CR comments and the other to identify factors that improve the odds of ‘Functional’ defect identification over the others.

Results

The results of our study suggest that a CR comment’s usefulness is dictated not only by its technical contributions, such as defect findings or quality improvement tips but also by its linguistic characteristics, such as comprehensibility and politeness. While a reviewer’s coding experience is positively associated with CR usefulness, the number of mutual reviews, comment volume in a file, the total number of lines added /modified, and CR interval have the opposite associations. While authorship and reviewership experiences for the files under review have been the most popular attributes for reviewer recommendation systems, we do not find any significant association of those attributes with CR usefulness.

Conclusion

We recommend discouraging frequent code review associations between two individuals as such associations may decrease CR usefulness. We also recommend authoring CR comments in a constructive and empathetic tone. As several of our results deviate from prior studies, we recommend more investigations to identify context-specific attributes to build reviewer recommendation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ethics in the Software Development Process: from Codes of Conduct to Ethical Deliberation

Article Open access 21 April 2021

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Challenges of Low-Code/No-Code Software Development: A Literature Review

Data Availability

Our analysis scripts and aggregated dataset are publicly available at https://github.com/WSU-SEAL/CR-usefulness-EMSE.

Notes

https://www.gerritcodereview.com/
https://review.opendev.org
Kappa (\(\kappa \)) scores are commonly interpreted as: 0.01-0.20 as ‘none to slight,’ 0.21-0.40 as ‘fair,’ 0.41- 0.60 as ‘moderate,’ 0.61-0.80 as ‘substantial,’ and 0.81-1.00 as ‘almost perfect agreement’ Landis and Koch (1977).
\(p-values\) are adjusted using Benjamini and Hochberg corrections (Benjamini and Hochberg 1995) due to multiple comparisons.

References

Allison P (2014) Prediction vs. causation in regression analysis. Statistical Horizons 703
Asthana S, Kumar R, Bhagwan R, Bird C, Bansal C, Maddila C, Mehta S, Ashok B (2019) Whodo: automating reviewer suggestions at scale. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 937–945
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: 2013 35th international conference on software engineering (ICSE), IEEE pp 712–721
Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: 2013 35th International conference on software engineering (ICSE), IEEE pp 931–940
Baltes S, Diehl S (2016) Worse than spam: Issues in sampling software developers. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–6
Barnett M, Bird C, Brunet J, Lahiri SK (2015) Helping developers help themselves: Automatic decomposition of code review changesets. In: Proceedings of the 37th international conference on software engineering- vol 1, IEEE Press, pp 134–144
Baum T, Schneider K (2016) On the need for a new generation of code review tools. In: International conference on product-focused software process improvement, Springer, pp 301–308
Bayaga A (2010) Multinomial logistic regression: Usage and application in risk analysis. J App Quantitative Methods 5(2)
Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix? In: Proceedings of the 11th working conference on mining software repositories, pp 202–211
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B (Methodol) 57(1):289–300
Article MathSciNet Google Scholar
Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2017) Process aspects and social dynamics of contemporary code review: Insights from open source development and industrial practice at microsoft. IEEE Trans Software Eng 43(1):56–75
Article Google Scholar
Bosu A, Carver JC, Hafiz M, Hilley P, Janni D (2014) Identifying the characteristics of vulnerable code changes: An empirical study. 22nd ACM SIGSOFT international symposium on the foundations of software engineering, FSE ’14. China, Hong Kong, pp 257–268
Bosu A, Greiler M, Bird C (2015) Characteristics of useful code reviews: An empirical study at microsoft. In: 2015 IEEE/ACM 12th Working conference on mining software repositories, IEEE pp 146–156
Changyong F, Hongyue W, Naiji L, Tian C, Hua H, Ying L et al (2014) Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry 26(2):105
Google Scholar
Chouchen M, Ouni A, Mkaouer MW, Kula RG, Inoue K (2021) Whoreview: A multi-objective search-based approach for code reviewers recommendation in modern code review. Appl Soft Comput 100:106908
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46
Article Google Scholar
Czerwonka J, Greiler M, Tilford J (2015) Code reviews do not find bugs. how the current code review best practice slows us down. In: 2015 IEEE/ACM 37th IEEE International conference on software engineering, vol 2, IEEE, pp 27–28
Dias M, Bacchelli A, Gousios G, Cassou D, Ducasse S (2015) Untangling fine-grained code changes. In: 2015 IEEE 22nd International conference on software analysis, evolution, and reengineering (SANER), IEEE, pp 341–350
Ebert F, Castor F, Novielli N, Serebrenik A (2021) An exploratory study on confusion in code reviews. Empir Softw Eng 26:1–48
Article Google Scholar
Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15(3):182–211. https://doi.org/10.1147/sj.153.0182
Article Google Scholar
Foley B (2018) What is regression analysis and why should i use it. Source: https://www.surveygizmo.com/resources/blog/regression-analysis
Foundation O (2022) 2022 openifra foundation annual report. https://openinfra.dev/annual-report/2022
Fowler M (2012) Refactoring catalog. Refactoring Home Page, URL: http://www.refactoring.com/catalog/index.html (letzter Abruf: 09.02. 2006)
Fowler M, Highsmith J et al (2001) The agile manifesto. Software Development 9(8):28–35
Google Scholar
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th working conference on mining software repositories, pp 172–181
Gauthier IX, Lamothe M, Mussbacher G, McIntosh S (2021) Is historical data an appropriate benchmark for reviewer recommendation systems?: A case study of the gerrit community. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE), IEEE, pp 30–41
Gómez VU, Ducasse S, D’Hondt T (2015) Visually characterizing source code changes. Sci Comput Program 98:376–393
Article Google Scholar
Governance O (2023) Openstack project teams. https://governance.openstack.org/tc/reference/projects/
Gunawardena SD, Devine P, Beaumont I, Garden LP, Murphy-Hill E, Blincoe K (2022) Destructive criticism in software code review impacts inclusion. Proceedings of the ACM on Human-Computer Interaction 6(CSCW2):1–29
Article Google Scholar
Han X, Tahir A, Liang P, Counsell S, Luo Y (2021) Understanding code smell detection via code review: A study of the openstack community. In: 2021 IEEE/ACM 29th International conference on program comprehension (ICPC), IEEE, pp 323–334
Harrell FE Jr, Lee KL, Califf RM, Pryor DB, Rosati RA (1984) Regression modelling strategies for improved prognostic prediction. Stat Med 3(2):143–152
Article Google Scholar
Hasan M, Iqbal A, Islam MRU, Rahman A, Bosu A (2021) Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report. Empir Softw Eng 26(6):1–34
Article Google Scholar
Hassan AE, Holt RC (2003) Studying the chaos of code development. In: WCRE, vol 3, p 123
Hatton L (2008) Testing the value of checklists in code inspections. IEEE Softw 25(4):82–88
Article Google Scholar
Helland IS (1987) On the interpretation and use of r2 in regression analysis. Biometrics pp 61–69
Henley AZ, Muçlu K, Christakis M, Fleming SD, Bird C (2018) Cfar: A tool to increase communication, productivity, and review quality in collaborative code reviews. In: Proceedings of the 2018 CHI conference on human factors in computing systems, pp 1–13
Hinkle D, Jurs H, Wiersma W (1998) Applied statistics for the behavioral sciences
Hirao T, Ihara A, Ueda Y, Phannachitta P, Matsumoto Ki (2016) The impact of a low level of agreement among reviewers in a code review process. In: IFIP International conference on open source systems, Springer, pp 97–110
Hirao T, McIntosh S, Ihara A, Matsumoto K (2020) Code reviews with divergent review scores: An empirical study of the openstack and qt communities. IEEE Trans Softw Eng
Hong Y, Tantithamthavorn C, Thongtanunam P, Aleti A (2022) Commentfinder: a simpler, faster, more accurate code review comments recommendation. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 507–519
Huang Y, Jia N, Chen X, Hong K, Zheng Z (2020) Code review knowledge perception: Fusing multi-features for salient-class location. IEEE Trans Softw Eng pp 1–1 . https://doi.org/10.1109/TSE.2020.3021902
Jiarpakdee J, Tantithamthavorn C, Treude C (2018) Autospearman: Automatically mitigating correlated software metrics for interpreting defect models. In: 2018 IEEE International conference on software maintenance and evolution (ICSME), IEEE computer society pp 92–103
Kincaid JP, Fishburne Jr RP, Rogers RL, Chissom BS (1975) Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Tech rep, naval technical training command millington tn research branch
Kononenko O, Baysal O, Godfrey MW (2016) Code review quality: How developers see it. In: Proceedings of the 38th International conference on software engineering, ICSE ’16, pp 1028–1038. ACM, New York, NY, USA. https://doi.org/10.1145/2884781.2884840
Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: Do people and participation matter? In: 2015 IEEE international conference on software maintenance and evolution (ICSME), IEEE, pp 111–120
Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, pp 124–133
Landis JR, Koch GG (1977) An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics pp 363–374
Mansfield ER, Helms BP (1982) Detecting multicollinearity. Am Stat 36(3a):158–160
Article Google Scholar
Mäntylä MV, Lassenius C (2008) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448
Article Google Scholar
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320
Article MathSciNet Google Scholar
McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21:2146–2189
Article Google Scholar
Mirsaeedi E, Rigby PC (2020) Mitigating turnover with code review recommendation: balancing expertise, workload, and knowledge distribution. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1183–1195
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering, pp 284–292
Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: First international symposium on empirical software engineering and measurement (ESEM 2007), IEEE, pp 364–373
Nagelkerke NJ et al (1991) A note on a general definition of the coefficient of determination. Biometrika 78(3):691–692
Article MathSciNet Google Scholar
Osborne JW, Waters E (2002) Four assumptions of multiple regression that researchers should always test. Pract Assess Res Eval 8(1):2
Google Scholar
Pandya P, Tiwari S (2022) Corms: A github and gerrit based hybrid code reviewer recommendation approach for modern code review. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2022, association for computing machinery, New York, NY, USA p 546-557
Panichella S, Zaugg N (2020) An empirical investigation of relevant changes and automation needs in modern code review. Empirical Software Engineering 25(TBD), TBD
Paul R, Turzo AK, Bosu A (2021) Why security defects go unnoticed during code reviews? a case-control study of the chromium os project. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE), IEEE pp 1373–1385
Pinto G, Steinmacher I, Gerosa MA (2016) More common than you think: An in-depth study of casual contributors. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1, IEEE, pp 112–123
Qualtrics (2023) Question types. https://www.qualtrics.com/support/survey-platform/survey-module/editing-questions/question-types-guide/question-types-overview/
Rahman MM, Roy CK, Collins JA (2016) Correct: code reviewer recommendation in github based on cross-project and technology experience. In: Proceedings of the 38th International conference on software engineering companion, pp 222–231
Rahman MM, Roy CK, Kula RG (2017) Predicting usefulness of code review comments using textual features and developer experience. In: 2017 IEEE/ACM 14th International conference on mining software repositories (MSR), IEEE, pp 215–226
Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, pp 202–212
Rigby PC, Storey MA (2011) Understanding broadcast based peer review on open source software projects. In: 2011 33rd International conference on software engineering (ICSE), IEEE, pp 541–550
Rong G, Zhang Y, Yang L, Zhang F, Kuang H, Zhang H (2022) Modeling review history for reviewer recommendation: A hypergraph approach. In: Proceedings of the 44th International conference on software engineering, ICSE ’22, association for computing machinery, New York, NY, USA p 1381–1392
Sadowski C, Söderberg E, Church L, Sipko M, Bacchelli A (2018) Modern code review: a case study at google. In: Proceedings of the 40th International conference on software engineering: software engineering in practice, pp 181–190
Sarker J, Turzo AK, Bosu A (2020) A benchmark study of the contemporary toxicity detectors on software engineering interactions. arXiv preprint arXiv:2009.09331
Sarle W (1990) Sas/stat user’s guide: The varclus procedure. sas institute. Inc. Cary, NC, USA
Snow J, Mann M (2013) Qualtrics survey software: handbook for research professionals. Qualtrics Labs, Inc
Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories, IEEE, pp 180–190
Thongtanunam P, Hassan AE (2020) Review dynamics and their impact on software quality. IEEE Trans Software Eng 47(12):2698–2712
Article Google Scholar
Thongtanunam P, McIntosh S, Hassan AE, Iida H (2015) Investigating code review practices in defective files: An empirical study of the qt system. In: 2015 IEEE/ACM 12th Working conference on mining software repositories, IEEE, pp 168–179
Thongtanunam P, McIntosh S, Hassan AE, Iida H (2017) Review participation in modern code review. Empir Softw Eng 22(2):768–817
Article Google Scholar
Thongtanunam P, Pornprasit C, Tantithamthavorn C (2022) Autotransform: automated code transformation to support modern code review process. In: Proceedings of the 44th international conference on software engineering, pp 237–248
Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto Ki (2015) Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd International conference on software analysis, evolution, and reengineering (SANER), IEEE, pp 141–150
Tufan R, Pascarella L, Tufanoy M, Poshyvanykz D, Bavota G (2021) Towards automating code review activities. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE), IEEE, pp 163–174
Tufano R, Masiero S, Mastropaolo A, Pascarella L, Poshyvanyk D, Bavota G (2022) Using pre-trained models to boost code review automation. In: Proceedings of the 44th international conference on software engineering, pp 2291–2302
Yamane T (1973) Statistics: an introductory analysis-3
Zaidman A, Van Rompaey B, Van Deursen A, Demeyer S (2011) Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empir Softw Eng 16:325–364
Article Google Scholar
Zanaty FE, Hirao T, McIntosh S, Ihara A, Matsumoto K (2018) An empirical study of design discussions in code review. In: Proceedings of the 12th ACM/IEEE International symposium on empirical software engineering and measurement, pp 1–10
Zanjani MB, Kagdi H, Bird C (2015) Automatically recommending peer reviewers in modern code review. IEEE Trans Software Eng 42(6):530–543

Download references

Acknowledgements

We thank Hemangi Murdande for her assistance during manual data labeling.

Funding

Work conducted for this research is partially supported by the US National Science Foundation under Grant No. 1850475. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Department of Computer Science, Wayne State University, Detroit, Michigan, USA
Asif Kamal Turzo & Amiangshu Bosu

Authors

Asif Kamal Turzo
View author publications
You can also search for this author in PubMed Google Scholar
Amiangshu Bosu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amiangshu Bosu.

Ethics declarations

Competing interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Communicated by: Raula Kula.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Turzo, A.K., Bosu, A. What makes a code review useful to OpenDev developers? An empirical investigation. Empir Software Eng 29, 6 (2024). https://doi.org/10.1007/s10664-023-10411-x

Download citation

Accepted: 18 October 2023
Published: 22 November 2023
DOI: https://doi.org/10.1007/s10664-023-10411-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What makes a code review useful to OpenDev developers? An empirical investigation