Skip to main content
Log in

Early prediction of merged code changes to prioritize reviewing tasks

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Modern Code Review (MCR) has been widely used by open source and proprietary software projects. Inspecting code changes consumes reviewers much time and effort since they need to comprehend patches, and many reviewers are often assigned to review many code changes. Note that a code change might be eventually abandoned, which causes waste of time and effort. Thus, a tool that predicts early on whether a code change will be merged can help developers prioritize changes to inspect, accomplish more things given tight schedule, and not waste reviewing effort on low quality changes. In this paper, motivated by the above needs, we build a merged code change prediction tool. Our approach first extracts 34 features from code changes, which are grouped into 5 dimensions: code, file history, owner experience, collaboration network, and text. And then we leverage machine learning techniques such as random forest to build a prediction model. To evaluate the performance of our approach, we conduct experiments on three open source projects (i.e., Eclipse, LibreOffice, and OpenStack), containing a total of 166,215 code changes. Across three datasets, our approach statistically significantly improves random guess classifiers and two prediction models proposed by Jeong et al. (2009) and Gousios et al. (2014) in terms of several evaluation metrics. Besides, we also study the important features which distinguish merged code changes from abandoned ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://code.google.com/p/gerrit/

  2. For more details, please refer to Table 1.

  3. Subsystem is defined in Section 3.3.

  4. https://review.openstack.org/#/c/262127/

  5. https://gerrit.wikimedia.org/r/Documentation/rest-api.html

  6. http://networkx.readthedocs.io/en/stable/

  7. Cliff defines a delta of less than 0.147, between 0.147 to 0.33, between 0.33 and 0.474, and above 0.474 as negligible, small, medium, and large effect size respectively.

  8. https://cran.r-project.org/web/packages/Hmisc/Hmisc.pdf

  9. https://cran.r-project.org/web/packages/rms/rms.pdf

  10. https://github.com/aloysius-lim/bigrf

  11. https://cran.r-project.org/web/packages/ScottKnott/ScottKnott.pdf

  12. https://git.eclipse.org/r/#/c/73252/

  13. https://git.eclipse.org/r/#/c/58215/

  14. https://review.openstack.org/#/c/242578/

  15. https://git.eclipse.org/r/#/c/59345/

  16. https://git.eclipse.org/r/#/c/7424/

  17. https://review.openstack.org/#/c/172781/

  18. https://review.openstack.org/#/c/261950/

  19. https://gerrit.libreoffice.org/#/c/15274/

  20. https://review.openstack.org/#/c/172781/

  21. https://review.openstack.org/#/c/261950/

References

  • Abdi H (2007) Bonferroni and Šidák corrections for multiple comparisons. Encycl Meas Stat 3:103–107

    Google Scholar 

  • Ackerman A F, Fowler P J, Ebenau RG (1984) Software inspections and the industrial production of software. In: Proceedings of a symposium on software validation: inspection-testing-verification-alternatives. Elsevier North-Holland Inc., pp 13–40

  • Arisholm E, Briand L C, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: The 18th IEEE international symposium on software reliability, 2007. ISSRE’07. IEEE, pp 215–224

  • Aurum A, Petersson H, Wohlin C (2002) State-of-the-art: software inspections after 25 years. Software Testing. Verif and Reliab 12(3):133–154

    Article  Google Scholar 

  • Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 712–721

  • Bao L, Xing Z, Xia X, Lo D, Li S (2017) Who will leave the company?: a large-scale industry study of developer turnover by mining monthly work report. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 170–181

  • Baysal O, Kononenko O, Holmes R, Godfrey M W (2013) The influence of non-technical factors on code review. In: 2013 20th working conference on reverse engineering (WCRE). IEEE, pp 122–131

  • Bhattacharya P, Neamtiu I (2010) Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Cliff N (2014) Ordinal methods for behavioral data analysis. Psychology Press, New York

    Book  Google Scholar 

  • Costa C, Figueiredo J, Sarma A, Murta L (2016) TIPMerge: recommending developers for merging branches. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 998–1002

  • DeGroot MH, Schervish MJ (2012) Probability and statistics. Pearson Education, Boston

    Google Scholar 

  • Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence. Lawrence Erlbaum Associates Ltd, vol 17, pp 973–978

  • Fagan M E (2001) Design and code inspections to reduce errors in program development. In: Pioneers and their contributions to software engineering. Springer, Berlin, pp 301–334

    Chapter  Google Scholar 

  • Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using Bayesian nets. Inf Softw Technol 49(1):32–43

    Article  Google Scholar 

  • Gonzalez-Barahona J M, Izquierdo-Cortazar D, Robles G, del Castillo A (2014) Analyzing gerrit code review parameters with bicho. Electron Commun EASST

  • Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. ACM, pp 345– 355

  • Gousios G, Zaidman A, Storey M A, Van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 358–368

  • Graves T L, Karr A F, Marron J S, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Grbac T G, Mausa G, Basic B D (2013) Stability of software defect prediction in relation to levels of data imbalance. In: SQAMIA, pp 1–10

  • Hall M A, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The WEKA data mining software: an update. Sigkdd Explor 11(1):10–18

    Article  Google Scholar 

  • Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  • He H, Garcia E A (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  • Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401

  • Huang J, Ling C X (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310

    Article  Google Scholar 

  • Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170

  • Jeong G, Kim S, Zimmermann T, Yi K (2009) Improving code review by predicting reviewers and acceptance of patches. In: Research on software analysis for error-free computing center Tech-Memo (ROSAEC MEMO 2009-006), pp 1–18

  • Jiang T, Tan L, Kim S (2013a) Personalized defect prediction. In: 2013 IEEE/ACM 28th international conference on automated software engineering (ASE). IEEE, pp 279–289

  • Jiang Y, Adams B, German D M (2013b) Will my patch make it? and how fast? Case study on the linux kernel. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 101– 110

  • Kamei Y, Shihab E, Adams B, Hassan A E, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Khoshgoftaar T M, Geleyn E, Nguyen L, Bullard L (2002) Cost-sensitive boosting in software quality modeling. In: 7th IEEE international symposium on high assurance systems engineering, 2002. Proceedings. IEEE, pp 51–60

  • Kim S, Whitehead E J, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  • Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey M W (2015) Investigating code review quality: Do people and participation matter?. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 111–120

  • Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 1–10

  • Leßenich O, Siegmund J, Apel S, Kästner C, Hunsen C (2016) Indicators for merge conflicts in the wild: survey and empirical study. Autom Softw Eng 1–35. https://doi.org/10.1007/s10515-017-0227-0

    Article  Google Scholar 

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  • Liu X Y, Zhou Z H (2006) The influence of class imbalance on cost-sensitive learning: an empirical study. In: Sixth international conference on data mining, 2006. ICDM’06. IEEE, pp 970–974

  • Liu M, Miao L, Zhang D (2014) Two-stage cost-sensitive learning for software defect prediction. IEEE Trans Reliab 63(2):676–686

    Article  Google Scholar 

  • Mann H B, Whitney D R (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60

    Article  MathSciNet  Google Scholar 

  • Matsumoto S, Kamei Y, Monden A, Ki Matsumoto, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: Proceedings of the 6th international conference on predictive models in software engineering. ACM, p 18

  • McIntosh S, Kamei Y, Adams B, Hassan A E (2014) The impact of code review coverage and code review participation on software quality: a case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 192–201

  • Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the 5th international conference on predictor models in software engineering. ACM, p 7

  • Mockus A, Weiss D M (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180

    Article  Google Scholar 

  • Mukadam M, Bird C, Rigby P C (2013) Gerrit software code review data from android. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 45–48

  • Rahman F, Posnett D, Devanbu P (2012) Recalling the imprecision of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM, p 61

  • Rajbahadur G K, Wang S, Kamei Y, Hassan A E (2017) The impact of using regression models to build defect classifiers. In: Proceedings of the 14th international conference on mining software repositories. IEEE Press, pp 135–145

  • Ratzinger J, Pinzger M, Gall H (2007) EQ-Mine:predicting short-term defects for software evolution. In: International conference on fundamental approaches to software engineering. Springer, Berlin, pp 12–26

  • Rigby P C, German D M (2006) A preliminary examination of code review processes in open source projects. Tech. rep., Technical Report DCS-305-IR, University of Victoria

  • Rigby P C, German D M, Storey M A (2008) Open source software peer review practices: a case study of the apache server. In: Proceedings of the 30th international conference on Software engineering. ACM, pp 541–550

  • Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. In: 2011 27th IEEE international conference on software maintenance (ICSM). IEEE, pp 303–312

  • Scott A J, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30(3):507–512

    Article  Google Scholar 

  • Shimagaki J, Kamei Y, McIntosh S, Hassan A E, Ubayashi N (2016) A study of the quality-impacting practices of modern code review at sony mobile. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 212–221

  • Shull F, Seaman C (2008) Inspecting the history of inspections: an example of evidence-based technology diffusion. IEEE Softw 25(1):88–90. https://doi.org/10.1109/MS.2008.7

    Article  Google Scholar 

  • Tamrawi A, Nguyen T T, Al-Kofahi J M, Nguyen T N (2011) Fuzzy set and cache-based approach for bug triaging. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, pp 365–375

  • Thongtanunam P, McIntosh S, Hassan A E, Iida H (2016) Review participation in modern code review. Empir Softw Eng 1–50

  • Tian Y, Nagappan M, Lo D, Hassan A E (2015) What are the characteristics of high-rated apps? A case study on free android applications. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 301–310

  • Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in GitHub. In: Proceedings of the 36th international conference on Software engineering. ACM, pp 356–366

  • Upton G J (1992) Fisher’s exact test. J R Stat Soc A Stat Soc 155(3):395–402

    Article  Google Scholar 

  • Votta L G (1993) Does every inspection need a meeting? ACM SIGSOFT Softw Eng Notes 18(5):107–114

    Article  Google Scholar 

  • Weiss G M, McCarthy K, Zabar B (2007) Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? DMIN 7:35–41

    Google Scholar 

  • Weißgerber P, Neu D, Diehl S (2008) Small patches get in! In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 67–76

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biol Bull 1 (6):80–83

    Google Scholar 

  • Wolpert D H, Macready W G (1999) An efficient method to estimate bagging’s generalization error. Mach Learn 35(1):41–55

    Article  Google Scholar 

  • Xia X, Lo D, Shihab E, Wang X, Yang X (2015a) Elblocker: predicting blocking bugs with ensemble imbalance learning. Inf Softw Technol 61:93–106

    Article  Google Scholar 

  • Xia X, Lo D, Wang X, Yang X (2015b) Who should review this change?: Putting text and file location analyses together for more accurate recommendations. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 261–270

  • Xia X, Lo D, Pan S J, Nagappan N, Wang X (2016a) Hydra: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42 (10):977–998

    Article  Google Scholar 

  • Xia X, Lo D, Wang X, Yang X (2016b) Collective personalized change classification with multiobjective search. IEEE Trans Reliab 65(4):1810–1829

    Article  Google Scholar 

  • Xia X, Lo D, Ding Y, Al-Kofahi J M, Nguyen TN, Wang X (2017) Improving automated bug triaging with specialized topic model. IEEE Trans Softw Eng 43(3):272–297

    Article  Google Scholar 

  • Yang X, Kula R G, Yoshida N, Iida H (2016) Mining the modern code review repositories: a dataset of people, process and product. In: Proceedings of the 13th international conference on mining software repositories. ACM, pp 460–463

  • Zanetti M S, Scholtes I, Tessone C J, Schweitzer F (2013) Categorizing bugs with social networks: a case study on four open source software communities. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 1032–1041

  • Zhang Y, Lo D, Xia X, Xu B, Sun J, Li S (2015) Combining software metrics and text features for vulnerable file prediction. In: 2015 20th international conference on engineering of complex computer systems (ICECCS). IEEE, pp 40–49

  • Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by NSFC Program (No. 61602403 and 61572426).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Xia.

Additional information

Communicated by: Sven Apel

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, Y., Xia, X., Lo, D. et al. Early prediction of merged code changes to prioritize reviewing tasks. Empir Software Eng 23, 3346–3393 (2018). https://doi.org/10.1007/s10664-018-9602-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9602-0

Keywords

Navigation