Skip to main content
Log in

Characterizing and identifying reverted commits

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

In practice, a popular and coarse-grained approach for recovering from a problematic commit is to revert it (i.e., undoing the change). However, reverted commits could induce some issues for software development, such as impeding the development progress and increasing the difficulty for maintenance. In order to mitigate these issues, we set out to explore the following central question: can we characterize and identify which commits will be reverted? In this paper, we characterize commits using 27 commit features and build an identification model to identify commits that will be reverted. We first identify reverted commits by analyzing commit messages and comparing the changed content, and extract 27 commit features that can be divided into three dimensions, namely change, developer and message, respectively. Then, we build an identification model (e.g., random forest) based on the extracted features. To evaluate the effectiveness of our proposed model, we perform an empirical study on ten open source projects including a total of 125,241 commits. Our experimental results show that our model outperforms two baselines in terms of AUC-ROC and cost-effectiveness (i.e., percentage of detected reverted commits when inspecting 20% of total changed LOC). In terms of the average performance across the ten studied projects, our model achieves an AUC-ROC of 0.756 and a cost-effectiveness of 0.746, significantly improving the baselines by substantial margins. In addition, we found that “developer” is the most discriminative dimension among the three dimensions of features for the identification of reverted commits. However, using all the three dimensions of commit features leads to better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://hadoop.apache.org/

  2. https://www.gerritcodereview.com/

  3. http://hbase.apache.org/

  4. http://karaf.apache.org/

  5. https://jenkins.io/index.html

  6. http://projects.spring.io/spring-boot/

  7. https://hive.apache.org/

  8. https://projects.eclipse.org/projects/eclipse.platform

  9. http://www.eclipse.org/egit/

  10. https://projects.eclipse.org/projects/eclipse.jdt

  11. http://commit.guru/

  12. http://cran.r-project.org/web/packages/Hmisc/index.html

  13. https://cran.r-project.org/web/packages/rms/rms.pdf

  14. http://cran.r-project.org/web/packages/bigrf/bigrf.pdf

  15. https://cran.r-project.org/web/packages/PRROC/PRROC.pdf

  16. https://cran.r-project.org/web/packages/e1071/e1071.pdf

  17. https://cran.r-project.org/web/packages/DMwR/DMwR.pdf

  18. https://www.microsoft.com/en-us/cognitive-toolkit/

  19. https://pytorch.org/

  20. http://scikit-learn.org/

  21. https://www.djangoproject.com/

References

  • Abdi H (2007) Bonferroni and šidák corrections for multiple comparisons. Encyclopedia of measurement and statistics 3:103–107

    Google Scholar 

  • Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 202–211

  • Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 4–14

  • Boyd K, Costa VS, Davis J, Page CD (2012) Unachievable region in precision-recall space and its effect on empirical evaluation. In: Proceedings of the international conference on machine learning, NIH public access, vol 2012, p 349

  • Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: ACM Sigmod record, vol 29. ACM, pp 93–104

  • Codoban M, Ragavan SS, Dig D, Bailey B (2015) Software history under the lens: a study on why and how developers examine it. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 1–10

  • da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657

    Article  Google Scholar 

  • Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 233–240

  • Fan Y, Xia X, Lo D, Hassan AE (2018a) Chaff from the wheat: characterizing and determining valid bug reports. IEEE transactions on software engineering

  • Fan Y, Xia X, Lo D, Li S (2018b) Early prediction of merged code changes to prioritize reviewing tasks. Empir Softw Eng, pp 1–48

  • Fluri B, Gall HC (2006) Classifying change types for qualifying change couplings. In: 14th IEEE international conference on program comprehension, 2006. ICPC 2006. IEEE, pp 35–45

  • Fluri B, Wuersch M, PInzger M, Gall H (2007) Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng 33 (11):725–743

    Article  Google Scholar 

  • Fu Y, Yan M, Zhang X, Xu L, Yang D, Kymer JD (2015) Automated classification of software change messages by semi-supervised latent dirichlet allocation. Inf Softw Technol 57:369–377

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18

    Article  Google Scholar 

  • Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Hassan AE (2008) Automated classification of change messages in open source projects. In: Proceedings of the 2008 ACM symposium on applied computing. ACM, pp 837–841

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, pp 78–88

  • Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401

  • Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 99–108

  • Huang J, Ling CX (2005) Using auc and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310

    Article  Google Scholar 

  • Huang Q, Shihab E, Xia X, Lo D, Li S (2017) Identifying self-admitted technical debt in open source projects using text mining. Empir Softw Eng, pp 1–34

  • Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering. IEEE Press, pp 279–289

  • Kabinna S, Shang W, Bezemer CP, Hassan AE (2016) Examining the stability of logging statements. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1, pp 326–337

  • Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Kim S, Whitehead Jr EJ, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  • Lampert TA, Gançarski P (2014) The bane of skew. Mach Learn 97(1–2):5–32

    Article  MathSciNet  MATH  Google Scholar 

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  • Li H, Shang W, Zou Y, Hassan AE (2016) Towards just-in-time suggestions for log changes. Empir Softw Eng, pp 1–35

  • Li H, Shang W, Zou Y, Hassan AE (2017) Towards just-in-time suggestions for log changes. Empir Softw Eng 22(4):1831–1865

    Article  Google Scholar 

  • Li H, Chen THP, Shang W, Hassan AE (2018) Studying software logging using topic models. Empir Softw Eng, pp 1–40

  • Long JD, Feng D, Cliff N (2003) Ordinal analysis of behavioral data. Handbook of psychology

  • Macho C, McIntosh S, Pinzger M (2016) Predicting build co-changes with source code change and commit categories. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 541–551

  • Mäntylä MV, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448

    Article  Google Scholar 

  • McCallum A, Nigam K, et al. (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, WI, vol 752, pp 41–48

  • McIntosh S, Adams B, Nagappan M, Hassan AE (2014) Mining co-change information to understand when build changes are necessary. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 241–250

  • Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: icsm, pp 120–130

  • Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180

    Article  Google Scholar 

  • Nam J, Kim S (2015) Clami: defect prediction on unlabeled datasets (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 452–463

  • Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and cohen’s d indices the most appropriate choices. In: Annual meeting of the southern association for institutional research, Citeseer

  • Rosen C, Grawi B, Shihab E (2015) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 966–969

  • Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics, pp 507–512

  • Shimagaki J, Kamei Y, McIntosh S, Pursehouse D, Ubayashi N (2016) Why are commits being reverted?: a comparative study of industrial and open source projects. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 301–311

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: ACM Sigsoft software engineering notes, vol 30. ACM, pp 1–5

  • Souza R, Chavez C, Bittencourt RA (2015) Rapid releases and patch backouts: a software analytics approach. IEEE Softw 32(2):89–96

    Article  Google Scholar 

  • Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 812–823

  • Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18

    Article  Google Scholar 

  • Tao Y, Han D, Kim S (2014) Writing acceptable patches: an empirical study of open source project patches. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 271–280

  • Tian Y, Nagappan M, Lo D, Hassan AE (2015) What are the characteristics of high-rated apps? a case study on free android applications. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 301–310

  • Valdivia Garcia H, Shihab E (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 72–81

  • Wilcoxon F (1992) Individual comparisons by ranking methods. Breakthroughs in statistics, pp 196–202

  • Wolpert DH, Macready WG (1999) An efficient method to estimate bagging’s generalization error. Mach Learn 35(1):41–55

    Article  MATH  Google Scholar 

  • Xia X, Lo D, Qiu W, Wang X, Zhou B (2014) Automated configuration bug report prediction using text mining. In: 2014 IEEE 38th annual computer software and applications conference (COMPSAC). IEEE, pp 107–116

  • Xia X, Lo D, McIntosh S, Shihab E, Hassan AE (2015a) Cross-project build co-change prediction. In: 2015 IEEE 22nd international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 311–320

  • Xia X, Lo D, Shihab E, Wang X, Yang X (2015b) Elblocker: predicting blocking bugs with ensemble imbalance learning. Inf Softw Technol 61:93–106

    Article  Google Scholar 

  • Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016a) Hydra: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42 (10):977–998

    Article  Google Scholar 

  • Xia X, Shihab E, Kamei Y, Lo D, Wang X (2016b) Predicting crashing releases of mobile applications. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement. ACM, p 29

  • Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empir Softw Eng, pp 1–37

  • Yan M, Fu Y, Zhang X, Yang D, Xu L, Kymer JD (2016) Automatically classifying software changes via discriminative topic model: supporting multi-category and cross-project. J Syst Softw 113:296–308

    Article  Google Scholar 

  • Yan M, Fang Y, Lo D, Xia X, Zhang X (2017) File-level defect prediction: unsupervised vs. supervised models. In: 2017 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), IEEE, pp 344–353

  • Yan M, Xia X, Shihab E, Lo D, Yin J, Yang X (2018) Automating change-level self-admitted technical debt determination. IEEE Trans Softw Eng

  • Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 157–168

  • Yoon Y, Myers BA (2012) An exploratory study of backtracking strategies used by developers. In: Proceedings of the 5th international workshop on co-operative and human aspects of software engineering. IEEE Press, pp 138–144

Download references

Acknowledgment

This research was partially supported by the National Key Research and Development Program of China (2018YFB1003904), NSFC Program (No. 61602403) and China Postdoctoral Science Foundation (No. 2017M621931).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Xia.

Additional information

Communicated by: Massimiliano Di Penta

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, M., Xia, X., Lo, D. et al. Characterizing and identifying reverted commits. Empir Software Eng 24, 2171–2208 (2019). https://doi.org/10.1007/s10664-019-09688-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09688-8

Keywords

Navigation