Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction
In recent years, just-in-time (JIT) defect prediction has gained considerable interest as it enables developers to identify risky changes at check-in time. Previous studies tried to conduct research from both supervised and unsupervised perspectives. Since the label of change is hard to acquire, it would be more desirable for applications if a prediction model doesn’t highly rely on the label information. However, the performance of the unsupervised models proposed by previous work isn’t good in terms of precision and F1 due to the lack of supervised information. To overcome this weakness, we try to study the JIT defect prediction from the semi-supervised perspective, which only requires a few labeled data for training. In this paper, we propose an Effort-Aware Tri-Training (EATT) semi-supervised model for JIT defect prediction based on sample selection. We compare EATT with the state-of-the-art supervised and unsupervised models with respect to different labeled rates. The experimental results on six open-source projects demonstrate that EATT performs better than existing supervised and unsupervised models for effort-aware JIT defect prediction.
KeywordsDefect prediction Just-in-time Tri-training Effort-aware
This paper is supported by the National Natural Science Foundations of China (Grant Nos. 61773208, 71671086), the Natural Science Foundation of Jiangsu Province (Grant No. BK20170809) and the China Postdoctoral Science Foundation (Grant No. 2018YFB1003902).
- 1.Angluin, D., Laird, P.D.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1987)Google Scholar
- 3.Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of COLT, pp. 92–100 (1998)Google Scholar
- 6.Fu, W., Menzies, T.: Revisiting unsupervised learning for defect prediction. In: ESEC/FSE, pp. 72–83 (2017)Google Scholar
- 7.Hata, H., Mizuno, O., Kikuno, T.: Bug prediction based on fine-grained module histories. In: ICSE, pp. 200–210 (2012)Google Scholar
- 8.Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: ICSME, pp. 159–170 (2017)Google Scholar
- 9.Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J., Devine, L.: Exploring methods for evaluating group differences on the NSSE and other surveys: are the t-test and Cohen’s d indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research (2006)Google Scholar
- 16.Liu, J., Zhou, Y., Yang, Y., Lu, H., Xu, B.: Code churn: a neglected metric in effort-aware just-in-time defect prediction. In: ESEM, pp. 11–19 (2017)Google Scholar
- 17.Lu, H., Cukic, B., Culp, M.V.: An iterative semi-supervised approach to software fault prediction. In: PROMISE, pp. 15:1–15:10 (2011)Google Scholar
- 18.Lu, H., Cukic, B., Culp, M.V.: Software defect prediction using semi-supervised learning with dimension reduction. In: ASE, pp. 314–317 (2012)Google Scholar
- 22.Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: QRS, pp. 17–26 (2015)Google Scholar
- 23.Yang, Y., et al.: Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: FSE, pp. 157–168 (2016)Google Scholar
- 27.Zhu, X.: Semi-supervised learning. In: Encyclopedia of Machine Learning and Data Mining, pp. 1142–1147 (2017)Google Scholar