Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction

  • Wenzhou Zhang
  • Weiwei Li
  • Xiuyi JiaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11440)


In recent years, just-in-time (JIT) defect prediction has gained considerable interest as it enables developers to identify risky changes at check-in time. Previous studies tried to conduct research from both supervised and unsupervised perspectives. Since the label of change is hard to acquire, it would be more desirable for applications if a prediction model doesn’t highly rely on the label information. However, the performance of the unsupervised models proposed by previous work isn’t good in terms of precision and F1 due to the lack of supervised information. To overcome this weakness, we try to study the JIT defect prediction from the semi-supervised perspective, which only requires a few labeled data for training. In this paper, we propose an Effort-Aware Tri-Training (EATT) semi-supervised model for JIT defect prediction based on sample selection. We compare EATT with the state-of-the-art supervised and unsupervised models with respect to different labeled rates. The experimental results on six open-source projects demonstrate that EATT performs better than existing supervised and unsupervised models for effort-aware JIT defect prediction.


Defect prediction Just-in-time Tri-training Effort-aware 



This paper is supported by the National Natural Science Foundations of China (Grant Nos. 61773208, 71671086), the Natural Science Foundation of Jiangsu Province (Grant No. BK20170809) and the China Postdoctoral Science Foundation (Grant No. 2018YFB1003902).


  1. 1.
    Angluin, D., Laird, P.D.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1987)Google Scholar
  2. 2.
    Arshad, A., Riaz, S., Jiao, L., Murthy, A.: Semi-supervised deep fuzzy c-mean clustering for software fault prediction. IEEE Access 6, 25675–25685 (2018)CrossRefGoogle Scholar
  3. 3.
    Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of COLT, pp. 92–100 (1998)Google Scholar
  4. 4.
    Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2006)CrossRefGoogle Scholar
  5. 5.
    Chen, X., Zhao, Y., Wang, Q., Yuan, Z.: MULTI: multi-objective effort-aware just-in-time software defect prediction. Inf. Softw. Tech. 93, 1–13 (2018)CrossRefGoogle Scholar
  6. 6.
    Fu, W., Menzies, T.: Revisiting unsupervised learning for defect prediction. In: ESEC/FSE, pp. 72–83 (2017)Google Scholar
  7. 7.
    Hata, H., Mizuno, O., Kikuno, T.: Bug prediction based on fine-grained module histories. In: ICSE, pp. 200–210 (2012)Google Scholar
  8. 8.
    Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: ICSME, pp. 159–170 (2017)Google Scholar
  9. 9.
    Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J., Devine, L.: Exploring methods for evaluating group differences on the NSSE and other surveys: are the t-test and Cohen’s d indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research (2006)Google Scholar
  10. 10.
    Jiang, Y., Li, M., Zhou, Z.: Software defect detection with rocus. J. Comput. Sci. Technol. 26(2), 328–342 (2011)CrossRefGoogle Scholar
  11. 11.
    Kamei, Y., Fukushima, T., McIntosh, S., Yamashita, K., Ubayashi, N., Hassan, A.E.: Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21(5), 2072–2106 (2016)CrossRefGoogle Scholar
  12. 12.
    Kamei, Y., et al.: A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39(6), 757–773 (2013)CrossRefGoogle Scholar
  13. 13.
    Li, M., Zhang, H., Wu, R., Zhou, Z.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)CrossRefGoogle Scholar
  14. 14.
    Li, W., Huang, Z., Li, Q.: Three-way decisions based software defect prediction. Knowl.-Based Syst. 91, 263–274 (2016)CrossRefGoogle Scholar
  15. 15.
    Li, Z., Jing, X., Zhu, X.: Progress on approaches to software defect prediction. IET Softw. 12(3), 161–175 (2018)CrossRefGoogle Scholar
  16. 16.
    Liu, J., Zhou, Y., Yang, Y., Lu, H., Xu, B.: Code churn: a neglected metric in effort-aware just-in-time defect prediction. In: ESEM, pp. 11–19 (2017)Google Scholar
  17. 17.
    Lu, H., Cukic, B., Culp, M.V.: An iterative semi-supervised approach to software fault prediction. In: PROMISE, pp. 15:1–15:10 (2011)Google Scholar
  18. 18.
    Lu, H., Cukic, B., Culp, M.V.: Software defect prediction using semi-supervised learning with dimension reduction. In: ASE, pp. 314–317 (2012)Google Scholar
  19. 19.
    Mockus, A., Weiss, D.M.: Predicting risk of software changes. Bell Labs Tech. J. 5(2), 169–180 (2000)CrossRefGoogle Scholar
  20. 20.
    Song, Q., Jia, Z., Shepperd, M.J., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37(3), 356–370 (2011)CrossRefGoogle Scholar
  21. 21.
    Yang, X., Lo, D., Xia, X., Sun, J.: TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Tech. 87, 206–220 (2017)CrossRefGoogle Scholar
  22. 22.
    Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: QRS, pp. 17–26 (2015)Google Scholar
  23. 23.
    Yang, Y., et al.: Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: FSE, pp. 157–168 (2016)Google Scholar
  24. 24.
    Zhang, Z., Jing, X., Wang, T.: Label propagation based semi-supervised learning for software defect prediction. Autom. Softw. Eng. 24(1), 47–69 (2017)CrossRefGoogle Scholar
  25. 25.
    Zhou, Z., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)CrossRefGoogle Scholar
  26. 26.
    Zhou, Z., Li, M.: Semi-supervised learning by disagreement. Knowl. Inf. Syst. 24(3), 415–439 (2010)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Zhu, X.: Semi-supervised learning. In: Encyclopedia of Machine Learning and Data Mining, pp. 1142–1147 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringNanjing University of Science and TechnologyNanjingChina
  2. 2.College of AstronauticsNanjing University of Aeronautics and AstronauticsNanjingChina
  3. 3.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations