Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Cross-version defect prediction: use historical data, cross-project data, or both?

Abstract

Context

Although a long-running project has experienced many releases, removing defects from a product is still a challenge. Cross-version defect prediction (CVDP) regards project data of prior releases as a useful source for predicting fault-prone modules based on defect prediction techniques. Recent studies have explored cross-project defect prediction (CPDP) that uses the project data from outside a project for defect prediction. While CPDP techniques and CPDP data can be diverted to CVDP, its effectiveness has not been investigated.

Objective

To investigate whether CPDP approaches and CPDP data are useful for CVDP. The investigation also compared the usage of prior release data.

Method

We chose a style of replication of a previous comparative study on CPDP approaches.

Results

Some CPDP approaches could improve the performance of CVDP. The use of the latest prior release was the best choice. If one has no CVDP data, the use of CPDP data for CVDP was found to be effective.

Conclusions

1) Some CPDP approaches could improve CVDP, 2), if one can access project data from the latest release, project data from older releases would not bring clear benefit, and 3) even if one has no CVDP data, appropriate CPDP approaches would be able to deliver quality prediction with CPDP data.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    http://www.spinelis.gr/sw/ckjm

References

  1. Amasaki S (2018) Cross-version defect prediction using cross-project defect prediction approaches. In: Proc. of PROMISE ’18. ACM, pp 32–41

  2. Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: Proc. of SEAA ’15. IEEE, pp 96–103

  3. Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proc. of ISESE ’06. ACM, pp 1–10

  4. Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical evaluation of cross-release effort-aware defect prediction models. In: Proc. of QRS ’16. IEEE, pp 214–221

  5. Bin Y, Zhou K, Lu H, Zhou Y, Xu B (2017) Training data selection for cross-project defection prediction: which approach is better? In: Proc. of ESEM ’17. IEEE, pp 354–363

  6. Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38–67

  7. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

  8. Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720

  9. Broomhead DS, Lowe D (1988) Multivariate functional interpolation and adaptive networks. Complex Syst 2:321–355

  10. Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proc. of ICST ’13. IEEE, pp 252–261

  11. Chen L, Fang B, Shang Z, Tang Y (2015) Negative samples reduction in cross-company software defects prediction. Inf Softw Technol 62(C):67–77

  12. Cheng M, Wu G, Wan H, You G, Yuan M, Jiang M (2016) Exploiting correlation subspace to predict heterogeneous cross-project defects. Int J Softw Eng Knowl Eng 26(09 & 10):1571–1580

  13. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

  14. Cox DR (1958) Two further applications of a model for binary regression. Biometrika 45(3):562–565

  15. D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proc. of MSR ’10. IEEE, pp 31–41

  16. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

  17. Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130

  18. Erika CCA, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proc. of ESEM ’09. IEEE, pp 460–463

  19. Harman M, Islam S, Jia Y, Minku LL, Sarro F, Srivisut K (2014) Less is more: temporal fault predictive performance over multiple hadoop releases. In: Proc. of SSBSE’14. Springer, pp 240–246

  20. He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199

  21. He Z, Peters F, Menzies T, Yang Y (2013) Learning from open-source projects: an empirical study on defect prediction. In: Proc. of ESEM ’13. IEEE, pp 45–54

  22. He P, Li B, Ma Y (2014) Towards cross-project defect prediction with imbalanced feature sets. CoRR 1411.4228

  23. He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190

  24. Herbold S (2013) Training data selection for cross-project defect prediction. In: Proc. of PROMISE ’13. ACM, pp 6:1–6:10

  25. Herbold S (2015) CrossPare: a tool for benchmarking cross-project defect predictions. In: Proc. of ASEW ’15. IEEE, pp 90–96

  26. Herbold S, Trautsch A, Grabowski J (2017) A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Softw Eng, 1–25. https://doi.org/10.1109/TSE.2018.2790413

  27. Herbold S, Trautsch A, Grabowski J (2018) Correction of ”a comparative study to benchmark cross-project defect prediction approaches”. IEEE Trans Softw Eng, 1–5. https://doi.org/10.1109/TSE.2018.2790413

  28. Herzig K, Just S, Rau A, Zeller A (2013) Predicting defects using change genealogies. In: Proc. of ISSRE ’13. IEEE, pp 118–127

  29. Holschuh T, Pauser M, Herzig K, Zimmermann T, Premraj R, Zeller A (2009) Predicting defects in sap java code: An experience report. In: Proc. of ICSE ’09 - companion volume. IEEE, pp 172–181

  30. Hosseini S, Turhan B, Mäntylä M (2018) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol 95:296–312

  31. Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proc. of ESEC/FSE ’15. ACM, pp 496–507

  32. Jing XY, Wu F, Dong X, Xu B (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339

  33. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proc. of PROMISE ’10. ACM, pp 9:1–9:10

  34. Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: Proc. of ACIT-CSI ’15, pp 2–7

  35. Khoshgoftaar TM, Seliya N (2003) Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir Softw Eng 8:3

  36. Khoshgoftaar TM, Rebours P, Seliya N (2009) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49

  37. Li Z, Jing XY, Zhu X, Zhang H, Xu B, Ying S (2017) On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans Softw Eng, 1–21

  38. Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864

  39. Lu H, Kocaguneli E, Cukic B (2014) Defect prediction between software versions with active learning and dimensionality reduction. In: Proc. of ISSRE ’14. IEEE, pp 312–322

  40. Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256

  41. Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J 23(3):1–30

  42. Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local versus global models for effort estimation and defect prediction. In: Proc. of ASE ’11. IEEE, pp 343–351

  43. Monden A, Hayashi T, Shinoda S, Shirai K, Yoshida J, Barker M, Matsumoto K (2013) Assessing the cost effectiveness of fault prediction in acceptance testing. IEEE Trans Softw Eng 39(10):1345–1357

  44. Nam J, Kim S (2015) CLAMI: defect prediction on unlabeled datasets. In: Proc. of ASE ’15. IEEE, pp 452–463

  45. Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proc. of ICSE ’13. IEEE, pp 382–391

  46. Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE Trans Softw Eng 44(9):874–896

  47. Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’Union fait la force. In: Proc. of CSMR-WCRE ’14. IEEE, pp 164–173

  48. Peters F, Menzies T (2012) Privacy and utility for defect prediction: experiments with MORPH. In: Proc. of ICSE ’12. IEEE, pp 189–199

  49. Peters F, Menzies T, Gong L, Zhang H (2013a) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068

  50. Peters F, Menzies T, Marcus A (2013b) Better cross company defect prediction. In: MSR ’13: 10th IEEE working conference on mining software repositories. IEEE, pp 409–418

  51. Peters F, Menzies T, Layman L (2015) LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proc. of ICSE ’15. IEEE, pp 801–811

  52. Premraj R, Herzig K (2011) Network versus code metrics to predict defects: a replication study. In: Proc. of ESEM ’11. IEEE, pp 215–224

  53. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc

  54. Rahman F, Posnett D, Devanbu P (2012) Recalling the ”imprecision” of cross-project defect prediction. In: Proc. of ESEC/FSE ’12. ACM, pp 61:1–61:11

  55. Rana R, Staron M, Berger C, Hansson J, Nilsson M, Meding W (2014) The adoption of machine learning techniques for software defect prediction: an initial industrial validation. In: Proc. of joint conference on knowledge-based software engineering. Springer, pp 270–285

  56. Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):1–29

  57. Ryu D, Jang JI, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980

  58. Sarro F, Di Martino S, Ferrucci F, Gravino C (2012) A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In: Proc. of SAC ’12. ACM, pp 1215–1220

  59. Shepperd MJ, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215

  60. Tosun A, Bener A, Turhan B, Menzies T (2010) Practical considerations in deploying statistical methods for defect prediction: a case study within the Turkish telecommunications industry. Inf Softw Technol 52(11):1242–1257

  61. Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empir Softw Eng 17(1–2):62–74

  62. Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14 (5):540–578

  63. Turhan B, Tosun AM, Bener AB (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118

  64. Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to cross-project fault prediction. In: Proc. of SNPD ’12. IEEE, pp 476–481

  65. Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proc. of PROMISE ’08. ACM, pp 19–24

  66. Wu R, Zhang H, Kim S, Cheung SC (2011) ReLink: recovering links between bugs and changes. In: Proc. of ESEC/FSE ’11. ACM, pp 15–25

  67. Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) HYDRA: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42 (10):977–998

  68. Xu Z, Li S, Tang Y, Luo X, Zhang T, Liu J, Xu J (2018a) Cross version defect prediction with representative data via sparse subset selection. In: Proc. of ICPC ’18. ACM, pp 1–12

  69. Xu Z, Liu J, Luo X, Zhang T (2018b) Cross-version defect prediction via hybrid active learning with kernel principal component analysis. In: Proc. of SANER ’18. IEEE, pp 209–220

  70. Yu Q, Jiang S, Zhang Y (2017) A feature matching and transfer approach for cross-company defect prediction. J Syst Softw 132:366–378

  71. Yu X, Wu M, Jian Y, Bennin KE, Fu M, Ma C (2018) Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning. Soft Comput 22(10):1–12

  72. Zhang Y, Lo D, Xia X, Sun J (2015) An Empirical Study of Classifier Combination for Cross-Project Defect Prediction. In: Proc. of COMPSAC ’15. IEEE, pp 264–269

  73. Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proc. of ICSE ’16. ACM, pp 309–320

  74. Zhang Y, Lo D, Xia X, Sun J (2018) Combined classifier for cross-project defect prediction: an extended empirical study. Front Comput Sci 12(2):280–296

  75. Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Zhou Y, Xu B (2017) Understanding the value of considering client usage context in package cohesion for fault-proneness prediction. Autom Softw Eng 24(2):393–453

  76. Zhou Y, Yang Y, Lu H, Chen L, Li Y, Zhao Y, Qian J, Xu B (2018) How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Trans Softw Eng Methodol 27(1):1–51

  77. Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proc. of ESEC/FSE ’09. ACM, pp 91–100

Download references

Acknowledgments

This work was partially supported by JSPS KAKENHI under Grant No. 18K11246.

Author information

Correspondence to Sousuke Amasaki.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Predictive Models and Data Analytics in Software Engineering (PROMISE)

Communicated by: Shane McIntosh, Leandro L. Minku, Ayşe Tosun, and Burak Turhan

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Amasaki, S. Cross-version defect prediction: use historical data, cross-project data, or both?. Empir Software Eng (2020). https://doi.org/10.1007/s10664-019-09777-8

Download citation

Keywords

  • Cross-version defect prediction
  • Cross-project defect prediction
  • Comparative study