An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation

Amasaki, Sousuke; Aman, Hirohisa; Yokogawa, Tomoyuki

doi:10.1007/s10664-021-10103-4

An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation

Published: 27 January 2022

Volume 27, article number 46, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

457 Accesses
4 Citations
Explore all metrics

Abstract

Software effort estimation (SEE) models have been studied for decades. One of serious but typical situations for data-oriented models is the availability of datasets for training models. Cross-company software effort estimation (CCSEE) is a research topic to utilize project data outside an organization. The same problem was identified in software defect prediction research, and Cross-project defect prediction (CPDP) has been studied. CPDP and CCSEE shared the motivation and developed approaches for the same problem. A question arisen that CPDP approaches were applicable to CCSEE. This study explored this question with a survey and an empirical study focusing on homogeneous approaches. We first examined 24 homogeneous CPDP approach implementations provided in a CPDP framework and found more than half of the approaches were applicable. Next, we used the results of two past studies to check whether the applicable CPDP approaches covered strategies taken in past CPDP studies. The empirical experiment was then conducted to evaluate the estimation performance of the applicable CPDP approaches. CPDP approaches were configured with some machine learning techniques if available, and ten CCSEE dataset configurations were supplied for evaluation. The result was also compared with the results of those two past studies to find the commonalities and the differences between CPDP and CCSEE. The answers to the research questions revealed the our CPDP selection covered the strategies that CPDP approaches had taken. The empirical results supported the simple merge with no instance weighting, no feature selection, no data transformation, nor the simple voting. It was not necessarily surprising according CPDP and CCSEE studies, but had not been explored with CPDP approaches under CCSEE situation. A practical implication is: Combine cross-company data to make effort estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Article 03 January 2021

Ensemble Based-Cross Project Defect Prediction

An effective feature selection based cross-project defect prediction model for software quality improvement

Article 11 January 2023

References

Amasaki S, Kawata K, Yokogawa T (2015) Improving Cross-Project defect prediction methods with data simplification. In: Proceedings of SEAA ’15. IEEE, pp 96–103
Amasaki S, Aman H, Yokogawa T (2020) An exploratory study on applicability of cross project defect prediction approaches to cross-company effort estimation. In: Proceedings of PROMISE, Association for Computing Machinery, pp 71–80
Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical evaluation of cross-release effort-aware defect prediction models. In: Proceedings of International Conference on Software Quality, Reliability and Security, pp 214–221
Bin Y, Zhou K, Lu H, Zhou Y, Xu B (2017) Training data selection for cross-project defection prediction: Which approach is better?. In: Proceedings of International Symposium on Empirical Software Engineering and Measurement, ACM, pp 354–363
Boehm B (1981) Software engineering economics. Prentice-Hall
Briand LC, Langley T, Wieczorek I (2000) A replicated assessment and comparison of common software cost modeling techniques. In: Proceedings of ICSE. IEEE, pp 377–386
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective Cross-Project Defect Prediction. In: Proceedings of ICST ’13. IEEE, pp 252–261
Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: 2019 IEEE/ACM 6Th international conference on mobile software engineering and systems (MOBILESoft). https://doi.org/10.1109/MOBILESoft.2019.00023, pp 99–110
de Cabral A, JTH Oliveira AL (2021) Ensemble effort estimation using dynamic selection. J Syst Softw 175:110904
Article Google Scholar
Desharnais JM (1989) Analyse statistique de la producivité des projets de développment en informatique à partir de la technique des points de fonction. Master’s thesis, Université du Québec à Montréal
Erika CCA, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of ESEM ’09. IEEE, pp 460–463
Ferrucci F, Gravino C (2019) Can expert opinion improve effort predictions when exploiting cross-company datasets? - a case study in a small/medium company. In: Proceedings of Product-Focused Software Process Improvement. Springer, pp 280–295
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995
Article Google Scholar
Frank E, Hall MA, Witten IH (2016) The weka workbench. online appendix for “data mining: Practical machine learning tools and techniques” https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Article Google Scholar
He Z, Peters F, Menzies T, Yang Y (2013) Learning from Open-Source projects: an empirical study on defect prediction. In: Proceedings of ESEM ’13. IEEE, pp 45–54
Herbold S (2013) Training data selection for cross-project defect prediction. In: Proceedings of PROMISE ’13. ACM, New York, pp 6:1–6:10
Herbold S, Trautsch A, Grabowski J (2018) A comparative study to benchmark Cross-Project defect prediction approaches. IEEE Trans Softw Eng 44(9):811–833
Article Google Scholar
Hosni M, Idri A, Abran A, Nassif AB (2018) On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft Comput 22(18):5977–6010
Article Google Scholar
Hosseini S, Turhan B, Gunarathna D (2019) A systematic literature review and Meta-Analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147
Article Google Scholar
Huang L, Port D, Wang L, Xie T, Menzies T (2010) Text mining in supporting software systems risk assurance. In: Proceedings of International Conference on Automated Software Engineering, ASE ’10. ACM, pp 163–166
Idri A, Hosni M, Abran A (2016) Systematic literature review of ensemble effort estimation. J Syst Softw 118:151–175
Article Google Scholar
Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of FSE ’15. ACM, pp 496–507
Jing X, Qi F, Wu F, Xu B (2016) Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation. In: Proceedings of ICSE. IEEE, pp 607–618
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(6):2072–2106. https://doi.org/10.1007/s10664-015-9400-x
Article Google Scholar
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: Proceedings of ACIT-CSI ’15. Springer, pp 2–7
Kemerer CF (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429
Article Google Scholar
Khoshgoftaar TM, Rebours P, Seliya N (2009) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49
Article Google Scholar
Kitchenham B, Kansala K (1993) Inter-item correlations among function points. In: Proceedings of METRICS ’93, pp 11–14
Kitchenham B, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: Proceedings of PROMISE, p 5
Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus Within-Company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329
Article Google Scholar
Kocaguneli E, Menzies T (2011) How to Find Relevant Data for Effort Estimation? In: Proceedings of ESEM ’11. IEEE pp 255–264
Kocaguneli E, Cukic B, Menzies T, Lu H (2013a) Building a second opinion: learning cross-company data. In: Proceedings of ESEM ’13. ACM, pp 1–10
Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013b) Active learning and effort estimation: Finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053
Article Google Scholar
Kocaguneli E, Menzies T, Mendes E (2015) Transfer learning in effort estimation. Empir Softw Eng 20(3):813–843
Article Google Scholar
Krishna R, Menzies T (2019) Bellwethers: A baseline method for transfer learning. IEEE Trans Softw Eng 45(11):1081–1105
Article Google Scholar
Langdon WB, Dolado J, Sarro F, Harman M (2016) Exact mean absolute error of baseline predictor, MARP0. Inf Softw Technol 73:16–18
Article Google Scholar
Li Z, Jing XY, Zhu X, Zhang H, Xu B, Ying S (2019) On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans Softw Eng 45(4):391–411
Article Google Scholar
Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864
Article Google Scholar
Lokan C, Mendes E (2006) Cross-company: Single-company effort models using the ISBSG database A further replicated study. In: Proceedings of ISESE ’06, vol 2006, pp 75–84E
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Article Google Scholar
Maxwell KD (2002) Applied statistics for software managers. Prentice Hall
Mendes E, Kitchenham B (2004) Further comparison of cross-company and within-company effort estimation models for Web applications. In: Proceedings of METRIC ’04, pp 348–357
Mendes E, Lokan C (2009) Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study. In: Proceedings of EASE. ACM, pp 11–20
Mendes E, Di Martino S, Ferrucci F, Gravino C (2008) Cross-company vs. single-company web effort models using the Tukutuku database: An extended study. J Syst Softw 81(5):673–690
Article Google Scholar
Mendes E, Kalinowski M, Martins D, Ferrucci F, Sarro F (2014) Cross-vs. Within-company cost estimation studies revisited: An extended systematic review. In: Proceedings of EASE. ACM
Mensah S, Keung J, Bosu MF, Bennin KE (2018a) Duplex output software effort estimation model with self-guided interpretation. Inf Softw Technol 94:1–13
Article Google Scholar
Mensah S, Keung J, MacDonell SG, Bosu MF, Bennin KE (2018b) Investigating the significance of the bellwether effect to improve software effort prediction: Further empirical study. IEEE Trans Reliab 67(3):1176–1198
Article Google Scholar
Menzies T, Chen Z, Hihn J, Lum K (2006a) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):883–895
Article Google Scholar
Menzies T, Chen Z, Hihn J, Lum K (2006b) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):883–895
Article Google Scholar
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local versus global models for effort estimation and defect prediction. In: Proceedings of ASE ’11. IEEE, pp 343–351
Menzies T, Yang Y, Mathew G, Boehm B, Hihn J (2017) Negative results for software effort estimation. Empir Softw Eng 22(5):2658–2683
Article Google Scholar
Minku LL, Hou S (2017) Clustering dycom. In: Proceedings of PROMISE ’17. ACM, pp 12–21
Minku LL, Yao X (2012) Can Cross-company Data Improve Performance in Software Effort Estimation?. In: Proceedings of PROMISE ’12. ACM, pp 69–78
Minku LL, Yao X (2013) Ensembles and locality: Insight on improving software effort estimation. Inf Softw Technol 55(8):1512–1528
Article Google Scholar
Minku LL, Yao X (2014) How to make best use of cross-company data in software effort estimation?. In: Proceedings of ICSE. ACM, pp 446–456
Minku L L, Yao X (2017) Which models of the past are relevant to the present? a software effort estimation approach to exploiting useful past models. Autom Softw Eng 24(3):499–542
Article Google Scholar
Minku L L, Bowes D, Shihab E, Turhan B (2019) A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empir Softw Eng 24:3153–3204
Article Google Scholar
Nam J, Kim S (2015) CLAMI Defect prediction on unlabeled datasets (t). In: Proceedings of ASE ’15. IEEE, pp 452–463
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of ICSE ’13. IEEE, pp 382–391
Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE Trans Softw Eng 44(9):874–896
Article Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’Union fait la force. In: Proceedings of CSMR-WCRE ’14. IEEE, pp 164–173
Peters F, Menzies T (2012) Privacy and utility for defect prediction: experiments with MORPH. In: Proceedings of ICSE ’12. IEEE, pp 189–199
Peters F, Menzies T, Gong L, Zhang H (2013) Balancing privacy and utility in Cross-Company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068
Article Google Scholar
Peters F, Menzies T, Layman L (2015) LACE2: Better Privacy-Preserving data sharing for cross project defect prediction. In: Proceedings of ICSE ’15. IEEE, pp 801–811
Phannachitta P, Keung J, Monden A, Matsumoto K (2017) A stability assessment of solution adaptation techniques for analogy-based software effort estimation. Empir Softw Eng 22(1):474–504
Article Google Scholar
Port D, Korte M (2008) Comparative studies of the model evaluation criterions MMRE and PRED in software cost estimation research. In: Proceedings of International Symposium on Empirical Software Engineering and Measurement. ACM, pp 51–60
Pospieszny P, Czarnacka-Chrobot B, Kobylinski A (2018) An effective approach for software project effort and duration estimation with machine learning algorithms. J Syst Softw 137:184–196
Article Google Scholar
Prana GAA, Treude C, Thung F, Atapattu T, Lo D (2019) Categorizing the Content of GitHub README Files. Empir Softw Eng 24:1296–1327
Article Google Scholar
Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):1–29
Google Scholar
Ryu D, Jang JI, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980
Article Google Scholar
Sarro F, Petrozziello A, Harman M (2016) Multi-objective software effort estimation. In: Proceedings of ICSE. ACM, pp 619–630
Sehra SK, Brar YS, Kaur N, Sehra SS (2017) Research patterns and trends in software effort estimation. Inf Softw Technol 91:1–21
Article Google Scholar
Shepperd MJ, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827
Article Google Scholar
Sigweni B, Shepperd M, Turchi T (2016) Realistic assessment of software effort estimation models. In: Proceedings of EASE ’16. ACM
Tabassum S, Minku LL, Feng D, Cabral GG, Song L (2020) An investigation of cross-project learning in online just-in-time software defect prediction. In: ACM/IEEE 42Nd international conference on software engineering. https://doi.org/10.1145/3377811.3380403, New York, pp 554–565
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18
Article Google Scholar
Tong S, He Q, Chen Y, Yang Y, Shen B (2016) Heterogeneous Cross-Company effort estimation through transfer learning. In: Proceedings of APSEC ’16. IEEE, pp 169–176
Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empir Softw Eng 17(1-2):62–74
Article Google Scholar
Turhan B, Mendes E (2014) A comparison of Cross-Versus Single-Company effort prediction models for web projects. In: Proceedings of SEAA ’14. IEEE, pp 285–292
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578
Article Google Scholar
Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to Cross-Project fault prediction. In: Proceedings of SNPD ’12. IEEE, pp 476–481
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proceedings of PROMISE ’08. ACM, New York, pp 19–24
Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59
Article Google Scholar
Yan M, Xia X, Shihab E, Lo D, Yin J, Yang X (2019) Automating Change-level Self-admitted Technical Debt Determination. IEEE Trans Softw Eng 45:1221–1229
Article Google Scholar
Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proceedings of MSR, pp 182–191
Zhang F, Mockus A, Keivanloo I, Zou Y (2016) Towards building a universal defect prediction model with rank transformed predictors. Empir Softw Eng 21(5):2107–2145
Article Google Scholar
Zhang W, Yang Y, Wang Q (2015a) Using bayesian regression and em algorithm with missing handling for software effort prediction. Inf Softw Technol 58:58–70
Article Google Scholar
Zhang Y, Lo D, Xia X, Sun J (2015b) An empirical study of classifier combination for Cross-Project defect prediction. In: Proceedings of COMPSAC ’15. IEEE, pp 264–269
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of ESEC/FSE ’09. ACM, pp 91–100

Download references

Acknowledgments

This work was partially supported by JSPS KAKENHI under Grant No. 18K11246 and No. 21K11831, and No. 21K11833

Author information

Authors and Affiliations

Okayama Prefectural University, Soja, Japan
Sousuke Amasaki & Tomoyuki Yokogawa
Center for Information Technology, Ehime University, Matsuyama, Japan
Hirohisa Aman

Authors

Sousuke Amasaki
View author publications
You can also search for this author in PubMed Google Scholar
Hirohisa Aman
View author publications
You can also search for this author in PubMed Google Scholar
Tomoyuki Yokogawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sousuke Amasaki.

Additional information

Communicated by: Tim Menzies and Mei Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Inventing the Next Generation of Software Analytics

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amasaki, S., Aman, H. & Yokogawa, T. An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation. Empir Software Eng 27, 46 (2022). https://doi.org/10.1007/s10664-021-10103-4

Download citation

Accepted: 09 December 2021
Published: 27 January 2022
DOI: https://doi.org/10.1007/s10664-021-10103-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation

Abstract

Access this article

Similar content being viewed by others

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Ensemble Based-Cross Project Defect Prediction

An effective feature selection based cross-project defect prediction model for software quality improvement

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation

Abstract

Access this article

Similar content being viewed by others

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Ensemble Based-Cross Project Defect Prediction

An effective feature selection based cross-project defect prediction model for software quality improvement

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation