Abstract
Software effort estimation (SEE) models have been studied for decades. One of serious but typical situations for data-oriented models is the availability of datasets for training models. Cross-company software effort estimation (CCSEE) is a research topic to utilize project data outside an organization. The same problem was identified in software defect prediction research, and Cross-project defect prediction (CPDP) has been studied. CPDP and CCSEE shared the motivation and developed approaches for the same problem. A question arisen that CPDP approaches were applicable to CCSEE. This study explored this question with a survey and an empirical study focusing on homogeneous approaches. We first examined 24 homogeneous CPDP approach implementations provided in a CPDP framework and found more than half of the approaches were applicable. Next, we used the results of two past studies to check whether the applicable CPDP approaches covered strategies taken in past CPDP studies. The empirical experiment was then conducted to evaluate the estimation performance of the applicable CPDP approaches. CPDP approaches were configured with some machine learning techniques if available, and ten CCSEE dataset configurations were supplied for evaluation. The result was also compared with the results of those two past studies to find the commonalities and the differences between CPDP and CCSEE. The answers to the research questions revealed the our CPDP selection covered the strategies that CPDP approaches had taken. The empirical results supported the simple merge with no instance weighting, no feature selection, no data transformation, nor the simple voting. It was not necessarily surprising according CPDP and CCSEE studies, but had not been explored with CPDP approaches under CCSEE situation. A practical implication is: Combine cross-company data to make effort estimates.
Similar content being viewed by others
References
Amasaki S, Kawata K, Yokogawa T (2015) Improving Cross-Project defect prediction methods with data simplification. In: Proceedings of SEAA ’15. IEEE, pp 96–103
Amasaki S, Aman H, Yokogawa T (2020) An exploratory study on applicability of cross project defect prediction approaches to cross-company effort estimation. In: Proceedings of PROMISE, Association for Computing Machinery, pp 71–80
Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical evaluation of cross-release effort-aware defect prediction models. In: Proceedings of International Conference on Software Quality, Reliability and Security, pp 214–221
Bin Y, Zhou K, Lu H, Zhou Y, Xu B (2017) Training data selection for cross-project defection prediction: Which approach is better?. In: Proceedings of International Symposium on Empirical Software Engineering and Measurement, ACM, pp 354–363
Boehm B (1981) Software engineering economics. Prentice-Hall
Briand LC, Langley T, Wieczorek I (2000) A replicated assessment and comparison of common software cost modeling techniques. In: Proceedings of ICSE. IEEE, pp 377–386
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective Cross-Project Defect Prediction. In: Proceedings of ICST ’13. IEEE, pp 252–261
Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: 2019 IEEE/ACM 6Th international conference on mobile software engineering and systems (MOBILESoft). https://doi.org/10.1109/MOBILESoft.2019.00023, pp 99–110
de Cabral A, JTH Oliveira AL (2021) Ensemble effort estimation using dynamic selection. J Syst Softw 175:110904
Desharnais JM (1989) Analyse statistique de la producivité des projets de développment en informatique à partir de la technique des points de fonction. Master’s thesis, Université du Québec à Montréal
Erika CCA, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of ESEM ’09. IEEE, pp 460–463
Ferrucci F, Gravino C (2019) Can expert opinion improve effort predictions when exploiting cross-company datasets? - a case study in a small/medium company. In: Proceedings of Product-Focused Software Process Improvement. Springer, pp 280–295
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995
Frank E, Hall MA, Witten IH (2016) The weka workbench. online appendix for “data mining: Practical machine learning tools and techniques” https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
He Z, Peters F, Menzies T, Yang Y (2013) Learning from Open-Source projects: an empirical study on defect prediction. In: Proceedings of ESEM ’13. IEEE, pp 45–54
Herbold S (2013) Training data selection for cross-project defect prediction. In: Proceedings of PROMISE ’13. ACM, New York, pp 6:1–6:10
Herbold S, Trautsch A, Grabowski J (2018) A comparative study to benchmark Cross-Project defect prediction approaches. IEEE Trans Softw Eng 44(9):811–833
Hosni M, Idri A, Abran A, Nassif AB (2018) On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft Comput 22(18):5977–6010
Hosseini S, Turhan B, Gunarathna D (2019) A systematic literature review and Meta-Analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147
Huang L, Port D, Wang L, Xie T, Menzies T (2010) Text mining in supporting software systems risk assurance. In: Proceedings of International Conference on Automated Software Engineering, ASE ’10. ACM, pp 163–166
Idri A, Hosni M, Abran A (2016) Systematic literature review of ensemble effort estimation. J Syst Softw 118:151–175
Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of FSE ’15. ACM, pp 496–507
Jing X, Qi F, Wu F, Xu B (2016) Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation. In: Proceedings of ICSE. IEEE, pp 607–618
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(6):2072–2106. https://doi.org/10.1007/s10664-015-9400-x
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: Proceedings of ACIT-CSI ’15. Springer, pp 2–7
Kemerer CF (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429
Khoshgoftaar TM, Rebours P, Seliya N (2009) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49
Kitchenham B, Kansala K (1993) Inter-item correlations among function points. In: Proceedings of METRICS ’93, pp 11–14
Kitchenham B, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: Proceedings of PROMISE, p 5
Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus Within-Company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329
Kocaguneli E, Menzies T (2011) How to Find Relevant Data for Effort Estimation? In: Proceedings of ESEM ’11. IEEE pp 255–264
Kocaguneli E, Cukic B, Menzies T, Lu H (2013a) Building a second opinion: learning cross-company data. In: Proceedings of ESEM ’13. ACM, pp 1–10
Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013b) Active learning and effort estimation: Finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053
Kocaguneli E, Menzies T, Mendes E (2015) Transfer learning in effort estimation. Empir Softw Eng 20(3):813–843
Krishna R, Menzies T (2019) Bellwethers: A baseline method for transfer learning. IEEE Trans Softw Eng 45(11):1081–1105
Langdon WB, Dolado J, Sarro F, Harman M (2016) Exact mean absolute error of baseline predictor, MARP0. Inf Softw Technol 73:16–18
Li Z, Jing XY, Zhu X, Zhang H, Xu B, Ying S (2019) On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans Softw Eng 45(4):391–411
Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864
Lokan C, Mendes E (2006) Cross-company: Single-company effort models using the ISBSG database A further replicated study. In: Proceedings of ISESE ’06, vol 2006, pp 75–84E
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Maxwell KD (2002) Applied statistics for software managers. Prentice Hall
Mendes E, Kitchenham B (2004) Further comparison of cross-company and within-company effort estimation models for Web applications. In: Proceedings of METRIC ’04, pp 348–357
Mendes E, Lokan C (2009) Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study. In: Proceedings of EASE. ACM, pp 11–20
Mendes E, Di Martino S, Ferrucci F, Gravino C (2008) Cross-company vs. single-company web effort models using the Tukutuku database: An extended study. J Syst Softw 81(5):673–690
Mendes E, Kalinowski M, Martins D, Ferrucci F, Sarro F (2014) Cross-vs. Within-company cost estimation studies revisited: An extended systematic review. In: Proceedings of EASE. ACM
Mensah S, Keung J, Bosu MF, Bennin KE (2018a) Duplex output software effort estimation model with self-guided interpretation. Inf Softw Technol 94:1–13
Mensah S, Keung J, MacDonell SG, Bosu MF, Bennin KE (2018b) Investigating the significance of the bellwether effect to improve software effort prediction: Further empirical study. IEEE Trans Reliab 67(3):1176–1198
Menzies T, Chen Z, Hihn J, Lum K (2006a) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):883–895
Menzies T, Chen Z, Hihn J, Lum K (2006b) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):883–895
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local versus global models for effort estimation and defect prediction. In: Proceedings of ASE ’11. IEEE, pp 343–351
Menzies T, Yang Y, Mathew G, Boehm B, Hihn J (2017) Negative results for software effort estimation. Empir Softw Eng 22(5):2658–2683
Minku LL, Hou S (2017) Clustering dycom. In: Proceedings of PROMISE ’17. ACM, pp 12–21
Minku LL, Yao X (2012) Can Cross-company Data Improve Performance in Software Effort Estimation?. In: Proceedings of PROMISE ’12. ACM, pp 69–78
Minku LL, Yao X (2013) Ensembles and locality: Insight on improving software effort estimation. Inf Softw Technol 55(8):1512–1528
Minku LL, Yao X (2014) How to make best use of cross-company data in software effort estimation?. In: Proceedings of ICSE. ACM, pp 446–456
Minku L L, Yao X (2017) Which models of the past are relevant to the present? a software effort estimation approach to exploiting useful past models. Autom Softw Eng 24(3):499–542
Minku L L, Bowes D, Shihab E, Turhan B (2019) A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empir Softw Eng 24:3153–3204
Nam J, Kim S (2015) CLAMI Defect prediction on unlabeled datasets (t). In: Proceedings of ASE ’15. IEEE, pp 452–463
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of ICSE ’13. IEEE, pp 382–391
Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE Trans Softw Eng 44(9):874–896
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’Union fait la force. In: Proceedings of CSMR-WCRE ’14. IEEE, pp 164–173
Peters F, Menzies T (2012) Privacy and utility for defect prediction: experiments with MORPH. In: Proceedings of ICSE ’12. IEEE, pp 189–199
Peters F, Menzies T, Gong L, Zhang H (2013) Balancing privacy and utility in Cross-Company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068
Peters F, Menzies T, Layman L (2015) LACE2: Better Privacy-Preserving data sharing for cross project defect prediction. In: Proceedings of ICSE ’15. IEEE, pp 801–811
Phannachitta P, Keung J, Monden A, Matsumoto K (2017) A stability assessment of solution adaptation techniques for analogy-based software effort estimation. Empir Softw Eng 22(1):474–504
Port D, Korte M (2008) Comparative studies of the model evaluation criterions MMRE and PRED in software cost estimation research. In: Proceedings of International Symposium on Empirical Software Engineering and Measurement. ACM, pp 51–60
Pospieszny P, Czarnacka-Chrobot B, Kobylinski A (2018) An effective approach for software project effort and duration estimation with machine learning algorithms. J Syst Softw 137:184–196
Prana GAA, Treude C, Thung F, Atapattu T, Lo D (2019) Categorizing the Content of GitHub README Files. Empir Softw Eng 24:1296–1327
Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):1–29
Ryu D, Jang JI, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980
Sarro F, Petrozziello A, Harman M (2016) Multi-objective software effort estimation. In: Proceedings of ICSE. ACM, pp 619–630
Sehra SK, Brar YS, Kaur N, Sehra SS (2017) Research patterns and trends in software effort estimation. Inf Softw Technol 91:1–21
Shepperd MJ, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827
Sigweni B, Shepperd M, Turchi T (2016) Realistic assessment of software effort estimation models. In: Proceedings of EASE ’16. ACM
Tabassum S, Minku LL, Feng D, Cabral GG, Song L (2020) An investigation of cross-project learning in online just-in-time software defect prediction. In: ACM/IEEE 42Nd international conference on software engineering. https://doi.org/10.1145/3377811.3380403, New York, pp 554–565
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18
Tong S, He Q, Chen Y, Yang Y, Shen B (2016) Heterogeneous Cross-Company effort estimation through transfer learning. In: Proceedings of APSEC ’16. IEEE, pp 169–176
Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empir Softw Eng 17(1-2):62–74
Turhan B, Mendes E (2014) A comparison of Cross-Versus Single-Company effort prediction models for web projects. In: Proceedings of SEAA ’14. IEEE, pp 285–292
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578
Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to Cross-Project fault prediction. In: Proceedings of SNPD ’12. IEEE, pp 476–481
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proceedings of PROMISE ’08. ACM, New York, pp 19–24
Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59
Yan M, Xia X, Shihab E, Lo D, Yin J, Yang X (2019) Automating Change-level Self-admitted Technical Debt Determination. IEEE Trans Softw Eng 45:1221–1229
Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proceedings of MSR, pp 182–191
Zhang F, Mockus A, Keivanloo I, Zou Y (2016) Towards building a universal defect prediction model with rank transformed predictors. Empir Softw Eng 21(5):2107–2145
Zhang W, Yang Y, Wang Q (2015a) Using bayesian regression and em algorithm with missing handling for software effort prediction. Inf Softw Technol 58:58–70
Zhang Y, Lo D, Xia X, Sun J (2015b) An empirical study of classifier combination for Cross-Project defect prediction. In: Proceedings of COMPSAC ’15. IEEE, pp 264–269
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of ESEC/FSE ’09. ACM, pp 91–100
Acknowledgments
This work was partially supported by JSPS KAKENHI under Grant No. 18K11246 and No. 21K11831, and No. 21K11833
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Tim Menzies and Mei Nagappan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Inventing the Next Generation of Software Analytics
Rights and permissions
About this article
Cite this article
Amasaki, S., Aman, H. & Yokogawa, T. An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation. Empir Software Eng 27, 46 (2022). https://doi.org/10.1007/s10664-021-10103-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10103-4