Skip to main content
Log in

A pragmatic ensemble learning approach for effective software effort estimation

  • S.I. : ACITSEP
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

The immense increase in software technology has resulted in the convolution of software projects. Software effort estimation is fundamental to commence any software project and inaccurate estimation may lead to several complications and setbacks for present and future projects. Several techniques have been following for ages of the software effort estimation. As the application of software is extensively increased in its size and complexity, the traditional methods aren’t adequate to meet the requirements. To achieve the accurate estimation of software effort, in this paper, a gradient boosting regressor model is proposed as a robust approach. The performance is compared with regression models such as stochastic gradient descent, K-nearest neighbor, decision tree, bagging regressor, random forest regressor, Ada-boost regressor, and gradient boosting regressor by employing COCOMO’81 containing 63 projects and CHINA of 499 projects. The regression models are evaluated by the evaluation metrics such as MAE, MSE, RMSE, and R2. From the results, it is evident that the gradient boosting regressor model is performing well by obtaining an accuracy of 98% with COCOMO’81 and 93% with CHINA dataset. The proposed method significantly performs better than all regression models used in comparison with both the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Abbreviations

SGD:

Stochastic gradient descent

KNN:

K-nearest neighbor

DT:

Decision trees

BR:

Bagging regressor

RFR:

Random forest regressor

ABR:

Ada-Boost regressor

GBR:

Gradient boosting regressor

MRE:

Magnitude relative error

MMRE:

Mean magnitude of relative error

RMSE:

Root mean square error

MAE:

Mean absolute error

MdMRE:

Median magnitude of relative error

MSE:

Mean square error

References

  1. Al Yahya M, Ahmad R, Lee S (2010) Impact of CMMI based software process maturity on COCOMO II’s effort estimation. Int Arab J Inf Technol 7(2):129–137

    Google Scholar 

  2. Attarzadeh I, Mehranzadeh A, Barati A (2012) Proposing an enhanced artificial neural network prediction model to improve the accuracy in software effort estimation. In: Proceedings of 2012 4th international conference computer intelligence, communication system networks, CICSyN 2012, pp 167–72

  3. Suresh Kumar P, Behera HS (2020) Role of soft computing techniques in software effort estimation: an analytical study. Adv Intell Syst Comput 999:807–831

    Google Scholar 

  4. Baskeles B, Turhan B, Bener A (2007) Software effort estimation using machine learning methods. In: 2007 22nd international symposium on computer and information sciences [Internet]. IEEE, pp 1–6. Available from http://ieeexplore.ieee.org/document/4456863/

  5. Kocaguneli E, Tosun A, Bener A (2010) AI-based models for software effort estimation. In: Proceedings of 36th EUROMICRO conference software engineering and advanced applications, SEAA, pp 323–326

  6. Nassif AB, Capretz LF, Ho D (2012) Software effort estimation in the early stages of the software life cycle using a cascade correlation neural network model. In: Proceedings of 13th ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing. SNPD 2012, pp 589–594

  7. Huang J, Li YF, Xie M (2015) An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf Softw Technol 67:108–127

    Article  Google Scholar 

  8. Arslan F (2019) A review of machine learning models for software cost estimation. Rev Comput Eng Res 6(2):64–75

    Article  Google Scholar 

  9. Kocaguneli E, Menzies T, Keung JW (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416

    Article  Google Scholar 

  10. Singal P, Kumari AC, Sharma P (2020) Estimation of software development effort: a differential evolution approach. Proc Comput Sci 167(2019):2643–2652

    Article  Google Scholar 

  11. Malgonde O, Chari K (2019) An ensemble-based model for predicting agile software development effort. Empir Softw Eng 24:1017–1055

    Article  Google Scholar 

  12. Abdelali Z, Mustapha H, Abdelwahed N (2019) Investigating the use of random forest in software effort estimation. Proc Comput Sci 148:343–352

    Article  Google Scholar 

  13. Nassif AB, Azzeh M, Idri A, Abran A (2019) Software development effort estimation using regression fuzzy models. Comput Intell Neurosci. https://doi.org/10.1007/978-94-007-7506-0_7

    Article  Google Scholar 

  14. Pospieszny P, Czarnacka-Chrobot B, Kobylinski A (2018) An effective approach for software project effort and duration estimation with machine learning algorithms. J Syst Softw 137:184–196

    Article  Google Scholar 

  15. Minku LL, Yao X (2013) Ensembles and locality: insight on improving software effort estimation. Inf Softw Technol 55(8):1512–1528. https://doi.org/10.1016/j.infsof.2012.09.012

    Article  Google Scholar 

  16. Friedman H, Greedy J (2001) Function approximation: a gradient boosting machine. Ann Stat. https://doi.org/10.2307/2699986

    Article  MathSciNet  Google Scholar 

  17. Keprate A, Ratnayake RMC (2017) Using gradient boosting regressor to predict stress intensity factor of a crack propagating in small bore piping. In: IEEE international conference on industrial engineering management, pp 1331–1336

  18. Aljahdali S, Sheta AF, Debnath NC (2016) Estimating software effort and function point using regression. In: Support vector machine and artificial neural networks models. Proceedings of IEEE/ACS international conference on computing system applications, AICCSA

  19. Reddy PVGDP, Sudha KR, Sree PR, Ramesh SNSVSC (2010) Software effort estimation using radial basis and generalized regression. Neural Netw 2(5):87–92

    Google Scholar 

  20. Minku LL, Yao X (2011) A principled evaluation of ensembles of learning machines for software effort estimation. In: ACM international conference on proceeding series

  21. Dave VS, Dutta K (2011) Comparison of regression model, feed-forward neural network and radial basis neural network for software development effort estimation. ACM SIGSOFT Softw Eng Notes 36(5):1

    Article  Google Scholar 

  22. Sarro F, Petrozziello A, Harman M (2016) Multi-objective software effort estimation. In: Proceedings of international conference on software engineering, pp 619–30

  23. Bettenburg N, Nagappan M, Hassan AE (2012) Think locally, act globally: improving defect and effort prediction models. IEEE Int Work Conf Min Softw Repos, pp 60–69

  24. Boehm B (1981) Software engineering economics. Available from http://promise.site.uottawa.ca/SERepository/datasets/cocomo81.arff

  25. No Title. Available from http://promise.site.uottawa.ca/SERepository/datasets-page.html

  26. Boehm BW (1984) Software engineering economics. IEEE Trans Softw Eng 10(1):4–21

    Article  Google Scholar 

  27. Bosu MF, Macdonell SG (2019) Experience: quality benchmarking of datasets used in software effort estimation. J Data Inf Qual 11(4):1–38

    Article  Google Scholar 

  28. Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F et al (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834

    Article  Google Scholar 

  29. Li X, Li W, Xu Y (2018) Human age prediction based on DNA methylation using a gradient boosting regressor. Genes (Basel) 9(9):424

    Article  Google Scholar 

  30. Singh AJ, Kumar M (2020) Comparative study on effort estimation using different data mining techniques. Int J Sci Technol Res 9(4):3005–3010

    Google Scholar 

  31. Fadhil AA, Alsarraj RG (2020) Exploring the whale optimization algorithm to enhance software project effort estimation. In: 2020 6th international engineering conference “sustainable technology and development” (IEC) [Internet]. IEEE, pp 146–51. Available from https://ieeexplore.ieee.org/document/9122918/

  32. Suresh Kumar P, Behera HS (2020) Estimating software effort using neural network: an experimental investigation. In: Advances in intelligent systems and computing. Springer ,Singapore, pp 165–80. https://doi.org/10.1007/978-981-15-2449-3_14

  33. Xia T, Krishna R, Chen J, Mathew G, Shen X, Menzies T (2018) Hyperparameter optimization for effort estimation. Available from http://arxiv.org/abs/1805.00336

  34. Saljoughinejad R, Khatibi V (2018) A new optimized hybrid model based on COCOMO to increase the accuracy of software cost estimation. J Adv Comput Eng Technol 4(1):27–40

    Google Scholar 

  35. Satapathy SM, Rath SK (2017) Empirical assessment of machine learning models for agile software development effort estimation using story points. Innov Syst Softw Eng 13(2–3):191–200. https://doi.org/10.1007/s11334-017-0288-z

    Article  Google Scholar 

  36. Satapathy SC, Govardhan A, Srujan Raju K, Mandal JK (2015) Emerging ICT for bridging the future. In: Proceedings of the 49th annual convention of the computer society of India (CSI), vol 1. Advanced intelligent system computing, vol 337, pp 19–30

  37. Azzeh M, Elsheikh Y, Alseid M (2014) An optimized analogy-based project effort estimation. Int J Adv Comput Sci Appl 5(4):6–11

    Google Scholar 

  38. Uzun R, Erkaymaz O, Yapici İŞ (2018) Comparison of artificial neural network and regression models to diagnose of knee disorder in different postures using surface. Electromyography 31(1):100–110

    Google Scholar 

  39. Kumari S, Pushkar S (2016) A framework for analogy-based software cost estimation using multi-objective genetic algorithm. Lect Notes Eng Comput Sci 2225:508–515

    Google Scholar 

  40. Liu Q, Xiao J, Zhu H (2019) Feature selection for software effort estimation with localized neighborhood mutual information. Cluster Comput 22(1):6953–6961

    Article  Google Scholar 

  41. Azzeh M (2011) Model tree based adaption strategy for software effort estimation by analogy. In: Proceedings of 11th IEEE international conference on computing information technology, pp 328–335

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Suresh Kumar.

Ethics declarations

Conflict of interest

The authors declare that this manuscript has no conflict of interest with any other published source and has not been published previously (partly or in full). No data have been fabricated or manipulated to support our conclusions.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suresh Kumar, P., Behera, H.S., Nayak, J. et al. A pragmatic ensemble learning approach for effective software effort estimation. Innovations Syst Softw Eng 18, 283–299 (2022). https://doi.org/10.1007/s11334-020-00379-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-020-00379-y

Keywords

Navigation