Negative results for software effort estimation


More than half the literature on software effort estimation (SEE) focuses on comparisons of new estimation methods. Surprisingly, there are no studies comparing state of the art latest methods with decades-old approaches. Accordingly, this paper takes five steps to check if new SEE methods generated better estimates than older methods. Firstly, collect effort estimation methods ranging from “classical” COCOMO (parametric estimation over a pre-determined set of attributes) to “modern” (reasoning via analogy using spectral-based clustering plus instance and feature selection, and a recent “baseline method” proposed in ACM Transactions on Software Engineering). Secondly, catalog the list of objections that lead to the development of post-COCOMO estimation methods. Thirdly, characterize each of those objections as a comparison between newer and older estimation methods. Fourthly, using four COCOMO-style data sets (from 1991, 2000, 2005, 2010) and run those comparisons experiments. Fifthly, compare the performance of the different estimators using a Scott-Knott procedure using (i) the A12 effect size to rule out “small” differences and (ii) a 99 % confident bootstrap procedure to check for statistically different groupings of treatments. The major negative result of this paper is that for the COCOMO data sets, nothing we studied did any better than Boehms original procedure. Hence, we conclude that when COCOMO-style attributes are available, we strongly recommend (i) using that data and (ii) use COCOMO to generate predictions. We say this since the experiments of this paper show that, at least for effort estimation, how data is collected is more important than what learner is applied to that data.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    For full details on these attributes, see Section 4 of this paper.


  1. Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: ICSE’11, pp 1–10

  2. Auer M, Trendowicz A, Graser B, Haunschmid E, Stefan B (2006) Optimal project feature weights in analogy-based cost estimation: improvement and limitations. IEEE Trans Softw Eng 32:83– 92

    Article  Google Scholar 

  3. Baker D (2007) A hybrid approach to expert and model-based effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University, Available from

  4. Black R, Curnow R, Katz R, Bray M (1977) Bcs software production data, final technical report radc-tr-77-116. Technical report Boeing Computer Services, Inc

  5. Boehm B (1981) Software engineering economics. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  6. Boehm B (2000) Safe, simple software cost analysis. IEEE Softw:14–17

  7. Boehm B, Horowitz E, Madachy R, Reifer D, Bradford KC, Steece B, Winsor Brown A, Chulani S, Abts C (2000) Software Cost Estimation with Cocomo II. Prentice Hall, Englewood Cliffs

    Google Scholar 

  8. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees

    MATH  Google Scholar 

  9. Burgess CJ, Lefley Martin (2001) Can genetic programming improve software effort estimation? a comparative evaluation. Inf Softw Technol 43(14):863–873

    Article  Google Scholar 

  10. Chen Z, Boehm B, Menzies T, Port D (2005) Finding the right data for software cost modeling. IEEE Softw 22:38–46

    Article  Google Scholar 

  11. Chen Z, Menzies T, Port D (2005) Feature subset selection can improve software cost estimation. In: PROMISE’05. Available from

  12. Chulani S, Boehm B, Steece B (1999) Bayesian analysis of empirical software engineering cost models. IEEE Trans Softw Eng 25(4)

  13. Cohen PR (1995) Empirical methods for artificial intelligence, MIT Press, Cambridge

  14. Corazza A, Di Martino S, Ferrucci F, Gravino C, Sarro F, Mendes E (2010) How effective is tabu search to configure support vector regression for effort estimation?. In: Proceedings of the 6th international conference on predictive models in software engineering, PROMISE ’10, pp 4:1–4:10

  15. Cordero R, Costamagna M, Paschetta E (1997) A genetic algorithm approach for the calibration of cocomo-like models. In: 12th COCOMO Forum

  16. Dabney JB (2002) Return on investment for IV&V. NASA funded study. Results Available from

  17. Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38:375–397

    Article  Google Scholar 

  18. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Mono. Stat. Appl. Probab. Chapman and Hall, London

    Book  Google Scholar 

  19. Freiman F, Park R (1979) Price software model - version 3: An overview. In: Proceedings IEEE-PINY workshop on quantitative software models, IEEE catalog number TH 0067-9, pp 32–41

  20. Herd J, Postak J, Russell W, Stewart J (1977) Software cost estimation study-study results, final technical report, radc-tr-77-220. Technical report, Doty Associates

  21. Ingold D, Boehm B, Koolmanojwong S (2013) A model for estimating agile project process and schedule acceleration. In: ICSSP 2013, pp 29–35

  22. Jensen R (1983) An improved macrolevel software development resource estimation model, pp 88–92

  23. Jorgensen M (2015) The world is skewed: ignorance, use, misuse, misunderstandings, and how to improve uncertainty analyses in software development projects, 2015 CREST workshop.

  24. Jørgensen M, Gruschke TM (2009) The impact of lessons-learned sessions on effort estimation and uncertainty assessments. IEEE Trans Softw Eng 35(3):368–383

    Article  Google Scholar 

  25. Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. Available from

  26. Jorgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70(1-2):37–60

    Article  Google Scholar 

  27. Li M, Mao K, Yang Y, Harman M (2013) Pricing crowdsourcing-based software development tasks. In: ICSE, new ideas and emerging results, San Francisco, CA, USA, pp 1205–1208

  28. Kadoda G, Cartwright M, Chen L, Shepperd M (2000) Experiences using casebased reasoning to predict software project effort

  29. Kampenes Vigdis By, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086

    Article  Google Scholar 

  30. Keung JW (2008) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: international symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 294–296

  31. Keung JW, Kitchenham B (2008) Experiments with analogy-x for software cost estimation. In: ASWEC ’08: proceedings of the 19th Australian conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 229–238

  32. Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-x: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484

    Article  Google Scholar 

  33. Kirsopp C, Shepperd M (2002) Making inferences with small numbers of training sets. IEEE Proc:149

  34. Kocaguneli E, Menzies T, Bener A, Keung J (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng 28:425–438. Available from

    Article  Google Scholar 

  35. Kocaguneli E, Menzies T, Keung J W (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416

    Article  Google Scholar 

  36. Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013) Active learning and effort estimation: finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053

    Article  Google Scholar 

  37. Kocaguneli E, Menzies T, Mendes E (2014) Transfer learning in effort estimation. Empir Softw Eng:1–31

  38. Kocaguneli E, Zimmermann T, Bird C, Nagappan N, Menzies T (2013) Distributed development considered harmful?. In: ICSE, pp 882–890

  39. Li Jingzhou, Ruhe Guenther (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. In: International symposium on empirical software engineering, p 74

  40. Li J, Ruhe G (2007) Decision support analysis for software effort estimation by analogy. In: PROMISE ’07: proceedings of the third international workshop on predictor models in software engineering, p 6

  41. Li J, Ruhe G (2008) Analysis of attribute weighting heuristics for analogy-based software effort estimation method aqua+. Empir Softw Eng 13:63–96

  42. Li Y, Xie M, Goh T (2009) A study of the non-linear adjustment for analogy based software cost estimation. Empir Softw Eng:603–643

  43. Lokan C, Mendes E (2006) Cross-company and single-company effort models using the isbsg database: a further replicated study. In: The ACM-IEEE international symposium on empirical software engineering, November 21–22, Rio de Janeiro

  44. Lokan C, Mendes E (2009) Applying moving windows to software effort estimation. In: 3rd international symposium on empirical software engineering and measurement, 2009. ESEM 2009, pp 111–122

  45. Menzies T, Butcher A, Cok DR, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834. Available from

    Article  Google Scholar 

  46. Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng. Available from

  47. Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007) Problems with precision. IEEE Trans Softw Eng.

  48. Menzies T, Kocagüneli E, Minku L, Peters F, Turhan B (2015) Chapter 20 - ensembles of learning machines. In: Sharing data and models in software engineering, pp 239–265

  49. Menzies T, Peters F, Marcus A (2013) Ooops... (errata report for “Better Cross-Company Learning”). In: MSR’13.

  50. Menzies T, Port D, Chen Z, Hihn J, Stukes S (2005) Validation methods for calibrating software effort models. In: Proceedings, ICSE. Available from

  51. Menzies T, Shepperd M (2012) Special issue on repeatable results in software engineering prediction. Empir Softw Eng 17(1–2):1–17

    Article  Google Scholar 

  52. Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall, London

    Book  MATH  Google Scholar 

  53. Minku LL, Yao X (2011) A principled evaluation of ensembles of learning machines for software effort estimation, vol 106

  54. Minku LL, Yao X (2013) Ensembles and locality: insight on improving software effort estimation. Inf Softw Technol 55:1512–1528

    Article  Google Scholar 

  55. Minku LL, Yao X (2014) How to make best use of cross-company data in software effort estimation?. In: ICSE’14, pp 446–456

  56. Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551

    Article  Google Scholar 

  57. Molokken-Pstvold K, Haugen NC, Benestad HC (2008) Using planning poker for combining expert estimates in software projects. J Syst Softw 81:2106–2117

    Article  Google Scholar 

  58. Murphy-Hill E, Parnin C, Black AP (2012) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18

    Article  Google Scholar 

  59. Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391

    Article  Google Scholar 

  60. Papakroni V (2013) Data carving: identifying and removing irrelevancies in the data. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University

  61. Park R (1988) The central equations of the price software cost model. In: 4th COCOMO users group meeting

  62. Passos C, Braun AP, Cruzes DS, Mendonca M (2011) Analyzing the impact of beliefs in software project practices. In: ESEM’11

  63. Popper K R (1963) Conjectures and refutations. Routledge and Kegan Paul

  64. Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of ASE’11

  65. Putnam L (1976) A macro-estimating methodology for software development, pp 38–43

  66. Scanniello G, Gravino C, Marcus A, Menzies T (2013) Class level fault prediction using software clustering. In: IEEE/ACM 28th international conference on automated software engineering, (ASE), 2013. IEEE, pp 640–645

  67. Shaw M (2001) The coming-of-age of software architecture research. In: Proceedings of the 23rd international conference on software engineering, ICSE ’01, vol 656. IEEE Computer Society, Washington, DC, USA

  68. Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(12). Available from

  69. Shepperd MJ, Macdonell SG (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827

    Article  Google Scholar 

  70. (2002) Nasa to shut down checkout & launch control system.

  71. Stanley C, Byrne MD (2013) Predicting tags for stackoverflow posts. In: Proceedings of ICCM, vol 2013

  72. Valerdi R (2011) Convergence of expert opinion via the wideband delphi method: an application in cost estimation models. In: Incose International Symposium, Denver, USA. Available from

  73. Walkerden Fiona, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empir Softw Engg 4(2):135–158

    Article  Google Scholar 

  74. Walston C, Felix C (1977) A method of programming measurement and estimation. IBM Syst J 16(1):54–77

    Article  Google Scholar 

  75. Whigham PA, Owen CA, Macdonell SG (2015) A baseline model for software effort estimation. ACM Trans Softw Eng Methodol 24(3):20:1–20:11

    Article  Google Scholar 

  76. Wolverton R (1974) The cost of developing large-scale software. IEEE Trans Comput:615–636

Download references


The research described in this paper was carried out, in part, at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the US National Aeronautics and Space Administration. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not constitute or imply its endorsement by the US Government.

Author information



Corresponding author

Correspondence to Tim Menzies.

Additional information

Communicated by: Richard Paige, Jordi Cabot and Neil Ernst

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Menzies, T., Yang, Y., Mathew, G. et al. Negative results for software effort estimation. Empir Software Eng 22, 2658–2683 (2017).

Download citation


  • Effort estimation
  • CART
  • Nearest neighbor
  • Clustering
  • Feature selection
  • Prototype generation
  • Bootstrap sampling
  • Effect size
  • A12