Empirical Software Engineering

, Volume 22, Issue 5, pp 2658–2683 | Cite as

Negative results for software effort estimation

  • Tim MenziesEmail author
  • Ye Yang
  • George Mathew
  • Barry Boehm
  • Jairus Hihn


More than half the literature on software effort estimation (SEE) focuses on comparisons of new estimation methods. Surprisingly, there are no studies comparing state of the art latest methods with decades-old approaches. Accordingly, this paper takes five steps to check if new SEE methods generated better estimates than older methods. Firstly, collect effort estimation methods ranging from “classical” COCOMO (parametric estimation over a pre-determined set of attributes) to “modern” (reasoning via analogy using spectral-based clustering plus instance and feature selection, and a recent “baseline method” proposed in ACM Transactions on Software Engineering). Secondly, catalog the list of objections that lead to the development of post-COCOMO estimation methods. Thirdly, characterize each of those objections as a comparison between newer and older estimation methods. Fourthly, using four COCOMO-style data sets (from 1991, 2000, 2005, 2010) and run those comparisons experiments. Fifthly, compare the performance of the different estimators using a Scott-Knott procedure using (i) the A12 effect size to rule out “small” differences and (ii) a 99 % confident bootstrap procedure to check for statistically different groupings of treatments. The major negative result of this paper is that for the COCOMO data sets, nothing we studied did any better than Boehms original procedure. Hence, we conclude that when COCOMO-style attributes are available, we strongly recommend (i) using that data and (ii) use COCOMO to generate predictions. We say this since the experiments of this paper show that, at least for effort estimation, how data is collected is more important than what learner is applied to that data.


Effort estimation COCOMO CART Nearest neighbor Clustering Feature selection Prototype generation Bootstrap sampling Effect size A12 



The research described in this paper was carried out, in part, at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the US National Aeronautics and Space Administration. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not constitute or imply its endorsement by the US Government.


  1. Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: ICSE’11, pp 1–10Google Scholar
  2. Auer M, Trendowicz A, Graser B, Haunschmid E, Stefan B (2006) Optimal project feature weights in analogy-based cost estimation: improvement and limitations. IEEE Trans Softw Eng 32:83– 92CrossRefGoogle Scholar
  3. Baker D (2007) A hybrid approach to expert and model-based effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University, Available from
  4. Black R, Curnow R, Katz R, Bray M (1977) Bcs software production data, final technical report radc-tr-77-116. Technical report Boeing Computer Services, IncGoogle Scholar
  5. Boehm B (1981) Software engineering economics. Prentice Hall, Englewood CliffszbMATHGoogle Scholar
  6. Boehm B (2000) Safe, simple software cost analysis. IEEE Softw:14–17Google Scholar
  7. Boehm B, Horowitz E, Madachy R, Reifer D, Bradford KC, Steece B, Winsor Brown A, Chulani S, Abts C (2000) Software Cost Estimation with Cocomo II. Prentice Hall, Englewood CliffsGoogle Scholar
  8. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression treeszbMATHGoogle Scholar
  9. Burgess CJ, Lefley Martin (2001) Can genetic programming improve software effort estimation? a comparative evaluation. Inf Softw Technol 43(14):863–873CrossRefGoogle Scholar
  10. Chen Z, Boehm B, Menzies T, Port D (2005) Finding the right data for software cost modeling. IEEE Softw 22:38–46CrossRefGoogle Scholar
  11. Chen Z, Menzies T, Port D (2005) Feature subset selection can improve software cost estimation. In: PROMISE’05. Available from
  12. Chulani S, Boehm B, Steece B (1999) Bayesian analysis of empirical software engineering cost models. IEEE Trans Softw Eng 25(4)Google Scholar
  13. Cohen PR (1995) Empirical methods for artificial intelligence, MIT Press, CambridgeGoogle Scholar
  14. Corazza A, Di Martino S, Ferrucci F, Gravino C, Sarro F, Mendes E (2010) How effective is tabu search to configure support vector regression for effort estimation?. In: Proceedings of the 6th international conference on predictive models in software engineering, PROMISE ’10, pp 4:1–4:10Google Scholar
  15. Cordero R, Costamagna M, Paschetta E (1997) A genetic algorithm approach for the calibration of cocomo-like models. In: 12th COCOMO ForumGoogle Scholar
  16. Dabney JB (2002) Return on investment for IV&V. NASA funded study. Results Available from
  17. Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38:375–397CrossRefGoogle Scholar
  18. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Mono. Stat. Appl. Probab. Chapman and Hall, LondonCrossRefGoogle Scholar
  19. Freiman F, Park R (1979) Price software model - version 3: An overview. In: Proceedings IEEE-PINY workshop on quantitative software models, IEEE catalog number TH 0067-9, pp 32–41Google Scholar
  20. Herd J, Postak J, Russell W, Stewart J (1977) Software cost estimation study-study results, final technical report, radc-tr-77-220. Technical report, Doty AssociatesGoogle Scholar
  21. Ingold D, Boehm B, Koolmanojwong S (2013) A model for estimating agile project process and schedule acceleration. In: ICSSP 2013, pp 29–35Google Scholar
  22. Jensen R (1983) An improved macrolevel software development resource estimation model, pp 88–92Google Scholar
  23. Jorgensen M (2015) The world is skewed: ignorance, use, misuse, misunderstandings, and how to improve uncertainty analyses in software development projects, 2015 CREST workshop.
  24. Jørgensen M, Gruschke TM (2009) The impact of lessons-learned sessions on effort estimation and uncertainty assessments. IEEE Trans Softw Eng 35(3):368–383CrossRefGoogle Scholar
  25. Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. Available from
  26. Jorgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70(1-2):37–60CrossRefGoogle Scholar
  27. Li M, Mao K, Yang Y, Harman M (2013) Pricing crowdsourcing-based software development tasks. In: ICSE, new ideas and emerging results, San Francisco, CA, USA, pp 1205–1208Google Scholar
  28. Kadoda G, Cartwright M, Chen L, Shepperd M (2000) Experiences using casebased reasoning to predict software project effortGoogle Scholar
  29. Kampenes Vigdis By, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086CrossRefGoogle Scholar
  30. Keung JW (2008) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: international symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 294–296Google Scholar
  31. Keung JW, Kitchenham B (2008) Experiments with analogy-x for software cost estimation. In: ASWEC ’08: proceedings of the 19th Australian conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 229–238Google Scholar
  32. Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-x: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484CrossRefGoogle Scholar
  33. Kirsopp C, Shepperd M (2002) Making inferences with small numbers of training sets. IEEE Proc:149Google Scholar
  34. Kocaguneli E, Menzies T, Bener A, Keung J (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng 28:425–438. Available from CrossRefGoogle Scholar
  35. Kocaguneli E, Menzies T, Keung J W (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416CrossRefGoogle Scholar
  36. Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013) Active learning and effort estimation: finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053CrossRefGoogle Scholar
  37. Kocaguneli E, Menzies T, Mendes E (2014) Transfer learning in effort estimation. Empir Softw Eng:1–31Google Scholar
  38. Kocaguneli E, Zimmermann T, Bird C, Nagappan N, Menzies T (2013) Distributed development considered harmful?. In: ICSE, pp 882–890Google Scholar
  39. Li Jingzhou, Ruhe Guenther (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. In: International symposium on empirical software engineering, p 74Google Scholar
  40. Li J, Ruhe G (2007) Decision support analysis for software effort estimation by analogy. In: PROMISE ’07: proceedings of the third international workshop on predictor models in software engineering, p 6Google Scholar
  41. Li J, Ruhe G (2008) Analysis of attribute weighting heuristics for analogy-based software effort estimation method aqua+. Empir Softw Eng 13:63–96Google Scholar
  42. Li Y, Xie M, Goh T (2009) A study of the non-linear adjustment for analogy based software cost estimation. Empir Softw Eng:603–643Google Scholar
  43. Lokan C, Mendes E (2006) Cross-company and single-company effort models using the isbsg database: a further replicated study. In: The ACM-IEEE international symposium on empirical software engineering, November 21–22, Rio de JaneiroGoogle Scholar
  44. Lokan C, Mendes E (2009) Applying moving windows to software effort estimation. In: 3rd international symposium on empirical software engineering and measurement, 2009. ESEM 2009, pp 111–122Google Scholar
  45. Menzies T, Butcher A, Cok DR, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834. Available from CrossRefGoogle Scholar
  46. Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng. Available from
  47. Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007) Problems with precision. IEEE Trans Softw Eng.
  48. Menzies T, Kocagüneli E, Minku L, Peters F, Turhan B (2015) Chapter 20 - ensembles of learning machines. In: Sharing data and models in software engineering, pp 239–265Google Scholar
  49. Menzies T, Peters F, Marcus A (2013) Ooops... (errata report for “Better Cross-Company Learning”). In: MSR’13.
  50. Menzies T, Port D, Chen Z, Hihn J, Stukes S (2005) Validation methods for calibrating software effort models. In: Proceedings, ICSE. Available from
  51. Menzies T, Shepperd M (2012) Special issue on repeatable results in software engineering prediction. Empir Softw Eng 17(1–2):1–17CrossRefGoogle Scholar
  52. Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall, LondonCrossRefzbMATHGoogle Scholar
  53. Minku LL, Yao X (2011) A principled evaluation of ensembles of learning machines for software effort estimation, vol 106Google Scholar
  54. Minku LL, Yao X (2013) Ensembles and locality: insight on improving software effort estimation. Inf Softw Technol 55:1512–1528CrossRefGoogle Scholar
  55. Minku LL, Yao X (2014) How to make best use of cross-company data in software effort estimation?. In: ICSE’14, pp 446–456Google Scholar
  56. Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551CrossRefGoogle Scholar
  57. Molokken-Pstvold K, Haugen NC, Benestad HC (2008) Using planning poker for combining expert estimates in software projects. J Syst Softw 81:2106–2117CrossRefGoogle Scholar
  58. Murphy-Hill E, Parnin C, Black AP (2012) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18CrossRefGoogle Scholar
  59. Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391CrossRefGoogle Scholar
  60. Papakroni V (2013) Data carving: identifying and removing irrelevancies in the data. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia UniversityGoogle Scholar
  61. Park R (1988) The central equations of the price software cost model. In: 4th COCOMO users group meetingGoogle Scholar
  62. Passos C, Braun AP, Cruzes DS, Mendonca M (2011) Analyzing the impact of beliefs in software project practices. In: ESEM’11Google Scholar
  63. Popper K R (1963) Conjectures and refutations. Routledge and Kegan PaulGoogle Scholar
  64. Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of ASE’11Google Scholar
  65. Putnam L (1976) A macro-estimating methodology for software development, pp 38–43Google Scholar
  66. Scanniello G, Gravino C, Marcus A, Menzies T (2013) Class level fault prediction using software clustering. In: IEEE/ACM 28th international conference on automated software engineering, (ASE), 2013. IEEE, pp 640–645Google Scholar
  67. Shaw M (2001) The coming-of-age of software architecture research. In: Proceedings of the 23rd international conference on software engineering, ICSE ’01, vol 656. IEEE Computer Society, Washington, DC, USAGoogle Scholar
  68. Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(12). Available from
  69. Shepperd MJ, Macdonell SG (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827CrossRefGoogle Scholar
  70. (2002) Nasa to shut down checkout & launch control system.
  71. Stanley C, Byrne MD (2013) Predicting tags for stackoverflow posts. In: Proceedings of ICCM, vol 2013Google Scholar
  72. Valerdi R (2011) Convergence of expert opinion via the wideband delphi method: an application in cost estimation models. In: Incose International Symposium, Denver, USA. Available from
  73. Walkerden Fiona, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empir Softw Engg 4(2):135–158CrossRefGoogle Scholar
  74. Walston C, Felix C (1977) A method of programming measurement and estimation. IBM Syst J 16(1):54–77CrossRefGoogle Scholar
  75. Whigham PA, Owen CA, Macdonell SG (2015) A baseline model for software effort estimation. ACM Trans Softw Eng Methodol 24(3):20:1–20:11CrossRefGoogle Scholar
  76. Wolverton R (1974) The cost of developing large-scale software. IEEE Trans Comput:615–636Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Tim Menzies
    • 1
    Email author
  • Ye Yang
    • 2
  • George Mathew
    • 1
  • Barry Boehm
    • 3
  • Jairus Hihn
    • 4
  1. 1.CSNorth Carolina State UniversityRaleighUSA
  2. 2.SSEStevens InstituteHobokenUSA
  3. 3.CSUniversity of Southern CaliforniaLos AngelesUSA
  4. 4.Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadenaUSA

Personalised recommendations