Skip to main content

Kernel methods for software effort estimation

Effects of different kernel functions and bandwidths on estimation accuracy


Analogy based estimation (ABE) generates an effort estimate for a new software project through adaptation of similar past projects (a.k.a. analogies). Majority of ABE methods follow uniform weighting in adaptation procedure. In this research we investigated non-uniform weighting through kernel density estimation. After an extensive experimentation of 19 datasets, 3 evaluation criteria, 5 kernels, 5 bandwidth values and a total of 2090 ABE variants, we found that: (1) non-uniform weighting through kernel methods cannot outperform uniform weighting ABE and (2) kernel type and bandwidth parameters do not produce a definite effect on estimation performance. In summary simple ABE approaches are able to perform better than much more complex approaches. Hence,—provided that similar experimental settings are adopted—we discourage the use of kernel methods as a weighting strategy in ABE.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    Note that the effort values stored in software effort datasets are stored in a single column; hence our space is 1-dimensional. In other words, V n in this formula will be just 1-dimensional too which is just the bandwidth value h, i.e. V n  = h.


  1. Alpaydin E (2004) Introduction to machine learning. MIT Press, Cambridge, MA

    Google Scholar 

  2. Angelis L, Stamelos I (2000) A simulation tool for efficient analogy based cost estimation. Empirical Softw Eng 5:35–68

    Article  Google Scholar 

  3. Auer M, Trendowicz A, Graser B, Haunschmid E, Biffl S (2006) Optimal project feature weights in analogy-based cost estimation: improvement and limitations. IEEE Trans Softw Eng 32:83–92

    Article  Google Scholar 

  4. Baker D (2007) A hybrid approach to expert and model-based effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University

  5. Boehm B, Abts C, Chulani S (2000) Software development cost estimation approaches: a survey. Annals Softw Eng 10:177–205

    MATH  Article  Google Scholar 

  6. Boehm BW (1981) Software engineering economics. Prentice Hall PTR, Upper Saddle River, NJ, USA

    MATH  Google Scholar 

  7. Briand L, El Emam K, Bomarius F (1998) Cobra: a hybrid method for software cost estimation, benchmarking, and risk assessment.In: Proceedings of the international conference on software engineering, pp 390–399

  8. Briand LC, El Emam K, Surmann D, Wieczorek I, Maxwell KD (1999) An assessment and comparison of common software cost estimation modeling techniques. In: ICSE ’99: proceedings of the 21st international conference on software engineering. ACM, New York, NY, USA, pp 313–322

    Chapter  Google Scholar 

  9. Browman HI (1999) Negative results. Mar Ecol Prog Ser 191:301–309

    Article  Google Scholar 

  10. Chen Z, Menzies T, Port D (2005) Feature subset selection can improve software cost estimation. In: PROMISE’05: proceedings of the international conference on predictor models in software engineering

  11. Cressie NAC (1993) Statistics for spatial data (Wiley series in probability and statistics). Wiley-Interscience

  12. Desharnais J (1989) Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. Master’s thesis, Univ. of Montreal

  13. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, 2 edition.

  14. Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995

    Article  Google Scholar 

  15. Frank E, Hall M, Pfahringer B (2003) Locally weighted naive bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 249–256

  16. Hardle W, Simar L (2003) Applied multivariate statistical analysis. Springer, New York

    Google Scholar 

  17. Jeffery R, Ruhe M, Wieczorek I (2001) Using public domain metrics to estimate software development effort. In: METRICS ’01: proceedings of the 7th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 16

    Google Scholar 

  18. John G, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 338–345

  19. Jorgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70:37–60

    Article  Google Scholar 

  20. Kadoda G, Cartwright M, Shepperd M (2000) On configuring a case-based reasoning software project prediction system. UK CBR workshop, Cambridge, UK, pp 1–10

  21. Kemerer C (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429

    Article  Google Scholar 

  22. Keung J (2008a) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: proceedings of the second ACM-IEEE international symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 294–296

    Chapter  Google Scholar 

  23. Keung JW (2008b) Theoretical maximum prediction accuracy for analogy-based software cost estimation. In: 2008 15th Asia-Pacific software engineering conference, pp 495–502

  24. Keung J, Kitchenham B (2008) Experiments with analogy-x for software cost estimation. In: ASWEC ’08: proceedings of the 19th Australian conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 229–238

    Chapter  Google Scholar 

  25. Keung J, Kocaguneli E, Menzies T (2011) A ranking stability indicator for selecting the best estimator in software cost estimation. Autom Softw Eng (under second round review).

  26. Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-x: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484

    Article  Google Scholar 

  27. Kirsopp C, Shepperd M (2003) Case and feature subset selection in case-based software project effort prediction. In: Research and development in intelligent systems XIX: proceedings of ES2002, the twenty-second SGAI international conference on knowledge based systems and applied artificial intelligence, p 61

  28. Kitchenham B, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: PROMISE ’09: proceedings of the 5th international conference on predictor models in software engineering. ACM, New York, NY, USA, pp 1–5

    Chapter  Google Scholar 

  29. Kitchenham B, Pickard L, MacDonell S, Shepperd M (2001) What accuracy statistics really measure. IEE Softw 148(3):81–85

    Article  Google Scholar 

  30. Kläs M, Trendowicz A, Wickenkamp A, Münch J, Kikuchi N, Ishigai Y (2008) The use of simulation techniques for hybrid software cost estimation and risk analysis. Adv Comput 74:115–174

    Article  Google Scholar 

  31. Kocaguneli E, Menzies T, Bener A, Keung JW (2011) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng. doi:10.1109/TSE.2011.27

    Google Scholar 

  32. Kultur Y, Kocaguneli E, Bener A (2009) Domain specific phase by phase effort estimation in software projects. In: ISCIS 2009: 24th international symposium on computer and information sciences, pp 498–503

  33. Li J, Ruhe G (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering, vol 13, pp 63–96

  34. Li J, Ruhe G (2008) Analysis of attribute weighting heuristics for analogy-based software effort estimation method aqua+. Empirical Softw Eng 63–96

  35. Li J, Ruhe G, Al-emran A, Richter MM (2007) A flexible method for software effort estimation by analogy. Empirical Softw Eng 12:65–106

    Article  Google Scholar 

  36. Li Y, Xie M, Goh T (2009) A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw 82:241–252

    Article  Google Scholar 

  37. Mendes E, Mosley N (2002) Further investigation into the use of cbr and stepwise regression to predict development effort for web hypermedia applications. In: International symposium on empirical software engineering, pp 79–90

  38. Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34:723–737

    Article  Google Scholar 

  39. Mendes E, Mosley N, Watson I (2002) A comparison of case-based reasoning approaches. In: WWW ’02: proceedings of the 11th international conference on world wide web. ACM, New York, NY, USA, pp 272–280

    Chapter  Google Scholar 

  40. Mendes E, Watson ID, Triggs C, Mosley N, Counsell S (2003) A comparative study of cost estimation models for web hypermedia applications. Empirical Softw Eng 8(2):163–196

    Article  Google Scholar 

  41. Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32:883–895

    Article  Google Scholar 

  42. Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007) The business case for automated software engineering. ASE, pp 303–312

  43. Menzies T, Jalali O, Hihn J, Baker D, Lum K (2010) Stable rankings for different effort models. Autom Softw Eng 17:409–437

    Article  Google Scholar 

  44. Milic D, Wohlin C (2004) Distribution patterns of effort estimations. In: Euromicro conference

  45. Moløkken-Østvold K, Jørgensen M, Tanilkan SS, Gallis H, Lien AC, Hove SE (2004) A survey on software estimation in the Norwegian industry. In: IEEE international symposium on software metrics, pp 208–219

  46. Pal SK, Shiu SCK (2001) Foundations of soft case-based reasoning. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  47. Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2003) Distributed deviation detection in sensor networks. SIGMOD Rec 32:2003

    Article  Google Scholar 

  48. Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31:615–624

    Article  Google Scholar 

  49. Robson C (2002) Real world research: a resource for social scientists and practitioner-researchers. Blackwell Publisher Ltd.

  50. Scheid S (2004) Introduction to kernel smoothing. Talk

  51. Scott DW (1992) Multivariate density estimation: theory, practice, and visualization (Wiley series in probability and statistics). Wiley-Interscience

  52. Shepperd M (2007) Software project economics: a roadmap. In: FOSE ’07: future of software engineering, pp 304–315

  53. Shepperd M, Kadoda G (2001) Comparing software prediction models using simulation. IEEE Trans Softw Eng, pp 1014–1022

  54. Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743

    Article  Google Scholar 

  55. Shepperd M, Schofield C, Kitchenham B (1996) Effort estimation using analogy. In: International conference on software engineering, pp 170–178

  56. Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An empirical validation of the relationship between the magnitude of relative error and project size. In: METRICS ’02: proceedings of the 8th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 3

    Chapter  Google Scholar 

  57. Trendowicz A, Heidrich J, Münch J, Ishigai Y, Yokoyama K, Kikuchi N (2006) Development of a hybrid cost estimation model in an iterative manner. In: Proceedings of the 28th international conference on software engineering, ICSE ’06. ACM, New York, NY, USA, pp 331–340

    Chapter  Google Scholar 

  58. Walkerden F, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empirival Softw Eng 4(2):135–158

    Article  Google Scholar 

  59. Wand MP, Jones MC (1994) Kernel smoothing (monographs on statistics and applied probability). Chapman & Hall/CRC, London, UK

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Ekrem Kocaguneli.

Additional information

The work was partially funded by NSF grant CCF:1017330 and the Qatar/West Virginia University research grant NPRP 09-12-5-2-470.

Editor: D.H. Rombach

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kocaguneli, E., Menzies, T. & Keung, J.W. Kernel methods for software effort estimation. Empir Software Eng 18, 1–24 (2013).

Download citation


  • Effort estimation
  • Data mining
  • Kernel function
  • Bandwidth