Empirical Software Engineering

, Volume 18, Issue 1, pp 1–24 | Cite as

Kernel methods for software effort estimation

Effects of different kernel functions and bandwidths on estimation accuracy
  • Ekrem Kocaguneli
  • Tim Menzies
  • Jacky W. Keung


Analogy based estimation (ABE) generates an effort estimate for a new software project through adaptation of similar past projects (a.k.a. analogies). Majority of ABE methods follow uniform weighting in adaptation procedure. In this research we investigated non-uniform weighting through kernel density estimation. After an extensive experimentation of 19 datasets, 3 evaluation criteria, 5 kernels, 5 bandwidth values and a total of 2090 ABE variants, we found that: (1) non-uniform weighting through kernel methods cannot outperform uniform weighting ABE and (2) kernel type and bandwidth parameters do not produce a definite effect on estimation performance. In summary simple ABE approaches are able to perform better than much more complex approaches. Hence,—provided that similar experimental settings are adopted—we discourage the use of kernel methods as a weighting strategy in ABE.


Effort estimation Data mining Kernel function Bandwidth 


  1. Alpaydin E (2004) Introduction to machine learning. MIT Press, Cambridge, MAGoogle Scholar
  2. Angelis L, Stamelos I (2000) A simulation tool for efficient analogy based cost estimation. Empirical Softw Eng 5:35–68CrossRefGoogle Scholar
  3. Auer M, Trendowicz A, Graser B, Haunschmid E, Biffl S (2006) Optimal project feature weights in analogy-based cost estimation: improvement and limitations. IEEE Trans Softw Eng 32:83–92CrossRefGoogle Scholar
  4. Baker D (2007) A hybrid approach to expert and model-based effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia UniversityGoogle Scholar
  5. Boehm B, Abts C, Chulani S (2000) Software development cost estimation approaches: a survey. Annals Softw Eng 10:177–205zbMATHCrossRefGoogle Scholar
  6. Boehm BW (1981) Software engineering economics. Prentice Hall PTR, Upper Saddle River, NJ, USAzbMATHGoogle Scholar
  7. Briand L, El Emam K, Bomarius F (1998) Cobra: a hybrid method for software cost estimation, benchmarking, and risk assessment.In: Proceedings of the international conference on software engineering, pp 390–399Google Scholar
  8. Briand LC, El Emam K, Surmann D, Wieczorek I, Maxwell KD (1999) An assessment and comparison of common software cost estimation modeling techniques. In: ICSE ’99: proceedings of the 21st international conference on software engineering. ACM, New York, NY, USA, pp 313–322CrossRefGoogle Scholar
  9. Browman HI (1999) Negative results. Mar Ecol Prog Ser 191:301–309CrossRefGoogle Scholar
  10. Chen Z, Menzies T, Port D (2005) Feature subset selection can improve software cost estimation. In: PROMISE’05: proceedings of the international conference on predictor models in software engineeringGoogle Scholar
  11. Cressie NAC (1993) Statistics for spatial data (Wiley series in probability and statistics). Wiley-InterscienceGoogle Scholar
  12. Desharnais J (1989) Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. Master’s thesis, Univ. of MontrealGoogle Scholar
  13. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, 2 edition.Google Scholar
  14. Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995CrossRefGoogle Scholar
  15. Frank E, Hall M, Pfahringer B (2003) Locally weighted naive bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 249–256Google Scholar
  16. Hardle W, Simar L (2003) Applied multivariate statistical analysis. Springer, New YorkGoogle Scholar
  17. Jeffery R, Ruhe M, Wieczorek I (2001) Using public domain metrics to estimate software development effort. In: METRICS ’01: proceedings of the 7th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 16Google Scholar
  18. John G, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 338–345Google Scholar
  19. Jorgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70:37–60CrossRefGoogle Scholar
  20. Kadoda G, Cartwright M, Shepperd M (2000) On configuring a case-based reasoning software project prediction system. UK CBR workshop, Cambridge, UK, pp 1–10Google Scholar
  21. Kemerer C (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429CrossRefGoogle Scholar
  22. Keung J (2008a) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: proceedings of the second ACM-IEEE international symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 294–296CrossRefGoogle Scholar
  23. Keung JW (2008b) Theoretical maximum prediction accuracy for analogy-based software cost estimation. In: 2008 15th Asia-Pacific software engineering conference, pp 495–502Google Scholar
  24. Keung J, Kitchenham B (2008) Experiments with analogy-x for software cost estimation. In: ASWEC ’08: proceedings of the 19th Australian conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 229–238CrossRefGoogle Scholar
  25. Keung J, Kocaguneli E, Menzies T (2011) A ranking stability indicator for selecting the best estimator in software cost estimation. Autom Softw Eng (under second round review).
  26. Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-x: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484CrossRefGoogle Scholar
  27. Kirsopp C, Shepperd M (2003) Case and feature subset selection in case-based software project effort prediction. In: Research and development in intelligent systems XIX: proceedings of ES2002, the twenty-second SGAI international conference on knowledge based systems and applied artificial intelligence, p 61Google Scholar
  28. Kitchenham B, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: PROMISE ’09: proceedings of the 5th international conference on predictor models in software engineering. ACM, New York, NY, USA, pp 1–5CrossRefGoogle Scholar
  29. Kitchenham B, Pickard L, MacDonell S, Shepperd M (2001) What accuracy statistics really measure. IEE Softw 148(3):81–85CrossRefGoogle Scholar
  30. Kläs M, Trendowicz A, Wickenkamp A, Münch J, Kikuchi N, Ishigai Y (2008) The use of simulation techniques for hybrid software cost estimation and risk analysis. Adv Comput 74:115–174CrossRefGoogle Scholar
  31. Kocaguneli E, Menzies T, Bener A, Keung JW (2011) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng. doi: 10.1109/TSE.2011.27 Google Scholar
  32. Kultur Y, Kocaguneli E, Bener A (2009) Domain specific phase by phase effort estimation in software projects. In: ISCIS 2009: 24th international symposium on computer and information sciences, pp 498–503Google Scholar
  33. Li J, Ruhe G (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering, vol 13, pp 63–96Google Scholar
  34. Li J, Ruhe G (2008) Analysis of attribute weighting heuristics for analogy-based software effort estimation method aqua+. Empirical Softw Eng 63–96Google Scholar
  35. Li J, Ruhe G, Al-emran A, Richter MM (2007) A flexible method for software effort estimation by analogy. Empirical Softw Eng 12:65–106CrossRefGoogle Scholar
  36. Li Y, Xie M, Goh T (2009) A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw 82:241–252CrossRefGoogle Scholar
  37. Mendes E, Mosley N (2002) Further investigation into the use of cbr and stepwise regression to predict development effort for web hypermedia applications. In: International symposium on empirical software engineering, pp 79–90Google Scholar
  38. Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34:723–737CrossRefGoogle Scholar
  39. Mendes E, Mosley N, Watson I (2002) A comparison of case-based reasoning approaches. In: WWW ’02: proceedings of the 11th international conference on world wide web. ACM, New York, NY, USA, pp 272–280CrossRefGoogle Scholar
  40. Mendes E, Watson ID, Triggs C, Mosley N, Counsell S (2003) A comparative study of cost estimation models for web hypermedia applications. Empirical Softw Eng 8(2):163–196CrossRefGoogle Scholar
  41. Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32:883–895CrossRefGoogle Scholar
  42. Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007) The business case for automated software engineering. ASE, pp 303–312Google Scholar
  43. Menzies T, Jalali O, Hihn J, Baker D, Lum K (2010) Stable rankings for different effort models. Autom Softw Eng 17:409–437CrossRefGoogle Scholar
  44. Milic D, Wohlin C (2004) Distribution patterns of effort estimations. In: Euromicro conferenceGoogle Scholar
  45. Moløkken-Østvold K, Jørgensen M, Tanilkan SS, Gallis H, Lien AC, Hove SE (2004) A survey on software estimation in the Norwegian industry. In: IEEE international symposium on software metrics, pp 208–219Google Scholar
  46. Pal SK, Shiu SCK (2001) Foundations of soft case-based reasoning. Cambridge University Press, Cambridge, UKCrossRefGoogle Scholar
  47. Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2003) Distributed deviation detection in sensor networks. SIGMOD Rec 32:2003CrossRefGoogle Scholar
  48. Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31:615–624CrossRefGoogle Scholar
  49. Robson C (2002) Real world research: a resource for social scientists and practitioner-researchers. Blackwell Publisher Ltd.Google Scholar
  50. Scheid S (2004) Introduction to kernel smoothing. TalkGoogle Scholar
  51. Scott DW (1992) Multivariate density estimation: theory, practice, and visualization (Wiley series in probability and statistics). Wiley-InterscienceGoogle Scholar
  52. Shepperd M (2007) Software project economics: a roadmap. In: FOSE ’07: future of software engineering, pp 304–315Google Scholar
  53. Shepperd M, Kadoda G (2001) Comparing software prediction models using simulation. IEEE Trans Softw Eng, pp 1014–1022Google Scholar
  54. Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743CrossRefGoogle Scholar
  55. Shepperd M, Schofield C, Kitchenham B (1996) Effort estimation using analogy. In: International conference on software engineering, pp 170–178Google Scholar
  56. Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An empirical validation of the relationship between the magnitude of relative error and project size. In: METRICS ’02: proceedings of the 8th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 3CrossRefGoogle Scholar
  57. Trendowicz A, Heidrich J, Münch J, Ishigai Y, Yokoyama K, Kikuchi N (2006) Development of a hybrid cost estimation model in an iterative manner. In: Proceedings of the 28th international conference on software engineering, ICSE ’06. ACM, New York, NY, USA, pp 331–340CrossRefGoogle Scholar
  58. Walkerden F, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empirival Softw Eng 4(2):135–158CrossRefGoogle Scholar
  59. Wand MP, Jones MC (1994) Kernel smoothing (monographs on statistics and applied probability). Chapman & Hall/CRC, London, UKGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Ekrem Kocaguneli
    • 1
  • Tim Menzies
    • 1
  • Jacky W. Keung
    • 2
  1. 1.Lane Department of Computer Science and Electrical EngineeringWest Virginia UniversityMorgantownUSA
  2. 2.Department of ComputingThe Hong Kong Polytechnic UniversityKowloonHong Kong

Personalised recommendations