Skip to main content
Log in

Kernel methods for software effort estimation

Effects of different kernel functions and bandwidths on estimation accuracy

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Analogy based estimation (ABE) generates an effort estimate for a new software project through adaptation of similar past projects (a.k.a. analogies). Majority of ABE methods follow uniform weighting in adaptation procedure. In this research we investigated non-uniform weighting through kernel density estimation. After an extensive experimentation of 19 datasets, 3 evaluation criteria, 5 kernels, 5 bandwidth values and a total of 2090 ABE variants, we found that: (1) non-uniform weighting through kernel methods cannot outperform uniform weighting ABE and (2) kernel type and bandwidth parameters do not produce a definite effect on estimation performance. In summary simple ABE approaches are able to perform better than much more complex approaches. Hence,—provided that similar experimental settings are adopted—we discourage the use of kernel methods as a weighting strategy in ABE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Note that the effort values stored in software effort datasets are stored in a single column; hence our space is 1-dimensional. In other words, V n in this formula will be just 1-dimensional too which is just the bandwidth value h, i.e. V n  = h.

References

  • Alpaydin E (2004) Introduction to machine learning. MIT Press, Cambridge, MA

    Google Scholar 

  • Angelis L, Stamelos I (2000) A simulation tool for efficient analogy based cost estimation. Empirical Softw Eng 5:35–68

    Article  Google Scholar 

  • Auer M, Trendowicz A, Graser B, Haunschmid E, Biffl S (2006) Optimal project feature weights in analogy-based cost estimation: improvement and limitations. IEEE Trans Softw Eng 32:83–92

    Article  Google Scholar 

  • Baker D (2007) A hybrid approach to expert and model-based effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University

  • Boehm B, Abts C, Chulani S (2000) Software development cost estimation approaches: a survey. Annals Softw Eng 10:177–205

    Article  MATH  Google Scholar 

  • Boehm BW (1981) Software engineering economics. Prentice Hall PTR, Upper Saddle River, NJ, USA

    MATH  Google Scholar 

  • Briand L, El Emam K, Bomarius F (1998) Cobra: a hybrid method for software cost estimation, benchmarking, and risk assessment.In: Proceedings of the international conference on software engineering, pp 390–399

  • Briand LC, El Emam K, Surmann D, Wieczorek I, Maxwell KD (1999) An assessment and comparison of common software cost estimation modeling techniques. In: ICSE ’99: proceedings of the 21st international conference on software engineering. ACM, New York, NY, USA, pp 313–322

    Chapter  Google Scholar 

  • Browman HI (1999) Negative results. Mar Ecol Prog Ser 191:301–309

    Article  Google Scholar 

  • Chen Z, Menzies T, Port D (2005) Feature subset selection can improve software cost estimation. In: PROMISE’05: proceedings of the international conference on predictor models in software engineering

  • Cressie NAC (1993) Statistics for spatial data (Wiley series in probability and statistics). Wiley-Interscience

  • Desharnais J (1989) Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. Master’s thesis, Univ. of Montreal

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, 2 edition.

  • Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995

    Article  Google Scholar 

  • Frank E, Hall M, Pfahringer B (2003) Locally weighted naive bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 249–256

  • Hardle W, Simar L (2003) Applied multivariate statistical analysis. Springer, New York

    Google Scholar 

  • Jeffery R, Ruhe M, Wieczorek I (2001) Using public domain metrics to estimate software development effort. In: METRICS ’01: proceedings of the 7th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 16

    Google Scholar 

  • John G, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 338–345

  • Jorgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70:37–60

    Article  Google Scholar 

  • Kadoda G, Cartwright M, Shepperd M (2000) On configuring a case-based reasoning software project prediction system. UK CBR workshop, Cambridge, UK, pp 1–10

  • Kemerer C (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429

    Article  Google Scholar 

  • Keung J (2008a) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: proceedings of the second ACM-IEEE international symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 294–296

    Chapter  Google Scholar 

  • Keung JW (2008b) Theoretical maximum prediction accuracy for analogy-based software cost estimation. In: 2008 15th Asia-Pacific software engineering conference, pp 495–502

  • Keung J, Kitchenham B (2008) Experiments with analogy-x for software cost estimation. In: ASWEC ’08: proceedings of the 19th Australian conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 229–238

    Chapter  Google Scholar 

  • Keung J, Kocaguneli E, Menzies T (2011) A ranking stability indicator for selecting the best estimator in software cost estimation. Autom Softw Eng (under second round review). http://menzies.us/pdf/11draftranking.pdf

  • Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-x: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484

    Article  Google Scholar 

  • Kirsopp C, Shepperd M (2003) Case and feature subset selection in case-based software project effort prediction. In: Research and development in intelligent systems XIX: proceedings of ES2002, the twenty-second SGAI international conference on knowledge based systems and applied artificial intelligence, p 61

  • Kitchenham B, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: PROMISE ’09: proceedings of the 5th international conference on predictor models in software engineering. ACM, New York, NY, USA, pp 1–5

    Chapter  Google Scholar 

  • Kitchenham B, Pickard L, MacDonell S, Shepperd M (2001) What accuracy statistics really measure. IEE Softw 148(3):81–85

    Article  Google Scholar 

  • Kläs M, Trendowicz A, Wickenkamp A, Münch J, Kikuchi N, Ishigai Y (2008) The use of simulation techniques for hybrid software cost estimation and risk analysis. Adv Comput 74:115–174

    Article  Google Scholar 

  • Kocaguneli E, Menzies T, Bener A, Keung JW (2011) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng. doi:10.1109/TSE.2011.27

    Google Scholar 

  • Kultur Y, Kocaguneli E, Bener A (2009) Domain specific phase by phase effort estimation in software projects. In: ISCIS 2009: 24th international symposium on computer and information sciences, pp 498–503

  • Li J, Ruhe G (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering, vol 13, pp 63–96

  • Li J, Ruhe G (2008) Analysis of attribute weighting heuristics for analogy-based software effort estimation method aqua+. Empirical Softw Eng 63–96

  • Li J, Ruhe G, Al-emran A, Richter MM (2007) A flexible method for software effort estimation by analogy. Empirical Softw Eng 12:65–106

    Article  Google Scholar 

  • Li Y, Xie M, Goh T (2009) A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw 82:241–252

    Article  Google Scholar 

  • Mendes E, Mosley N (2002) Further investigation into the use of cbr and stepwise regression to predict development effort for web hypermedia applications. In: International symposium on empirical software engineering, pp 79–90

  • Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34:723–737

    Article  Google Scholar 

  • Mendes E, Mosley N, Watson I (2002) A comparison of case-based reasoning approaches. In: WWW ’02: proceedings of the 11th international conference on world wide web. ACM, New York, NY, USA, pp 272–280

    Chapter  Google Scholar 

  • Mendes E, Watson ID, Triggs C, Mosley N, Counsell S (2003) A comparative study of cost estimation models for web hypermedia applications. Empirical Softw Eng 8(2):163–196

    Article  Google Scholar 

  • Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32:883–895

    Article  Google Scholar 

  • Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007) The business case for automated software engineering. ASE, pp 303–312

  • Menzies T, Jalali O, Hihn J, Baker D, Lum K (2010) Stable rankings for different effort models. Autom Softw Eng 17:409–437

    Article  Google Scholar 

  • Milic D, Wohlin C (2004) Distribution patterns of effort estimations. In: Euromicro conference

  • Moløkken-Østvold K, Jørgensen M, Tanilkan SS, Gallis H, Lien AC, Hove SE (2004) A survey on software estimation in the Norwegian industry. In: IEEE international symposium on software metrics, pp 208–219

  • Pal SK, Shiu SCK (2001) Foundations of soft case-based reasoning. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  • Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2003) Distributed deviation detection in sensor networks. SIGMOD Rec 32:2003

    Article  Google Scholar 

  • Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31:615–624

    Article  Google Scholar 

  • Robson C (2002) Real world research: a resource for social scientists and practitioner-researchers. Blackwell Publisher Ltd.

  • Scheid S (2004) Introduction to kernel smoothing. Talk

  • Scott DW (1992) Multivariate density estimation: theory, practice, and visualization (Wiley series in probability and statistics). Wiley-Interscience

  • Shepperd M (2007) Software project economics: a roadmap. In: FOSE ’07: future of software engineering, pp 304–315

  • Shepperd M, Kadoda G (2001) Comparing software prediction models using simulation. IEEE Trans Softw Eng, pp 1014–1022

  • Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743

    Article  Google Scholar 

  • Shepperd M, Schofield C, Kitchenham B (1996) Effort estimation using analogy. In: International conference on software engineering, pp 170–178

  • Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An empirical validation of the relationship between the magnitude of relative error and project size. In: METRICS ’02: proceedings of the 8th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 3

    Chapter  Google Scholar 

  • Trendowicz A, Heidrich J, Münch J, Ishigai Y, Yokoyama K, Kikuchi N (2006) Development of a hybrid cost estimation model in an iterative manner. In: Proceedings of the 28th international conference on software engineering, ICSE ’06. ACM, New York, NY, USA, pp 331–340

    Chapter  Google Scholar 

  • Walkerden F, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empirival Softw Eng 4(2):135–158

    Article  Google Scholar 

  • Wand MP, Jones MC (1994) Kernel smoothing (monographs on statistics and applied probability). Chapman & Hall/CRC, London, UK

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ekrem Kocaguneli.

Additional information

Editor: D.H. Rombach

The work was partially funded by NSF grant CCF:1017330 and the Qatar/West Virginia University research grant NPRP 09-12-5-2-470.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kocaguneli, E., Menzies, T. & Keung, J.W. Kernel methods for software effort estimation. Empir Software Eng 18, 1–24 (2013). https://doi.org/10.1007/s10664-011-9189-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-011-9189-1

Keywords

Navigation