Abstract
Analogy based estimation (ABE) generates an effort estimate for a new software project through adaptation of similar past projects (a.k.a. analogies). Majority of ABE methods follow uniform weighting in adaptation procedure. In this research we investigated non-uniform weighting through kernel density estimation. After an extensive experimentation of 19 datasets, 3 evaluation criteria, 5 kernels, 5 bandwidth values and a total of 2090 ABE variants, we found that: (1) non-uniform weighting through kernel methods cannot outperform uniform weighting ABE and (2) kernel type and bandwidth parameters do not produce a definite effect on estimation performance. In summary simple ABE approaches are able to perform better than much more complex approaches. Hence,—provided that similar experimental settings are adopted—we discourage the use of kernel methods as a weighting strategy in ABE.
Similar content being viewed by others
Notes
Note that the effort values stored in software effort datasets are stored in a single column; hence our space is 1-dimensional. In other words, V n in this formula will be just 1-dimensional too which is just the bandwidth value h, i.e. V n = h.
References
Alpaydin E (2004) Introduction to machine learning. MIT Press, Cambridge, MA
Angelis L, Stamelos I (2000) A simulation tool for efficient analogy based cost estimation. Empirical Softw Eng 5:35–68
Auer M, Trendowicz A, Graser B, Haunschmid E, Biffl S (2006) Optimal project feature weights in analogy-based cost estimation: improvement and limitations. IEEE Trans Softw Eng 32:83–92
Baker D (2007) A hybrid approach to expert and model-based effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University
Boehm B, Abts C, Chulani S (2000) Software development cost estimation approaches: a survey. Annals Softw Eng 10:177–205
Boehm BW (1981) Software engineering economics. Prentice Hall PTR, Upper Saddle River, NJ, USA
Briand L, El Emam K, Bomarius F (1998) Cobra: a hybrid method for software cost estimation, benchmarking, and risk assessment.In: Proceedings of the international conference on software engineering, pp 390–399
Briand LC, El Emam K, Surmann D, Wieczorek I, Maxwell KD (1999) An assessment and comparison of common software cost estimation modeling techniques. In: ICSE ’99: proceedings of the 21st international conference on software engineering. ACM, New York, NY, USA, pp 313–322
Browman HI (1999) Negative results. Mar Ecol Prog Ser 191:301–309
Chen Z, Menzies T, Port D (2005) Feature subset selection can improve software cost estimation. In: PROMISE’05: proceedings of the international conference on predictor models in software engineering
Cressie NAC (1993) Statistics for spatial data (Wiley series in probability and statistics). Wiley-Interscience
Desharnais J (1989) Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. Master’s thesis, Univ. of Montreal
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, 2 edition.
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995
Frank E, Hall M, Pfahringer B (2003) Locally weighted naive bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 249–256
Hardle W, Simar L (2003) Applied multivariate statistical analysis. Springer, New York
Jeffery R, Ruhe M, Wieczorek I (2001) Using public domain metrics to estimate software development effort. In: METRICS ’01: proceedings of the 7th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 16
John G, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 338–345
Jorgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70:37–60
Kadoda G, Cartwright M, Shepperd M (2000) On configuring a case-based reasoning software project prediction system. UK CBR workshop, Cambridge, UK, pp 1–10
Kemerer C (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429
Keung J (2008a) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: proceedings of the second ACM-IEEE international symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 294–296
Keung JW (2008b) Theoretical maximum prediction accuracy for analogy-based software cost estimation. In: 2008 15th Asia-Pacific software engineering conference, pp 495–502
Keung J, Kitchenham B (2008) Experiments with analogy-x for software cost estimation. In: ASWEC ’08: proceedings of the 19th Australian conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 229–238
Keung J, Kocaguneli E, Menzies T (2011) A ranking stability indicator for selecting the best estimator in software cost estimation. Autom Softw Eng (under second round review). http://menzies.us/pdf/11draftranking.pdf
Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-x: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484
Kirsopp C, Shepperd M (2003) Case and feature subset selection in case-based software project effort prediction. In: Research and development in intelligent systems XIX: proceedings of ES2002, the twenty-second SGAI international conference on knowledge based systems and applied artificial intelligence, p 61
Kitchenham B, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: PROMISE ’09: proceedings of the 5th international conference on predictor models in software engineering. ACM, New York, NY, USA, pp 1–5
Kitchenham B, Pickard L, MacDonell S, Shepperd M (2001) What accuracy statistics really measure. IEE Softw 148(3):81–85
Kläs M, Trendowicz A, Wickenkamp A, Münch J, Kikuchi N, Ishigai Y (2008) The use of simulation techniques for hybrid software cost estimation and risk analysis. Adv Comput 74:115–174
Kocaguneli E, Menzies T, Bener A, Keung JW (2011) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng. doi:10.1109/TSE.2011.27
Kultur Y, Kocaguneli E, Bener A (2009) Domain specific phase by phase effort estimation in software projects. In: ISCIS 2009: 24th international symposium on computer and information sciences, pp 498–503
Li J, Ruhe G (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering, vol 13, pp 63–96
Li J, Ruhe G (2008) Analysis of attribute weighting heuristics for analogy-based software effort estimation method aqua+. Empirical Softw Eng 63–96
Li J, Ruhe G, Al-emran A, Richter MM (2007) A flexible method for software effort estimation by analogy. Empirical Softw Eng 12:65–106
Li Y, Xie M, Goh T (2009) A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw 82:241–252
Mendes E, Mosley N (2002) Further investigation into the use of cbr and stepwise regression to predict development effort for web hypermedia applications. In: International symposium on empirical software engineering, pp 79–90
Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34:723–737
Mendes E, Mosley N, Watson I (2002) A comparison of case-based reasoning approaches. In: WWW ’02: proceedings of the 11th international conference on world wide web. ACM, New York, NY, USA, pp 272–280
Mendes E, Watson ID, Triggs C, Mosley N, Counsell S (2003) A comparative study of cost estimation models for web hypermedia applications. Empirical Softw Eng 8(2):163–196
Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32:883–895
Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007) The business case for automated software engineering. ASE, pp 303–312
Menzies T, Jalali O, Hihn J, Baker D, Lum K (2010) Stable rankings for different effort models. Autom Softw Eng 17:409–437
Milic D, Wohlin C (2004) Distribution patterns of effort estimations. In: Euromicro conference
Moløkken-Østvold K, Jørgensen M, Tanilkan SS, Gallis H, Lien AC, Hove SE (2004) A survey on software estimation in the Norwegian industry. In: IEEE international symposium on software metrics, pp 208–219
Pal SK, Shiu SCK (2001) Foundations of soft case-based reasoning. Cambridge University Press, Cambridge, UK
Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2003) Distributed deviation detection in sensor networks. SIGMOD Rec 32:2003
Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31:615–624
Robson C (2002) Real world research: a resource for social scientists and practitioner-researchers. Blackwell Publisher Ltd.
Scheid S (2004) Introduction to kernel smoothing. Talk
Scott DW (1992) Multivariate density estimation: theory, practice, and visualization (Wiley series in probability and statistics). Wiley-Interscience
Shepperd M (2007) Software project economics: a roadmap. In: FOSE ’07: future of software engineering, pp 304–315
Shepperd M, Kadoda G (2001) Comparing software prediction models using simulation. IEEE Trans Softw Eng, pp 1014–1022
Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743
Shepperd M, Schofield C, Kitchenham B (1996) Effort estimation using analogy. In: International conference on software engineering, pp 170–178
Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An empirical validation of the relationship between the magnitude of relative error and project size. In: METRICS ’02: proceedings of the 8th international symposium on software metrics. IEEE Computer Society, Washington, DC, USA, p 3
Trendowicz A, Heidrich J, Münch J, Ishigai Y, Yokoyama K, Kikuchi N (2006) Development of a hybrid cost estimation model in an iterative manner. In: Proceedings of the 28th international conference on software engineering, ICSE ’06. ACM, New York, NY, USA, pp 331–340
Walkerden F, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empirival Softw Eng 4(2):135–158
Wand MP, Jones MC (1994) Kernel smoothing (monographs on statistics and applied probability). Chapman & Hall/CRC, London, UK
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: D.H. Rombach
The work was partially funded by NSF grant CCF:1017330 and the Qatar/West Virginia University research grant NPRP 09-12-5-2-470.
Rights and permissions
About this article
Cite this article
Kocaguneli, E., Menzies, T. & Keung, J.W. Kernel methods for software effort estimation. Empir Software Eng 18, 1–24 (2013). https://doi.org/10.1007/s10664-011-9189-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-011-9189-1