Abstract
This paper investigates the use of various techniques including genetic programming, with public data sets, to attempt to model and hence estimate software project effort. The main research question is whether genetic programs can offer ‘better’ solution search using public domain metrics rather than company specific ones. Unlike most previous research, a realistic approach is taken, whereby predictions are made on the basis of the data available at a given date. Experiments are reported, designed to assess the accuracy of estimates made using data within and beyond a specific company. This research also offers insights into genetic programming’s performance, relative to alternative methods, as a problem solver in this domain. The results do not find a clear winner but, for this data, GP performs consistently well, but is harder to configure and produces more complex models. The evidence here agrees with other researchers that companies would do well to base estimates on in house data rather than incorporating public data sets. The complexity of the GP must be weighed against the small increases in accuracy to decide whether to use it as part of any effort prediction estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francome, Genetic Programming: An introduction. San Mateo, CA: Morgan Kaufmann, 1998.
R. Bisio and F. Malabocchia, “Cost estimation of software projects through case base reasoning,” presented at 1st Intl. Conf. on Case-Based Reasoning Research & Development, 1995.
J. Bode, “Neural networks for cost estimation,” Cost Engineering, vol. 40, pp. 25–30, 1998.
B. W. Boehm, Software Engineering Economics. Englewood Cliffs, N.J.: Prentice-Hall, 1981.
C. J. Burgess and M. Lefley, “Can genetic programming improve software effort estimation? A comparative evaluation,” Information & Software Technology, vol. 43, pp. 863–873, 2001.
J. J. Dolado, “Limits to methods in software cost estimation,” presented at 1st Intl. Workshop on Soft Computing Applied to Software Engineering, Limerick, Ireland, 1999.
J. J. Dolado, “On the problem of the software cost function,” Information & Software Technology, vol. 43, pp. 61–72, 2001.
S. Drummond, “Measuring applications development performance,” in Datamation, vol. 31, 1985, pp. 102–8.
G. R. Finnie, G. E. Wittig, and J.-M. Desharnais, “Estimating software development effort with case-based reasoning,” presented at 2nd Intl. Conf. on Case-Based Reasoning, 1997.
S. Huang and Y. Huang, “Bounds on the number of hidden neurons,” IEEE Trans. on Neural Networks, vol. 2, pp. 47–55, 1991.
R. Jeffery, M. Ruhe, and I. Wieczorek, “Using public domain metrics to estimate software development effort,” presented at 7th IEEE Intl. Metrics Symp., London, 2001.
C. F. Kemerer, “An empirical validation of software cost estimation models,” Communications of the ACM, vol. 30, pp. 416–429, 1987.
B. A. Kitchenham, S. G. MacDonell, L. Pickard, and M. J. Shepperd, “What accuracy statistics really measure,” IEE Proceedings-Software Engineering, vol. 148, pp. 81–85, 2001.
B. A. Kitchenham and N. R. Taylor, “Software cost models,” ICL Technical Journal, vol. 4, pp. 73–102, 1984.
P. Kok, B. A. Kitchenham, and J. Kirakowski, “The MERMAID approach to software cost estimation,” presented at Esprit Technical Week, 1990.
J. R. Koza, Genetic programming: On the programming of computers by means of natural selection. Cambridge, MA: MIT Press, 1992.
J. R. Koza, Genetic Programming II: Automatic discovery of reusable programs: MIT Press, 1994.
J. R. Koza, F. H. Bennett, D. Andre, M. A. Keane, and (). Genetic Programming III: Darwinian Invention and Problem Solving. San Mateo, CA: Morgan Kaufmann, 1999.
M. Lefley and T. Kinsella, “Investigating neural network efficiency and structure by weight investigation,” presented at European Symp. on Intelligent Technologies, Germany, 2000.
C. Mair, G. Kadoda, M. Lefley, K. Phalp, C. Schofield, M. Shepperd, and S. Webster, “An investigation of machine learning based prediction systems,” J. of Systems Software, vol. 53, pp. 23–29, 2000.
K. Maxwell, L. Van Wassenhove, and S. Dutta, “Performance evaluation of general and company specific models in software development effort estimation,” Management Science, vol. 45, pp. 787–803, 1999.
M. J. Shepperd and C. Schofield, “Estimating software project effort using analogies,” IEEE Transactions on Software Engineering, vol. 23, pp. 736–743, 1997.
M. J. Shepperd, C. Schofield, and B. A. Kitchenham, “Effort estimation using analogy,” presented at 18th Intl. Conf. on Softw. Eng., Berlin, 1996.
K. K. Shukla, “Neuro-genetic prediction of software development effort,” Information & Software Technology, vol. 42, pp. 701–713, 2000.
E. Stensrud and I. Myrtveit, “Human performance estimating with analogy and regression models: an empirical validation,” presented at 5th Intl. Metrics Symp., Bethesda, MD, 1998.
S. Vicinanza, M. J. Prietula, and T. Mukhopadhyay, “Case-based reasoning in effort estimation,” presented at 11th Intl. Conf. on Info. Syst., 1990.
S. Walczak and N. Cerpa, “Heuristic principles for the design of artificial neural networks,” Information & Software Technology, vol. 41, pp. 107–117, 1999.
G. Wittig and G. Finnie, “Estimating software development effort with connectionists models,” Information & Software Technology, vol. 39, pp. 469–476, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lefley, M., Shepperd, M.J. (2003). Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets. In: Cantú-Paz, E., et al. Genetic and Evolutionary Computation — GECCO 2003. GECCO 2003. Lecture Notes in Computer Science, vol 2724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45110-2_151
Download citation
DOI: https://doi.org/10.1007/3-540-45110-2_151
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40603-7
Online ISBN: 978-3-540-45110-5
eBook Packages: Springer Book Archive