Abstract
Prediction systems adopting a case-based reasoning (CBR) approach have been widely advocated. However, as with most machine learning techniques, feature and case subset selection can be extremely influential on the quality of the predictions generated. Unfortunately, both are NP-hard search problems which are intractable for non-trivial data sets. Using all features frequently leads to poor prediction accuracy and pre-processing methods (filters) have not generally been effective. In this paper we consider two different real world project effort data sets. We describe how using simple search techniques, such as hill climbing and sequential selection, can achieve major improvements in accuracy. We conclude that, for our data sets, forward sequential selection, for features, followed by backward sequential selection, for cases, is the most effective approach when exhaustive searching is not possible.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jeffery R, Ruhe M, Wieczorek I. Using public domain metrics to estimate software development effort, presented at 7th IEEE Intl. Metrics Symp., London, 2001.
Boehm BW. Software engineering economics, IEEE Transactions on Software Engineering, vol. 10, pp. 4 – 21, 1984.
Putnam LH. A general empirical solution to the macro software sizing and estimating problem, IEEE Transactions on Software Engineering, vol. 4, pp. 345 – 361, 1978.
Kok P, Kitchenham BA, Kirakowski J. The MERMAID approach to software cost estimation, presented at Esprit Technical Week, 1990.
Selby RW & Porter AA. Learning from examples: generation and evaluation of decision trees for software resource analysis, IEEE Transactions on Software Engineering, vol. 14, pp. 743 – 757, 1988.
Finnie GR, Wittig GE, Desharnais J-M. A comparison of software effort estimation techniques using function points with neural networks, case based reasoning and regression models, J. of Systems Software, vol. 39, pp. 281 – 289, 1997.
Burgess CJ & Lefley M. Can genetic programming improve software effort estimation? A comparative evaluation, Information & Software Technology, vol. 43, pp. 863 – 873, 2001.
Dolado JJ. On the problem of the software cost function, Information & Software Technology, vol. 43, pp. 61 – 72, 2001.
Shepperd MJ & Schofield C. Effort estimation by analogy: a case study, presented at 7th European Software Control and Metrics Conference, Wilmslow, UK, 1996.
Aamodt A & Plaza E. Case-based reasoning: foundational issues, methodical variations and system approaches, AI Communications, vol. 7, 1994.
Aha DW & Bankert RL. A comparative evaluation of sequential feature selection algorithms, in Artificial Intelligence and Statistics V., D. Fisher and J-H. Lenz, Eds. New York: Springer-Verlag, 1996.
Kolodner JL. Case-Based Reasoning: Morgan-Kaufmann, 1993.
Kirsopp C & Shepperd MJ. Making inferences with small numbers of training sets, presented at Intl. Conf, on Empirical Assessment of Software Engineering, Keele Univ, UK, 2002.
Shepperd MJ & Schofield C. Estimating software project effort using analogies, IEEE Transactions on Software Engineering, vol. 23, pp. 736 – 743, 1997.
Niessink F & van Vliet H. Predicting maintenance effort with function points, presented at Intl. Conf, on Softw. Maint, Bari, Italy, 1997.
Mendes E, Counsell S, Mosley N. Measurement and effort prediction of web applications, presented at 2nd ICSE Workshop on Web Engineering, Limerick, Ireland, 2000.
Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modeling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000.
Kohavi R & John GH. Wrappers for feature selection for machine learning, Artificial Intelligence, vol. 97, pp. 273 – 324, 1997.
Skalak DB. Prototype and feature selection by sampling and random mutation hill climbing algorithms, presented at 11 th Intl. Machine Learning Conf. (ICML-94), 1994.
Debuse JCW & Rayward-Smith VJ. Feature subset selection within a simulated annealing data mining algorithm, J. of Intelligent Information Systems, vol. 9, pp. 57 – 81, 1997.
Kirsopp C, Shepperd MJ, Hart J. Search heuristics, case-based reasoning and software project effort prediction, presented at GECCO 2002: Genetic and Evolutionary Computation Conf., New York, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag London Limited
About this paper
Cite this paper
Kirsopp, C., Shepperd, M. (2003). Case and Feature Subset Selection in Case-Based Software Project Effort Prediction. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XIX. Springer, London. https://doi.org/10.1007/978-1-4471-0651-7_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0651-7_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-674-5
Online ISBN: 978-1-4471-0651-7
eBook Packages: Springer Book Archive