Case and Feature Subset Selection in Case-Based Software Project Effort Prediction
Prediction systems adopting a case-based reasoning (CBR) approach have been widely advocated. However, as with most machine learning techniques, feature and case subset selection can be extremely influential on the quality of the predictions generated. Unfortunately, both are NP-hard search problems which are intractable for non-trivial data sets. Using all features frequently leads to poor prediction accuracy and pre-processing methods (filters) have not generally been effective. In this paper we consider two different real world project effort data sets. We describe how using simple search techniques, such as hill climbing and sequential selection, can achieve major improvements in accuracy. We conclude that, for our data sets, forward sequential selection, for features, followed by backward sequential selection, for cases, is the most effective approach when exhaustive searching is not possible.
KeywordsFeature Selection Feature Subset Case Selection Subset Selection Hill Climbing
Unable to display preview. Download preview PDF.
- l.Jeffery R, Ruhe M, Wieczorek I. Using public domain metrics to estimate software development effort, presented at 7th IEEE Intl. Metrics Symp., London, 2001.Google Scholar
- 4.Kok P, Kitchenham BA, Kirakowski J. The MERMAID approach to software cost estimation, presented at Esprit Technical Week, 1990.Google Scholar
- 9.Shepperd MJ & Schofield C. Effort estimation by analogy: a case study, presented at 7th European Software Control and Metrics Conference, Wilmslow, UK, 1996.Google Scholar
- 10.Aamodt A & Plaza E. Case-based reasoning: foundational issues, methodical variations and system approaches, AI Communications, vol. 7, 1994.Google Scholar
- 11.Aha DW & Bankert RL. A comparative evaluation of sequential feature selection algorithms, in Artificial Intelligence and Statistics V., D. Fisher and J-H. Lenz, Eds. New York: Springer-Verlag, 1996.Google Scholar
- 12.Kolodner JL. Case-Based Reasoning: Morgan-Kaufmann, 1993.Google Scholar
- 13.Kirsopp C & Shepperd MJ. Making inferences with small numbers of training sets, presented at Intl. Conf, on Empirical Assessment of Software Engineering, Keele Univ, UK, 2002.Google Scholar
- 15.Niessink F & van Vliet H. Predicting maintenance effort with function points, presented at Intl. Conf, on Softw. Maint, Bari, Italy, 1997.Google Scholar
- 16.Mendes E, Counsell S, Mosley N. Measurement and effort prediction of web applications, presented at 2nd ICSE Workshop on Web Engineering, Limerick, Ireland, 2000.Google Scholar
- 17.Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modeling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000.Google Scholar
- 19.Skalak DB. Prototype and feature selection by sampling and random mutation hill climbing algorithms, presented at 11 th Intl. Machine Learning Conf. (ICML-94), 1994.Google Scholar
- 21.Kirsopp C, Shepperd MJ, Hart J. Search heuristics, case-based reasoning and software project effort prediction, presented at GECCO 2002: Genetic and Evolutionary Computation Conf., New York, 2002.Google Scholar