Case and Feature Subset Selection in Case-Based Software Project Effort Prediction

  • Colin Kirsopp
  • Martin Shepperd


Prediction systems adopting a case-based reasoning (CBR) approach have been widely advocated. However, as with most machine learning techniques, feature and case subset selection can be extremely influential on the quality of the predictions generated. Unfortunately, both are NP-hard search problems which are intractable for non-trivial data sets. Using all features frequently leads to poor prediction accuracy and pre-processing methods (filters) have not generally been effective. In this paper we consider two different real world project effort data sets. We describe how using simple search techniques, such as hill climbing and sequential selection, can achieve major improvements in accuracy. We conclude that, for our data sets, forward sequential selection, for features, followed by backward sequential selection, for cases, is the most effective approach when exhaustive searching is not possible.


Feature Selection Feature Subset Case Selection Subset Selection Hill Climbing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. l.
    Jeffery R, Ruhe M, Wieczorek I. Using public domain metrics to estimate software development effort, presented at 7th IEEE Intl. Metrics Symp., London, 2001.Google Scholar
  2. 2.
    Boehm BW. Software engineering economics, IEEE Transactions on Software Engineering, vol. 10, pp. 4 – 21, 1984.CrossRefGoogle Scholar
  3. 3.
    Putnam LH. A general empirical solution to the macro software sizing and estimating problem, IEEE Transactions on Software Engineering, vol. 4, pp. 345 – 361, 1978.CrossRefGoogle Scholar
  4. 4.
    Kok P, Kitchenham BA, Kirakowski J. The MERMAID approach to software cost estimation, presented at Esprit Technical Week, 1990.Google Scholar
  5. 5.
    Selby RW & Porter AA. Learning from examples: generation and evaluation of decision trees for software resource analysis, IEEE Transactions on Software Engineering, vol. 14, pp. 743 – 757, 1988.CrossRefGoogle Scholar
  6. 6.
    Finnie GR, Wittig GE, Desharnais J-M. A comparison of software effort estimation techniques using function points with neural networks, case based reasoning and regression models, J. of Systems Software, vol. 39, pp. 281 – 289, 1997.CrossRefGoogle Scholar
  7. 7.
    Burgess CJ & Lefley M. Can genetic programming improve software effort estimation? A comparative evaluation, Information & Software Technology, vol. 43, pp. 863 – 873, 2001.CrossRefGoogle Scholar
  8. 8.
    Dolado JJ. On the problem of the software cost function, Information & Software Technology, vol. 43, pp. 61 – 72, 2001.CrossRefGoogle Scholar
  9. 9.
    Shepperd MJ & Schofield C. Effort estimation by analogy: a case study, presented at 7th European Software Control and Metrics Conference, Wilmslow, UK, 1996.Google Scholar
  10. 10.
    Aamodt A & Plaza E. Case-based reasoning: foundational issues, methodical variations and system approaches, AI Communications, vol. 7, 1994.Google Scholar
  11. 11.
    Aha DW & Bankert RL. A comparative evaluation of sequential feature selection algorithms, in Artificial Intelligence and Statistics V., D. Fisher and J-H. Lenz, Eds. New York: Springer-Verlag, 1996.Google Scholar
  12. 12.
    Kolodner JL. Case-Based Reasoning: Morgan-Kaufmann, 1993.Google Scholar
  13. 13.
    Kirsopp C & Shepperd MJ. Making inferences with small numbers of training sets, presented at Intl. Conf, on Empirical Assessment of Software Engineering, Keele Univ, UK, 2002.Google Scholar
  14. 14.
    Shepperd MJ & Schofield C. Estimating software project effort using analogies, IEEE Transactions on Software Engineering, vol. 23, pp. 736 – 743, 1997.CrossRefGoogle Scholar
  15. 15.
    Niessink F & van Vliet H. Predicting maintenance effort with function points, presented at Intl. Conf, on Softw. Maint, Bari, Italy, 1997.Google Scholar
  16. 16.
    Mendes E, Counsell S, Mosley N. Measurement and effort prediction of web applications, presented at 2nd ICSE Workshop on Web Engineering, Limerick, Ireland, 2000.Google Scholar
  17. 17.
    Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modeling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000.Google Scholar
  18. 18.
    Kohavi R & John GH. Wrappers for feature selection for machine learning, Artificial Intelligence, vol. 97, pp. 273 – 324, 1997.MATHCrossRefGoogle Scholar
  19. 19.
    Skalak DB. Prototype and feature selection by sampling and random mutation hill climbing algorithms, presented at 11 th Intl. Machine Learning Conf. (ICML-94), 1994.Google Scholar
  20. 20.
    Debuse JCW & Rayward-Smith VJ. Feature subset selection within a simulated annealing data mining algorithm, J. of Intelligent Information Systems, vol. 9, pp. 57 – 81, 1997.CrossRefGoogle Scholar
  21. 21.
    Kirsopp C, Shepperd MJ, Hart J. Search heuristics, case-based reasoning and software project effort prediction, presented at GECCO 2002: Genetic and Evolutionary Computation Conf., New York, 2002.Google Scholar

Copyright information

© Springer-Verlag London Limited 2003

Authors and Affiliations

  • Colin Kirsopp
    • 1
  • Martin Shepperd
    • 1
  1. 1.Empirical Software Engineering Research Group School of DesignEngineering and Computing Bournemouth UniversityBournemouthUK

Personalised recommendations