Skip to main content

Case and Feature Subset Selection in Case-Based Software Project Effort Prediction

  • Conference paper

Abstract

Prediction systems adopting a case-based reasoning (CBR) approach have been widely advocated. However, as with most machine learning techniques, feature and case subset selection can be extremely influential on the quality of the predictions generated. Unfortunately, both are NP-hard search problems which are intractable for non-trivial data sets. Using all features frequently leads to poor prediction accuracy and pre-processing methods (filters) have not generally been effective. In this paper we consider two different real world project effort data sets. We describe how using simple search techniques, such as hill climbing and sequential selection, can achieve major improvements in accuracy. We conclude that, for our data sets, forward sequential selection, for features, followed by backward sequential selection, for cases, is the most effective approach when exhaustive searching is not possible.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jeffery R, Ruhe M, Wieczorek I. Using public domain metrics to estimate software development effort, presented at 7th IEEE Intl. Metrics Symp., London, 2001.

    Google Scholar 

  2. Boehm BW. Software engineering economics, IEEE Transactions on Software Engineering, vol. 10, pp. 4 – 21, 1984.

    Article  Google Scholar 

  3. Putnam LH. A general empirical solution to the macro software sizing and estimating problem, IEEE Transactions on Software Engineering, vol. 4, pp. 345 – 361, 1978.

    Article  Google Scholar 

  4. Kok P, Kitchenham BA, Kirakowski J. The MERMAID approach to software cost estimation, presented at Esprit Technical Week, 1990.

    Google Scholar 

  5. Selby RW & Porter AA. Learning from examples: generation and evaluation of decision trees for software resource analysis, IEEE Transactions on Software Engineering, vol. 14, pp. 743 – 757, 1988.

    Article  Google Scholar 

  6. Finnie GR, Wittig GE, Desharnais J-M. A comparison of software effort estimation techniques using function points with neural networks, case based reasoning and regression models, J. of Systems Software, vol. 39, pp. 281 – 289, 1997.

    Article  Google Scholar 

  7. Burgess CJ & Lefley M. Can genetic programming improve software effort estimation? A comparative evaluation, Information & Software Technology, vol. 43, pp. 863 – 873, 2001.

    Article  Google Scholar 

  8. Dolado JJ. On the problem of the software cost function, Information & Software Technology, vol. 43, pp. 61 – 72, 2001.

    Article  Google Scholar 

  9. Shepperd MJ & Schofield C. Effort estimation by analogy: a case study, presented at 7th European Software Control and Metrics Conference, Wilmslow, UK, 1996.

    Google Scholar 

  10. Aamodt A & Plaza E. Case-based reasoning: foundational issues, methodical variations and system approaches, AI Communications, vol. 7, 1994.

    Google Scholar 

  11. Aha DW & Bankert RL. A comparative evaluation of sequential feature selection algorithms, in Artificial Intelligence and Statistics V., D. Fisher and J-H. Lenz, Eds. New York: Springer-Verlag, 1996.

    Google Scholar 

  12. Kolodner JL. Case-Based Reasoning: Morgan-Kaufmann, 1993.

    Google Scholar 

  13. Kirsopp C & Shepperd MJ. Making inferences with small numbers of training sets, presented at Intl. Conf, on Empirical Assessment of Software Engineering, Keele Univ, UK, 2002.

    Google Scholar 

  14. Shepperd MJ & Schofield C. Estimating software project effort using analogies, IEEE Transactions on Software Engineering, vol. 23, pp. 736 – 743, 1997.

    Article  Google Scholar 

  15. Niessink F & van Vliet H. Predicting maintenance effort with function points, presented at Intl. Conf, on Softw. Maint, Bari, Italy, 1997.

    Google Scholar 

  16. Mendes E, Counsell S, Mosley N. Measurement and effort prediction of web applications, presented at 2nd ICSE Workshop on Web Engineering, Limerick, Ireland, 2000.

    Google Scholar 

  17. Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modeling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000.

    Google Scholar 

  18. Kohavi R & John GH. Wrappers for feature selection for machine learning, Artificial Intelligence, vol. 97, pp. 273 – 324, 1997.

    Article  MATH  Google Scholar 

  19. Skalak DB. Prototype and feature selection by sampling and random mutation hill climbing algorithms, presented at 11 th Intl. Machine Learning Conf. (ICML-94), 1994.

    Google Scholar 

  20. Debuse JCW & Rayward-Smith VJ. Feature subset selection within a simulated annealing data mining algorithm, J. of Intelligent Information Systems, vol. 9, pp. 57 – 81, 1997.

    Article  Google Scholar 

  21. Kirsopp C, Shepperd MJ, Hart J. Search heuristics, case-based reasoning and software project effort prediction, presented at GECCO 2002: Genetic and Evolutionary Computation Conf., New York, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag London Limited

About this paper

Cite this paper

Kirsopp, C., Shepperd, M. (2003). Case and Feature Subset Selection in Case-Based Software Project Effort Prediction. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XIX. Springer, London. https://doi.org/10.1007/978-1-4471-0651-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0651-7_5

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-674-5

  • Online ISBN: 978-1-4471-0651-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics