Case and Feature Subset Selection in Case-Based Software Project Effort Prediction

Kirsopp, Colin; Shepperd, Martin

doi:10.1007/978-1-4471-0651-7_5

Case and Feature Subset Selection in Case-Based Software Project Effort Prediction

Colin Kirsopp⁴ &
Martin Shepperd⁴

Conference paper

99 Accesses
19 Citations

Abstract

Prediction systems adopting a case-based reasoning (CBR) approach have been widely advocated. However, as with most machine learning techniques, feature and case subset selection can be extremely influential on the quality of the predictions generated. Unfortunately, both are NP-hard search problems which are intractable for non-trivial data sets. Using all features frequently leads to poor prediction accuracy and pre-processing methods (filters) have not generally been effective. In this paper we consider two different real world project effort data sets. We describe how using simple search techniques, such as hill climbing and sequential selection, can achieve major improvements in accuracy. We conclude that, for our data sets, forward sequential selection, for features, followed by backward sequential selection, for cases, is the most effective approach when exhaustive searching is not possible.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jeffery R, Ruhe M, Wieczorek I. Using public domain metrics to estimate software development effort, presented at 7th IEEE Intl. Metrics Symp., London, 2001.
Google Scholar
Boehm BW. Software engineering economics, IEEE Transactions on Software Engineering, vol. 10, pp. 4 – 21, 1984.
Article Google Scholar
Putnam LH. A general empirical solution to the macro software sizing and estimating problem, IEEE Transactions on Software Engineering, vol. 4, pp. 345 – 361, 1978.
Article Google Scholar
Kok P, Kitchenham BA, Kirakowski J. The MERMAID approach to software cost estimation, presented at Esprit Technical Week, 1990.
Google Scholar
Selby RW & Porter AA. Learning from examples: generation and evaluation of decision trees for software resource analysis, IEEE Transactions on Software Engineering, vol. 14, pp. 743 – 757, 1988.
Article Google Scholar
Finnie GR, Wittig GE, Desharnais J-M. A comparison of software effort estimation techniques using function points with neural networks, case based reasoning and regression models, J. of Systems Software, vol. 39, pp. 281 – 289, 1997.
Article Google Scholar
Burgess CJ & Lefley M. Can genetic programming improve software effort estimation? A comparative evaluation, Information & Software Technology, vol. 43, pp. 863 – 873, 2001.
Article Google Scholar
Dolado JJ. On the problem of the software cost function, Information & Software Technology, vol. 43, pp. 61 – 72, 2001.
Article Google Scholar
Shepperd MJ & Schofield C. Effort estimation by analogy: a case study, presented at 7th European Software Control and Metrics Conference, Wilmslow, UK, 1996.
Google Scholar
Aamodt A & Plaza E. Case-based reasoning: foundational issues, methodical variations and system approaches, AI Communications, vol. 7, 1994.
Google Scholar
Aha DW & Bankert RL. A comparative evaluation of sequential feature selection algorithms, in Artificial Intelligence and Statistics V., D. Fisher and J-H. Lenz, Eds. New York: Springer-Verlag, 1996.
Google Scholar
Kolodner JL. Case-Based Reasoning: Morgan-Kaufmann, 1993.
Google Scholar
Kirsopp C & Shepperd MJ. Making inferences with small numbers of training sets, presented at Intl. Conf, on Empirical Assessment of Software Engineering, Keele Univ, UK, 2002.
Google Scholar
Shepperd MJ & Schofield C. Estimating software project effort using analogies, IEEE Transactions on Software Engineering, vol. 23, pp. 736 – 743, 1997.
Article Google Scholar
Niessink F & van Vliet H. Predicting maintenance effort with function points, presented at Intl. Conf, on Softw. Maint, Bari, Italy, 1997.
Google Scholar
Mendes E, Counsell S, Mosley N. Measurement and effort prediction of web applications, presented at 2nd ICSE Workshop on Web Engineering, Limerick, Ireland, 2000.
Google Scholar
Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modeling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000.
Google Scholar
Kohavi R & John GH. Wrappers for feature selection for machine learning, Artificial Intelligence, vol. 97, pp. 273 – 324, 1997.
Article MATH Google Scholar
Skalak DB. Prototype and feature selection by sampling and random mutation hill climbing algorithms, presented at 11 th Intl. Machine Learning Conf. (ICML-94), 1994.
Google Scholar
Debuse JCW & Rayward-Smith VJ. Feature subset selection within a simulated annealing data mining algorithm, J. of Intelligent Information Systems, vol. 9, pp. 57 – 81, 1997.
Article Google Scholar
Kirsopp C, Shepperd MJ, Hart J. Search heuristics, case-based reasoning and software project effort prediction, presented at GECCO 2002: Genetic and Evolutionary Computation Conf., New York, 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Empirical Software Engineering Research Group School of Design, Engineering and Computing Bournemouth University, Royal London House, BH1 3LT, Bournemouth, UK
Colin Kirsopp & Martin Shepperd

Authors

Colin Kirsopp
View author publications
You can also search for this author in PubMed Google Scholar
Martin Shepperd
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng, FBCS, FIEE, FRSA (Technical Programme Chair) (Technical Programme Chair)
Dept of Computer Science, University of Aberdeen, Aberdeen, UK
Alun Preece (Deputy Technical Programme Chair) (Deputy Technical Programme Chair)
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen (Conference Chairman) (Conference Chairman)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kirsopp, C., Shepperd, M. (2003). Case and Feature Subset Selection in Case-Based Software Project Effort Prediction. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XIX. Springer, London. https://doi.org/10.1007/978-1-4471-0651-7_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0651-7_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-674-5
Online ISBN: 978-1-4471-0651-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics