A comparative study for estimating software development effort intervals
- 278 Downloads
Software cost/effort estimation is still an open challenge. Many researchers have proposed various methods that usually focus on point estimates. Until today, software cost estimation has been treated as a regression problem. However, in order to prevent overestimates and underestimates, it is more practical to predict the interval of estimations instead of the exact values. In this paper, we propose an approach that converts cost estimation into a classification problem and that classifies new software projects in one of the effort classes, each of which corresponds to an effort interval. Our approach integrates cluster analysis with classification methods. Cluster analysis is used to determine effort intervals while different classification algorithms are used to find corresponding effort classes. The proposed approach is applied to seven public datasets. Our experimental results show that the hit rate obtained for effort estimation are around 90–100%, which is much higher than that obtained by related studies. Furthermore, in terms of point estimation, our results are comparable to those in the literature although a simple mean/median is used for estimation. Finally, the dynamic generation of effort intervals is the most distinctive part of our study, and it results in time and effort gain for project managers through the removal of human intervention.
KeywordsSoftware effort estimation Interval prediction Classification Cluster analysis Machine learning
This research is supported in part by Tubitak under grant number EEEAG108E014.
- Alpaydin, E. (2004). Introduction to machine learning. Cambridge: The MIT Press.Google Scholar
- Bakar, Z. A., Deris, M. M., & Alhadi, A. C. (2005). Performance analysis of partitional and incremental clustering, Seminar Nasional Aplikasi Teknologi Informasi (SNATI).Google Scholar
- Baskeles, B., Turhan, B., & Bener, A. (2007). Software effort estimation using machine learning methods. In Proceedings of the 22nd international symposium on computer and information sciences (ISCIS 2007), Ankara, Turkey, pp. 126–131.Google Scholar
- Bibi, S., Stamelos, I., & Angelis, L. (2004). Software cost prediction with predefined interval estimates. In First Software Measurement European Forum, Rome, Italy, January 2004.Google Scholar
- Boehm, B. W. (1981). Software engineering economics. Advances in computer science and technology series. Upper Saddle River, NJ: Prentice Hall PTR.Google Scholar
- Boehm, B. W. (1999). COCOMO II and COQUALMO Data Collection Questionnaire. University of Southern California, Version 2.2.Google Scholar
- Boehm, B., Abts, C., & Chulani, S. (2000). Software development cost estimation approaches—A survey. Annals of Software Engineering.Google Scholar
- Boetticher, G. D. (2001). Using machine learning to predict project effort: empirical case studies in data-starved domains. In First international workshop on model-based requirements engineering, pp. 17–24.Google Scholar
- Boetticher, G., Menzies, T., & Ostrand, T. (2007). PROMISE repository of empirical software engineering data. West Virginia University, Department of Computer Science. http://www.promisedata.org/repository.
- Conte, S. D., Dunsmore, H. E., & Shen, V. Y. (1986). Software engineering metrics and models. Menlo Park, CA: Benjamin-Cummings.Google Scholar
- Jorgensen, M., & Teigen, K. H. (2002). Uncertainty intervals versus interval uncertainty: An alternative method for eliciting effort prediction intervals in software development projects. In International conference on project management (ProMAC), Singapore, pp. 343–352.Google Scholar
- Leung, H., & Fan, Z. (2001). Software cost estimation. Handbook of software engineering and knowledge engineering. ftp://cs.pitt.edu/chang/handbook/42b.pdf.
- Lum, K., Bramble, M., Hihn, J., Hackney, J., Khorrami, M., & Monson, E. (2003). Handbook for software cost estimation. NASA Jet Propulsion Laboratory, JPL D-26303.Google Scholar
- NASA. (1990). Manager’s handbook for software development. Goddard Space Flight Center, Greenbelt, MD, NASA Software Engineering Laboratory.Google Scholar
- Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufman.Google Scholar
- Sentas, P., Angelis, L., & Stamelos, I. (2003). Multinominal logistic regression applied on software productivity prediction. In 9th Panhellenic conference in informatics, Thessaloniki.Google Scholar
- Shalabi, L. A., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In IEEE proceedings of the international conference on dependability of computer systems (DEPCOS-RELCOMEX’06).Google Scholar
- Shepperd, M., & Schofield, M. (1997). Estimating software project effort using analogies. IEEE Transactions on Software Engineering, 23(12), 736–743.Google Scholar
- SoftLab. (2009). Software research laboratory, Department of Computer Engineering, Bogazici University. http://www.softlab.boun.edu.tr.
- Stensrud, E., Foss, T., Kitchenham, B., & Myrtveit, I. (2003). A further empirical investigation of the relationship between MRE and project size. Empirical Software Engineering.Google Scholar
- Tadayon, N. (2005). Neural network approach for software cost estimation. International Conference on Information Technology: Coding and Computing, 2, 815–818.Google Scholar