Software Quality Journal

, Volume 19, Issue 3, pp 537–552 | Cite as

A comparative study for estimating software development effort intervals

  • Ayşe Bakır
  • Burak Turhan
  • Ayşe Bener


Software cost/effort estimation is still an open challenge. Many researchers have proposed various methods that usually focus on point estimates. Until today, software cost estimation has been treated as a regression problem. However, in order to prevent overestimates and underestimates, it is more practical to predict the interval of estimations instead of the exact values. In this paper, we propose an approach that converts cost estimation into a classification problem and that classifies new software projects in one of the effort classes, each of which corresponds to an effort interval. Our approach integrates cluster analysis with classification methods. Cluster analysis is used to determine effort intervals while different classification algorithms are used to find corresponding effort classes. The proposed approach is applied to seven public datasets. Our experimental results show that the hit rate obtained for effort estimation are around 90–100%, which is much higher than that obtained by related studies. Furthermore, in terms of point estimation, our results are comparable to those in the literature although a simple mean/median is used for estimation. Finally, the dynamic generation of effort intervals is the most distinctive part of our study, and it results in time and effort gain for project managers through the removal of human intervention.


Software effort estimation Interval prediction Classification Cluster analysis Machine learning 



This research is supported in part by Tubitak under grant number EEEAG108E014.


  1. Alpaydin, E. (2004). Introduction to machine learning. Cambridge: The MIT Press.Google Scholar
  2. Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Journal of Empirical Software Engineering, 5(1), 35–68.CrossRefGoogle Scholar
  3. Bakar, Z. A., Deris, M. M., & Alhadi, A. C. (2005). Performance analysis of partitional and incremental clustering, Seminar Nasional Aplikasi Teknologi Informasi (SNATI).Google Scholar
  4. Baskeles, B., Turhan, B., & Bener, A. (2007). Software effort estimation using machine learning methods. In Proceedings of the 22nd international symposium on computer and information sciences (ISCIS 2007), Ankara, Turkey, pp. 126–131.Google Scholar
  5. Bibi, S., Stamelos, I., & Angelis, L. (2004). Software cost prediction with predefined interval estimates. In First Software Measurement European Forum, Rome, Italy, January 2004.Google Scholar
  6. Boehm, B. W. (1981). Software engineering economics. Advances in computer science and technology series. Upper Saddle River, NJ: Prentice Hall PTR.Google Scholar
  7. Boehm, B. W. (1999). COCOMO II and COQUALMO Data Collection Questionnaire. University of Southern California, Version 2.2.Google Scholar
  8. Boehm, B., Abts, C., & Chulani, S. (2000). Software development cost estimation approaches—A survey. Annals of Software Engineering.Google Scholar
  9. Boetticher, G. D. (2001). Using machine learning to predict project effort: empirical case studies in data-starved domains. In First international workshop on model-based requirements engineering, pp. 17–24.Google Scholar
  10. Boetticher, G., Menzies, T., & Ostrand, T. (2007). PROMISE repository of empirical software engineering data. West Virginia University, Department of Computer Science.
  11. Briand, L. C., Basili, V. R., & Thomas, W. M. (1992). A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering, 18(11), 931–942.CrossRefGoogle Scholar
  12. Conte, S. D., Dunsmore, H. E., & Shen, V. Y. (1986). Software engineering metrics and models. Menlo Park, CA: Benjamin-Cummings.Google Scholar
  13. Draper, N., & Smith, H. (1981). Applied regression analysis. London: Wiley.zbMATHGoogle Scholar
  14. Gallego, J. J. C., Rodriguez, D., Sicilia, M. A., Rubio, M. G., & Crespo, A. G. (2007). Software project effort estimation based on multiple parametric models generated through data clustering. Journal of Computer Science and Technology, 22(3), 371–378.CrossRefGoogle Scholar
  15. Jorgensen, M. (2002). Comments on ‘a simulation tool for efficient analogy based cost estimation’. Empirical Software Engineering, 7, 375–376.CrossRefGoogle Scholar
  16. Jorgensen, M. (2003). An effort prediction interval approach based on the empirical distribution of previous estimation accuracy. Information and Software Technology, 45, 123–126.CrossRefGoogle Scholar
  17. Jorgensen, M., & Teigen, K. H. (2002). Uncertainty intervals versus interval uncertainty: An alternative method for eliciting effort prediction intervals in software development projects. In International conference on project management (ProMAC), Singapore, pp. 343–352.Google Scholar
  18. Lee, A., Cheng, C. H., & Balakrishnan, J. (1998). Software development cost estimation: Integrating neural network with cluster analysis. Information and Management, 34, 1–9.CrossRefGoogle Scholar
  19. Leung, H., & Fan, Z. (2001). Software cost estimation. Handbook of software engineering and knowledge engineering.
  20. Lum, K., Bramble, M., Hihn, J., Hackney, J., Khorrami, M., & Monson, E. (2003). Handbook for software cost estimation. NASA Jet Propulsion Laboratory, JPL D-26303.Google Scholar
  21. Menzies, T., & Hihn, J. (2006). Evidence-based cost estimation for better-quality software. IEEE Software, 23(4), 64–66.CrossRefGoogle Scholar
  22. Miyazaki, Y., Terakado, M., Ozaki, K., & Nozaki, H. (1994). Robust regression for developing software estimation models. Journal of Systems and Software, 1, 3–16.CrossRefGoogle Scholar
  23. NASA. (1990). Manager’s handbook for software development. Goddard Space Flight Center, Greenbelt, MD, NASA Software Engineering Laboratory.Google Scholar
  24. Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufman.Google Scholar
  25. Sentas, P., Angelis, L., & Stamelos, I. (2003). Multinominal logistic regression applied on software productivity prediction. In 9th Panhellenic conference in informatics, Thessaloniki.Google Scholar
  26. Sentas, P., Angelis, L., Stamelos, I., & Bleris, G. (2005). Software productivity and effort prediction with ordinal regression. Information and Software Technology, 47, 17–29.CrossRefGoogle Scholar
  27. Shalabi, L. A., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In IEEE proceedings of the international conference on dependability of computer systems (DEPCOS-RELCOMEX’06).Google Scholar
  28. Shepperd, M., & Schofield, M. (1997). Estimating software project effort using analogies. IEEE Transactions on Software Engineering, 23(12), 736–743.Google Scholar
  29. SoftLab. (2009). Software research laboratory, Department of Computer Engineering, Bogazici University.
  30. Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137.CrossRefGoogle Scholar
  31. Stamelos, I., & Angelis, L. (2001). Managing uncertainty in project portfolio cost estimation. Information and Software Technology, 43(13), 759–768.CrossRefGoogle Scholar
  32. Stamelos, I., Angelis, L., Dimou, P., & Sakellaris, E. (2003). On the use of bayesian belief networks for the prediction of software productivity. Information and Software Technology, 45, 51–60.CrossRefGoogle Scholar
  33. Stensrud, E., Foss, T., Kitchenham, B., & Myrtveit, I. (2003). A further empirical investigation of the relationship between MRE and project size. Empirical Software Engineering.Google Scholar
  34. Tadayon, N. (2005). Neural network approach for software cost estimation. International Conference on Information Technology: Coding and Computing, 2, 815–818.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer EngineeringBoğaziçi UniversityBebek, IstanbulTurkey
  2. 2.Department of Information Processing ScienceUniversity of OuluOuluFinland
  3. 3.Ted Rogers School of Information Technology ManagementRyerson UniversityTorontoCanada

Personalised recommendations