Empirical Software Engineering

, Volume 12, Issue 1, pp 65–106 | Cite as

A flexible method for software effort estimation by analogy

  • Jingzhou Li
  • Guenther Ruhe
  • Ahmed Al-Emran
  • Michael M. Richter


Effort estimation by analogy uses information from former similar projects to predict the effort for a new project. Existing analogy-based methods are limited by their inability to handle non-quantitative data and missing values. The accuracy of predictions needs improvement as well. In this paper, we propose a new flexible method called AQUA that is able to overcome the limitations of former methods. AQUA combines ideas from two known analogy-based estimation techniques: case-based reasoning and collaborative filtering. The method is applicable to predict effort related to any object at the requirement, feature, or project levels. Which are the main contributions of AQUA when compared to other methods? First, AQUA supports non-quantitative data by defining similarity measures for different data types. Second, it is able to tolerate missing values. Third, the results from an explorative study in this paper shows that the prediction accuracy is sensitive to both the number N of analogies (similar objects) taken for adaptation and the threshold T for the degree of similarity, which is true especially for larger data sets. A fixed and small number of analogies, as assumed in existing analogy-based methods, may not produce the best accuracy of prediction. Fourth, a flexible mechanism based on learning of existing data is proposed for determining the appropriate values of N and T likely to offer the best accuracy of prediction. New criteria to measure the quality of prediction are proposed. AQUA was validated against two internal and one public domain data sets with non-quantitative attributes and missing values. The obtained results are encouraging. In addition, acomparative analysis with existing analogy-based estimation methods was conducted using three publicly available data sets that were used by these methods. Intwo of the three cases, AQUA outperformed all other methods.


Software development effort Analogy-based effort estimation Learning Comparative analysis Non-quantitative attributes Missing values 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Angelis L, Stamelos I (2000) A simulation tool for efficient analogy based cost estimation. Empir Software Eng 5:35–68CrossRefGoogle Scholar
  2. Angelis L, Stamelos I, Morisio M (2001) Building a software cost estimation model based on categorical data. METRICS'01: Proceedings of the IEEE 7th International Symposium on Software Metrics. England, UK, pp 4–15Google Scholar
  3. Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. Encyclopedia of Software Engineering. John Wiley, Inc.Google Scholar
  4. Boehm BW, Clark B, Horowitz E, Westland JC, Madachy RJ, Selby RW (1995) Cost models for future software life cycle processes: COCOMO 2.0. Ann Softw Eng 1:57–94CrossRefGoogle Scholar
  5. Briand LC, Wieczorek I (2001) Resource estimation in software engineering. In: Marciniak JJ (ed) Encyclopedia of software engineering (2nd edition). John Wiley, New YorkGoogle Scholar
  6. Burkhard H, Richter MM (2000) On the notion of similarity in case based reasoning and fuzzy theory. In: Pal S et al (eds) Soft-computing and case based reasoning. Springer VerlagGoogle Scholar
  7. Conte SD, Dunsmore H, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummings Publishing Co. Inc.Google Scholar
  8. Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat 37(1):36–48MathSciNetCrossRefGoogle Scholar
  9. Fenton NE, Pfleeger SL (1997) Software metrics: a rigorous & practical approach (2nd edition). PWS Publishing Company, BostonGoogle Scholar
  10. Frigge M, Hoaglin DC, Iglewicz B (1989) Some implementations of the boxplot. Am Stat 43(1):50–54CrossRefGoogle Scholar
  11. Herlocker JL et al (1999) An algorithmic framework for performing collaborative filtering. SIGIR'99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkley, USAGoogle Scholar
  12. Idri A, Abran A (2001) A fuzzy logic based measures for software project similarity: validation and possible improvements. METRICS'01: Proceedings of the IEEE 7th International Symposium on Software Metrics. England, UK, pp 85–96Google Scholar
  13. Idri A, Abran A et al (2002) Estimating software project effort by analogy based on linguistic values. METRICS'02: Pproceedings of the Eighth IEEE Symposium on Software Metrics. Ottawa, Canada, pp 21–30Google Scholar
  14. ISBSG (2005) International software benchmark and standards group, Data R8,, October 18, 2005Google Scholar
  15. Kadoda G, Michelle C, Chen L, Shepperd M (2000) Experiences using case-based reasoning to predict software project effort. EASE'2000—Fourth International Conference on Empirical Assessment and Evaluation in Software Engineering. Staffordshire, UKGoogle Scholar
  16. Kemerer CF (1987) An empirical validation of software cost estimation models. Communication of the ACM 30(5):436–445CrossRefGoogle Scholar
  17. Leung HKN (2002) Estimating maintenance effort by analogy. Empir Software Eng 7(2):157–175CrossRefzbMATHGoogle Scholar
  18. Mendes E, Watson I, Chris T, Nile M, Steve C (2003) A comparative study of cost estimation models for web hypermedia applications. Empir Software Eng 8(2):163–196CrossRefGoogle Scholar
  19. Mukhopadhyay T, Vicinanza S, Prietula MJ (1992) Examining the feasibility of a case-based reasoning model for software effort estimation. MIS Quarterly 16(2):155–171CrossRefGoogle Scholar
  20. Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391CrossRefGoogle Scholar
  21. Ohsugi N et al (2004) Applying collaborative filtering for effort estimation with process metrics. PROFES'04: 5th International Conference on Product Focused Software Process Improvement, LNCS 3009, JapanGoogle Scholar
  22. Pawlak Z (1991) Rough set: theoretical aspects of reasoning about data. KluwerGoogle Scholar
  23. Richter MM (1995) On the notion of similarity in case-based reasoning. In: Della Riccia G et al (ed) Mathematical and statistical methods in artificial intelligence. Springer, Berlin Heidelberg New YorkGoogle Scholar
  24. Ruhe M, Jeffery R, Wieczorek I (2003) Cost estimation for web application. ICSE 2003: Proceedings of 25th International Conference on Software Engineering. Oregon, USA, pp 285–294Google Scholar
  25. Sarwar B et al (2001) Item-based collaborative filtering recommendation algorithms. ACM WWW'01: Proceedings of the Tenth International Conference on World Wide Web. Hong Kong, pp 285–295Google Scholar
  26. Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(12):736–743CrossRefGoogle Scholar
  27. Song Q, Shepperd M, Mair C (2005) Using grey relational analysis to predict software effort with small data sets. METRICS'05: Proceedings of the 11th IEEE International Software Metrics Symposium. Como, Italy, pp 35–45Google Scholar
  28. Tautz C, Althoff K, Nick M (2000) A case-based reasoning approach for managing qualitative experience. 17th National Conference on Artificial Intelligence (AAAI-00) Workshop on Intelligent Lessons Learned Systems. Austin, TexasGoogle Scholar
  29. Turner CR, Fuggetta A, Lavazza L, Wolf AL (1999) A conceptual basis for feature engineering. J Syst Softw 49(1):3–15CrossRefGoogle Scholar
  30. Walkerden F, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empir Software Eng 4(2):135–158CrossRefGoogle Scholar
  31. Wangenheim CG, Althoff K, Barcia RM (2000) Goal-oriented and similarity-based retrieval of software engineering experienceware. In: Ruhe G, Bomarius F (eds) Learning software organizations—methodology and applications. Lecture Notes in Computer Science 1756, Springer VerlagGoogle Scholar
  32. Watson I (1997) Applying case-based reasoning: techniques for enterprise systems. Morgan Kaufmann, San Francisco, CAzbMATHGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Jingzhou Li
    • 1
  • Guenther Ruhe
    • 1
  • Ahmed Al-Emran
    • 1
  • Michael M. Richter
    • 2
  1. 1.Software Engineering Decision Support LaboratoryUniversity of CalgaryCalgaryCanada
  2. 2.TU Kaiserslautern, FB InformatikKaiserslauternGermany

Personalised recommendations