Journal of Computer Science and Technology

, Volume 22, Issue 3, pp 371–378 | Cite as

Software Project Effort Estimation Based on Multiple Parametric Models Generated Through Data Clustering

  • Juan J. Cuadrado Gallego
  • Daniel Rodríguez
  • Miguel Ángel Sicilia
  • Miguel Garre Rubio
  • Angel García Crespo
Regular Paper


Parametric software effort estimation models usually consists of only a single mathematical relationship. With the advent of software repositories containing data from heterogeneous projects, these types of models suffer from poor adjustment and predictive accuracy. One possible way to alleviate this problem is the use of a set of mathematical equations obtained through dividing of the historical project datasets according to different parameters into subdatasets called partitions. In turn, partitions are divided into clusters that serve as a tool for more accurate models. In this paper, we describe the process, tool and results of such approach through a case study using a publicly available repository, ISBSG. Results suggest the adequacy of the technique as an extension of existing single-expression models without making the estimation process much more complex that uses a single estimation model. A tool to support the process is also presented.


software engineering software measurement effort estimation clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2007_9043_MOESM1_ESM.pdf (33 kb)
Supplementary material - Chinese Abstract (PDF 33 kb)


  1. [1]
    Boehm B, Abts C, Chulani S. Software development cost estimation approaches — A survey. USC Center for Software Engineering Technical Report USC-CSE-2000-505, 2000.Google Scholar
  2. [2]
    Parametric Estimating Initiative. Parametric Estimating Handbook, 2nd Edition, 1999.Google Scholar
  3. [3]
    Stensrud E, Foss T, Kitchenham B, Myrtveit I. An empirical validation of the relationship between the magnitude of relative error and project size. In Proc. the Eighth IEEE Symp. Software Metrics, Ottawa, Canada, 2002, pp.3–12.Google Scholar
  4. [4]
    Cuadrado-Gallego J J, Sicilia M A, Garre M et al. An empirical study of process-related attributes in segmented software cost-estimation relationships. Journal of Systems and Software, 2006, 79(3): 351–361.Google Scholar
  5. [5]
    Shepperd M, Schofield C, Kitchenham B. Effort estimation using analogy. In Proc. 8th Int. Conf. Software Engineering, IEEE Computer Society Press, Berlin, 1996, pp.170–178.Google Scholar
  6. [6]
    Xu Z, Khoshgoftaar T. Identification of fuzzy models of software cost estimation. Fuzzy Sets and Systems, 2004, 145(1): 141–163.CrossRefMathSciNetGoogle Scholar
  7. [7]
    Pedrycz W, Succi G. Genetic granular classifiers in modeling software quality. The Journal of Systems and Software, 2002, 76(3): 277–285.CrossRefGoogle Scholar
  8. [8]
    Dick S, Meeks A, Last M et al. Data mining in software metrics databases. Fuzzy Sets and Systems, 2004, 145(1): 81–110.CrossRefMathSciNetGoogle Scholar
  9. [9]
    Lung C H, Zaman M, Nandi A. Applications of clustering techniques to software partitioning, recovery and restructuring. Journal of Systems and Software, 2004, 73(2): 227–244.CrossRefGoogle Scholar
  10. [10]
    Dolado J. On the problem of the software cost function. Information and Software Technology, 2001, 43(1): 61–72.CrossRefGoogle Scholar
  11. [11]
    Shepperd M, Schofield C. Estimating software project effort using analogies. IEEE Trans. Software Engineering, 1997, 23(11): 736–743.CrossRefGoogle Scholar
  12. [12]
    Oligny S, Bourque P, Abran A, Fournier B. Exploring the relation between effort and duration in software engineering project. In Proc. World Computer Congress, Beijing, China, August 21–25, 2000, pp.175–178.Google Scholar
  13. [13]
    Marquardt W. An algorithm for least squares estimation of non-linear parameters. J. Soc. Indust. Appl. Math., 1963, 11: 431–441.MATHCrossRefMathSciNetGoogle Scholar
  14. [14]
    Conte S D, Dunsmore H E, Shen V Y. Software Engineering Metrics and Models. Menlo Park: Benjamin/Cummings, CA, 1986.Google Scholar
  15. [15]
    Kohavi R, John G. Automatic parameter selection by minimizing estimated error. In Proc. 12th Int. Conf. Machine Learning, San Francisco, 1995, pp.304–312.Google Scholar
  16. [16]
    Witten I H, Frank E. Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann Publishers, USA, 2005.Google Scholar
  17. [17]
    NESMA. NESMA FPA counting practices manual (CPM 2.0), 1996.Google Scholar
  18. [18]
    Dreger J B. Function Point Analysis. Englewood Cliffs, NJ: Prentice Hall, 1989.Google Scholar

Copyright information

© Science Press, Beijing, China and Springer Science + Business Media, LLC, USA 2007

Authors and Affiliations

  • Juan J. Cuadrado Gallego
    • 1
  • Daniel Rodríguez
    • 1
  • Miguel Ángel Sicilia
    • 1
  • Miguel Garre Rubio
    • 1
  • Angel García Crespo
    • 2
  1. 1.Department of Computer ScienceThe University of AlcaláAlcaláSpain
  2. 2.Department of Computer ScienceCarlos III UniversityMadridSpain

Personalised recommendations