Skip to main content
Log in

Preliminary Data Analysis Methods in Software Estimation

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Software is quite often expensive to develop and can become a major cost factor in corporate information systems’ budgets. With the variability of software characteristics and the continual emergence of new technologies the accurate prediction of software development costs is a critical problem within the project management context.

In order to address this issue a large number of software cost prediction models have been proposed. Each model succeeds to some extent but they all encounter the same problem, i.e., the inconsistency and inadequacy of the historical data sets. Often a preliminary data analysis has not been performed and it is possible for the data to contain non-dominated or confounded variables. Moreover, some of the project attributes or their values are inappropriately out of date, for example the type of computer used for project development in the COCOMO 81 (Boehm, 1981) data set.

This paper proposes a framework composed of a set of clearly identified steps that should be performed before a data set is used within a cost estimation model. This framework is based closely on a paradigm proposed by Maxwell (2002). Briefly, the framework applies a set of statistical approaches, that includes correlation coefficient analysis, Analysis of Variance and Chi-Square test, etc., to the data set in order to remove outliers and identify dominant variables.

To ground the framework within a practical context the procedure is used to analyze the ISBSG (International Software Benchmarking Standards Group data—Release 8) data set. This is a frequently used accessible data collection containing information for 2,008 software projects. As a consequence of this analysis, 6 explanatory variables are extracted and evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Basili, V.R. 1985. Quantitative evaluation of software methodology, Proceedings of the 1st Pan-Pacific Computer Conference.

  • Basili, V.R. and Rombach, H.D. 1988. The TAME project: Towards improvement-oriented software environments, IEEE Transactions on Software Engineering 14(6): 758–773.

    Google Scholar 

  • Boehm, B.W. 1981. Software Engineering Economics. Englewood Cliffs, NJ, Prentice Hall.

    Google Scholar 

  • Boetticher, G. 2001. Using machine learning to predict project effort: Empirical case studies in data-starved domains, Proceedings of the Model Based Requirements Workshop, pp. 17–24.

  • Briand, L.C., Basili, V.R., and Thomas, W. 1992. A pattern recognition approach for software engineering data analysis, IEEE Transactions on Software Engineering 18(11).

  • Briand, L.C., Emam, K.E., Surmann, D., and Wieczorek, I. 1998. An assessment and comparison of common software cost estimation modelling techniques, Technical Report ISERN-98-27, Fraunhofer Institute for Experimental Software Engineering, Germany.

  • Briand, L.C., Langley, T., and Wieczorek, I. 1999. A replicated assessment and comparison of common software cost modeling techniques, Technical Report, IESE-Report 073.99/E.

  • Burr, A. and Owen, M. 1996. Statistical Methods for Software Quality Using Metrics for Process Improvement. Thomson Computer Press.

  • Chulani, S., Boehm, B.W., and Steece, B. 1999. Bayesian analysis of empirical software engineering cost models, IEEE Transactions on Software Engineering 25(4): 573–583.

    Google Scholar 

  • Conte, S.D., Dunsmore, H.E., and Shen, V.Y. 1986. Software Engineering Metrics and Models. Benjamin/Cummings.

  • Fenton, N.E. and Neil, M. 2000. Software metrics: Roadmap, “The Future of Software Engineering,” Proceedings of the 22nd International Conference on Software Engineering, pp. 357–370. ACM Press.

  • Finnie, G.R., Wittig, G.E., and Desharnais, J.M. 1997. Reassessing function points, Australian Journal of Information Systems 4(2): 39–45.

    Google Scholar 

  • Gravetter, F.J. and Wallnau, L.B. 1996. Statistics for the Behavioral Science: A First Course for Students of Psychology and Education, 4th ed. St. Paul, West.

    Google Scholar 

  • IFPUG. 1994. Counting Practices Manual, Release 4.0, International Function Point Users Group, Westerville, OH.

  • Karunanithi, N., Whitley, D., and Malaiya, K.Y. 1992. Using neural networks in reliability prediction, IEEE Software 9(4): 53–59.

    Google Scholar 

  • Kemerer, C.F. 1987. An empirical validation of software cost estimation models, Communications of the ACM 30(5): 416–429.

    Google Scholar 

  • Kitchenham, B.A. 1998. A procedure for analyzing unbalanced datasets, IEEE Transactions on Software Engineering 24(4): 278–301.

    Google Scholar 

  • Kitchenham, B.A., Pfleeger, S., Pickard, L., Jones, P., Hoaglin, D., Emam, K.E., and Rosenberg, J. 2002. Preliminary guidelines for empirical research in software engineering, IEEE Transactions on Software Engineering 28(8): 721–734.

    Google Scholar 

  • Maxwell, K. 2002. Applied Statistics for Software Managers. UpperSaddle River, NJ, Pearson Education.

    Google Scholar 

  • Maxwell, K., Wassenhove, L.V., and Dutta, S. 1996. A software development productivity of European space, military and industrial applications, IEEE Transactions on Software Engineering 22(10): 704–718.

    Google Scholar 

  • Pfleeger, S.L., Jeffery, R., Curtis, B., and Kitchenham, B. 1997. Status report on software measurement, IEEE Software 14(2): 33–43.

    Google Scholar 

  • Porter, A.A. and Selby, R.W. 1988. Learning from examples: Generation and evaluation of decision trees for software resource analysis, IEEE Transactions on Software Engineering 14(12): 1743–1757.

    Google Scholar 

  • Porter, A.A. and Selby, R.W. 1990. Empirically guided software development using metric-based classification trees, IEEE Software 7(2): 46–54.

    Google Scholar 

  • Putnam, L.H. 1978. A general empirical solution to the macro software sizing and estimating problem, IEEE Transactions on Software Engineering 4(4): 345–361.

    Google Scholar 

  • Putnam, L.H. and Myers, W. 1992. Measures for Excellence: Reliable Software on Time, within Budget. Yourdon Press.

  • Samson, B., Ellison, D., and Dugard, P. 1997. Software cost estimation using an albus perceptron (cmac), Information and Software Technology 39: 55–60.

    Google Scholar 

  • Srinivasan, K. and Fisher, D. 1995. Machine learning approaches to estimating software development effort, IEEE Transactions on Software Engineering 21(2): 126–137.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Q., Mintram, R.C. Preliminary Data Analysis Methods in Software Estimation. Software Qual J 13, 91–115 (2005). https://doi.org/10.1007/s11219-004-5262-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-004-5262-y

Keywords

Navigation