Preliminary Data Analysis Methods in Software Estimation

Liu, Qin; Mintram, Robert C.

doi:10.1007/s11219-004-5262-y

Preliminary Data Analysis Methods in Software Estimation

Published: March 2005

Volume 13, pages 91–115, (2005)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Qin Liu¹ &
Robert C. Mintram¹

277 Accesses
17 Citations
Explore all metrics

Abstract

Software is quite often expensive to develop and can become a major cost factor in corporate information systems’ budgets. With the variability of software characteristics and the continual emergence of new technologies the accurate prediction of software development costs is a critical problem within the project management context.

In order to address this issue a large number of software cost prediction models have been proposed. Each model succeeds to some extent but they all encounter the same problem, i.e., the inconsistency and inadequacy of the historical data sets. Often a preliminary data analysis has not been performed and it is possible for the data to contain non-dominated or confounded variables. Moreover, some of the project attributes or their values are inappropriately out of date, for example the type of computer used for project development in the COCOMO 81 (Boehm, 1981) data set.

This paper proposes a framework composed of a set of clearly identified steps that should be performed before a data set is used within a cost estimation model. This framework is based closely on a paradigm proposed by Maxwell (2002). Briefly, the framework applies a set of statistical approaches, that includes correlation coefficient analysis, Analysis of Variance and Chi-Square test, etc., to the data set in order to remove outliers and identify dominant variables.

To ground the framework within a practical context the procedure is used to analyze the ISBSG (International Software Benchmarking Standards Group data—Release 8) data set. This is a frequently used accessible data collection containing information for 2,008 software projects. As a consequence of this analysis, 6 explanatory variables are extracted and evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Negative results for software effort estimation

Article 21 November 2016

Improving the Software Estimation Models Based on Functional Size through Validation of the Assumptions behind the Linear Regression and the Use of the Confidence Intervals When the Reference Database Presents a Wedge-Shape Form

Article 28 December 2021

A statistical study of the relevance of lines of code measures in software projects

Article 07 May 2014

References

Basili, V.R. 1985. Quantitative evaluation of software methodology, Proceedings of the 1st Pan-Pacific Computer Conference.
Basili, V.R. and Rombach, H.D. 1988. The TAME project: Towards improvement-oriented software environments, IEEE Transactions on Software Engineering 14(6): 758–773.
Google Scholar
Boehm, B.W. 1981. Software Engineering Economics. Englewood Cliffs, NJ, Prentice Hall.
Google Scholar
Boetticher, G. 2001. Using machine learning to predict project effort: Empirical case studies in data-starved domains, Proceedings of the Model Based Requirements Workshop, pp. 17–24.
Briand, L.C., Basili, V.R., and Thomas, W. 1992. A pattern recognition approach for software engineering data analysis, IEEE Transactions on Software Engineering 18(11).
Briand, L.C., Emam, K.E., Surmann, D., and Wieczorek, I. 1998. An assessment and comparison of common software cost estimation modelling techniques, Technical Report ISERN-98-27, Fraunhofer Institute for Experimental Software Engineering, Germany.
Briand, L.C., Langley, T., and Wieczorek, I. 1999. A replicated assessment and comparison of common software cost modeling techniques, Technical Report, IESE-Report 073.99/E.
Burr, A. and Owen, M. 1996. Statistical Methods for Software Quality Using Metrics for Process Improvement. Thomson Computer Press.
Chulani, S., Boehm, B.W., and Steece, B. 1999. Bayesian analysis of empirical software engineering cost models, IEEE Transactions on Software Engineering 25(4): 573–583.
Google Scholar
Conte, S.D., Dunsmore, H.E., and Shen, V.Y. 1986. Software Engineering Metrics and Models. Benjamin/Cummings.
Fenton, N.E. and Neil, M. 2000. Software metrics: Roadmap, “The Future of Software Engineering,” Proceedings of the 22nd International Conference on Software Engineering, pp. 357–370. ACM Press.
Finnie, G.R., Wittig, G.E., and Desharnais, J.M. 1997. Reassessing function points, Australian Journal of Information Systems 4(2): 39–45.
Google Scholar
Gravetter, F.J. and Wallnau, L.B. 1996. Statistics for the Behavioral Science: A First Course for Students of Psychology and Education, 4th ed. St. Paul, West.
Google Scholar
IFPUG. 1994. Counting Practices Manual, Release 4.0, International Function Point Users Group, Westerville, OH.
Karunanithi, N., Whitley, D., and Malaiya, K.Y. 1992. Using neural networks in reliability prediction, IEEE Software 9(4): 53–59.
Google Scholar
Kemerer, C.F. 1987. An empirical validation of software cost estimation models, Communications of the ACM 30(5): 416–429.
Google Scholar
Kitchenham, B.A. 1998. A procedure for analyzing unbalanced datasets, IEEE Transactions on Software Engineering 24(4): 278–301.
Google Scholar
Kitchenham, B.A., Pfleeger, S., Pickard, L., Jones, P., Hoaglin, D., Emam, K.E., and Rosenberg, J. 2002. Preliminary guidelines for empirical research in software engineering, IEEE Transactions on Software Engineering 28(8): 721–734.
Google Scholar
Maxwell, K. 2002. Applied Statistics for Software Managers. UpperSaddle River, NJ, Pearson Education.
Google Scholar
Maxwell, K., Wassenhove, L.V., and Dutta, S. 1996. A software development productivity of European space, military and industrial applications, IEEE Transactions on Software Engineering 22(10): 704–718.
Google Scholar
Pfleeger, S.L., Jeffery, R., Curtis, B., and Kitchenham, B. 1997. Status report on software measurement, IEEE Software 14(2): 33–43.
Google Scholar
Porter, A.A. and Selby, R.W. 1988. Learning from examples: Generation and evaluation of decision trees for software resource analysis, IEEE Transactions on Software Engineering 14(12): 1743–1757.
Google Scholar
Porter, A.A. and Selby, R.W. 1990. Empirically guided software development using metric-based classification trees, IEEE Software 7(2): 46–54.
Google Scholar
Putnam, L.H. 1978. A general empirical solution to the macro software sizing and estimating problem, IEEE Transactions on Software Engineering 4(4): 345–361.
Google Scholar
Putnam, L.H. and Myers, W. 1992. Measures for Excellence: Reliable Software on Time, within Budget. Yourdon Press.
Samson, B., Ellison, D., and Dugard, P. 1997. Software cost estimation using an albus perceptron (cmac), Information and Software Technology 39: 55–60.
Google Scholar
Srinivasan, K. and Fisher, D. 1995. Machine learning approaches to estimating software development effort, IEEE Transactions on Software Engineering 21(2): 126–137.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, University of Northumbria, UK
Qin Liu & Robert C. Mintram

Authors

Qin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Mintram
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Q., Mintram, R.C. Preliminary Data Analysis Methods in Software Estimation. Software Qual J 13, 91–115 (2005). https://doi.org/10.1007/s11219-004-5262-y

Download citation

Issue Date: March 2005
DOI: https://doi.org/10.1007/s11219-004-5262-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preliminary Data Analysis Methods in Software Estimation

Abstract

Access this article

Similar content being viewed by others

Negative results for software effort estimation

Improving the Software Estimation Models Based on Functional Size through Validation of the Assumptions behind the Linear Regression and the Use of the Confidence Intervals When the Reference Database Presents a Wedge-Shape Form

A statistical study of the relevance of lines of code measures in software projects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Preliminary Data Analysis Methods in Software Estimation

Abstract

Access this article

Similar content being viewed by others

Negative results for software effort estimation

Improving the Software Estimation Models Based on Functional Size through Validation of the Assumptions behind the Linear Regression and the Use of the Confidence Intervals When the Reference Database Presents a Wedge-Shape Form

A statistical study of the relevance of lines of code measures in software projects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation