A Study on Software Effort Prediction Using Machine Learning Techniques

Zhang, Wen; Yang, Ye; Wang, Qing

doi:10.1007/978-3-642-32341-6_1

Wen Zhang³,
Ye Yang³ &
Qing Wang³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 275))

Included in the following conference series:

International Conference on Evaluation of Novel Approaches to Software Engineering

711 Accesses
3 Citations

Abstract

This paper conducts a study on of software effort prediction using machine learning techniques. Both supervised and unsupervised learning techniques are employed to predict software effort using historical dataset. The unsupervised learning as k-medoids clustering equipped with different similarity measures is used to cluster projects in historical dataset. The supervised learning as J48 decision tree, back propagation neural network (BPNN) and na\(\ddot{i}\)ve Bayes is used to classify the software projects into different effort classes. We also impute the missing values in the historical datasets and then machine learning techniques are adopted to predict software effort. Experiments on ISBSG and CSBSG datasets demonstrate that unsupervised learning as k-medoids clustering produced a poor performance. Kulzinsky coefficient has the best performance in measuring the similarities of projects. Supervised learning techniques produced superior performances than unsupervised learning techniques in software effort prediction. BPNN produced the best performance among the three supervised learning techniques. Missing data imputation improved the performances of both unsupervised and supervised learning techniques in software effort prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boehm, B., Abts, C., Brown, A., Chulani, S., Clark, B., Horowitz, E.: Software Cost Estimation with COCOMO II. Prentice Hall, New Jersey (2001)
Google Scholar
Pendharkar, P., Subramanian, G., Roger, J.: A Probabilistic Model for Predicting Software Development Effort. IEEE Transactions on Software Engineering 31(7), 615–624 (2005)
Article Google Scholar
Jorgensen, M.: A Review of Studies on Expert Estimation of Software Development Effort. Journal of Systems and Software 70, 37–60 (2004)
Article Google Scholar
Fairley, R.: Recent Advances in Software Estimation Techniques. In: Proceedings of International Conference on Software Engineering, pp. 382–391 (1992)
Google Scholar
Yang, Y., Wang, Q., Li, M.: Process Trustworthiness as a Capability Indicator for Measuring and Improving Software Trustworthiness. In: Wang, Q., Garousi, V., Madachy, R., Pfahl, D. (eds.) ICSP 2009. LNCS, vol. 5543, pp. 389–401. Springer, Heidelberg (2009)
Chapter Google Scholar
Korte, M., Port, D.: Confidence in Software Cost Estimation Results based on MMRE and PRED. In: Proceedings of PROMISE 2008, pp. 63–70 (2008)
Google Scholar
He, M., Li, M., Wang, Q., Yang, Y., Ye, K.: An Investigation of Software Development Productivity in China. In: Wang, Q., Pfahl, D., Raffo, D.M. (eds.) ICSP 2008. LNCS, vol. 5007, pp. 381–394. Springer, Heidelberg (2008)
Chapter Google Scholar
Krupka, E., Tishby, N.: Generalization from Observed to Unoberserved Features by Clustering. Journal of Machine Learning Research 83, 339–370 (2008)
MathSciNet Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Elsevier (2006)
Google Scholar
Gan, G., Ma, C., Wu, J.: Data Clustering, Theory, Algorithmsm, and Applications. In: ASA-SIAM Series on Statistical and Applied Probability, pp. 78–78 (2008)
Google Scholar
Song, Q., Shepperd, M.: A new imputation method for small software project data sets. Journal of Systems and Software 80, 51–62 (2007)
Article Google Scholar
Zhou, Z., Tang, W.: Clusterer ensemble. Knowledge-Based Systems 19, 77–83 (2006)
Article Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of KDD-2000 Workshop on Text Mining, pp. 109–119 (2000)
Google Scholar
Quinlan, J.: Programs for Machine Learning, 2nd edn. Morgan Kaufmann Publishers (1993)
Google Scholar
Rumelhart, D., Hinton, G., Williams, J.: Learning internal representations by error propagation. In: Proceedings of Parallel Distributed Processing, Exploitations in the Microstructure of Cognition, pp. 318–362 (1986)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons (2003)
Google Scholar
Finnie, G., Wittig, G.: A Comparison of Software Effort Estimation Techniques: Using Function Points with Neural Networks, Case-Based Reasoning and Regression Models. Journal of Systems and Software 39, 281–289 (1997)
Article Google Scholar
Park, H., Baek, S.: An empirical validation of a neural network model for software effort estimation. Expert System with Applications 35, 929–937 (2008)
Article Google Scholar
Srinivasan, K., Fisher, D.: Machine Learning Approaches to Estimating Software Development Effort. IEEE Transactions on Software Engineering 21(2), 126–137 (1995)
Article Google Scholar
Shukla, K.: Neuro-genetic prediction of software development effort. Information and Software Technology 42, 701–713 (2000)
Article Google Scholar
Boehm, B.: Software Engineering Economics. Prentice Hall, New Jersey (1981)
MATH Google Scholar
Prietula, M., Vicinanza, S., Mukhopadhyay, T.: Software-effort estimation with a case-based resoner. Journal of Experimental & Theoritical Artificial Intelligence 8, 341–363 (1996)
Article Google Scholar
Jorgensen, M., Shepperd, M.: A Systematic Review of Software Development Cost Estimation Studies. IEEE Transactions on Software Engineering 33(1), 33–53 (2007)
Article Google Scholar
Zhang, W., Yang, Y., Wang, Q.: Handling missing data in software effort prediction with naive Bayes and EM algorithm. In: Proceedings of International Conference on Predictive Models in Software Engineering, vol. 4 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing, 100190, P.R.China
Wen Zhang, Ye Yang & Qing Wang

Authors

Wen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, Macquarie University, 2109, Sydney, NSW, Australia
Leszek A. Maciaszek
Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, 800 W. Campbell Road, Texas 75080-3021, Richardson, USA
Kang Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W., Yang, Y., Wang, Q. (2013). A Study on Software Effort Prediction Using Machine Learning Techniques. In: Maciaszek, L.A., Zhang, K. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2011. Communications in Computer and Information Science, vol 275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32341-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-32341-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32340-9
Online ISBN: 978-3-642-32341-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics