Abstract
This paper conducts a study on of software effort prediction using machine learning techniques. Both supervised and unsupervised learning techniques are employed to predict software effort using historical dataset. The unsupervised learning as k-medoids clustering equipped with different similarity measures is used to cluster projects in historical dataset. The supervised learning as J48 decision tree, back propagation neural network (BPNN) and na\(\ddot{i}\)ve Bayes is used to classify the software projects into different effort classes. We also impute the missing values in the historical datasets and then machine learning techniques are adopted to predict software effort. Experiments on ISBSG and CSBSG datasets demonstrate that unsupervised learning as k-medoids clustering produced a poor performance. Kulzinsky coefficient has the best performance in measuring the similarities of projects. Supervised learning techniques produced superior performances than unsupervised learning techniques in software effort prediction. BPNN produced the best performance among the three supervised learning techniques. Missing data imputation improved the performances of both unsupervised and supervised learning techniques in software effort prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boehm, B., Abts, C., Brown, A., Chulani, S., Clark, B., Horowitz, E.: Software Cost Estimation with COCOMO II. Prentice Hall, New Jersey (2001)
Pendharkar, P., Subramanian, G., Roger, J.: A Probabilistic Model for Predicting Software Development Effort. IEEE Transactions on Software Engineering 31(7), 615–624 (2005)
Jorgensen, M.: A Review of Studies on Expert Estimation of Software Development Effort. Journal of Systems and Software 70, 37–60 (2004)
Fairley, R.: Recent Advances in Software Estimation Techniques. In: Proceedings of International Conference on Software Engineering, pp. 382–391 (1992)
Yang, Y., Wang, Q., Li, M.: Process Trustworthiness as a Capability Indicator for Measuring and Improving Software Trustworthiness. In: Wang, Q., Garousi, V., Madachy, R., Pfahl, D. (eds.) ICSP 2009. LNCS, vol. 5543, pp. 389–401. Springer, Heidelberg (2009)
Korte, M., Port, D.: Confidence in Software Cost Estimation Results based on MMRE and PRED. In: Proceedings of PROMISE 2008, pp. 63–70 (2008)
He, M., Li, M., Wang, Q., Yang, Y., Ye, K.: An Investigation of Software Development Productivity in China. In: Wang, Q., Pfahl, D., Raffo, D.M. (eds.) ICSP 2008. LNCS, vol. 5007, pp. 381–394. Springer, Heidelberg (2008)
Krupka, E., Tishby, N.: Generalization from Observed to Unoberserved Features by Clustering. Journal of Machine Learning Research 83, 339–370 (2008)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Elsevier (2006)
Gan, G., Ma, C., Wu, J.: Data Clustering, Theory, Algorithmsm, and Applications. In: ASA-SIAM Series on Statistical and Applied Probability, pp. 78–78 (2008)
Song, Q., Shepperd, M.: A new imputation method for small software project data sets. Journal of Systems and Software 80, 51–62 (2007)
Zhou, Z., Tang, W.: Clusterer ensemble. Knowledge-Based Systems 19, 77–83 (2006)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of KDD-2000 Workshop on Text Mining, pp. 109–119 (2000)
Quinlan, J.: Programs for Machine Learning, 2nd edn. Morgan Kaufmann Publishers (1993)
Rumelhart, D., Hinton, G., Williams, J.: Learning internal representations by error propagation. In: Proceedings of Parallel Distributed Processing, Exploitations in the Microstructure of Cognition, pp. 318–362 (1986)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons (2003)
Finnie, G., Wittig, G.: A Comparison of Software Effort Estimation Techniques: Using Function Points with Neural Networks, Case-Based Reasoning and Regression Models. Journal of Systems and Software 39, 281–289 (1997)
Park, H., Baek, S.: An empirical validation of a neural network model for software effort estimation. Expert System with Applications 35, 929–937 (2008)
Srinivasan, K., Fisher, D.: Machine Learning Approaches to Estimating Software Development Effort. IEEE Transactions on Software Engineering 21(2), 126–137 (1995)
Shukla, K.: Neuro-genetic prediction of software development effort. Information and Software Technology 42, 701–713 (2000)
Boehm, B.: Software Engineering Economics. Prentice Hall, New Jersey (1981)
Prietula, M., Vicinanza, S., Mukhopadhyay, T.: Software-effort estimation with a case-based resoner. Journal of Experimental & Theoritical Artificial Intelligence 8, 341–363 (1996)
Jorgensen, M., Shepperd, M.: A Systematic Review of Software Development Cost Estimation Studies. IEEE Transactions on Software Engineering 33(1), 33–53 (2007)
Zhang, W., Yang, Y., Wang, Q.: Handling missing data in software effort prediction with naive Bayes and EM algorithm. In: Proceedings of International Conference on Predictive Models in Software Engineering, vol. 4 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, W., Yang, Y., Wang, Q. (2013). A Study on Software Effort Prediction Using Machine Learning Techniques. In: Maciaszek, L.A., Zhang, K. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2011. Communications in Computer and Information Science, vol 275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32341-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-32341-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32340-9
Online ISBN: 978-3-642-32341-6
eBook Packages: Computer ScienceComputer Science (R0)