Abstract
Software industries have enormous demand for fault prediction of the faulty module and fault removal techniques. Many researchers have developed different fault prediction models to predict the fault at an early stage of the software development life cycle (SDLC). But the state-of-the-art model still suffers from the performance and generalize validation of the models. However, some researchers refer to data mining techniques, machine learning, and artificial intelligence play crucial roles in developing fault prediction models. A recent study stated that metric selection techniques also help to enhance the performance of models. Hence, to resolve the issue of improving the fault prediction model’s performance and validation, we have used data mining, instance selection, metric selection, and ensemble methods to beat the state-of-the-art results. For the validation, we have collected the 22 software projects from the four different software repositories. We have implemented three machine learning algorithms and three ensemble methods with two metric selection methods on 22 datasets. The statistical evaluation of the implemented model performed using Wilcoxon signed-rank test and the Friedman test followed by the Nemenyi test to find the significant model. As a result, the Random forest algorithm produces the best result with an average median of 95.43% (accuracy) and 0.96 (f-measure) on 22 software projects. Based on the Nemenyi test, Random forest (RF) is performing better with 4.54 (accuracy mean score) and 4.41 (f-measure mean score) shown in the critical diagram. Experimental study shows that data mining techniques with PCA provide better accuracy and f-measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jayanthi, R., Florence, L.: Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput. 22(1), 77–88 (2018). https://doi.org/10.1007/s10586-018-1730-1
Tian, J.: Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement. Wiley, Hoboken (2005)
Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 1–42 (2010)
Canaparo, M., Ronchieri, E.: Data mining techniques for software quality prediction in open source software: an initial assessment. In: CHEP, vol. 2018, pp. 1–8 (2019)
Chauhan, N.S., Saxena, A.: A green software development life cycle for cloud computing. IT Prof. 15(1), 28–34 (2013)
Kumar, L., Sripada, S.K., Sureka, A., Rath, S.K.: Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM). J. Syst. Softw. 137, 686–712 (2018)
Okutan, A., Yıldız, O.T.: Software defect prediction using Bayesian networks. Empirical Softw. Eng. 19(1), 154–181 (2012). https://doi.org/10.1007/s10664-012-9218-8
Shan, C., Chen, B., Hu, C., Xue, J., Li, N.: Software defect prediction model based on LLE and SVM. In: IET Conference Publications, vol. 2014, no. CP653 (2014)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
Kuncheva, L.I., Skurichina, M., Duin, R.P.W.: An experimental study on diversity for bagging and boosting with linear classifiers. Inf. Fusion 3(4), 245–258 (2002)
Aljamaan, H.I., Elish, M.O.: An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings, pp. 187–194 (2009)
Mccabe, T.J.: A Complexity. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)
Wang, T., Zhang, Z., Jing, X., Zhang, L.: Multiple kernel ensemble learning for software defect prediction. Autom. Softw. Eng. 23(4), 569–590 (2016)
Xu, Z., Xuan, J., Liu, J., Cui, X.: MICHAC: defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, January 2016, pp. 370–381 (2016)
Ryu, D., Baik, J.: Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. J. 49, 1062–1077 (2016)
Abdi, Y., Parsa, S., Seyfari, Y.: A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction. Innov. Syst. Softw. Eng. 11(4), 289–301 (2015). https://doi.org/10.1007/s11334-015-0258-2
Taheri, S., Mammadov, M.: Learning the Naive Bayes classifier with optimization models. Int. J. Appl. Math. Comput. Sci. 23(4), 787–795 (2013)
Yang, Z.R.: A novel radial basis function neural network for discriminant analysis. IEEE Trans. Neural Netw. 17(3), 604–612 (2006)
Arar, Ö.F., Ayan, K.: Software defect prediction using cost-sensitive neural network. Appl. Soft Comput. J. 33, 263–277 (2015)
Mejia, J., Muñoz, M., Rocha, Á., Calvo-Manzano, J. (eds.): Trends and Applications in Software Engineering. AISC, vol. 405. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26285-7
Khoshgoftaar, T.M., Gao, K.: Feature selection with imbalanced data for software defect prediction. In: 8th International Conference on Machine Learning and Applications, ICMLA 2009, pp. 235–240 (2009)
NASA Dataset. https://github.com/klainfo/NASADefectDataset
Eclipse Dataset. http://bug.inf.usi.ch/download.php
Elastic Search Dataset. http://www.inf.uszeged.hu/ ferenc/papers/UnifiedBugDataSet/
Android Dataset. http://www.inf.uszeged.hu/~ferenc/ papers/UnifiedBugDataSet/
Singh, Y., Kaur, A., Malhotra, R.: Software fault proneness prediction using support vector machines. In: Lecture Notes in Engineering and Computer Science, vol. 2176, no. 1, pp. 240–245 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kumar, R., Chaturvedi, A. (2022). Software Fault Prediction Using Data Mining Techniques on Software Metrics. In: Misra, R., Shyamasundar, R.K., Chaturvedi, A., Omer, R. (eds) Machine Learning and Big Data Analytics (Proceedings of International Conference on Machine Learning and Big Data Analytics (ICMLBDA) 2021). ICMLBDA 2021. Lecture Notes in Networks and Systems, vol 256. Springer, Cham. https://doi.org/10.1007/978-3-030-82469-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-82469-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82468-6
Online ISBN: 978-3-030-82469-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)