Software Fault Prediction Using Data Mining Techniques on Software Metrics

Kumar, Rakesh; Chaturvedi, Amrita

doi:10.1007/978-3-030-82469-3_27

Rakesh Kumar¹³ &
Amrita Chaturvedi¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 256))

Included in the following conference series:

International Conference on Machine Learning and Big Data Analytics

752 Accesses
1 Citations

Abstract

Software industries have enormous demand for fault prediction of the faulty module and fault removal techniques. Many researchers have developed different fault prediction models to predict the fault at an early stage of the software development life cycle (SDLC). But the state-of-the-art model still suffers from the performance and generalize validation of the models. However, some researchers refer to data mining techniques, machine learning, and artificial intelligence play crucial roles in developing fault prediction models. A recent study stated that metric selection techniques also help to enhance the performance of models. Hence, to resolve the issue of improving the fault prediction model’s performance and validation, we have used data mining, instance selection, metric selection, and ensemble methods to beat the state-of-the-art results. For the validation, we have collected the 22 software projects from the four different software repositories. We have implemented three machine learning algorithms and three ensemble methods with two metric selection methods on 22 datasets. The statistical evaluation of the implemented model performed using Wilcoxon signed-rank test and the Friedman test followed by the Nemenyi test to find the significant model. As a result, the Random forest algorithm produces the best result with an average median of 95.43% (accuracy) and 0.96 (f-measure) on 22 software projects. Based on the Nemenyi test, Random forest (RF) is performing better with 4.54 (accuracy mean score) and 4.41 (f-measure mean score) shown in the critical diagram. Experimental study shows that data mining techniques with PCA provide better accuracy and f-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jayanthi, R., Florence, L.: Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput. 22(1), 77–88 (2018). https://doi.org/10.1007/s10586-018-1730-1
Article Google Scholar
Tian, J.: Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement. Wiley, Hoboken (2005)
Book Google Scholar
Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 1–42 (2010)
Article Google Scholar
Canaparo, M., Ronchieri, E.: Data mining techniques for software quality prediction in open source software: an initial assessment. In: CHEP, vol. 2018, pp. 1–8 (2019)
Google Scholar
Chauhan, N.S., Saxena, A.: A green software development life cycle for cloud computing. IT Prof. 15(1), 28–34 (2013)
Article Google Scholar
Kumar, L., Sripada, S.K., Sureka, A., Rath, S.K.: Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM). J. Syst. Softw. 137, 686–712 (2018)
Article Google Scholar
Okutan, A., Yıldız, O.T.: Software defect prediction using Bayesian networks. Empirical Softw. Eng. 19(1), 154–181 (2012). https://doi.org/10.1007/s10664-012-9218-8
Article Google Scholar
Shan, C., Chen, B., Hu, C., Xue, J., Li, N.: Software defect prediction model based on LLE and SVM. In: IET Conference Publications, vol. 2014, no. CP653 (2014)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
Article Google Scholar
Kuncheva, L.I., Skurichina, M., Duin, R.P.W.: An experimental study on diversity for bagging and boosting with linear classifiers. Inf. Fusion 3(4), 245–258 (2002)
Article Google Scholar
Aljamaan, H.I., Elish, M.O.: An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings, pp. 187–194 (2009)
Google Scholar
Mccabe, T.J.: A Complexity. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)
Google Scholar
Wang, T., Zhang, Z., Jing, X., Zhang, L.: Multiple kernel ensemble learning for software defect prediction. Autom. Softw. Eng. 23(4), 569–590 (2016)
Article Google Scholar
Xu, Z., Xuan, J., Liu, J., Cui, X.: MICHAC: defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, January 2016, pp. 370–381 (2016)
Google Scholar
Ryu, D., Baik, J.: Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. J. 49, 1062–1077 (2016)
Article Google Scholar
Abdi, Y., Parsa, S., Seyfari, Y.: A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction. Innov. Syst. Softw. Eng. 11(4), 289–301 (2015). https://doi.org/10.1007/s11334-015-0258-2
Article Google Scholar
Taheri, S., Mammadov, M.: Learning the Naive Bayes classifier with optimization models. Int. J. Appl. Math. Comput. Sci. 23(4), 787–795 (2013)
Article MathSciNet Google Scholar
Yang, Z.R.: A novel radial basis function neural network for discriminant analysis. IEEE Trans. Neural Netw. 17(3), 604–612 (2006)
Article Google Scholar
Arar, Ö.F., Ayan, K.: Software defect prediction using cost-sensitive neural network. Appl. Soft Comput. J. 33, 263–277 (2015)
Article Google Scholar
Mejia, J., Muñoz, M., Rocha, Á., Calvo-Manzano, J. (eds.): Trends and Applications in Software Engineering. AISC, vol. 405. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26285-7
Book Google Scholar
Khoshgoftaar, T.M., Gao, K.: Feature selection with imbalanced data for software defect prediction. In: 8th International Conference on Machine Learning and Applications, ICMLA 2009, pp. 235–240 (2009)
Google Scholar
NASA Dataset. https://github.com/klainfo/NASADefectDataset
Eclipse Dataset. http://bug.inf.usi.ch/download.php
Elastic Search Dataset. http://www.inf.uszeged.hu/ ferenc/papers/UnifiedBugDataSet/
Android Dataset. http://www.inf.uszeged.hu/~ferenc/ papers/UnifiedBugDataSet/
Singh, Y., Kaur, A., Malhotra, R.: Software fault proneness prediction using support vector machines. In: Lecture Notes in Engineering and Computer Science, vol. 2176, no. 1, pp. 240–245 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology (IIT-BHU), Varanasi, India
Rakesh Kumar & Amrita Chaturvedi

Authors

Rakesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Amrita Chaturvedi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rakesh Kumar .

Editor information

Editors and Affiliations

Indian Institute of Technology Patna, Patna, India
Rajiv Misra
Indian Institute of Technology Bombay, Mumbai, India
Rudrapatna K. Shyamasundar
Indian Institute of Technology (BHU), Varanasi, India
Amrita Chaturvedi
Cardiff University, Cardiff, UK
Rana Omer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, R., Chaturvedi, A. (2022). Software Fault Prediction Using Data Mining Techniques on Software Metrics. In: Misra, R., Shyamasundar, R.K., Chaturvedi, A., Omer, R. (eds) Machine Learning and Big Data Analytics (Proceedings of International Conference on Machine Learning and Big Data Analytics (ICMLBDA) 2021). ICMLBDA 2021. Lecture Notes in Networks and Systems, vol 256. Springer, Cham. https://doi.org/10.1007/978-3-030-82469-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-82469-3_27
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82468-6
Online ISBN: 978-3-030-82469-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics