Skip to main content

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 256))

Included in the following conference series:

Abstract

Software industries have enormous demand for fault prediction of the faulty module and fault removal techniques. Many researchers have developed different fault prediction models to predict the fault at an early stage of the software development life cycle (SDLC). But the state-of-the-art model still suffers from the performance and generalize validation of the models. However, some researchers refer to data mining techniques, machine learning, and artificial intelligence play crucial roles in developing fault prediction models. A recent study stated that metric selection techniques also help to enhance the performance of models. Hence, to resolve the issue of improving the fault prediction model’s performance and validation, we have used data mining, instance selection, metric selection, and ensemble methods to beat the state-of-the-art results. For the validation, we have collected the 22 software projects from the four different software repositories. We have implemented three machine learning algorithms and three ensemble methods with two metric selection methods on 22 datasets. The statistical evaluation of the implemented model performed using Wilcoxon signed-rank test and the Friedman test followed by the Nemenyi test to find the significant model. As a result, the Random forest algorithm produces the best result with an average median of 95.43% (accuracy) and 0.96 (f-measure) on 22 software projects. Based on the Nemenyi test, Random forest (RF) is performing better with 4.54 (accuracy mean score) and 4.41 (f-measure mean score) shown in the critical diagram. Experimental study shows that data mining techniques with PCA provide better accuracy and f-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jayanthi, R., Florence, L.: Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput. 22(1), 77–88 (2018). https://doi.org/10.1007/s10586-018-1730-1

    Article  Google Scholar 

  2. Tian, J.: Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement. Wiley, Hoboken (2005)

    Book  Google Scholar 

  3. Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 1–42 (2010)

    Article  Google Scholar 

  4. Canaparo, M., Ronchieri, E.: Data mining techniques for software quality prediction in open source software: an initial assessment. In: CHEP, vol. 2018, pp. 1–8 (2019)

    Google Scholar 

  5. Chauhan, N.S., Saxena, A.: A green software development life cycle for cloud computing. IT Prof. 15(1), 28–34 (2013)

    Article  Google Scholar 

  6. Kumar, L., Sripada, S.K., Sureka, A., Rath, S.K.: Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM). J. Syst. Softw. 137, 686–712 (2018)

    Article  Google Scholar 

  7. Okutan, A., Yıldız, O.T.: Software defect prediction using Bayesian networks. Empirical Softw. Eng. 19(1), 154–181 (2012). https://doi.org/10.1007/s10664-012-9218-8

    Article  Google Scholar 

  8. Shan, C., Chen, B., Hu, C., Xue, J., Li, N.: Software defect prediction model based on LLE and SVM. In: IET Conference Publications, vol. 2014, no. CP653 (2014)

    Google Scholar 

  9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)

    Article  Google Scholar 

  10. Kuncheva, L.I., Skurichina, M., Duin, R.P.W.: An experimental study on diversity for bagging and boosting with linear classifiers. Inf. Fusion 3(4), 245–258 (2002)

    Article  Google Scholar 

  11. Aljamaan, H.I., Elish, M.O.: An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings, pp. 187–194 (2009)

    Google Scholar 

  12. Mccabe, T.J.: A Complexity. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)

    Google Scholar 

  13. Wang, T., Zhang, Z., Jing, X., Zhang, L.: Multiple kernel ensemble learning for software defect prediction. Autom. Softw. Eng. 23(4), 569–590 (2016)

    Article  Google Scholar 

  14. Xu, Z., Xuan, J., Liu, J., Cui, X.: MICHAC: defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, January 2016, pp. 370–381 (2016)

    Google Scholar 

  15. Ryu, D., Baik, J.: Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. J. 49, 1062–1077 (2016)

    Article  Google Scholar 

  16. Abdi, Y., Parsa, S., Seyfari, Y.: A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction. Innov. Syst. Softw. Eng. 11(4), 289–301 (2015). https://doi.org/10.1007/s11334-015-0258-2

    Article  Google Scholar 

  17. Taheri, S., Mammadov, M.: Learning the Naive Bayes classifier with optimization models. Int. J. Appl. Math. Comput. Sci. 23(4), 787–795 (2013)

    Article  MathSciNet  Google Scholar 

  18. Yang, Z.R.: A novel radial basis function neural network for discriminant analysis. IEEE Trans. Neural Netw. 17(3), 604–612 (2006)

    Article  Google Scholar 

  19. Arar, Ö.F., Ayan, K.: Software defect prediction using cost-sensitive neural network. Appl. Soft Comput. J. 33, 263–277 (2015)

    Article  Google Scholar 

  20. Mejia, J., Muñoz, M., Rocha, Á., Calvo-Manzano, J. (eds.): Trends and Applications in Software Engineering. AISC, vol. 405. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26285-7

    Book  Google Scholar 

  21. Khoshgoftaar, T.M., Gao, K.: Feature selection with imbalanced data for software defect prediction. In: 8th International Conference on Machine Learning and Applications, ICMLA 2009, pp. 235–240 (2009)

    Google Scholar 

  22. NASA Dataset. https://github.com/klainfo/NASADefectDataset

  23. Eclipse Dataset. http://bug.inf.usi.ch/download.php

  24. Elastic Search Dataset. http://www.inf.uszeged.hu/ ferenc/papers/UnifiedBugDataSet/

  25. Android Dataset. http://www.inf.uszeged.hu/~ferenc/ papers/UnifiedBugDataSet/

  26. Singh, Y., Kaur, A., Malhotra, R.: Software fault proneness prediction using support vector machines. In: Lecture Notes in Engineering and Computer Science, vol. 2176, no. 1, pp. 240–245 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rakesh Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, R., Chaturvedi, A. (2022). Software Fault Prediction Using Data Mining Techniques on Software Metrics. In: Misra, R., Shyamasundar, R.K., Chaturvedi, A., Omer, R. (eds) Machine Learning and Big Data Analytics (Proceedings of International Conference on Machine Learning and Big Data Analytics (ICMLBDA) 2021). ICMLBDA 2021. Lecture Notes in Networks and Systems, vol 256. Springer, Cham. https://doi.org/10.1007/978-3-030-82469-3_27

Download citation

Publish with us

Policies and ethics