Skip to main content
Log in

EkmEx - an extended framework for labeling an unlabeled fault dataset

  • 1177: Advances in Deep Learning for Multimodal Fusion and Alignment
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Software fault prediction (SFP) is a quality assurance process that identifies if certain modules are fault-prone (FP) or not-fault-prone (NFP). Hence, it minimizes the testing efforts incurred in terms of cost and time. Supervised machine learning techniques have capacity to spot-out the FP modules. However, such techniques require fault information from previous versions of software product. Such information, accumulated over the life-cycle of software, may neither be readily available nor reliable. Currently, clustering with experts’ opinions is a prudent choice for labeling the modules without any fault information. However, the asserted technique may not fully comprehend important aspects such as selection of experts, conflict in expert opinions, catering the diverse expertise of domain experts etc. In this paper, we propose a comprehensive framework named EkmEx that extends the conventional fault prediction approaches while providing mathematical foundation through aspects not addressed so far. The EkmEx guides in selection of experts, furnishes an objective solution for resolve of verdict-conflicts and manages the problem of diversity in expertise of domain experts. We performed expert-assisted module labeling through EkmEx and conventional clustering on seven public datasets of NASA. The empirical outcomes of research exhibit significant potential of the proposed framework in identifying FP modules across all seven datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. AbuHassan A, Alshayeb M, Ghouti L (2020) Software smell detection techniques: A systematic literature review. J Softw Evol Process :e2320

  2. Alsghaier H, Akour M (2020) Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier. Softw Pract Exper 50(4):407–427. https://doi.org/10.1002/spe.2784

    Article  Google Scholar 

  3. Al-Shaaby A, Aljamaan H, Alshayeb M (2020) Bad smell detection using machine learning techniques: A systematic literature review. Arab J Sci Eng :1–29

  4. Amasaki S (2020) Cross-version defect prediction: use historical data, cross-project data, or both? Empir Softw Eng :1–23

  5. Beecham S, Hall T, Bowes D, Gray D, Counsell S, Black S (2010) A systematic review of fault prediction approaches used in software engineering. The Irish Software Engineering Research Centre, Limerick, Ireland

  6. Beecham S, Hall T, Bowes D, Gray D, Counsell S, Black S (2010) A systematic review of fault prediction approaches used in software engineering, Technical Report Lero-TR-2010-04, Lero, Tech Rep.

  7. Bender R (1999) Quantitative risk assessment in epidemiological studies investigating threshold effects. Biometric J 41(3):305–319

    Article  Google Scholar 

  8. Bird C, Bachmann A, Aune E, Duffy J, Bernstein (2009) Fair and balanced? bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ser. ESEC/FSE ’09. Association for Computing Machinery, New York, pp 121–130. https://doi.org/10.1145/1595696.1595716

  9. Bishnu PS, Bhattacherjee V (2012) Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Trans Knowl Data Eng 24 (6):1146–1150

    Article  Google Scholar 

  10. Boetticher G, Menzies T, Ostrand T (2007) {PROMISE} repository of empirical software engineering data, ArXiv

  11. Briand LC, Daly J, Porter V, Wust J (1998) A comprehensive empirical validation of design measures for object-oriented systems. In: Proceedings fifth international software metrics symposium, metrics (Cat. No.98TB100262), pp 246–257

  12. Catal C (2011) Software fault prediction: A literature review and current trends. Expert Syst Appl 38(4):4626–4636

    Article  Google Scholar 

  13. Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354

    Article  Google Scholar 

  14. Catal C, Sevim U, Diri B (2009) Software fault prediction of unlabeled program modules. In: Proceedings of the world congress on engineering, vol 1, pp 1–3

  15. Catal C, Sevim U, Diri B (2009) Clustering and metrics thresholds based software fault prediction of unlabeled program modules. In: 2009 Sixth international conference on information technology: new generations, pp 199–204

  16. Chappelly T, Cifuentes C, Krishnan P, Gevay S (2017) Machine learning for finding bugs: An initial report. In: Machine learning techniques for software quality evaluation (MaLTeSQuE), IEEE Workshop on. IEEE, pp 21–26

  17. El Emam K, Benlarbi S, Goel N, Rai S (1999) A validation of object-oriented metrics. National Research Council Canada Institute for Information Technology

  18. El-Emam K, Melo W (2001) The prediction of faulty classes using object-oriented design metrics. J Syst Softw 56:02

    Article  Google Scholar 

  19. Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach. CRC Press, Boca Raton

    Book  Google Scholar 

  20. Ghani I (2014) Handbook of research on emerging advancements and technologies in software engineering. IGI Global

  21. Gondra I (2008) Applying machine learning to software fault-proneness prediction. J Syst Softw 81(2):186–195

    Article  Google Scholar 

  22. Gupta R, Singh SK (2020) Using software metrics to detect temporary field code smell. In: 2020 10th international conference on cloud computing, data science engineering (Confluence), pp 45–49

  23. Hall T, Zhang M, Bowes D, Sun Y (2014) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol 23(4). https://doi.org/10.1145/2629648

  24. Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc., New York

    MATH  Google Scholar 

  25. Herbold S (2013) Training data selection for cross-project defect prediction. In: Proceedings of the 9th international conference on predictive models in software engineering, ser. PROMISE ’13. Association for Computing Machinery, New York. https://doi.org/10.1145/2499393.2499395

  26. Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering, ser. ICSE ’13. IEEE Press, pp 392–401

  27. I. 9000:2015(en) (2015) Quality management systems — fundamentals and vocabulary, ISO

  28. Kotková B., Hromada M (2020) Adverse event in a medical facility-blackout. Int J Power Syst 5

  29. Li W, Shatnawi R (2007) An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. J Syst Softw 80(7):1120–1128. https://doi.org/10.1016/j.jss.2006.10.018

    Article  Google Scholar 

  30. Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. Inst Eng Technol Softw 12(3):161–175

    Google Scholar 

  31. Li K, Xiang Z, Chen T, Wang S, Tan KC (2020) Understanding the automated parameter optimization on transfer learning for cpdp: An empirical study. arXiv:2002.03148

  32. Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256

    Article  Google Scholar 

  33. MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297

  34. Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27(C):504–518

    Article  Google Scholar 

  35. Marinescu R (2004) Detection strategies: metrics-based rules for detecting design flaws. In: 20th IEEE international conference on software maintenance, 2004. Proceedings., pp 350–359

  36. Martinetz TM, Berkovich SG, Schulten KJ (1993) ’neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans Neural Netw 4(4):558–569

    Article  Google Scholar 

  37. McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320

    Article  MathSciNet  Google Scholar 

  38. McCabe TJ, Butler CW (1989) Design complexity measurement and testing. Commun ACM 32(12):1415–1425

    Article  Google Scholar 

  39. Nam J, Kim S (2015) Clami: Defect prediction on unlabeled datasets (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 452–463

  40. Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391

  41. Nam J, Fu W, Kim S, Menzies T, Tan L (2017) Heterogeneous defect prediction. IEEE Trans Softw Eng

  42. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Advances in neural information processing systems. sMIT Press, pp 849–856

  43. Olbrich S, Cruzes DS, Basili V, Zazworka N (2009) The evolution and impact of code smells: A case study of two open source systems. In: 2009 3rd international symposium on empirical software engineering and measurement, pp 390–400

  44. Olbrich SM, Cruzes DS, Sjøberg DIK (2010) Are all code smells harmful? a study of god classes and brain classes in the evolution of three open source systems. In: 2010 IEEE international conference on software maintenance, pp 1–10

  45. Radjenović D, Heričko M, Torkar R, živkovič A (2013) Software fault prediction metrics: A systematic literature review. Inf Softw Technol 55 (8):1397–1418

    Article  Google Scholar 

  46. Rathore SS, Kumar S (2017) A decision tree logic based recommendation system to select software fault prediction techniques. Computing 99(3):255–285

    Article  MathSciNet  Google Scholar 

  47. Rodriguez D, Ruiz R, Riquelme JC, Harrison R (2013) A study of subgroup discovery approaches for defect prediction. Inf Softw Technol 55 (10):1810–1822. https://doi.org/10.1016/j.infsof.2013.05.002

    Article  Google Scholar 

  48. Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  49. Seliya N, Khoshgoftaar TM (2007) Software quality analysis of unlabeled program modules with semisupervised clustering. IEEE Trans Syst Man Cybern A Syst Humans 37(2):201–211

    Article  Google Scholar 

  50. Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: Some comments on the nasa software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215

    Article  Google Scholar 

  51. Sjoberg DIK, Yamashita A, Anda B, Mockus A, Dyba T (2013) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 39(8):1144–1156. https://doi.org/10.1109/TSE.2012.89

    Article  Google Scholar 

  52. Son L, Pritam N, Khari M, Kumar R, Phuong P, Pham T (2019) Empirical study of software defect prediction: A systematic mapping. Symmetry 11:212

    Article  Google Scholar 

  53. Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578

    Article  Google Scholar 

  54. Wahono RS (2015) A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J Softw Eng 1(1):1–16

    Google Scholar 

  55. Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proceedings of the 4th international workshop on predictor models in software engineering, ser. PROMISE ’08. ACM, New York, pp 19–24

  56. Xu Z, Pang S, Zhang T, Luo X-P, Liu J, Tang Y-T, Yu X, Xue L (2019) Cross project defect prediction via balanced distribution adaptation based transfer learning. J Comput Sci Technol 34(5):1039–1062

    Article  Google Scholar 

  57. Yan M, Fang Y, Lo D, Xia X, Zhang X (2017) File-level defect prediction: Unsupervised vs. supervised models. In: 2017 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). pp 344–353

  58. Yang J, Qian H (2016) Defect prediction on unlabeled datasets by using unsupervised clustering. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th international conference on Smart City; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS), pp 465–472

  59. Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 157–168

  60. Yang Y, Yang J, Qian H (2018) Defect prediction by using cluster ensembles. In: 2018 tenth international conference on advanced computational intelligence (ICACI), pp 631–636

  61. Yuan X, Khoshgoftaar TM, Allen EB, Ganesan K (2000) An application of fuzzy clustering to software quality prediction. In: Proceedings 3rd IEEE symposium on application-specific systems and software engineering technology, pp 85–90

  62. Zakari A, Lee SP (2019) Simultaneous isolation of software faults for effective fault localization. In: 2019 IEEE 15th international colloquium on signal processing & its applications (CSPA). IEEE, pp 16–20

  63. Zhang J, Wu J, Chen C, Zheng Z, Lyu MR (2020) Cds: A cross–version software defect prediction model with data selection. IEEE Access 8:110059–110072

    Article  Google Scholar 

  64. Zhong Shi, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: Eighth IEEE international symposium on high assurance systems engineering, 2004. Proceedings., pp 149–155

  65. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2019) A comprehensive survey on transfer learning. arXiv:1911.02685

  66. Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: 2008 ACM/IEEE 30th international conference on software engineering, pp 531–540

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sohail Sarwar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rizwan, M., Nadeem, A., Sarwar, S. et al. EkmEx - an extended framework for labeling an unlabeled fault dataset. Multimed Tools Appl 81, 12141–12156 (2022). https://doi.org/10.1007/s11042-021-11441-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11441-7

Keywords

Navigation