Identifying and eliminating less complex instances from software fault data

Shatnawi, Raed

doi:10.1007/s13198-016-0556-6

Identifying and eliminating less complex instances from software fault data

Original Article
Published: 24 December 2016

Volume 8, pages 974–982, (2017)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Raed Shatnawi ORCID: orcid.org/0000-0001-7777-1370^1,2

142 Accesses
1 Citation
Explore all metrics

Abstract

Software quality is costly to achieve for large systems. Developers and testers need to investigate a large number of software modules to ensure software quality. This investigation is time consuming and formal models such as machine learning techniques are used to predict the fault-proneness of software modules to predict where faults are more probable. However, many modules are small in either implementation or design size and have low priority to investigate their quality. In this research, machine learners are used to classify the most complex parts of software. The less complex software modules are filtered out using two size measures: number of public parameters in a software construct and line of code (LOC) as a surrogate of implementation size. Modules at lowest 10, 20 and 30% of each measure are filtered out of the trained and tested data sets. The remaining modules are used to build four classifiers: Naïve Bayes, logistic regression, nearest neighbors and decision trees. The resulting classifiers were evaluated and compared against the original data. The classifiers based on NPM filtering were more consistent with the data without filtering than the LOC size measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
MATH Google Scholar
Al Dallal J (2012) The impact of accounting for special methods in the measurement of object-oriented class cohesion on refactoring and fault prediction activities. J Syst Softw 85(5):1042–1057
Article Google Scholar
Boetticher G (2006) Improving credibility of machine learner models in software engineering. In: Advanced machine learner applications in software engineering, software engineering and knowledge engineering, pp 52–72
Catal C, Alan O, Balkan K (2011) Class noise detection based on software metrics and ROC curves. Inf Sci 181(21):4867–4877
Article Google Scholar
Challagulla VU, Bastani FB, Yen I, Paul RA (2005) Empirical assessment of machine learning based software defect prediction techniques. In: Tenth IEEE international workshop on object-oriented real-time dependable systems. pp 263–270
Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
MATH Google Scholar
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of MSR 2010 (7th IEEE working conference on mining software repositories). pp 31–41
Erni K, Lewerentz C (1996) Applying design-metrics to object-oriented frameworks. In: Proceedings of the third international software metrics symposium. pp 25–26
Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. Technical report, HP Laboratories, Page Mill Road, Palo Alto, 38 pages
Gao K, Khoshgoftaar K, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606
Article Google Scholar
Gao K, Khoshgoftaar TM, Seliya N (2012) Predicting high-risk program modules by selecting the right software measurements. Softw Qual J 20(1):3–42
Article Google Scholar
Gray D, Bowes D, Davey N, Sun Y, Christianson B (2011) The misuse of the NASA metrics data program data sets for automated software defect prediction. In: Evaluation and assessment in software engineering (EASE)
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction, IEEE Trans Softw Eng 31(10):897–910
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software, an update. Special Interest Group Knowl Discov Data Min Explor Newsl 11(1):10–18
Google Scholar
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2011) A systematic review of fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Article Google Scholar
Hamill M, Goseva-Popstojanova K (2014) Exploring the missing link: an empirical study of software fixes. Softw Test Verif Reliab 24(5):49–71
Google Scholar
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1264–1284
Google Scholar
Jiang Y, Cukic B, Menzies T (2007) Can data transformation help in the detection of fault-prone modules? In: Proceedings of the 2008 workshop on defects in large software systems. pp 16–20
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13:561–595
Article Google Scholar
Jindal R, Malhotra R, Jain A (2016) Prediction of defect severity by mining software project reports. Int J Syst Assur Eng Manag 1–18
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Besnard P, Hanks S (eds) Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp 338–345
Jureczko M, Madeyski L (2015) Cross–project defect prediction with respect to code ownership model: an empirical study. e-Inform Softw Eng J 9(1):21–35
Google Scholar
Kaur A, Kaur K, Chopra D (2016) An empirical study of software entropy based bug prediction using machine learning. Int J Syst Assur Eng Manag 1–18
Kim S, Zimmermann T, Whitehead E, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering (ICSE 2007), Minneapolis, 20–26 May
Liebchen GA, Shepperd M (2008) Data sets and data quality in software engineering. Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE ‘08). ACM, New York, pp 39–44
Chapter Google Scholar
Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng 34(2):287–300
Article Google Scholar
Menzies T, DiStefano J, Orrego A, Chapman R (2004) Assessing predictors of software defects. In: Predictive software models workshop
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17:375–407
Article Google Scholar
Mertik M, Lenic M, Stiglic G, Kokol P (2006) Estimating software quality with advanced data mining techniques. In: International conference on software engineering advances. p 19
Petrić J, Bowes D, Hall T, Christianson B, Baddoo N (2016) The jinx on the NASA software defect data sets. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering (EASE ‘16). Article 13, 5 pages
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo
Google Scholar
Riquelme JC, Ruiz R, Rodríguez D, Moreno J (2008) Finding defective modules from highly unbalanced datasets. Actas del 8° taller sobre el apoyo a la decisión en ingeniería del software 2(1):67–74
Google Scholar
Schröter A, Zimmermann T, Zeller A (2006) Predicting component failures at design time. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering. ACM, pp 18–27
Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595
Article Google Scholar
Shatnawi R (2010) A quantitative investigation of the acceptable risk levels of object-oriented metrics in open-source systems. IEEE Trans Softw Eng 36(2):216–225
Article Google Scholar
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Article Google Scholar
Wang H, Khoshgoftaar TM, Seliya N (2011) How many software metrics should be selected for defect prediction? In: Murray RC, McCarthy PM (eds) FLAIRS Conference. AAAI Press, Palo Alto
Google Scholar
Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789
Article Google Scholar
Zhou Y, Xu B, Leung H, Chen L (2014) An in-depth study of the potentially confounding effect of class size in fault prediction. ACM Trans Softw Eng Methodol 23(1):1–51
Article Google Scholar

Download references

Author information

Authors and Affiliations

Software Engineering Department, Jordan University of Science and Technology, Irbid, Jordan
Raed Shatnawi
College of Computer and Information Systems, Al Yamamah University, Riyadh, Kingdom of Saudi Arabia
Raed Shatnawi

Authors

Raed Shatnawi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raed Shatnawi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shatnawi, R. Identifying and eliminating less complex instances from software fault data. Int J Syst Assur Eng Manag 8 (Suppl 2), 974–982 (2017). https://doi.org/10.1007/s13198-016-0556-6

Download citation

Received: 28 March 2016
Revised: 07 November 2016
Published: 24 December 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s13198-016-0556-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying and eliminating less complex instances from software fault data

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of software fault prediction using various categories of classifiers

Recognizing Faults in Software Related Difficult Data

Software Complexity Prediction Model: A Combined Machine Learning Approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identifying and eliminating less complex instances from software fault data

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of software fault prediction using various categories of classifiers

Recognizing Faults in Software Related Difficult Data

Software Complexity Prediction Model: A Combined Machine Learning Approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation