Abstract
There are many approaches for predicting bugs in software systems. A popular approach for bug prediction is using entropy of changes as proposed by Hassan (2009). This paper uses the metrics derived using entropy of changes to compare five machine learning techniques, namely Gene Expression Programming (GEP), General Regression Neural Network, Locally Weighted Regression, Support Vector Regression (SVR) and Least Median Square Regression for predicting bugs. Four software subsystems: mozilla/layout/generic, mozilla/layout/forms, apache/httpd/modules/ssl and apache/httpd/modules/mappers are used for the validation purpose. The data extraction for the validation purpose is automated by developing an algorithm that employs web scraping and regular expressions. The study suggests GEP and SVR as stable regression techniques for bug prediction using entropy of changes.
Similar content being viewed by others
References
Afzal W, Torkar R (2008) A comparative evaluation of using genetic programming for predicting fault count data. In: The third international conference on software engineering advances (ICSEA’08), pp 407–414
Aggarwal KK, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract 14:39–62
Atkeson CG, Moore AW, Schaal SA (1997) Locally weighted learning. AI Rev 11:75–113
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761. doi:10.1109/32.544352
Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38:4626–4636
Catal C, Banerjee S (2010) Application of artificial immune systems paradigm for developing software fault prediction models. In: Evolutionary computation and optimization algorithms in software engineering Hershey, USA: IGI Global, pp 76–93
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577
Dejaeger K, Verbraken T, Baesens B (2013) Toward comprehensible software fault prediction models using Bayesian network classifiers. IEEE Trans Softw Eng 39:237–257
Ekanayake J, Tappolet J, Gall HC, Bernstein A (2012) Time variance and defect prediction in software projects. Empir Softw Eng 17(4–5):348–389. doi:10.1007/s10664-011-9180-x
Fenton N, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814. doi:10.1109/32.879815
Fenton N, Neil M, Marsh W, Hearty P, Radlinski L, Krause P (2007) Project data incorporating qualitative factors for improved software defect prediction. In: Proceedings of the 29th international conference on software engineering workshops, IEEE computer society, Washington, DC, USA (ICSEW’07), pp 69. doi:10.1109/ICSEW.2007.171
Ferreira C (2001) Gene expression programming a new adaptive algorithm for solving problems. Complex Syst 13(2):87–129
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661. doi:10.1109/32.859533
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Software Eng 31:897–910
Hassan AE (2009) Predicting faults using the complexity of code changes. In: 31st international conference on software engineering, IEEE computer society pp 78–88
Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483–492
Kaur A, Kaur K (2014) An empirical study of robustness and stability of machine learning classifiers in software defect prediction. Adv Intell Inf 320:383–397
Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: Proceedings of seventh international symposium on software reliability engineering, pp 364–371
Khoshgoftaar TM, Allen EB, Hudepohl JP, Aud SJ (1997) Application of neural networks to software quality modeling of a very large telecommunications systems. IEEE Trans Neural Netw 8(4):902–909
Khoshgoftaar TM, Allen EB, Jones WD, Hudepohl JP (1999) Data mining for predictors of software quality. Int J Softw Eng Knowl Eng 9(5):547–563
Leroy AM, Rousseeu PJ (1987) Robust regression and outlier detection. Wiley, New York
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34:485–496
Malhotra R (2014) Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl Soft Comput 21:286–297
Malhotra R (2015) A Systematic literature review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the 5th international conference on predictor models in software engineering. doi:10.1145/1540438.1540448
Mende T, Koschke R (2010) Effort-aware defect prediction models. In: 14th European conference on software maintenance and reengineering (CSMR), pp 107–116
Menzies T, Jeremy G, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13
Menzies T, Krishna R, Pryor D (2016) The promise repository of empirical software engineering data. North Carolina State University, Department of Computer Science [Online]. http://openscience.us/repo
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software Engineering, ACM, New York, NY, USA, pp 181–190. doi:10.1145/1368088.1368114
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of 27th international conference on software engineering pp 284–292
Okutan A, Yildiz OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Radjenovic D, Herico M, Torkar R et al (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55:1397–1418
Rodriguez D, Ruiz R, Riqelme JC, Harrison R (2013) A study of subgroup discovery approaches for defect prediction. Inf Softw Technol 55 (10):1810–1822
Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193
Singh VB, Chaturvedi KK (2012) Entropy based bug prediction using support vector regression. In: Proceedings of 12th international conference on intelligent systems design and applications, pp 746–751
Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3–35
Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576
Thwin MMT, Quah TS (2005) Application of neural networks for software quality prediction using OO metrics. J Syst Softw 76(2):147–156
Tosun MA, Bener AB, Turhan B (2011) An industrial case study of classifier ensembles for locating software defects. Software Qual J 19(3):515–536
Witten IH, Frank E, Hall MA, Holmes G (2011) Data mining practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Predictive Modeling Software-DTREG- https://www.dtreg.com/download
Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Kaur, A., Kaur, K. & Chopra, D. An empirical study of software entropy based bug prediction using machine learning. Int J Syst Assur Eng Manag 8 (Suppl 2), 599–616 (2017). https://doi.org/10.1007/s13198-016-0479-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-016-0479-2