Skip to main content
Log in

An empirical study of software entropy based bug prediction using machine learning

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

There are many approaches for predicting bugs in software systems. A popular approach for bug prediction is using entropy of changes as proposed by Hassan (2009). This paper uses the metrics derived using entropy of changes to compare five machine learning techniques, namely Gene Expression Programming (GEP), General Regression Neural Network, Locally Weighted Regression, Support Vector Regression (SVR) and Least Median Square Regression for predicting bugs. Four software subsystems: mozilla/layout/generic, mozilla/layout/forms, apache/httpd/modules/ssl and apache/httpd/modules/mappers are used for the validation purpose. The data extraction for the validation purpose is automated by developing an algorithm that employs web scraping and regular expressions. The study suggests GEP and SVR as stable regression techniques for bug prediction using entropy of changes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Afzal W, Torkar R (2008) A comparative evaluation of using genetic programming for predicting fault count data. In: The third international conference on software engineering advances (ICSEA’08), pp 407–414

  • Aggarwal KK, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract 14:39–62

    Article  Google Scholar 

  • Atkeson CG, Moore AW, Schaal SA (1997) Locally weighted learning. AI Rev 11:75–113

    Google Scholar 

  • Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761. doi:10.1109/32.544352

    Article  Google Scholar 

  • Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38:4626–4636

    Article  Google Scholar 

  • Catal C, Banerjee S (2010) Application of artificial immune systems paradigm for developing software fault prediction models. In: Evolutionary computation and optimization algorithms in software engineering Hershey, USA: IGI Global, pp 76–93

  • D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577

    Article  Google Scholar 

  • Dejaeger K, Verbraken T, Baesens B (2013) Toward comprehensible software fault prediction models using Bayesian network classifiers. IEEE Trans Softw Eng 39:237–257

    Article  Google Scholar 

  • Ekanayake J, Tappolet J, Gall HC, Bernstein A (2012) Time variance and defect prediction in software projects. Empir Softw Eng 17(4–5):348–389. doi:10.1007/s10664-011-9180-x

    Article  Google Scholar 

  • Fenton N, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814. doi:10.1109/32.879815

    Article  Google Scholar 

  • Fenton N, Neil M, Marsh W, Hearty P, Radlinski L, Krause P (2007) Project data incorporating qualitative factors for improved software defect prediction. In: Proceedings of the 29th international conference on software engineering workshops, IEEE computer society, Washington, DC, USA (ICSEW’07), pp 69. doi:10.1109/ICSEW.2007.171

  • Ferreira C (2001) Gene expression programming a new adaptive algorithm for solving problems. Complex Syst 13(2):87–129

    MATH  MathSciNet  Google Scholar 

  • Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661. doi:10.1109/32.859533

    Article  Google Scholar 

  • Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Software Eng 31:897–910

    Article  Google Scholar 

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: 31st international conference on software engineering, IEEE computer society pp 78–88

  • Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483–492

    Article  Google Scholar 

  • Kaur A, Kaur K (2014) An empirical study of robustness and stability of machine learning classifiers in software defect prediction. Adv Intell Inf 320:383–397

    Google Scholar 

  • Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: Proceedings of seventh international symposium on software reliability engineering, pp 364–371

  • Khoshgoftaar TM, Allen EB, Hudepohl JP, Aud SJ (1997) Application of neural networks to software quality modeling of a very large telecommunications systems. IEEE Trans Neural Netw 8(4):902–909

    Article  Google Scholar 

  • Khoshgoftaar TM, Allen EB, Jones WD, Hudepohl JP (1999) Data mining for predictors of software quality. Int J Softw Eng Knowl Eng 9(5):547–563

    Article  Google Scholar 

  • Leroy AM, Rousseeu PJ (1987) Robust regression and outlier detection. Wiley, New York

    Google Scholar 

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34:485–496

    Article  Google Scholar 

  • Malhotra R (2014) Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl Soft Comput 21:286–297

    Article  Google Scholar 

  • Malhotra R (2015) A Systematic literature review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518

    Article  Google Scholar 

  • Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the 5th international conference on predictor models in software engineering. doi:10.1145/1540438.1540448

  • Mende T, Koschke R (2010) Effort-aware defect prediction models. In: 14th European conference on software maintenance and reengineering (CSMR), pp 107–116

  • Menzies T, Jeremy G, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13

    Article  Google Scholar 

  • Menzies T, Krishna R, Pryor D (2016) The promise repository of empirical software engineering data. North Carolina State University, Department of Computer Science [Online]. http://openscience.us/repo

  • Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software Engineering, ACM, New York, NY, USA, pp 181–190. doi:10.1145/1368088.1368114

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of 27th international conference on software engineering pp 284–292

  • Okutan A, Yildiz OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181

    Article  Google Scholar 

  • Radjenovic D, Herico M, Torkar R et al (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55:1397–1418

    Article  Google Scholar 

  • Rodriguez D, Ruiz R, Riqelme JC, Harrison R (2013) A study of subgroup discovery approaches for defect prediction. Inf Softw Technol 55 (10):1810–1822

    Article  Google Scholar 

  • Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193

    Article  MATH  Google Scholar 

  • Singh VB, Chaturvedi KK (2012) Entropy based bug prediction using support vector regression. In: Proceedings of 12th international conference on intelligent systems design and applications, pp 746–751

  • Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3–35

    Article  Google Scholar 

  • Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576

    Article  Google Scholar 

  • Thwin MMT, Quah TS (2005) Application of neural networks for software quality prediction using OO metrics. J Syst Softw 76(2):147–156

    Article  Google Scholar 

  • Tosun MA, Bener AB, Turhan B (2011) An industrial case study of classifier ensembles for locating software defects. Software Qual J 19(3):515–536

    Article  Google Scholar 

  • Witten IH, Frank E, Hall MA, Holmes G (2011) Data mining practical machine learning tools and techniques. Morgan Kaufmann, Burlington

    Google Scholar 

  • Predictive Modeling Software-DTREG- https://www.dtreg.com/download

  • Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepti Chopra.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, A., Kaur, K. & Chopra, D. An empirical study of software entropy based bug prediction using machine learning. Int J Syst Assur Eng Manag 8 (Suppl 2), 599–616 (2017). https://doi.org/10.1007/s13198-016-0479-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-016-0479-2

Keywords

Navigation