Empirical Assessment of LR- and ANN-Based Fault Prediction Techniques
Abstract
At the present time, because of our reliance on software systems, there is a need for dynamic dependability assessment to ensure that these systems will perform as specified under various conditions. One approach to achieving this is to dynamically assess the modules for software fault predictions. Software fault prediction, as well as inspection and testing, are still the prevalent methods of assuring the quality of software. Software metrics-based approaches to build quality models can predict whether a software module will be fault-prone or not. The application of these models can assist to focus quality improvement efforts on modules that are likely to be faulty during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. In the present paper, the statistical model, such as logistic regression (LR), and the machine learning approaches, such as artificial neural networks (ANN), have been investigated for predicting fault proneness. We evaluate the two predictor models on three main components: a single data sample, a common evaluation parameter, and cross validations. The study shows that ANN techniques perform better than LR; but that LR, being a simpler technique, is also a good quality indicator technique.
Keywords
Receiver Operating Characteristic Curve Artificial Neural Network Model Artificial Neural Network Technique Bayesian Regularization Conditional NumberReferences
- 1.Khoshgoftaar TM, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: ISSRE 1997, IEEE computer society, the eighth international symposium on software engineering, pp 27–35Google Scholar
- 2.Porter AA, Selby RW (1990) Empirically guided software development using metric-based classification trees. IEEE Software 7(2):46–54CrossRefGoogle Scholar
- 3.Selby RW, Porter AA (1988) Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE T Software Eng 14(12):1743–1757CrossRefGoogle Scholar
- 4.Khoshgoftaar TM, Lanning DL, Pandya AS (1994) A comparative study of pattern-recognition techniques for quality evaluation of telecommunications software. IEEE J Sel Area Comm 12(2):279–291CrossRefGoogle Scholar
- 5.Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE T Software Eng 25(5):675–689CrossRefGoogle Scholar
- 6.Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE T Software Eng 18(5):423–33CrossRefGoogle Scholar
- 7.Khoshgoftaar TM, Allen EB, Halstead R, Trio GP, Flass RM (1998) Using process history to predict software quality. Computer 31(4):66–72CrossRefGoogle Scholar
- 8.Briand L, Basili V, Thomas W (1992) A pattern recognition approach for software engineering data analysis. IEEE T Software Eng 18(11):931–942CrossRefGoogle Scholar
- 9.Morasca S, Ruhe G (2000) A hybrid approach to analyze empirical software engineering data and its application to predict module fault-proneness in maintenance. J Syst Software 53(3):225–237CrossRefGoogle Scholar
- 10.Evanco W (1997) Poisson analyses of defects for small software components. J Syst Software 38:27–35CrossRefGoogle Scholar
- 11.El-Emam K, Melo W, Machado J (1999) The prediction of faulty classes using object-oriented design metrics. J Syst SoftwareGoogle Scholar
- 12.Thwin MMT, Quah TS (2002) Application of neural network for predicting software development faults using object-oriented design metrics. In: Proceedings of the 9th international conference on neural information processing, pp 2312–2316Google Scholar
- 13.Osamu M, Shiro I, Shuya N, Tohru K (2007) Spam filter based approach for finding fault-prone software modules. In: 29th international conference on software engineering workshops (ICSEW’07)Google Scholar
- 14.Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE T Software Eng 18(5):423–433CrossRefGoogle Scholar
- 15.Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented metrics as quality indicators. IEEE T Software Eng 22(10):751–761CrossRefGoogle Scholar
- 16.Metrics Data Program, NASA IV&V Facility: http://mdp.ivv.nasa.gov/
- 17.Chidamber SR, Kemerer CF (1994) A metrics suite for object-oriented design. IEEE T Software Eng, 20(6):476–493CrossRefGoogle Scholar
- 18.McCabe TJ (1976) A complexity measure. IEEE T Software Eng SE-2(4):308–320CrossRefMathSciNetGoogle Scholar
- 19.Henry S, Kafura D (1981) Software structure metrics based on information flow. IEEE T Software Eng SE-7(5):510–518CrossRefGoogle Scholar
- 20.Hosmer D, Lemeshow S (1989) Applied logistic regression. Wiley-Interscience, New YorkGoogle Scholar
- 21.Barnett V, Price T (1995) Outliers in statistical data. John Wiley & Sons, New YorkGoogle Scholar
- 22.Belsley D, Kuh E, Welsch R (1980) Regression diagnostics: identifying influential data and sources of collinearity. John Wiley & Sons, New YorkCrossRefMATHGoogle Scholar
- 23.Hanley J, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic ROC curve. Radiology 143:29–36Google Scholar