Applying the Mahalanobis-Taguchi strategy for software defect diagnosis
The Mahalanobis-Taguchi (MT) strategy combines mathematical and statistical concepts like Mahalanobis distance, Gram-Schmidt orthogonalization and experimental designs to support diagnosis and decision-making based on multivariate data. The primary purpose is to develop a scale to measure the degree of abnormality of cases, compared to “normal” or “healthy” cases, i.e. a continuous scale from a set of binary classified cases. An optimal subset of variables for measuring abnormality is then selected and rules for future diagnosis are defined based on them and the measurement scale. This maps well to problems in software defect prediction based on a multivariate set of software metrics and attributes. In this paper, the MT strategy combined with a cluster analysis technique for determining the most appropriate training set, is described and applied to well-known datasets in order to evaluate the fault-proneness of software modules. The measurement scale resulting from the MT strategy is evaluated using ROC curves and shows that it is a promising technique for software defect diagnosis. It compares favorably to previously evaluated methods on a number of publically available data sets. The special characteristic of the MT strategy that it quantifies the level of abnormality can also stimulate and inform discussions with engineers and managers in different defect prediction situations.
KeywordsSoftware defect prediction Fault-proneness Software testing Mahalanobis-Taguchi strategy
Mahalanobis-Taguchi Gram-Schmidt process;
Receiver Operating Characteristic;
Area under the curve
Unable to display preview. Download preview PDF.
- Afzal, W., Torkar, R., Feldt, R., Gorschek, T.: Genetic programming of cross-release fault count predictions in large and complex software projects. In: Chis, M. (ed.) Evolutionary Computation and Optimization Algorithms in Software Engineering; Application and Techniques. IGI Global, Hershey (2009, pp. 94–126). doi:10.4018/978-1-61520-809-8.ch006 Google Scholar
- Chiu, T., Fang, D., Chen, J., Wang, Y., Jeris, C.: A robust and scalable clustering algorithm for mixed type attributes in large database environment. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA. ACM, New York (2001) Google Scholar
- Cudney, E.A., Paryani, K., Ragsdell, K.M.: Identifying useful variables for vehicle braking using the adjoint matrix approach to the Mahalanobis-Taguchi system. J. Ind. Syst. Eng. 1(4), 281–292 (2008) Google Scholar
- Fenton, N., Pfleeger, S.: Software Metrics: A Rigorous and Practical Approach. Springer, Berlin (1998) Google Scholar
- Goel, B., Singh, Y.: Empirical Investigation of Metrics for Fault Prediction on Object-Oriented Software. Studies in Computational Intelligence, pp. 255–265. Springer, Berlin (2008) Google Scholar
- Holmes, J.: Optimizing the software life cycle. ASQ Soft. Qual. Prof. 5, 14–23 (2003) Google Scholar
- Kubat, M., Matwin, S.: Addressing the curse of imbalanced training set: one-sided selection. In: Proc. 14th Int’1 Conf. Machine Learning (ICML ’97) (1997) Google Scholar
- Ma, Y., Guo, L., Cukic, B.: Statistical framework for the prediction of fault-proneness. Advances in Machine Learning—Applications in Software Engineering. Idea Group (2007) Google Scholar
- Zhang, T., Ramakrishnon, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada. ACM, New York (1996) Google Scholar