Hyper-Quadtree-Based K-Means Algorithm for Software Fault Prediction
Software faults are recoverable errors in a program that occur due to the programming errors. Software fault prediction is subject to problems like non-availability of fault data which makes the application of supervised technique difficult. In such cases, unsupervised techniques are helpful. In this paper, a hyper-quadtree-based K-means algorithm has been applied for predicting the faults in the program module. This paper contains two parts. First, the hyper-quadtree is applied on the software fault prediction dataset for the initialization of the K-means clustering algorithm. An input parameter Δ governs the initial number of clusters and cluster centers. Second, the cluster centers and the number of cluster centers obtained from the initialization algorithm are used as the input for the K-means clustering algorithm for predicting the faults in the software modules. The overall error rate of this prediction approach is compared with the other existing algorithms.
KeywordsHyper-quadtree K-means clustering Software fault prediction
- 1.P.S. Bishnu and V. Bhattacherjee, Member, IEEE, “Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 6, June 2012.Google Scholar
- 2.N. Seliya, T. M. Khoshgoftaar, “Software quality estimation with limited fault labels: a supervised learning perspective”, Software quality Journal Vol. 15, no. 3, 2007, pp. 377–344.Google Scholar
- 3.C. Catal, B. Diri, “Investigating the effect of dataset size, metrics set, and feature selection techniques on software fault prediction problem,” Information Sciences, Vol. 179, no. 8, pp. 1040–1058, 2009.Google Scholar
- 4.N. Seliya, “Software quality analysis with limited prior knowledge of faults,” Graduate Seminar, Wayne State University Department of Computer Science, 2006.Google Scholar
- 5.J. Han and M. Kamber, “Data Mining Concepts and Techniques, second ed,” pp. 401–404. Morgan Kaufmann Publisher, 2007.Google Scholar
- 6.S. Zhong, T.M. Khoshgoftaar, and N. Seliya, “Unsupervised Learning for Expert- Based Software Quality Estimation,” Proc. IEEE Eighth Int’l Symp. High Assurance Systems Eng., pp. 149–155, 2004.Google Scholar
- 7.P.S. Bishnu and V. Bhattacherjee, “Application of K-Medoids with kd –Tree for software fault prediction,” ACM Software Eng. Note. Vol. 36, pp. 1–6, Mar. 2011.Google Scholar
- 8.Yannis Manolopoulos, Alexandros Nanopoulos, Apostolos N. Papadopoulos, Yannis Theodoridis, “R-Trees: Theory and Application” pp. 1–6, Dec. 2010.Google Scholar
- 9.N. Seliya and T.M. Khoshgoftaar, “Software Quality Classification Modelling Using the SPRINT Decision Algorithm,” Proc. IEEE 14th Int’l Conf. Tools with Artificial Intelligence, pp. 365–374, 2002.Google Scholar
- 10.C. Catal, U. Sevim, and B. Diri, “Clustering and Metrics Threshold Based Software Fault Prediction of Unlabeled Program Modules,” Proc. Sixth Int’l Conf. Information Technology: New Generations, pp. 199–204, 2009.Google Scholar
- 11.H.Samet, The design and Analysis of Spatial Data structures. Reading, Mass Addison- Wesley, 2000.Google Scholar
- 12.http://promisedata.org/, 2012.