An Adaptive Iterative PCA-SVM Based Technique for Dimensionality Reduction to Support Fast Mining of Leukemia Data
Primary Goal of a Data mining technique is to detect and classify the data from a large data set without compromising the speed of the process. Data mining is the process of extracting patterns from a large dataset. Therefore the pattern discovery and mining are often time consuming. In any data pattern, a data is represented by several columns called the linear low dimensions. But the data identity does not equally depend upon each of these dimensions. Therefore scanning and processing the entire dataset for every query not only reduces the efficiency of the algorithm but at the same time minimizes the speed of processing. This can be solved significantly by identifying the intrinsic dimensionality of the data and applying the classification on the dataset corresponding to the intrinsic dataset only. Several algorithms have been proposed for identifying the intrinsic data dimensions and reducing the same. Once the dimension of the data is reduced, it affects the classification rate and classification rate may drop due to reduction in number of data points for decision. In this work we propose a unique technique for classifying the leukemia data by identifying and reducing the dimension of the training or knowledge dataset using Iterative process of Intrinsic dimensionality discovery and reduction using Principal Components Analysis (PCA) technique. Further the optimized data set is used to classify the given data using Support Vector Machines (SVM) classification. Results show that the proposed technique performs much better in terms of obtaining optimized data set and classification accuracy.
KeywordsPrinciple component analysis Support vector machine reduction Eigen value Local PCA
- 4.Jing, L., Shuzhong, L., Ming, L., Jianyun, N.: Application of dimensionality reduction analysis to fingerprint recognition. In: Proceedings of 2008 International Symposium on Computational Intelligence and Design, iscid, vol. 2, pp. 102–105 (2008)Google Scholar
- 5.Lespinats, S., Verleysen, M., Giron, A., Fertil, G.: DD-HDS: a method for visualization and exploration of high-dimensional data. IEEE Trans. Neural Netw. 18(5), 1265–1279 (2007)Google Scholar
- 6.Segall, R. S., Pierce, R. M.: Data mining of Leukemia cells using self-organized maps. In: Proceedings of 2009 ALAR Conference on Applied Research in Information Technology, 13 February (2009)Google Scholar
- 7.Segall, R. S.: Data mining of microarray databases for the analysis of environmental factors on corn and maize. In: Proceedings of the 2005 Conference of Applied Research in Information Technology, Sponsored by Acxiom Laboratory for Applied Research (ALAR), University of Central Arkansas, 18 February (2005)Google Scholar
- 8.Segall, R.S.: Data mining of microarray databases for the analysis of environmental factors on plants using cluster analysis and predictive regression. In: Proceedings of the Thirty-sixth Annual Conference of the Southwest Decision Sciences Institute, vol. 36, no. 1, Dallas, TX, 3–5 March (2005)Google Scholar