C-KPCA: Custom Kernel PCA for Cancer Classification

  • Van-Sang Ha
  • Ha-Nam Nguyen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9729)


Principal component analysis (PCA) is an effective and well-known method for reducing high-dimensional data sets. Recently, KPCA (Kernel PCA), a nonlinear form of PCA, has been introduced into many fields. In this paper, we propose a new gene selection, namely Custom Kernel principal component analysis (C-KPCA). The new kernel function for KPCA is created by combining a set of kernel functions. First, Singular Value Decomposition (SVD) is used to reduce the dimension of microarray data. Input space is then mapped to a higher-dimensional feature space using the proposed custom kernel function. The main objective of our method is to extract nonlinear features for classification process. In order to test the accuracy of our method, a number of experiments are carried out on four binary gene datasets: Colon Tumor, Leukemia, Lymphoma, and Prostate. The experimental results show that our proposed method results in a higher prediction rate as comparing with several recently published algorithms.


Feature extract KPCA SVD Cancer classification Dimension reduction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wang, Y., Tetko, I.V., Hall, M.A., Frank, E., Facius, A., Mayer, K.F.X., Mewes, H.W.: Gene selection from microarray data for cancer classification - A machine learning approach. Comput. Biol. Chem. 29(1), 37–46 (2005)CrossRefzbMATHGoogle Scholar
  2. 2.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining (1998)Google Scholar
  3. 3.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  4. 4.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  5. 5.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, p. 500. Addison Wesley (2005)Google Scholar
  7. 7.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, p. 680. John Wiley Section, New York (2001)zbMATHGoogle Scholar
  8. 8.
    Kirby, M., Sirovich, L.: Application of the Karhunen-Loeve procedure for the characterization of human faces. IEEE Trans. Pattern Anal. Mach. Intell. 12(1), 103–108 (1990)CrossRefGoogle Scholar
  9. 9.
    Swets, D.L.: Using discriminant eigenfeatures for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 831–836 (1996)CrossRefGoogle Scholar
  10. 10.
    Comon, P.: Independent component analysis, A new concept? Signal Processing 36(3), 287–314 (1994)CrossRefzbMATHGoogle Scholar
  11. 11.
    Scholkopf, B., Smola, A., Muller, K.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 10, 1299–1319 (1998)CrossRefGoogle Scholar
  12. 12.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst., 849–856 (2001)Google Scholar
  13. 13.
    Liu, Z., Chen, D., Bensmail, H.: Gene expression data classification with kernel principal component analysis. J. Biomed. Biotechnol. 2005(2), 155–159 (2005)CrossRefGoogle Scholar
  14. 14.
    Pochet, N., De Smet, F., Suykens, J.A.K., De Moor, B.L.R.: Systematic benchmarking of microarray data classification: Assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20(17), 3185–3195 (2004)CrossRefGoogle Scholar
  15. 15.
    Czajkowski, M., Grześ, M., Kretowski, M.: Multi-test decision tree and its application to microarray data classification. Artif. Intell. Med. 61(1), 35–44 (2014)CrossRefGoogle Scholar
  16. 16.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Mach. Learn. 6(1), 37–66 (1991)Google Scholar
  17. 17.
    Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)CrossRefGoogle Scholar
  18. 18.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: A gene selection method for cancer classification using Support Vector Machines. Mach. Learn. 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  19. 19.
    Vapnik, V.: The Nature of Statistical Learning Theory, vol. 8 (1995)Google Scholar
  20. 20.
    Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobioscience 9, 31–37 (2010)CrossRefGoogle Scholar
  21. 21.
    Kim, S.: Margin-maximized redundancy-minimized SVM-RFE for diagnostic classification of mammograms. In: 2011 IEEE Int. Conf. Bioinforma. Biomed. Work., pp. 562–569 (2011)Google Scholar
  22. 22.
    Tong, D.L., Schierz, A.C.: Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data. Artif. Intell. Med. 53(1), 47–56 (2011)CrossRefGoogle Scholar
  23. 23.
    Vimaladevi, M., Kalaavathi, B.: Cancer Classification using Hybrid Fast Particle Swarm Optimization with Backpropagation Neural Network 3(11), 8410–8414 (2014)Google Scholar
  24. 24.
    Duan, K.B., Rajapakse, J.C., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans. Nanobioscience 4(3), 228–233 (2005)CrossRefGoogle Scholar
  25. 25.
    Yoon, S., Kim, S.: AdaBoost-based multiple SVM-RFE for classification of mammograms in DDSM. BMC Med. Inform. Decis. Mak. 9(Suppl 1), S1 (2009)CrossRefGoogle Scholar
  26. 26.
    Bishop, C.M.C.C.M.: Pattern Recognition and Machine Learning 4(4) (2006)Google Scholar
  27. 27.
    Williams, C.K.I.: Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond 98(462) (2003)Google Scholar
  28. 28.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. (2009)Google Scholar
  29. 29.
    Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)CrossRefGoogle Scholar
  30. 30.
    Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A. 97(18), 10101–10106 (2000)CrossRefGoogle Scholar
  31. 31.
    Nello Cristianini, J.S.-T.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press (2000)Google Scholar
  32. 32.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Alshamlan, H.M., Badr, G.H., Alohali, Y.A.: Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Comput. Biol. Chem. 56, 49–60 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Economic Information SystemAcademy of FinanceHanoiViet Nam
  2. 2.Department of Information TechnologyVNU-University of Engineering and TechnologyHanoiViet Nam

Personalised recommendations