Cancer Classification by Kernel Principal Component Self-regression

  • Bai-ling Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4304)


The classification of cancer based on gene expression data is one of the most important tasks in bioinformatics, and is essential for future clinical implementations of microarray based cancer diagnosis. In this paper, a novel procedure for classifying cancer using the gene expression data is proposed based on a Kernel Principal Component Self-regression (KPCSR) model. Developed from Kernel Principal Component Analysis (KPCA), the KPCSR model selects a subset of the principal components from the kernel space for the input variables to regress in order to accurately characterize each type of cancer. A modular scheme with class-specific KPCSR structure proves very efficient, from which each cancer class is assigned an independent KPCSR model for coding the corresponding gene expression information. The performance was measured on several public gene expression datasets involving human tumor samples, using 5-fold cross-validation and leave-one-out cross-validation (LOOCV) respectively. Experimental results has shown that the classification accuracies are better or comparable to the maximum accuracies based on the Support Vector Machine and k-Nearest Neighbor classifications combined with various gene selection schemes reported previously in the literature. These results suggest that our proposed method is useful for microarray based cancer classification.


Support Vector Machine Gene Expression Data Principal Component Regression Kernel Principal Component Analysis Kernel Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Dudoit 2002]
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association 97, 77–87 (2002)Google Scholar
  2. [Golub 1999]
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)Google Scholar
  3. [Khan 2001]
    Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)Google Scholar
  4. [Nutt 2003]
    Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., Black, P.M., von Deimling, A., Pomeroy, S.L., Golub, T.R., Louis, D.N.: Gene Expression-based Classification of Malignant Gliomas Correlates Better with Survival than Histological Classification. Cancer Research 63, 1602–1607 (2003)Google Scholar
  5. [Bhattacharjee, 2001]
    Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of USA 98, 13790–13795 (2001)Google Scholar
  6. [Su 2001]
    Su, A.I., Welsh, J.B., Sapinoso, L.M., Kern, S.G., Dimitrov, P., Lapp, H., Schultz, P.G., Powell, S.M., Moskaluk, C.A., Frierson Jr., H.F., Hampton, G.M.: Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures. Cancer Research 61, 7388–7393 (2001)Google Scholar
  7. [Shipp, 2002]
    Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C., Gaasenbeek, M., Amgel, M., Reich, M., Pinkus, G.S., Ray, T.S., Kovall, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R.: Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nature Medicine 8, 68–74 (2002)Google Scholar
  8. [Cho 2003]
    Cho, J.H., Lee, D., Park, J.H., Lee, I.B.: New gene selection for classification of cancer subtype considering within-class variation. FEBS Letters 551, 3–7 (2003)Google Scholar
  9. [Shen 2005]
    Shen, L., Tan, E.C.: Dimension Reduction Based Penalized Logistic Regression for Cancer Classification Using Microarray Data. IEEE/ACM Trans. Computational Biology and Bioinformatics 2, 166–175 (2005)Google Scholar
  10. [Vapnik, 1998]
    Vapnik, V.N.: Statistical Learning Theory. Wiley Series on Adaptive and Learning Systems for Signal Processing, Communications and Control. Wiley, New York (1998)Google Scholar
  11. [Tang 2006]
    Tang, E.K., Suganthan, P.N., Yao, X.: Gene selection algorithms for microarray data based on least squares support vector machine. BMC Bioinformatics 7, 95 (2006)Google Scholar
  12. [Yang 2006]
    Yang, K., Cai, Z., Li, J., Lin, G.: A stable gene selection in microarray data analysis. BMC Bioinformatics 7, 228 (2006)Google Scholar
  13. [Wang 2005]
    Wang, Y., Tetko, I.V., Hall, M.A., Frank, E., Facius, A., Mayer, K.F., Mewes, H.W.: Gene selection from microarray data for cancer classification-a machine learning approach. Comput. Biol. Chem. 29(1), 37–46 (2005)Google Scholar
  14. [Wang 2005]
    Wang, A., Gehan, E.A.: Gene selection for microarray data analysis using principal component analysis. Stat. Med. 24, 1087–2069 (2005)Google Scholar
  15. [Rosipal 2001]
    Rosipal, R., Girolami, M., Trejo, L.J., Cichocki, A.: Kernel PCA for Feature Extraction and De-Noising in Non-linear Regression. Neural Computing & Applications 10, 231–243 (2001)Google Scholar
  16. [Rosipal 2001]
    Rosipal, R., Trejo, L.J.: Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. Journal of Machine Learning Research 2, 97–123 (2001)Google Scholar
  17. [Antoniadis 2003]
    Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19, 563–570 (2003)Google Scholar
  18. [Statnikov 2005]
    Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 631–643 (2005)Google Scholar
  19. [Ramaswamy, 2001]
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 35, 15149–15154 (2001)Google Scholar
  20. [Rifkin, 2003]
    Rifkin, R., Mukherjee, S., Tamayo, P., Ramaswamy, S., Yeang, C., Angelo, M., Reich, M., Poggio, T., Lander, E.S., Golub, T.R., Mesirov, J.P.: An Analytical Method for Multiclass Molecular Cancer Classification. SIAM Review 45, 706–723 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Bai-ling Zhang
    • 1
  1. 1.School of Computer Science and MathematicsVictoria UniversityAustralia

Personalised recommendations