Gene Feature Extraction Using T-Test Statistics and Kernel Partial Least Squares

  • Shutao Li
  • Chen Liao
  • James T. Kwok
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4234)


In this paper, we propose a gene extraction method by using two standard feature extraction methods, namely the T-test method and kernel partial least squares (KPLS), in tandem. First, a preprocessing step based on the T-test method is used to filter irrelevant and noisy genes. KPLS is then used to extract features with high information content. Finally, the extracted features are fed into a classifier. Experiments are performed on three benchmark datasets: breast cancer, ALL/AML leukemia and colon cancer. While using either the T-test method or KPLS does not yield satisfactory results, experimental results demonstrate that using these two together can significantly boost classification accuracy, and this simple combination can obtain state-of-the-art performance on all three datasets.


Support Vector Machine Kernel Matrix Gene Selection Method Gene Extraction Leukemia Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Science 96, 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, pp. 54–64 (2000)Google Scholar
  3. 3.
    Chai, H., Domeniconi, C.: An evaluation of gene selection methods for multi-class microarray data classification. In: Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics, Pisa, Italy, September 2004, pp. 3–10 (2004)Google Scholar
  4. 4.
    Duan, K., Rajapakse, J.C.: A variant of SVM-RFE for gene selection in cancer classification with expression data. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 49–55 (2004)Google Scholar
  5. 5.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  6. 6.
    Krishnapuram, B., Carin, L., Hartemink, A.: Gene expression analysis: Joint feature selection and classifier design. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, pp. 299–318. MIT, Cambridge (2004)Google Scholar
  7. 7.
    Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2002)Google Scholar
  8. 8.
    Ni, B., Liu, J.: A hybrid filter/wrapper gene selection method for microarray classification. In: Proceedings of International Conference on Machine Learning and Cybernetics, pp. 2537–2542 (2004)Google Scholar
  9. 9.
    Rosipal, R.: Kernel partial least squares for nonlinear regression and discrimination. Neural Network World 13(3), 291–300 (2003)Google Scholar
  10. 10.
    Rosipal, R., Trejo, L.J., Matthews, B.: Kernel PLS-SVC for linear and nonlinear classification. In: Proceedings of the Twentieth International Conference on Machine Learning, Washington, D.C., USA, August 2003, pp. 640–647 (2003)Google Scholar
  11. 11.
    Tang, Y., Zhang, Y.-Q., Huang, Z.: FCM-SVM-RFE gene feature selection algorithm for leukemia classification from microarray gene expression data. In: Proceedings of IEEE International Conference on Fuzzy Systems, pp. 97–101 (2005)Google Scholar
  12. 12.
    West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson Jr., J.A., Marks, J.R., Nevins, J.R.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Science 98(20), 11462–11467 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Shutao Li
    • 1
  • Chen Liao
    • 1
  • James T. Kwok
    • 2
  1. 1.College of Electrical and Information EngineeringHunan UniversityChangshaChina
  2. 2.Department of Computer ScienceHong Kong University of Science and TechnologyHong Kong

Personalised recommendations