Robust Signature Discovery for Affymetrix GeneChip\(^\circledR \) Cancer Classification

  • Hung-Ming LaiEmail author
  • Andreas Albrecht
  • Kathleen Steinhöfel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8946)


Phenotype prediction is one of the central issues in genetics and medical sciences research. Due to the advent of high-throughput screening technologies, microarray-based cancer classification has become a standard procedure to identify cancer-related gene signatures. Since gene expression profiling in transcriptome is of high dimensionality, it is a challenging task to discover a biologically functional signature over different cell lines. In this article, we present an innovative framework for finding a small portion of discriminative genes for a specific disease phenotype classification by using information theory. The framework is a data-driven approach and considers feature relevance, redundancy, and interdependence in the context of feature pairs. Its effectiveness has been validated by using a brain cancer benchmark, where the gene expression profiling matrix is derived from Affymetrix Human Genome U95Av2 GeneChip\(^{\textregistered }\). Three multivariate filters based on information theory have also been used for comparison. To show the strengths of the framework, three performance measures, two sets of enrichment analysis, and a stability index have been used in our experiments. The results show that the framework is robust and able to discover a gene signature having a high level of classification performance and being more statistically significant enriched.


Affymetrix Cancer classification Feature interdependence Feature selection Gene expression profiles Gene signature discovery 


  1. 1.
    Nevins, J.R., Potti, A.: Mining gene expression profiles: expression signatures as cancer phenotypes. Nature Rev. Genet. 8, 601–609 (2007)CrossRefGoogle Scholar
  2. 2.
    Kim, S.-Y.: Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinform. 10, 147 (2009)CrossRefGoogle Scholar
  3. 3.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)CrossRefGoogle Scholar
  4. 4.
    Bell, D.A., Wang, H.: A formalism for relevance and its application in feature subset selection. Mach. Learn. 41, 175–195 (2000)CrossRefzbMATHGoogle Scholar
  5. 5.
    Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Nat. Acad. Sci. 103, 5923–5928 (2006)CrossRefGoogle Scholar
  6. 6.
    Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: Proceedings of the 1994 AAAI Fall Symposium on Relevance, pp. 37–39 (1994)Google Scholar
  7. 7.
    Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., De Schaetzen, V., Duque, R., Bersini, H., Now, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 9, 1106–1119 (2012)CrossRefGoogle Scholar
  8. 8.
    Albrecht, A., Vinterbo, S.A., Ohno-Machado, L.: An Epicurean learning approach to gene-expression data classification. Artif. Intell. Med. 28, 75–87 (2003)CrossRefzbMATHGoogle Scholar
  9. 9.
    Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43, 5–13 (2010)CrossRefzbMATHGoogle Scholar
  10. 10.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23, 1106–1114 (2007)CrossRefGoogle Scholar
  12. 12.
    Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. NanoBiosci. 9, 31–37 (2010)CrossRefGoogle Scholar
  13. 13.
    Brown, G., Pocock, A., Zhao, M.-J., Luj, N.M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. The. J. Mach. Learn. Res. 13, 27–66 (2012)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3, 185–205 (2005)CrossRefGoogle Scholar
  15. 15.
    Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5, 1531–1555 (2004)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Nutt, C.L., Mani, D., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M.E., Batchelor, T.T.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63, 1602–1607 (2003)Google Scholar
  18. 18.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  19. 19.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)zbMATHGoogle Scholar
  20. 20.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefzbMATHGoogle Scholar
  21. 21.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using Ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  22. 22.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. 102, 15545–15550 (2005)CrossRefGoogle Scholar
  23. 23.
    Wang, J., Duncan, D., Shi, Z., Zhang, B.: WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 41, W77–W83 (2013)CrossRefGoogle Scholar
  24. 24.
    Coussens, L.M., Zitvogel, L., Palucka, A.K.: Neutralizing tumor-promoting chronic inflammation: a magic bullet? Science 339, 286–291 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hung-Ming Lai
    • 1
    Email author
  • Andreas Albrecht
    • 2
  • Kathleen Steinhöfel
    • 1
  1. 1.Algorithms and Bioinformatics Research Group, Department of InformaticsKing’s College LondonLondonUK
  2. 2.School of Science and Technology, Middlesex UniversityLondonUK

Personalised recommendations