International Conference on Agents and Artificial Intelligence

Agents and Artificial Intelligence pp 329-345 | Cite as

Robust Signature Discovery for Affymetrix GeneChip\(^\circledR \) Cancer Classification

  • Hung-Ming Lai
  • Andreas Albrecht
  • Kathleen Steinhöfel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8946)

Abstract

Phenotype prediction is one of the central issues in genetics and medical sciences research. Due to the advent of high-throughput screening technologies, microarray-based cancer classification has become a standard procedure to identify cancer-related gene signatures. Since gene expression profiling in transcriptome is of high dimensionality, it is a challenging task to discover a biologically functional signature over different cell lines. In this article, we present an innovative framework for finding a small portion of discriminative genes for a specific disease phenotype classification by using information theory. The framework is a data-driven approach and considers feature relevance, redundancy, and interdependence in the context of feature pairs. Its effectiveness has been validated by using a brain cancer benchmark, where the gene expression profiling matrix is derived from Affymetrix Human Genome U95Av2 GeneChip\(^{\textregistered }\). Three multivariate filters based on information theory have also been used for comparison. To show the strengths of the framework, three performance measures, two sets of enrichment analysis, and a stability index have been used in our experiments. The results show that the framework is robust and able to discover a gene signature having a high level of classification performance and being more statistically significant enriched.

Keywords

Affymetrix Cancer classification Feature interdependence Feature selection Gene expression profiles Gene signature discovery 

References

  1. 1.
    Nevins, J.R., Potti, A.: Mining gene expression profiles: expression signatures as cancer phenotypes. Nature Rev. Genet. 8, 601–609 (2007)CrossRefGoogle Scholar
  2. 2.
    Kim, S.-Y.: Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinform. 10, 147 (2009)CrossRefGoogle Scholar
  3. 3.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)CrossRefGoogle Scholar
  4. 4.
    Bell, D.A., Wang, H.: A formalism for relevance and its application in feature subset selection. Mach. Learn. 41, 175–195 (2000)CrossRefMATHGoogle Scholar
  5. 5.
    Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Nat. Acad. Sci. 103, 5923–5928 (2006)CrossRefGoogle Scholar
  6. 6.
    Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: Proceedings of the 1994 AAAI Fall Symposium on Relevance, pp. 37–39 (1994)Google Scholar
  7. 7.
    Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., De Schaetzen, V., Duque, R., Bersini, H., Now, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 9, 1106–1119 (2012)CrossRefGoogle Scholar
  8. 8.
    Albrecht, A., Vinterbo, S.A., Ohno-Machado, L.: An Epicurean learning approach to gene-expression data classification. Artif. Intell. Med. 28, 75–87 (2003)CrossRefMATHGoogle Scholar
  9. 9.
    Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43, 5–13 (2010)CrossRefMATHGoogle Scholar
  10. 10.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)CrossRefMATHGoogle Scholar
  11. 11.
    Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23, 1106–1114 (2007)CrossRefGoogle Scholar
  12. 12.
    Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. NanoBiosci. 9, 31–37 (2010)CrossRefGoogle Scholar
  13. 13.
    Brown, G., Pocock, A., Zhao, M.-J., Luj, N.M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. The. J. Mach. Learn. Res. 13, 27–66 (2012)MathSciNetMATHGoogle Scholar
  14. 14.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3, 185–205 (2005)CrossRefGoogle Scholar
  15. 15.
    Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5, 1531–1555 (2004)MathSciNetMATHGoogle Scholar
  16. 16.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetMATHGoogle Scholar
  17. 17.
    Nutt, C.L., Mani, D., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M.E., Batchelor, T.T.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63, 1602–1607 (2003)Google Scholar
  18. 18.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  19. 19.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)MATHGoogle Scholar
  20. 20.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefMATHGoogle Scholar
  21. 21.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using Ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  22. 22.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. 102, 15545–15550 (2005)CrossRefGoogle Scholar
  23. 23.
    Wang, J., Duncan, D., Shi, Z., Zhang, B.: WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 41, W77–W83 (2013)CrossRefGoogle Scholar
  24. 24.
    Coussens, L.M., Zitvogel, L., Palucka, A.K.: Neutralizing tumor-promoting chronic inflammation: a magic bullet? Science 339, 286–291 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hung-Ming Lai
    • 1
  • Andreas Albrecht
    • 2
  • Kathleen Steinhöfel
    • 1
  1. 1.Algorithms and Bioinformatics Research Group, Department of InformaticsKing’s College LondonLondonUK
  2. 2.School of Science and Technology, Middlesex UniversityLondonUK

Personalised recommendations