Skip to main content

Conquering the Curse of Dimensionality in Gene Expression Cancer Diagnosis: Tough Problem, Simple Models

  • Conference paper
Book cover Artificial Intelligence in Medicine (AIME 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3581))

Included in the following conference series:

Abstract

In the paper we study the properties of cancer gene expression data sets from the perspective of classification and tumor diagnosis. Our findings and case studies are based on several recently published data sets. We find that these data sets typically include a subset of about 100 highly discriminating features of which predictive power can be further enhanced by exploring their interactions. This finding speaks against often used univariate feature selection methods, and may explain the superior performance of support vector machines recently reported in the related work. We argue that a much simpler technique that directly finds visualizations with clear separation of diagnostic classes may be used instead. Furthermore, it may perform better in inference of an understandable classifier that includes only a few relevant features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  2. Shipp, M.A., Ross, K.N., Tamayo, P., et al.: Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 8, 68–74 (2002)

    Article  Google Scholar 

  3. Nutt, C.L., Mani, D.R., Betensky, R.A., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63, 1602–1607 (2003)

    Google Scholar 

  4. Khan, J., Wei, J.S., Ringnér, M., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, vol. 7(6), pp. 673–679 (2001)

    Google Scholar 

  5. Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 33–46 (2004)

    Google Scholar 

  6. Su, A.I., Welsh, J.B., Sapinoso, L.M., et al.: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res 61, 7388–7393 (2001)

    Google Scholar 

  7. Fu, L.M., Fu-Liu, C.S.: Multi-class cancer subtype classification based on gene expression signatures with reliability analysis. FEBS Letters 561, 186–190 (2004)

    Article  Google Scholar 

  8. Gamberger, D., Lavrac, N., Zelezny, F., Tolar, J.: Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. Journal of Biomedical Informatics 37, 269–284 (2004)

    Article  Google Scholar 

  9. Wang, Y., Tetko, I.V., Hall, M.A., et al.: Gene selection from microarray data for cancer classification–a machine learning approach. Computational Biology and Chemistry 29, 37–46 (2005)

    Article  MATH  Google Scholar 

  10. Kira, K., Rendell, L.: A practical approach to feature selection. In: Proceedings of the Ninth International Conference on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  11. Kononenko, I., Simec, E.: Induction of decision trees using relieff. Mathematical and statistical methods in artificial intelligence. Springer, Heidelberg (1995)

    Google Scholar 

  12. Brunsdon, C., Fotheringham, A.S., Charlton, M.: An investigation of methods for visualising highly multivariate datasets. Case Studies of Visualization in the Social Sciences, pp. 55–80 (1998)

    Google Scholar 

  13. Leban, G., Bratko, I., Petrovic, U., Curk, T., Zupan, B.: Vizrank: finding informative data projections in functional genomics by machine learning. Bioinformatics 21, 413–414 (2005)

    Article  Google Scholar 

  14. Singh, D., Febbo, P.G., Ross, K., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)

    Article  Google Scholar 

  15. Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30, 41–47 (2001)

    Article  Google Scholar 

  16. Sikonja, M.R., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Machine Learning 53, 23–69 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mramor, M., Leban, G., Demšar, J., Zupan, B. (2005). Conquering the Curse of Dimensionality in Gene Expression Cancer Diagnosis: Tough Problem, Simple Models. In: Miksch, S., Hunter, J., Keravnou, E.T. (eds) Artificial Intelligence in Medicine. AIME 2005. Lecture Notes in Computer Science(), vol 3581. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527770_68

Download citation

  • DOI: https://doi.org/10.1007/11527770_68

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27831-3

  • Online ISBN: 978-3-540-31884-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics