Skip to main content
Log in

An ensemble method for gene discovery based on DNA microarray data

  • Published:
Science in China Series C: Life Sciences Aims and scope Submit manuscript

Abstract

The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by monitoring activities of thousands of genes simultaneously. Current analyses of microarray data focus on precise classification of biological types, for example, tumor versus normal tissues. A further scientific challenging task is to extract disease-relevant genes from the bewildering amounts of raw data, which is one of the most critical themes in the post-genomic era, but it is generally ignored due to lack of an efficient approach. In this paper, we present a novel ensemble method for gene extraction that can be tailored to fulfill multiple biological tasks including (i) precise classification of biological types; (ii) disease gene mining; and (iii) target-driven gene networking. We also give a numerical application for(i) and (ii) using a public microarrary data set and set aside a separate paper to address (iii).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. DeRisi, J. L., Iyer, V. R., Brown, P. O., Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 1997, 278: 680–686.

    Article  CAS  PubMed  Google Scholar 

  2. Golub, T. R., Slonim, D. K., Tamayo, P. et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 1999, 286: 531–537.

    Article  CAS  PubMed  Google Scholar 

  3. Ambroise, C., McLachlan, G. J., Selection bias in gene extraction on the basis of microarray gene-expression data, Proc.Natl.Acad.Sci.USA, 2002, 99: 6562–6566.

    Article  CAS  PubMed  Google Scholar 

  4. Bo, T., Jonassen, I., New feature subset selection procedures for classification of expression profiles, Genome Biol., 2002, 3: RESEARCH0017.

    Article  PubMed  Google Scholar 

  5. Chow, M. L., Moler, E. J., Mian, I. S., Identifying marker genes in transcription profiling data using a mixture of feature relevance experts, Physiol. Genomics, 2001, 5: 99–111.

    CAS  PubMed  Google Scholar 

  6. Hastie, T., Tibshirani, R., Eisen, M. B. et al., ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns, Genome Biol., 2000, 1: RESEARCH0003.

    Article  CAS  PubMed  Google Scholar 

  7. Li, L., Weinberg, C. R., Darden, T. A. et al., Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method, Bio-informatics, 2001, 17: 1131–1142.

    CAS  Google Scholar 

  8. Burke, H. B., Discovering patterns in microarray data, Mol. Diagn., 2000, 5: 349–357.

    CAS  PubMed  Google Scholar 

  9. Breiman, L., Random forests, Machine Learning, 2001, 45: 5–32.

    Article  Google Scholar 

  10. Breiman, L., Bagging predictors, Machine Learning, 1996, 24: 123–140.

    Google Scholar 

  11. Shannon, W. D., Province, M. A., Rao, D. C., Tree-based recursive partitioning methods for subdividing sibpairs into relatively more homogeneous subgroups, Genet Epidemiol., 2001, 20: 293–306.

    Article  CAS  PubMed  Google Scholar 

  12. Province, M. A., Shannon, W. D., Rao, D. C., Classification methods for confronting heterogeneity, Adv. Genet., 2001, 42: 273–286.

    Article  CAS  PubMed  Google Scholar 

  13. Mills, J. C., Gordon, J. I., A new approach for filtering noise from high-density oligonucleotide microarray datasets, Nucleic Acids Res., 2001, 29: E72.

    Article  CAS  PubMed  Google Scholar 

  14. Hall, M., Correlation-based Feature Selection for Machine Learning, Hamilton: University of Waikato, 1998, PhD Thesis.

    Google Scholar 

  15. Blum, A. L., Langley, P., Selection of relevant features and examples in machine learning, Artificial Intelligence, 1997, 97: 245–271.

    Article  Google Scholar 

  16. Kohavi, R., John, G. H., Wrappers for feature subset selection, Artificial Intelligence, 1997, 97: 273–324.

    Article  Google Scholar 

  17. Xing, E. P., Jordan, M. I., Karp, R. M., Feature selection for high-dimensional genomic microarray data, Machine Learning: Proceedings of the Eighteenth International Conference, San Mateo, CA: Morgan Kaufmann, 2001.

    Google Scholar 

  18. Dietterich, T. G., Ensemble methods in machine learning, in First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science (eds. Kittler, J., Roli, F.), New York: Springer Verlag, 2000, 1–15.

    Google Scholar 

  19. Guo, Z., Li, X., Rao, S., Analysis of Medical Data: An Introduction to Bioinformatics, Harbin, China: Harbin Publisher, 2001, 145–151.

    Google Scholar 

  20. Zhang, H., Yu, C. Y., Singer, B. et al., Recursive partitioning for tumor classification with gene expression microarray data, Proc.Natl.Acad.Sci.USA, 2001, 98: 6730–6735.

    Article  CAS  PubMed  Google Scholar 

  21. Alon, U., Barkai, N., Notterman, D. A. et al., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc.Natl.Acad.Sci.USA, 1999, 96: 6745–6750.

    Article  CAS  PubMed  Google Scholar 

  22. Kowalski, J., Denhardt, D. T., Regulation of the mRNA for monocyte-derived neutrophil-activating peptide in differentiating HL60 promyelocytes, Mol. Cell Biol., 1989, 9: 1946–1957.

    CAS  PubMed  Google Scholar 

  23. Su, Y., Murali, T. M., Pavlovic, V. et al., RankGene: Identification of diagnostic genes based on expression data, Bioinformatics, 2003, 19: 1578–1579.

    Article  CAS  PubMed  Google Scholar 

  24. Yeoh, E. J., Ross, M. E., Shurtleff, S. A. et al., Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, 2002, 1: 133–143.

    Article  CAS  PubMed  Google Scholar 

  25. Haseman, J. K., Elston, R. C., The investigation of linkage between a quantitative trait and a marker locus, Behav. Genet., 1972, 2: 3–19.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xia Li or Shaoqi Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Rao, S., Zhang, T. et al. An ensemble method for gene discovery based on DNA microarray data. Sci. China Ser. C.-Life Sci. 47, 396–405 (2004). https://doi.org/10.1007/BF03187097

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03187097

Keywords

Navigation