Abstract
The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by monitoring activities of thousands of genes simultaneously. Current analyses of microarray data focus on precise classification of biological types, for example, tumor versus normal tissues. A further scientific challenging task is to extract disease-relevant genes from the bewildering amounts of raw data, which is one of the most critical themes in the post-genomic era, but it is generally ignored due to lack of an efficient approach. In this paper, we present a novel ensemble method for gene extraction that can be tailored to fulfill multiple biological tasks including (i) precise classification of biological types; (ii) disease gene mining; and (iii) target-driven gene networking. We also give a numerical application for(i) and (ii) using a public microarrary data set and set aside a separate paper to address (iii).
Similar content being viewed by others
References
DeRisi, J. L., Iyer, V. R., Brown, P. O., Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 1997, 278: 680–686.
Golub, T. R., Slonim, D. K., Tamayo, P. et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 1999, 286: 531–537.
Ambroise, C., McLachlan, G. J., Selection bias in gene extraction on the basis of microarray gene-expression data, Proc.Natl.Acad.Sci.USA, 2002, 99: 6562–6566.
Bo, T., Jonassen, I., New feature subset selection procedures for classification of expression profiles, Genome Biol., 2002, 3: RESEARCH0017.
Chow, M. L., Moler, E. J., Mian, I. S., Identifying marker genes in transcription profiling data using a mixture of feature relevance experts, Physiol. Genomics, 2001, 5: 99–111.
Hastie, T., Tibshirani, R., Eisen, M. B. et al., ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns, Genome Biol., 2000, 1: RESEARCH0003.
Li, L., Weinberg, C. R., Darden, T. A. et al., Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method, Bio-informatics, 2001, 17: 1131–1142.
Burke, H. B., Discovering patterns in microarray data, Mol. Diagn., 2000, 5: 349–357.
Breiman, L., Random forests, Machine Learning, 2001, 45: 5–32.
Breiman, L., Bagging predictors, Machine Learning, 1996, 24: 123–140.
Shannon, W. D., Province, M. A., Rao, D. C., Tree-based recursive partitioning methods for subdividing sibpairs into relatively more homogeneous subgroups, Genet Epidemiol., 2001, 20: 293–306.
Province, M. A., Shannon, W. D., Rao, D. C., Classification methods for confronting heterogeneity, Adv. Genet., 2001, 42: 273–286.
Mills, J. C., Gordon, J. I., A new approach for filtering noise from high-density oligonucleotide microarray datasets, Nucleic Acids Res., 2001, 29: E72.
Hall, M., Correlation-based Feature Selection for Machine Learning, Hamilton: University of Waikato, 1998, PhD Thesis.
Blum, A. L., Langley, P., Selection of relevant features and examples in machine learning, Artificial Intelligence, 1997, 97: 245–271.
Kohavi, R., John, G. H., Wrappers for feature subset selection, Artificial Intelligence, 1997, 97: 273–324.
Xing, E. P., Jordan, M. I., Karp, R. M., Feature selection for high-dimensional genomic microarray data, Machine Learning: Proceedings of the Eighteenth International Conference, San Mateo, CA: Morgan Kaufmann, 2001.
Dietterich, T. G., Ensemble methods in machine learning, in First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science (eds. Kittler, J., Roli, F.), New York: Springer Verlag, 2000, 1–15.
Guo, Z., Li, X., Rao, S., Analysis of Medical Data: An Introduction to Bioinformatics, Harbin, China: Harbin Publisher, 2001, 145–151.
Zhang, H., Yu, C. Y., Singer, B. et al., Recursive partitioning for tumor classification with gene expression microarray data, Proc.Natl.Acad.Sci.USA, 2001, 98: 6730–6735.
Alon, U., Barkai, N., Notterman, D. A. et al., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc.Natl.Acad.Sci.USA, 1999, 96: 6745–6750.
Kowalski, J., Denhardt, D. T., Regulation of the mRNA for monocyte-derived neutrophil-activating peptide in differentiating HL60 promyelocytes, Mol. Cell Biol., 1989, 9: 1946–1957.
Su, Y., Murali, T. M., Pavlovic, V. et al., RankGene: Identification of diagnostic genes based on expression data, Bioinformatics, 2003, 19: 1578–1579.
Yeoh, E. J., Ross, M. E., Shurtleff, S. A. et al., Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, 2002, 1: 133–143.
Haseman, J. K., Elston, R. C., The investigation of linkage between a quantitative trait and a marker locus, Behav. Genet., 1972, 2: 3–19.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Li, X., Rao, S., Zhang, T. et al. An ensemble method for gene discovery based on DNA microarray data. Sci. China Ser. C.-Life Sci. 47, 396–405 (2004). https://doi.org/10.1007/BF03187097
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF03187097