Complex Function Sets Improve Symbolic Discriminant Analysis of Microarray Data
Our ability to simultaneously measure the expression levels of thousands of genes in biological samples is providing important new opportunities for improving the diagnosis, prevention, and treatment of common diseases. However, new technologies such as DNA microarrays are generating new challenges for variable selection and statistical modeling. In response to these challenges, a genetic programming-based strategy called symbolic discriminant analysis (SDA) for the automatic selection of gene expression variables and mathematical functions for statistical modeling of clinical endpoints has been developed. The initial development and evaluation of SDA has focused on a function set consisting of only the four basic arithmetic operators. The goal of the present study is to evaluate whether adding more complex operators such as square root to the function set improves SDA modeling of microarray data. The results presented in this paper demonstrate that adding complex functions to the terminal set significantly improves SDA modeling by reducing model size and, in some cases, reducing classification error and runtime. We anticipate SDA will be an important new evolutionary computation tool to be added to the repertoire of methods for the analysis of microarray data.
KeywordsSystemic Lupus Erythematosus Linear Discriminant Analysis Complex Function Classification Error Multifactor Dimensionality Reduction
Unable to display preview. Download preview PDF.
- 2.Moore, J.H., Parker, J.S., Hahn, L.W.: Symbolic discriminant analysis for mining gene expression patterns. In: De Raedt, L., Flach, P. (eds) Lecture Notes in Artificial Intelligence 2167, pp 372–81, Springer-Verlag, Berlin (2001)Google Scholar
- 5.Moore, J.H., Parker, J.S.: Evolutionary computation in microarray data analysis. In: Lin, S. and Johnson, K. (eds): Methods of Microarray Data Analysis. Kluwer Academic Publishers, Boston (2001)Google Scholar
- 6.Templeton, A.R.: Epistasis and complex traits. In: Wade, M., Brodie III, B., Wolf, J. (eds.): Epistasis and Evolutionary Process. Oxford University Press, New York (2000)Google Scholar
- 8.Moore, J.H.: Cross validation consistency for the assessment of genetic programming results in microarray studies. In: Raidl, G. et al. (eds) Lecture Notes in Computer Science 2611, in press, Springer-Verlag, Berlin (2003).Google Scholar
- 11.Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Plummer, W.D., Parl, F.F. and Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69 (2001) 138–147CrossRefGoogle Scholar
- 12.Fisher, R.A.: The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugen. 7 (1936) 179–188Google Scholar
- 13.Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice Hall, Upper Saddle River (1998)Google Scholar
- 15.Neter, J., Wasserman, W., Kutner, M.H.: Applied Linear Statistical Models, Regression, Analysis of Variance, and Experimental Designs. 3rd edn. Irwin, Homewood (1990)Google Scholar
- 16.Langley, P.: Elements of Machine Learning. Morgan Kaufmann Publishers, Inc., San Francisco (1996)Google Scholar
- 18.http://garage.cps.msu.edu/software/software-index.htmlGoogle Scholar
- 19.Fogel, G.B., Corne, D.W.: Evolutionary Computation in Bioinformatics. Morgan Kaufmann Publishers, Inc., San Francisco (2003)Google Scholar
- 20.Maas, K., Chan, S., Parker, J., Slater, A., Moore, J.H., Olsen, N., and Aune, T.M.: Cutting edge: molecular portrait of human autoimmunity. Journal of Immunology 169 (2002) 5–9Google Scholar