Skip to main content

Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge

  • Chapter
Genetic Programming Theory and Practice IV

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

Human genetics is undergoing an information explosion. The availability of chip-based technology facilitates the measurement of thousands of DNA sequence variation from across the human genome. The challenge is to sift through these high-dimensional datasets to identify combinations of interacting DNA sequence variations that are predictive of common diseases. The goal of this study is to develop and evaluate a genetic programming (GP) approach to attribute selection and classification in this domain. We simulated genetic datasets of varying size in which the disease model consists of two interacting DNA sequence variations that exhibit no independent effects on class (i.e. epistasis). We show that GP is no better than a simple random search when classification accuracy is used as the fitness function. We then show that including pre-processed estimates of attribute quality using Tuned ReliefF (TuRF) in a multi-objective fitness function that also includes accuracy significantly improves the performance of GP over that of random search. This study demonstrates that GP may be a useful computational discovery tool in this domain. This study raises important questions about the general utility of GP for these types of problems, the importance of data preprocessing, the ideal functional form of the fitness function, and the importance of expert knowledge. We anticipate this study will provide an important baseline for future studies investigating the usefulness of GP as a general computational discovery tool for large-scale genetic studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., and Donnelly, P. (2005). International hapmap consortium: A haplotype map of the human genome. Nature, 437:1299–1320.

    Article  Google Scholar 

  • Andrew, A.S., Nelson, H.H., Kelsey, K.T., Moore, J.H., Meng, A.C., Casella, D.P., Tosteson, T.D., Schned, A.R., and Karagas, M.R. (2006). Concordance of multiple analytical approaches demonstrates a complex relationship between dna repair gene snps, smoking and bladder cancer susceptibility. Carcinogenesis.

    Google Scholar 

  • Bala, J., Jong, K. De, Huang, J., Vafaie, H., and Wechsler, H. (1996). Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation, 4:297–312.

    Google Scholar 

  • Banzhaf, W., Nordin, P., Keller, R.E., and Francone, F.D. (1998). Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann Publishers.

    Google Scholar 

  • Bateson, W. (1909). Mendel’s Principles of Heredity. Cambridge University Press, Cambridge.

    Google Scholar 

  • Cho, Y.M., Ritchie, M.D., Moore, J.H., Park, J.Y., Lee, K.U., Shin, H.D., Lee, H.K., and Park, K.S. (2004). Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia, 47:549–554.

    Article  Google Scholar 

  • Coello, C.A., Veldhuizen, D.A. Van, and Lamont, G.B. (2002). Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer.

    Google Scholar 

  • Coffey, C.S., Hebert, P.R., Ritchie, M.D., Krumholz, H.M., Morgan, T.M., Gaziano, J.M., Ridker, P.M., and Moore, J.H. (2004). An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene interactions on risk of myocardial infarction: The importance of model validation. BMC Bioninformatics, 4:49.

    Article  Google Scholar 

  • Deb, K. (2001). Multi-Objective Optimization Using Evolutionary Algorithms. Wiley.

    Google Scholar 

  • Freitas, A. (2001). Understanding the crucial role of attribute interactions. Artificial Intelligence Review, 16:177–199.

    Article  MATH  Google Scholar 

  • Freitas, A. (2002). Data Mining and KNowledge Discovery with Evolutionary Algorithms. Springer.

    Google Scholar 

  • Goldberg, D.E. (2002). The Design of Innovation. Kluwer.

    Google Scholar 

  • Hahn, L.W. and Moore, J.H. (2004). Ideal discrimination of discrete clinical endpoints using multilocus genotypes. Silico Biology, 4:183–194.

    MathSciNet  Google Scholar 

  • Hahn, L.W., Ritchie, M.D., and Moore, J.H. (2003). Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics, 19:376–382.

    Article  Google Scholar 

  • Haynes, Thomas, Langdon, William B., O’Reilly, Una-May, Poli, Riccardo, and Rosca, Justinian, editors (1999). Foundations of Genetic Programming, Orlando, Florida, USA.

    Google Scholar 

  • Hirschhorn, J.N. and Daly, M.J. (2005). Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6(95): 108–118.

    Google Scholar 

  • Jensen, L.J., Saric, J., and Bork, P. (2006). Literature mining for the biologist: from information retrieval to biological discovery. Nature Review Genetics, 7:119–129.

    Article  Google Scholar 

  • Jin, Y. (2005). Knowledge Incorporation in Evolutionary Computation. Springer.

    Google Scholar 

  • Kira, K. and Rendell, L.A. (1992). A practical approach to feature selection. In Machine Learning: Proceedings of the AAAI’92.

    Google Scholar 

  • Kononenko, I. (1994). Estimating attributes: analysis and extension of relief. Machine Learning: ECML, 94:171–182.

    Google Scholar 

  • Koza, John R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA.

    MATH  Google Scholar 

  • Koza, John R. (1994). Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge Massachusetts.

    MATH  Google Scholar 

  • Koza, John R., Andre, David, Bennett III, Forrest H, and Keane, Martin (1999). Genetic Programming 3: Darwinian Invention and Problem Solving. Morgan Kaufman.

    Google Scholar 

  • Koza, John R., Keane, Martin A., Streeter, Matthew J., Mydlowec, William, Yu, Jessen, and Lanza, Guido (2003). Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Academic Publishers.

    Google Scholar 

  • Koza, J.R., Jones, L.W., Keane, M.A., Streeter, M.J., and Al-Sakran, S.H. (2005). Toward automated design of industrial-strength analog circuits by means of genetic programming. In O’Reilly, U.M., Yu, T., Riolo, R., and Worzel, B., editors, Genetic Programming Theory and practice. Springer.

    Google Scholar 

  • Langdon, William B. (1998). Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming!, volume 1 of Genetic Programming. Kluwer, Boston.

    Google Scholar 

  • Lenski, R.E., Ofria, C., Pennock, R.T., and Adami, C. (2003). The evolutionary origin of complex features. 423:139–144.

    Google Scholar 

  • Li, W. and Reich, J. (2000). A complete enumeration and classification of two-locus disease models. Human Heredity, 50:334–349.

    Article  Google Scholar 

  • Moore, J.H. (2003). The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity, 56:73–82.

    Article  Google Scholar 

  • Moore, J.H. (2004). Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev. Mol Diagn, 4:795–803.

    Article  Google Scholar 

  • Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, W., Barney, N., and White, B.C. (2006). A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology.

    Google Scholar 

  • Moore, J.H. and Ritchie, M.D. (2004). The challenges of whole-genome approaches to common diseases. JAMA, 291:1642–1643.

    Article  Google Scholar 

  • Moore, J.H. and Williams, S.W. (2002). New strategies for identifying gene-gene interactions in hypertension. Annals of Medicine, 34:88–95.

    Article  Google Scholar 

  • Moore, J.H. and Williams, S.W. (2005). Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more mordern synthesis. BioEssays, 27:637–646.

    Article  Google Scholar 

  • Qin, S., Zhao, X., Pan, Y., Liu, J., Feng, G., Fu, J., Bao, J., Zhang, Z., and He, L. (2005). An association study of the n-methyl-d-aspartate receptor nr1 subunit gene (grin1) and nr2b subunit gene (grin2b) in schizophrenia with universal dna microarray. European Journal of Human Genetics, 13:807–814.

    Article  Google Scholar 

  • Ritchie, M.D., Hahn, L.W., and Moore, J.H. (2003). Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, phenocopy and genetic heterogeneity. Genetic Epidemiology, 24:150–157.

    Article  Google Scholar 

  • Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F, and Moore, J.H. (2001). Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. American Journal of Human Genetics, 69:138–147.

    Article  Google Scholar 

  • Robnik-Sikonja, M. and Kononenko, I. (2003). Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69.

    Article  MATH  Google Scholar 

  • Ryan, C. and Azad, R.M. (2003). Sensible initialization in chorus. EuroGP 2003, pages 394–403.

    Google Scholar 

  • Sastry, Kumara, O’Reilly, Una-May, and Goldberg, David E. (2004). Population sizing for genetic programming based on decision making. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 4, pages 49–65. Springer, Ann Arbor.

    Google Scholar 

  • Soares, M.L., Coelho, T., Sousa, A., Batalov, S., Conceicao, I., Sales-Luis, M.L., Ritchie, M.D., Williams, S.M., Nievergelt, C.M., Schork, N.J., Saraiva, M.J., and Buxbaum, J.N. (2005). Susceptibility and modifier genes in Portuguese transthyretin v30m amyloid polygeuropathy: complexity in a single-gene disease. Human Molecular Genetics, 14:543–553.

    Article  Google Scholar 

  • Thornton-Wells, T.A., Moore, J.H., and Haines, J.L. (2004). Genetics, statistics and human disease: analytical retooling for complexity. Trends in Genetics, 20:640–647.

    Article  Google Scholar 

  • Tsai, C.T., Lai, L.P., Lin, J.L., Chiang, F.T., Hwang, J.J., Ritchie, M.D., Moore, J.H., Hsu, K.L., Tseng, C.D., Liau, C.S., and Tseng, Y.Z. (2004). Renin-angiotensin system gene polymorphisms and atrial fibrillation. Circulation, 109:1640–1646.

    Article  Google Scholar 

  • Wang, W.Y., Barratt, B.J., Clayton, D.G., and Todd, J.A. (2005). Genome-wide association studies: theoretical and practical concerns. Nature Reviews Genetics, 6:109–118.

    Article  Google Scholar 

  • White, B.C., Gilbert, J.C., Reif, D.M., and Moore, J.H. (2005). A statistical comparison of grammatical evolution strategies in the domain of human genetics. Proceedings of the IEEE Congress on Evolutionary Computing, pages 676–682.

    Google Scholar 

  • Wilke, R.A., Reif, D.M., and Moore, J.H. (2005). Combinatorial pharmacoge-netics. Nature Reviews Drug Discovery, 4:911–918.

    Article  Google Scholar 

  • Williams, S.M., Ritchie, M.D., 3rd, J.A. Phillips, Dawson, E., Prince, M., Dzhura, E., Willis, A., Semenya, A., Summar, M., White, B.C., Addy, J.H., Kpodonu, J., Wong, L.J., Felder, R.A., Jose, P.A., and Moore, J.H. (2004). Multilocus analysis of hypertension: a hierarchical approach. Human Heredity, 57:28–38.

    Article  Google Scholar 

  • Xu, J., Lowery, J., Wiklund, F., Sun, J., Lindmark, F., Hsu, F.C., Dimitrov, L., Chang, B., Turner, A.R., Adami, H.O., Suh, E., Moore, J.H., Zheng, S.L., Isaacs, W.B., Trent, J.M., and Gronberg, H. (2005). The interaction of four inflammatory genes significantly predicts prostate cancer risk. Cancer Epidemiology Biomarkers and Prevention, 14:2563–2568.

    Article  Google Scholar 

  • Yu, Tina, Riolo, Rick L., and Worzel, Bill (2005). Genetic programming: Theory and practice. In Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice III, volume 9 of Genetic Programming, chapter 1, pages 1–14. Springer, Ann Arbor.

    Google Scholar 

  • Zhang, Yang and Rockett, Peter I. (2006). Feature extraction using multi-objective genetic programming. In Jin, Yaochu, editor, Multi-Objective Machine Learning, volume 16 of Studies in Computational Intelligence, chapter 4, pages 79–106. Springer. Invited chapter.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Moore, J.H., White, B.C. (2007). Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge. In: Riolo, R., Soule, T., Worzel, B. (eds) Genetic Programming Theory and Practice IV. Genetic and Evolutionary Computation. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-49650-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-49650-4_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-33375-5

  • Online ISBN: 978-0-387-49650-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics