An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming

  • Casey S. Greene
  • Bill C. White
  • Jason H. Moore
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4774)


Human genetics is undergoing a data explosion. Methods are available to measure DNA sequence variation throughout the human genome. Given current knowledge it seems likely that common human diseases are best predicted by interactions between biological components, which can be examined as interacting DNA sequence variations. The challenge is thus to examine these high-dimensional datasets to identify combinations of variations likely to predict common diseases. The goal of this paper was to develop and evaluate a genetic programming (GP) mutator suited to this task by exploiting expert knowledge in the form of Tuned ReliefF (TuRF) scores during mutation. We show that using expert knowledge guided mutation performs similarly to expert knowledge guided selection. This study demonstrates that in the context of an expert knowledge aware GP, mutation may be an appropriate component of the GP used to search for interacting predictors in this domain.


Genetic Programming Expert Knowledge Mutation Operator Multifactor Dimensionality Reduction Common Human Disease 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Freitas, A.A.: Understanding the crucial role of attribute interaction in data mining. Artif. Intell. Rev. 16(3), 177–199 (2001)zbMATHCrossRefGoogle Scholar
  2. 2.
    Goldberg, D.E.: The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Academic Publishers, Norwell, MA, USA (2002)zbMATHGoogle Scholar
  3. 3.
    Consortium, T.I.H.: A haplotype map of the human genome. Nature 437(7063), 1299–1320 (2005)CrossRefGoogle Scholar
  4. 4.
    White, B.C., Gilbert, J.C., Reif, D.M., Moore, J.H.: A statistical comparison of grammatical evolution strategies in the domain of human genetics. In: Proceedings of the IEEE Congress on Evolutionary Computing, pp. 676–682. IEEE Computer Society Press, Los Alamitos (2005)CrossRefGoogle Scholar
  5. 5.
    Moore, J.H., White, B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Genetic Programming Theory and Practice IV, Springer, Heidelberg (2006)Google Scholar
  6. 6.
    Moore, J., White, B.: Exploiting expert knowledge in genetic programming for genome-wide genetic analysis. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) Parallel Problem Solving from Nature - PPSN IX. LNCS, vol. 4193, pp. 969–977. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA (1992)zbMATHGoogle Scholar
  8. 8.
    Koza, J.R.: Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge, MA, USA (1994)zbMATHGoogle Scholar
  9. 9.
    Koza, J.R., Andre, D., Bennett, F.H., Keane, M.A.: Genetic Programming III: Darwinian Invention & Problem Solving. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)zbMATHGoogle Scholar
  10. 10.
    Koza, J.R.: Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Academic Publishers, Norwell, MA, USA (2003)zbMATHGoogle Scholar
  11. 11.
    Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1998)zbMATHGoogle Scholar
  12. 12.
    Langdon, W.B., Koza, J.R.: Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! Kluwer Academic Publishers, Norwell, MA, USA (1998)Google Scholar
  13. 13.
    Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, Heidelberg (2002)zbMATHCrossRefGoogle Scholar
  14. 14.
    Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms, Secaucus, NJ, USA. Springer, New York (2002)zbMATHGoogle Scholar
  15. 15.
    Fogel, G., Corne, D.: Evolutionary Computation in Bioinformatics. Morgan Kaufmann, San Francisco (2003)Google Scholar
  16. 16.
    Yu, T., Riolo, R., Worzel, B.: Genetic Programming: Theory and Practice, (2006) 10.1007/0-387-28111-8_1Google Scholar
  17. 17.
    Luke, S., Spector, L.: A revised comparison of crossover and mutation in genetic programming. In: Koza, J.R., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA, pp. 208–213. Morgan Kaufmann, San Francisco (1998)Google Scholar
  18. 18.
    Bearpark, K., Keane, A.: The use of collective memory in genetic programming. In: Jin, Y. (ed.) Knowledge Incorporation in Evolutionary Computation. Studies in Fuzziness and Soft Computing, pp. 15–36. Springer, Heidelberg (2005)Google Scholar
  19. 19.
    Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69, 138–147 (2001)CrossRefGoogle Scholar
  20. 20.
    Moore, J.H.: Computational analysis of gene-gene interactions using multifactor dimensionality reduction. Expert Review of Molecular Diagnostics 4(6), 795–803 (2004)CrossRefGoogle Scholar
  21. 21.
    Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology 241(2), 252–261 (2006)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Moore, J.H.: Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data. IGI (2007)Google Scholar
  23. 23.
    Wilke, R.A., Reif, D.M., Moore, J.H.: Combinatorial pharmacogenetics. Nature Reviews Drug Discovery 4, 911–918 (2005)CrossRefGoogle Scholar
  24. 24.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning: Proceedings of the AAAI 1992 (1992)Google Scholar
  25. 25.
    Kononenko, I.: Estimating attributes: Analysis and extension of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)Google Scholar
  26. 26.
    Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)zbMATHCrossRefGoogle Scholar
  27. 27.
    Moore, J.H., White, B.C.: Tuning relieff for genome-wide genetic analysis. LNCS, vol. 4447, pp. 166–175 (2007)Google Scholar
  28. 28.
    Gonzalez, G., Uribe, J.C., Tari, L., Brophy, C., Baral, C.: Mining gene-disease relationships from biomedical literature: Weighting protein-protein interactions and connectivity measures. In: Pacific Symposium on Biocomputing, vol. 12, pp. 28–39 (2007)Google Scholar
  29. 29.
    Moore, J.H., Barney, N., Tsai, C.T., Chiang, F.T., Gui, J., White, B.C.: Symbolic modeling of epistasis. Hum. Hered. 63(2), 120–133 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Casey S. Greene
    • 1
  • Bill C. White
    • 1
  • Jason H. Moore
    • 1
  1. 1.Dartmouth College, Hanover, NH 03755USA

Personalised recommendations