Genetic Analysis of Prostate Cancer Using Computational Evolution, Pareto-Optimization and Post-processing

  • Jason H. Moore
  • Douglas P. Hill
  • Arvis Sulovari
  • La Creis Kidd
Part of the Genetic and Evolutionary Computation book series (GEVO)


Given infinite time, humans would progress through modeling complex data in a manner that is dependent on prior expert knowledge. The goal of the present study is make extensions and enhancements to a computational evolution system (CES) that has the ultimate objective of tinkering with data as a human would. This is accomplished by providing flexibility in the model-building process and a meta-layer that learns how to generate better models. The key to the CES system is the ability to identify and exploit expert knowledge from biological databases or prior analytical results. Our prior results have demonstrated that CES is capable of efficiently navigating these large and rugged fitness landscapes toward the discovery of biologically meaningful genetic models of disease. Further, we have shown that the efficacy of CES is improved dramatically when the system is provided with statistical or biological expert knowledge. The goal of the present study was to apply CES to the genetic analysis of prostate cancer aggressiveness in a large sample of European Americans. We introduce here the use of Pareto-optimization to help address overfitting in the learning system. We further introduce a post-processing step that uses hierarchical cluster analysis to generate expert knowledge from the landscape of best models and their predictions across patients. We find that the combination of Pareto-optimization and post-processing of results greatly improves the genetic analysis of prostate cancer.

Key words

Computational evolution Genetic epidemiology Epistasis Gene-gene interactions 



This work was supported by NIH grants LM009012, LM010098 and AI596-94. We would like to thank the participants of present and past Genetic Programming Theory and Practice Workshops (GPTP) for their stimulating feedback and discussion that helped formulate some of the ideas in this paper.


  1. Banzhaf W, Francone FD, Keller RE, Nordin P (1998) Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco, CA, USAzbMATHGoogle Scholar
  2. Banzhaf W, Beslon G, Christensen S, Foster J, Képès F, Lefort V, Miller J, Radman M, Ramsden J (2006) From artificial evolution to computational evolution: a research agenda. Nature Reviews Genetics 7:729–735CrossRefGoogle Scholar
  3. Cordell HJ (2009) Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics 10:392–404CrossRefGoogle Scholar
  4. Fogel GB, Corne DW (eds) (2003) Evolutionary Computation in Bioinformatics. Morgan Kaufmann Publishers Inc.Google Scholar
  5. Greene C, Hill D, Moore J (2009a) Environmental noise improves epistasis models of genetic data discovered using a computational evolution system. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp 1785–1786Google Scholar
  6. Greene CS, Hill DP, Moore JH (2009b) Environmental sensing of expert knowledge in a computational evolution system for complex problem solving in human genetics. In: Riolo RL, O’Reilly UM, McConaghy T (eds) Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, Springer, Ann Arbor, chap 2, pp 19–36Google Scholar
  7. Hastie T, Tibshirani R, Friedman J (2003) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, corrected edn. Springer, URL\&path=ASIN/0387952845
  8. Horn J, Nafpliotis N, Goldberg DE (1994) A niched pareto genetic algorithm for multiobjective optimization. In: Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pp 82–87 vol.1, DOI 10.1109/ICEC.1994.350037, URL
  9. Koza JR (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems), 1st edn. A Bradford Book, URL\&path=ASIN/0262111705
  10. Lamont GB, VanVeldhuizen DA (2002) Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, Norwell, MA, USAzbMATHGoogle Scholar
  11. Lee PH, Jung JY, Shatkay H (2009) Functionally informative tag snp selection using a pareto-optimal approach: playing the game of life. BMC Bioinformatics 10(S-13):0Google Scholar
  12. McKinney BA, Reif DM, Ritchie MD, Moore JH (2006) Machine Learning for Detecting Gene-Gene Interactions: A Review. Applied Bioinformatics 5(2):77–88, URL
  13. Mitchell TM (1997) Machine Learning, 1st edn. McGraw-Hill, Inc., New York, NY, USAzbMATHGoogle Scholar
  14. Moore J, White B (2007) Tuning relieff for genome-wide genetic analysis. In: Lecture Notes in Computer Science, Springer, vol 4447, pp 166–175CrossRefGoogle Scholar
  15. Moore J, Williams S (2009) Epistasis and its implications for personal genetics. American Journal of Human Genetics 85:309–320CrossRefGoogle Scholar
  16. Moore J, Parker J, Olsen N, Aune T (2002) Symbolic discriminant analysis of microarray data in autoimmune disease. Genetic Epidemiology 23:57–69CrossRefGoogle Scholar
  17. Moore J, Andrews P, Barney N, White B (2008) Development and evaluation of an open-ended computational evolution system for the genetic analysis of susceptibility to common human diseases. In: Lecture Notes in Computer Science, vol 4973, pp 129–140CrossRefGoogle Scholar
  18. Moore J, Greene C, Andrews P, White B (2009) Genetic Programming Theory and Practice VI, Springer, chap 9: Does complexity matter? Artificial evolution, computational evolution, and the genetic analysis of epistasis in common human diseasesGoogle Scholar
  19. Moore J, Asselbergs F, Williams S (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4):445–455CrossRefGoogle Scholar
  20. Moore JH, Hill DP, Fisher JM, Lavender N, Kidd LC (2011) Human-computer interaction in a computational evolution system for the genetic analysis of cancer. In: Riolo R, Vladislavleva E, Moore JH (eds) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, Springer, Ann Arbor, USA, chap 9, pp 153–171, DOI doi:  10.1007/978-1-4614-1770-5-9
  21. Motsinger AA, Ritchie MD, Reif DM (2007) Novel methods for detecting epistasis in pharmacogenomics studies. Pharmacogenomics 8(9):1229–1241, DOI 10.2217/14622416.8.9.1229, URL Google Scholar
  22. Pattin KA, Payne JL, Hill DP, Caldwell T, Fisher JM, Moore JH (2010) Exploiting expert knowledge of protein-protein interactions in a computational evolution system for detecting epistasis. In: Riolo R, McConaghy T, Vladislavleva E (eds) Genetic Programming Theory and Practice VIII, Genetic and Evolutionary Computation, vol 8, Springer, Ann Arbor, USA, chap 12, pp 195–210, URL
  23. Payne J, Greene C, Hill D, Moore J (2010) Exploitation of Linkage Learning in Evolutionary Algorithms, Springer, chap 10: Sensible initialization of a computational evolution system using expert knowledge for epistasis analysis in human genetics, pp 215–226Google Scholar
  24. Smits G, Kotanchek M (2004) Pareto-front exploitation in symbolic regression. In: O’Reilly UM, Yu T, Riolo RL, Worzel B (eds) Genetic Programming Theory and Practice II, Springer, Ann Arbor, chap 17, pp 283–299, DOI doi:10.1007/0-387-23254-0-17Google Scholar
  25. Thornton-Wells T, Moore J, Haines J (2004) Genetics, statistics, and human disease: Analytic retooling for complexity. Trends in Genetics 20:640–647CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Jason H. Moore
    • 1
  • Douglas P. Hill
    • 1
  • Arvis Sulovari
    • 1
  • La Creis Kidd
    • 2
  1. 1.The Geisel School of Medicine at DartmouthOne Medical Center DriveLabanonUSA
  2. 2.University of LouisvilleLouisvilleUSA

Personalised recommendations