Optimal Use of Biological Expert Knowledge from Literature Mining in Ant Colony Optimization for Analysis of Epistasis in Human Disease

  • Arvis Sulovari
  • Jeff Kiralis
  • Jason H. Moore
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7833)

Abstract

The fast measurement of millions of sequence variations across the genome is possible with the current technology. As a result, a difficult challenge arise in bioinformatics: the identification of combinations of interacting DNA sequence variations predictive of common disease [1]. The Multifactor Dimensionality Reduction (MDR) method is capable of analysing such interactions but an exhaustive MDR search would require exponential time. Thus, we use the Ant Colony Optimization (ACO) as a stochastic wrapper. It has been shown by Greene et al. that this approach, if expert knowledge is incorporated, is effective for analysing large amounts of genetic variation[2]. In the ACO method integrated in the MDR package, a linear and an exponential probability distribution function can be used to weigh the expert knowledge. We generate our biological expert knowledge from a network of gene-gene interactions produced by a literature mining platform, Pathway Studio. We show that the linear distribution function of expert knowledge is the most appropriate to weigh our scores when expert knowledge from literature mining is used. We find that ACO parameters significantly affect the power of the method and we suggest values for these parameters that can be used to optimize MDR in Genome Wide Association Studies that use biological expert knowledge.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Greene, C.S., Gilmore, J.M., Kiralis, J., Andrews, P.C., Moore, J.H.: Optimal Use of Expert Knowledge in Ant Colony Optimization for the Analysis of Epistasis in Human Disease. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds.) EvoBIO 2009. LNCS, vol. 5483, pp. 92–103. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human disease. Human Heredity 56, 73–82 (2003)CrossRefGoogle Scholar
  3. 3.
    The International HapMap Consortium: A second Generation human haplotype of over 3.1 million SNPs. Nature 449, 851–861 (2007)Google Scholar
  4. 4.
    Nikitin, A., Egorov, S., Mazo, I.: Pathway Studio-the analysis and navigation of molecular networks. Bioinformatics Oxford Journals 19(16), 2155–2157 (2003)CrossRefGoogle Scholar
  5. 5.
    Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology 241(2), 252–261 (2006)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cordell, H.J.: Detecting gene-gene interactions that underlie human diseases. Nature Review Genetics 10, 392–404 (2009)CrossRefGoogle Scholar
  7. 7.
    Ritchie, M.D., et al.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69, 138–147 (2001)CrossRefGoogle Scholar
  8. 8.
    Julia, A., et al.: Identification of a two-loci epistatic interaction associated with susceptibility to rheumatoid arthritis through reverse engineering and multifactor dimensionality reduction. Genomics 90, 6–13 (2007)CrossRefGoogle Scholar
  9. 9.
    Cho, Y.M., et al.: Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia 47, 549–554 (2004)CrossRefGoogle Scholar
  10. 10.
    Tsai, C.T., et al.: Reninangiotensin system gene polymorphisms and coronary artery disease in a large angiographic cohort: detection of high order genegene interaction. Atherosclerosis 195, 172–180 (2007)CrossRefGoogle Scholar
  11. 11.
    Andrew, A.S., et al.: Bladder Cancer SNP panel predicts susceptibility and survival. Human Genetics 125(5-6), 527–539 (2009)CrossRefGoogle Scholar
  12. 12.
    Urbanowicz, R.J., Kiralis, J., Sinnot-Armstrong, N.A., Heberling, T., Fisher, J.M., Moore, J.H.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining 5(16) (2012)Google Scholar
  13. 13.
  14. 14.
  15. 15.
  16. 16.
    Sokal, R.R., Rohlf, F.J.: Biometry: the principles and practice of statistics in biological research, 3rd edn. W.H. Freeman and Co., New York (1995)Google Scholar
  17. 17.
    Dorigo, M., Maniezzo, V., Colorni, A.: Positive Feedback as a search strategy. Dipartimento di Elettronica e Informatica, Politecnico di Milano, Technical Reports, 91–116 (1991)Google Scholar
  18. 18.
    Dorigo, M., Stützle, T.: Ant Colony Optimization (2004)Google Scholar
  19. 19.
    Martens, D., et al.: Editorial Survey: Swarm Intelligence for Data Mining. Machine Learning 82(1), 1–42 (2011)CrossRefGoogle Scholar
  20. 20.
    Moore, J.H., White, B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Genetic Programming Theory and Practice IV. Springer (2007)Google Scholar
  21. 21.
    Pattin, K., Moore, J.H.: Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases. Human Genetics 124(1), 19–29 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Arvis Sulovari
    • 1
  • Jeff Kiralis
    • 1
  • Jason H. Moore
    • 1
  1. 1.Dartmouth-Hitchcock Medical CenterLebanonUnited States

Personalised recommendations