Tuning ReliefF for Genome-Wide Genetic Analysis

  • Jason H. Moore
  • Bill C. White
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4447)

Abstract

An important goal of human genetics is the identification of DNA sequence variations that are predictive of who is at risk for various common diseases. The focus of the present study is on the challenge of detecting and characterizing nonlinear attribute interactions or dependencies in the context of a genome-wide genetic study. The first question we address is whether the ReliefF algorithm is suitable for attribute selection in this domain. The second question we address is whether we can improve ReliefF for selecting important genetic attributes. Using simulated genetic datasets, we show that ReliefF is significantly better than a naïve chi-square test of independence for selecting two interacting attributes out of 103 candidates. In addition, we show that ReliefF can be improved in this domain by systematically removing the worst attributes and re-estimating ReliefF weights. Our simulation studies demonstrate that this new Tuned ReliefF (TuRF) algorithm is significantly better than ReliefF.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hirschhorn, J.N., Daly, M.J.: Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6, 95–108 (2005)CrossRefGoogle Scholar
  2. 2.
    Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: Theoretical and practical concerns. Nature Reviews Genetics 6, 109–118 (2005)CrossRefGoogle Scholar
  3. 3.
    Bateson, W.: Mendel’s Principles of Heredity. Cambridge University Press, Cambridge (1909)Google Scholar
  4. 4.
    Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity 56, 73–82 (2003)CrossRefGoogle Scholar
  5. 5.
    Moore, J.H., Williams, S.W.: Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more modern synthesis. BioEssays 27, 637–646 (2005)CrossRefGoogle Scholar
  6. 6.
    Thornton-Wells, T.A., Moore, J.H., Haines, J.L.: Genetics, statistics and human disease: Analytical retooling for complexity. Trends in Genetics 20, 640–647 (2004)CrossRefGoogle Scholar
  7. 7.
    Freitas, A.: Understanding the crucial role of attribute interactions. Artificial Intelligence Review 16, 177–199 (2001)MATHCrossRefGoogle Scholar
  8. 8.
    Li, W., Reich, J.: A complete enumeration and classification of two-locus disease models. Human Heredity 50, 334–349 (2000)CrossRefGoogle Scholar
  9. 9.
    Moore, J.H., Ritchie, M.D.: The challenges of whole-genome approaches to common diseases. JAMA 291, 1642–1643 (2004)CrossRefGoogle Scholar
  10. 10.
    Moore, J.H., Williams, S.W.: New strategies for identifying gene-gene interactions in hypertension. Annals of Medicine 34, 88–95 (2002)CrossRefGoogle Scholar
  11. 11.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning: Proceedings of the AAAI’92 (1992)Google Scholar
  12. 12.
    Kononenko, I.: Estimating attributes: Analysis and extension of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)Google Scholar
  13. 13.
    Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Machine Learning 53, 23–69 (2003)MATHCrossRefGoogle Scholar
  14. 14.
    Robnik-Sikonja, M., Kononenko, I.: Comprehensible interpretation of relief’s estimates. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 433–440 (2001)Google Scholar
  15. 15.
    Hahn, L.W., Moore, J.H.: Ideal discrimination of discrete clinical endpoints using multilocus genotypes. Silico Biology 4, 183–194 (2004)MathSciNetGoogle Scholar
  16. 16.
    Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)CrossRefGoogle Scholar
  17. 17.
    Moore, J.H.: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Review of Molecular Diagnostics 4, 795–803 (2004)CrossRefGoogle Scholar
  18. 18.
    Moore, J.H., Gilbert, J.C., Tsai, C., Chiang, F.T., Holden, W., Barney, N., White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology 241, 252–261 (2006)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Ritchie, M.D., Hahn, L.W., Moore, J.H.: Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, phenocopy, and genetic heterogeneity. Genetic Epidemiology 24, 150–157 (2003)CrossRefGoogle Scholar
  20. 20.
    Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69, 138–147 (2001)CrossRefGoogle Scholar
  21. 21.
    Moore, J.H., White, B.C.: Exploiting expert knowledge in genetic programming for genome-wide genetic analysis. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 969–977. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  22. 22.
    Moore, J.H.: Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data. IGI (2007)Google Scholar
  23. 23.
    Moore, J.H., White, B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Genetic Programming Theory and Practice, vol. 4, Springer, Heidelberg (2007)Google Scholar
  24. 24.
    Lui, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artificial Intelligence 159, 49–74 (2004)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, L.: Data mining in bioinformatics using weka. Bioinformatics 20, 2479–2481 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Jason H. Moore
    • 1
  • Bill C. White
    • 1
  1. 1.Dartmouth College, One Medical Center Drive, NH 03756Lebanon

Personalised recommendations