Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4447))

Abstract

An important goal of human genetics is the identification of DNA sequence variations that are predictive of who is at risk for various common diseases. The focus of the present study is on the challenge of detecting and characterizing nonlinear attribute interactions or dependencies in the context of a genome-wide genetic study. The first question we address is whether the ReliefF algorithm is suitable for attribute selection in this domain. The second question we address is whether we can improve ReliefF for selecting important genetic attributes. Using simulated genetic datasets, we show that ReliefF is significantly better than a naïve chi-square test of independence for selecting two interacting attributes out of 103 candidates. In addition, we show that ReliefF can be improved in this domain by systematically removing the worst attributes and re-estimating ReliefF weights. Our simulation studies demonstrate that this new Tuned ReliefF (TuRF) algorithm is significantly better than ReliefF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hirschhorn, J.N., Daly, M.J.: Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6, 95–108 (2005)

    Article  Google Scholar 

  2. Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: Theoretical and practical concerns. Nature Reviews Genetics 6, 109–118 (2005)

    Article  Google Scholar 

  3. Bateson, W.: Mendel’s Principles of Heredity. Cambridge University Press, Cambridge (1909)

    Google Scholar 

  4. Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity 56, 73–82 (2003)

    Article  Google Scholar 

  5. Moore, J.H., Williams, S.W.: Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more modern synthesis. BioEssays 27, 637–646 (2005)

    Article  Google Scholar 

  6. Thornton-Wells, T.A., Moore, J.H., Haines, J.L.: Genetics, statistics and human disease: Analytical retooling for complexity. Trends in Genetics 20, 640–647 (2004)

    Article  Google Scholar 

  7. Freitas, A.: Understanding the crucial role of attribute interactions. Artificial Intelligence Review 16, 177–199 (2001)

    Article  MATH  Google Scholar 

  8. Li, W., Reich, J.: A complete enumeration and classification of two-locus disease models. Human Heredity 50, 334–349 (2000)

    Article  Google Scholar 

  9. Moore, J.H., Ritchie, M.D.: The challenges of whole-genome approaches to common diseases. JAMA 291, 1642–1643 (2004)

    Article  Google Scholar 

  10. Moore, J.H., Williams, S.W.: New strategies for identifying gene-gene interactions in hypertension. Annals of Medicine 34, 88–95 (2002)

    Article  Google Scholar 

  11. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning: Proceedings of the AAAI’92 (1992)

    Google Scholar 

  12. Kononenko, I.: Estimating attributes: Analysis and extension of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Google Scholar 

  13. Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Machine Learning 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  14. Robnik-Sikonja, M., Kononenko, I.: Comprehensible interpretation of relief’s estimates. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 433–440 (2001)

    Google Scholar 

  15. Hahn, L.W., Moore, J.H.: Ideal discrimination of discrete clinical endpoints using multilocus genotypes. Silico Biology 4, 183–194 (2004)

    MathSciNet  Google Scholar 

  16. Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)

    Article  Google Scholar 

  17. Moore, J.H.: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Review of Molecular Diagnostics 4, 795–803 (2004)

    Article  Google Scholar 

  18. Moore, J.H., Gilbert, J.C., Tsai, C., Chiang, F.T., Holden, W., Barney, N., White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology 241, 252–261 (2006)

    Article  MathSciNet  Google Scholar 

  19. Ritchie, M.D., Hahn, L.W., Moore, J.H.: Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, phenocopy, and genetic heterogeneity. Genetic Epidemiology 24, 150–157 (2003)

    Article  Google Scholar 

  20. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69, 138–147 (2001)

    Article  Google Scholar 

  21. Moore, J.H., White, B.C.: Exploiting expert knowledge in genetic programming for genome-wide genetic analysis. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 969–977. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Moore, J.H.: Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data. IGI (2007)

    Google Scholar 

  23. Moore, J.H., White, B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Genetic Programming Theory and Practice, vol. 4, Springer, Heidelberg (2007)

    Google Scholar 

  24. Lui, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artificial Intelligence 159, 49–74 (2004)

    Article  MathSciNet  Google Scholar 

  25. Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, L.: Data mining in bioinformatics using weka. Bioinformatics 20, 2479–2481 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elena Marchiori Jason H. Moore Jagath C. Rajapakse

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Moore, J.H., White, B.C. (2007). Tuning ReliefF for Genome-Wide Genetic Analysis. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds) Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics. EvoBIO 2007. Lecture Notes in Computer Science, vol 4447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71783-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71783-6_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71782-9

  • Online ISBN: 978-3-540-71783-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics