Skip to main content

Advertisement

Log in

A distance-based cluster algorithm for genomic analysis in genetic disease

  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Both environmental and genetic factors play roles in the development of some diseases. Complex diseases, such as Crohn’s disease or Type II diabetes, are caused by a combination of environmental factors and mutations in multiple genes. Patients who have been diagnosed with such diseases cannot easily be treated. However, many diseases can be avoided if people at high risk change their living style, one example being their diet. But how can we tell their susceptibility to diseases before symptoms are found and help them make informed decisions about their health? The susceptibility to complex diseases can be predicted through the analysis of the genetic data. With the development of DNA microarray technique, it is possible to access the human genetic information related to specific diseases. This paper used a combinatorial method to analyze the genetic casecontrol data for Crohn’s disease. A distance based cluster method has been applied to publicly available genotype data on Crohn’s disease for epidemiological study and achieved a highly accurate result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Botstein, D., Risch, N. 2003. Discovering genotypes underlying human phenotypes: Past successes for Mendelian disease, future approaches for complex disease. Nat Genet 33, 228–237.

    Article  PubMed  CAS  Google Scholar 

  2. Brinza, D., Zelikovsky, A. 2006. 2SNP: Scalable phasing based on 2-SNP haplotypes. Bioinformatics 22, 371–373.

    Article  PubMed  CAS  Google Scholar 

  3. Brinza, D., He, J., Zelikovsky, A. 2006. Combinatorial search methods for multi-SNP disease association. Proceedings of International Conference of the IEEE Engineering in Medicine and Biology 1, 5802–5805.

    Article  Google Scholar 

  4. Cardon, L.R., Bell, J.I. 2001. Association study designs for complex diseases. Nat Rev Genet 2, 91–98.

    Article  PubMed  CAS  Google Scholar 

  5. Clark, A.G., Boerwinkle, E., Hixson, J., Sing, C.F. 2005. Determinants of the success of whole-genome association testing. Genome Res 15, 1463–1467.

    Article  PubMed  CAS  Google Scholar 

  6. Cook, N.R., Zee, R.Y., Ridker, P.M. 2004. Tree and spline based association analysis of gene-gene interaction models for ischemic stroke. Stat Med 23, I439–I453.

    Google Scholar 

  7. Daly, M., Rioux, J., Schaffner, S., Hudson, T., Lander, E. 2001. High resolution haplotype structure in the human genome. Nat Genet 29, 229–232.

    Article  PubMed  CAS  Google Scholar 

  8. Hahn, L.W., Ritchie, M.D., Moore, J.H. 2003. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382.

    Article  PubMed  CAS  Google Scholar 

  9. He, J., Zelikovsky, A. 2006. Tag SNP selection based on multivariate linear regression. Proc. of International Conference on Computational Science, LNCS 3992, 750–757.

  10. Hirschhorn, J.N., Daly, M.J. 2005. Genome-wide association studies for common diseases and complex diseases. Nature Reviews: Genetics 6, 95–108.

    Article  PubMed  CAS  Google Scholar 

  11. Kimmel, G., Shamir, R. 2005. A block-free hidden markov model for genotypes and its application to disease association. J Comput Biol 12, 1243–1260.

    Article  PubMed  CAS  Google Scholar 

  12. Listgarten, J., Damaraju, S., Poulin, B., Cook, L., Dufour, J., Driga, A., Mackey, J., Wishart, D., Greiner, R., Zanke, B. 2004. Predictive models for breast cancer susceptibility from multiple single nucleotide polymorphisms. Clin Cancer Res 10, 2725–2737.

    Article  PubMed  CAS  Google Scholar 

  13. Mao, W., Brinza, D., Hundewale, N., Gremalschi, S., Zelikovsky, A. 2006. Genotype susceptibility and integrated risk factors for complex diseases. Proceedings of IEEE International Conference on Granular Computing 1, 754–757.

    Google Scholar 

  14. Margaret H.D. 2003. Data Mining — Introduction and advanced topics. 1st Edition, Prentice Hall, New York.

    Google Scholar 

  15. Merikangas, KR., Risch, N. 2003. Will the genomics revolution revolutionize psychiatry. The American Journal of Psychiatry 160, 625–635.

    Article  PubMed  Google Scholar 

  16. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H. 2001. Multifactor-dimensionality reduction reveals highorder interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69, 138–147.

    Article  PubMed  CAS  Google Scholar 

  17. York, T.P., Eaves, L.J. 2001. Common disease analysis using multivariate adaptive regression Ssplines (MARS): Genetic analysis workshop 12 simulated sequence data. Genet Epidemiol 21Suppl I, S649–S654.

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weidong Mao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tu, Y., Mao, W. A distance-based cluster algorithm for genomic analysis in genetic disease. Interdiscip Sci Comput Life Sci 4, 90–96 (2012). https://doi.org/10.1007/s12539-012-0124-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-012-0124-y

Key words

Navigation