A distance-based cluster algorithm for genomic analysis in genetic disease

Tu, Yi; Mao, Weidong

doi:10.1007/s12539-012-0124-y

A distance-based cluster algorithm for genomic analysis in genetic disease

Published: 29 July 2012

Volume 4, pages 90–96, (2012)
Cite this article

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Yi Tu¹ &
Weidong Mao²

101 Accesses
1 Citation
Explore all metrics

Abstract

Both environmental and genetic factors play roles in the development of some diseases. Complex diseases, such as Crohn’s disease or Type II diabetes, are caused by a combination of environmental factors and mutations in multiple genes. Patients who have been diagnosed with such diseases cannot easily be treated. However, many diseases can be avoided if people at high risk change their living style, one example being their diet. But how can we tell their susceptibility to diseases before symptoms are found and help them make informed decisions about their health? The susceptibility to complex diseases can be predicted through the analysis of the genetic data. With the development of DNA microarray technique, it is possible to access the human genetic information related to specific diseases. This paper used a combinatorial method to analyze the genetic casecontrol data for Crohn’s disease. A distance based cluster method has been applied to publicly available genotype data on Crohn’s disease for epidemiological study and achieved a highly accurate result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Botstein, D., Risch, N. 2003. Discovering genotypes underlying human phenotypes: Past successes for Mendelian disease, future approaches for complex disease. Nat Genet 33, 228–237.
Article PubMed CAS Google Scholar
Brinza, D., Zelikovsky, A. 2006. 2SNP: Scalable phasing based on 2-SNP haplotypes. Bioinformatics 22, 371–373.
Article PubMed CAS Google Scholar
Brinza, D., He, J., Zelikovsky, A. 2006. Combinatorial search methods for multi-SNP disease association. Proceedings of International Conference of the IEEE Engineering in Medicine and Biology 1, 5802–5805.
Article Google Scholar
Cardon, L.R., Bell, J.I. 2001. Association study designs for complex diseases. Nat Rev Genet 2, 91–98.
Article PubMed CAS Google Scholar
Clark, A.G., Boerwinkle, E., Hixson, J., Sing, C.F. 2005. Determinants of the success of whole-genome association testing. Genome Res 15, 1463–1467.
Article PubMed CAS Google Scholar
Cook, N.R., Zee, R.Y., Ridker, P.M. 2004. Tree and spline based association analysis of gene-gene interaction models for ischemic stroke. Stat Med 23, I439–I453.
Google Scholar
Daly, M., Rioux, J., Schaffner, S., Hudson, T., Lander, E. 2001. High resolution haplotype structure in the human genome. Nat Genet 29, 229–232.
Article PubMed CAS Google Scholar
Hahn, L.W., Ritchie, M.D., Moore, J.H. 2003. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382.
Article PubMed CAS Google Scholar
He, J., Zelikovsky, A. 2006. Tag SNP selection based on multivariate linear regression. Proc. of International Conference on Computational Science, LNCS 3992, 750–757.
Hirschhorn, J.N., Daly, M.J. 2005. Genome-wide association studies for common diseases and complex diseases. Nature Reviews: Genetics 6, 95–108.
Article PubMed CAS Google Scholar
Kimmel, G., Shamir, R. 2005. A block-free hidden markov model for genotypes and its application to disease association. J Comput Biol 12, 1243–1260.
Article PubMed CAS Google Scholar
Listgarten, J., Damaraju, S., Poulin, B., Cook, L., Dufour, J., Driga, A., Mackey, J., Wishart, D., Greiner, R., Zanke, B. 2004. Predictive models for breast cancer susceptibility from multiple single nucleotide polymorphisms. Clin Cancer Res 10, 2725–2737.
Article PubMed CAS Google Scholar
Mao, W., Brinza, D., Hundewale, N., Gremalschi, S., Zelikovsky, A. 2006. Genotype susceptibility and integrated risk factors for complex diseases. Proceedings of IEEE International Conference on Granular Computing 1, 754–757.
Google Scholar
Margaret H.D. 2003. Data Mining — Introduction and advanced topics. 1^st Edition, Prentice Hall, New York.
Google Scholar
Merikangas, KR., Risch, N. 2003. Will the genomics revolution revolutionize psychiatry. The American Journal of Psychiatry 160, 625–635.
Article PubMed Google Scholar
Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H. 2001. Multifactor-dimensionality reduction reveals highorder interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69, 138–147.
Article PubMed CAS Google Scholar
York, T.P., Eaves, L.J. 2001. Common disease analysis using multivariate adaptive regression Ssplines (MARS): Genetic analysis workshop 12 simulated sequence data. Genet Epidemiol 21Suppl I, S649–S654.
PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, Wuhan, 430060, China
Yi Tu
Department of Mathematics & Computer Science, Virginia State University, Petersburg, VA, 23806, USA
Weidong Mao

Authors

Yi Tu
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weidong Mao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tu, Y., Mao, W. A distance-based cluster algorithm for genomic analysis in genetic disease. Interdiscip Sci Comput Life Sci 4, 90–96 (2012). https://doi.org/10.1007/s12539-012-0124-y

Download citation

Received: 14 June 2011
Revised: 17 November 2011
Accepted: 21 February 2012
Published: 29 July 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s12539-012-0124-y

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distance-based cluster algorithm for genomic analysis in genetic disease

Abstract

Access this article

Similar content being viewed by others

Clusters Identification in Binary Genomic Data: The Alternative Offered by Scan Statistics Approach

Preliminary Studies on Biclustering of GWA: A Multiobjective Approach

A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

Navigation

A distance-based cluster algorithm for genomic analysis in genetic disease

Abstract

Access this article

Similar content being viewed by others

Clusters Identification in Binary Genomic Data: The Alternative Offered by Scan Statistics Approach

Preliminary Studies on Biclustering of GWA: A Multiobjective Approach

A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation