Abstract
Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. This becomes computationally challenging since interaction even of pairs gene variations require checking more than 1012 possibilities genome-wide. This paper first explores the problem of searching for the most disease-associated and the most disease-resistant multi-gene interactions for a given population sample of diseased and non-diseased individuals. A proposed fast complimentary greedy search finds multi-SNP combinations with non-trivially high association on real data. Exploiting the developed methods for searching associated risk and resistance factors, the paper addresses the disease susceptibility prediction problem. We first propose a relevant optimum clustering formulation and the model-fitting algorithm transforming clustering algorithms into susceptibility prediction algorithms. For three available real data sets (Crohn’s disease (Daly et al, 2001), autoimmune disorder (Ueda et al, 2003), and tick-borne encephalitis (Barkash et al, 2006)), the accuracies of the prediction based on the combinatorial search (respectively, 84%, 83%, and 89%) are higher by 15% compared to the accuracies of the best previously known methods. The prediction based on the complimentary greedy search almost matches the best accuracy but is much more scalable.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Affymetrix (2005), http://www.affymetrix.com/products/arrays/
International HapMap Consortium, The International HapMap Project. Nature, 426, 789–796 (2003), http://www.hapmap.org
Daly, M., Rioux, J., Schaffner, S., Hudson, T., Lander, E.: High resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)
Barkhash, A., Perelygin, A., Brinza, D., Pilipenko, P., Bogdanova, Y.U., Romaschenko, A., Voevoda, M., Brinton, M.: Genetic Resistance to Flaviviruses. In: 5th Conf. on Bioinformatics of Genome Regulation and Structure (BGRS 2006) (to appear, 2006)
Brinza, D., Zelikovsky, A.: 2SNP: Scalable Phasing Based on 2-SNP Haplotypes. Bioinformatics 22(3), 371–373 (2006)
Brinza, D., He, J., Zelikovsky, A.: Combinatorial Search Methods for Multi-SNP Disease Association. In: Brinza, D., He, J., Zelikovsky, A. (eds.) Proc. IEEE Conf. on Engineering in Medicine and Biology (EMBC 2006) (September 2006) (to appear)
Clark, A.G.: Finding Genes Underlying Risk of Complex Disease by Linkage Disequilibrium Mapping. Curr. Opin. Genet. Dev. 13(3), 296–302 (2003)
Clark, A.G., et al.: Determinants of the success of whole-genome association testing. Genome Res. 15, 1463–1467 (2005)
Stephens, M., Smith, N.J., Donnelly, P.: A New Statistical Method for Haplotype Reconstruction from Population Data. The American J. of Human Genetics 68, 978–998 (2001)
Ueda, H., Howson, J.M.M., Esposito, L., et al.: Association of the T Cell Regulatory Gene CTLA4 with Susceptibility to Autoimmune Disease. Nature 423, 506–511 (2003)
He, J., Zelikovsky, A.: Tag SNP Selection Based on Multivariate Linear Regression. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 750–757. Springer, Heidelberg (2006)
Marchini, J., Donnelley, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics 37, 413–417 (2005)
Joachims, T.: http://svmlight.joachims.org/
Breiman, L., Cutler, A.: http://www.stat.berkeley.edu/users/breiman/RF
Mao, W., He, J., Brinza, D., Zelikovsky, A.: A Combinatorial Method for Predicting Genetic Susceptibility to Complex Diseases. In: Proc. IEEE Conf. on Engineering In Medicine and Biology (EMBC 2005), pp. 224–227 (2005)
Mao, W., Brinza, D., Hundewale, N., Gremalschi, S., Zelikovsky, A.: Genotype Susceptibility and Integrated Risk Factors for Complex Diseases. In: Proc. IEEE Conf. on Granular Computing (GRC 2006), pp. 754–757 (2006)
Kimmel, G., Shamir, R.: A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association. J. of Computational Biology 12(10), 1243–1260 (2005)
Listgarten, J., Damaraju, S., Poulin, B., Cook, L., Dufour, J., Driga, A., Mackey, J., Wishart, D., Greiner, R., Zanke, B.: Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clinical Cancer Research 10, 2725–2737 (2004)
Nelson, M.R., Kardia, S.L., Ferrell, R.E., Sing, C.F.: A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res. 11, 458–470 (2001)
Tahri-Daizadeh, N., Tregouet, D.A., Nicaud, V., Manuel, N., Cambien, F., Tiret, L.: Automated detection of informative combined effects in genetic association studies of complex traits. Genome Res. 13, 1952–1960 (2003)
Tomita, Y., Yokota, M., Honda, H.: Classification method for prediction of multifactorial disease development using interaction between genetic and environmental factors. In: IEEE Comput. Systems Bioinformatics Conf. CSB 2005, poster (2005)
Waddell, M., Page, D., Zhan, F., Barlogie, B., Shaughnessy, J.: Predicting Cancer Susceptibility from SingleNucleotide Polymorphism Data: A Case Study in Multiple Myeloma. In: Proc. BIOKDD 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brinza, D., Zelikovsky, A. (2006). Combinatorial Methods for Disease Association Search and Susceptibility Prediction. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_27
Download citation
DOI: https://doi.org/10.1007/11851561_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39583-6
Online ISBN: 978-3-540-39584-3
eBook Packages: Computer ScienceComputer Science (R0)