Combinatorial Methods for Disease Association Search and Susceptibility Prediction
Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. This becomes computationally challenging since interaction even of pairs gene variations require checking more than 1012 possibilities genome-wide. This paper first explores the problem of searching for the most disease-associated and the most disease-resistant multi-gene interactions for a given population sample of diseased and non-diseased individuals. A proposed fast complimentary greedy search finds multi-SNP combinations with non-trivially high association on real data. Exploiting the developed methods for searching associated risk and resistance factors, the paper addresses the disease susceptibility prediction problem. We first propose a relevant optimum clustering formulation and the model-fitting algorithm transforming clustering algorithms into susceptibility prediction algorithms. For three available real data sets (Crohn’s disease (Daly et al, 2001), autoimmune disorder (Ueda et al, 2003), and tick-borne encephalitis (Barkash et al, 2006)), the accuracies of the prediction based on the combinatorial search (respectively, 84%, 83%, and 89%) are higher by 15% compared to the accuracies of the best previously known methods. The prediction based on the complimentary greedy search almost matches the best accuracy but is much more scalable.
KeywordsPositive Predictive Value Random Forest Receiver Operating Characteristic Prediction Algorithm Disease Association
Unable to display preview. Download preview PDF.
- 1.Affymetrix (2005), http://www.affymetrix.com/products/arrays/
- 4.Barkhash, A., Perelygin, A., Brinza, D., Pilipenko, P., Bogdanova, Y.U., Romaschenko, A., Voevoda, M., Brinton, M.: Genetic Resistance to Flaviviruses. In: 5th Conf. on Bioinformatics of Genome Regulation and Structure (BGRS 2006) (to appear, 2006)Google Scholar
- 6.Brinza, D., He, J., Zelikovsky, A.: Combinatorial Search Methods for Multi-SNP Disease Association. In: Brinza, D., He, J., Zelikovsky, A. (eds.) Proc. IEEE Conf. on Engineering in Medicine and Biology (EMBC 2006) (September 2006) (to appear)Google Scholar
- 13.Joachims, T.: http://svmlight.joachims.org/
- 14.Breiman, L., Cutler, A.: http://www.stat.berkeley.edu/users/breiman/RF
- 15.Mao, W., He, J., Brinza, D., Zelikovsky, A.: A Combinatorial Method for Predicting Genetic Susceptibility to Complex Diseases. In: Proc. IEEE Conf. on Engineering In Medicine and Biology (EMBC 2005), pp. 224–227 (2005)Google Scholar
- 16.Mao, W., Brinza, D., Hundewale, N., Gremalschi, S., Zelikovsky, A.: Genotype Susceptibility and Integrated Risk Factors for Complex Diseases. In: Proc. IEEE Conf. on Granular Computing (GRC 2006), pp. 754–757 (2006)Google Scholar
- 20.Tahri-Daizadeh, N., Tregouet, D.A., Nicaud, V., Manuel, N., Cambien, F., Tiret, L.: Automated detection of informative combined effects in genetic association studies of complex traits. Genome Res. 13, 1952–1960 (2003)Google Scholar
- 21.Tomita, Y., Yokota, M., Honda, H.: Classification method for prediction of multifactorial disease development using interaction between genetic and environmental factors. In: IEEE Comput. Systems Bioinformatics Conf. CSB 2005, poster (2005)Google Scholar
- 22.Waddell, M., Page, D., Zhan, F., Barlogie, B., Shaughnessy, J.: Predicting Cancer Susceptibility from SingleNucleotide Polymorphism Data: A Case Study in Multiple Myeloma. In: Proc. BIOKDD 2005 (2005)Google Scholar