Combinatorial Methods for Disease Association Search and Susceptibility Prediction

  • Dumitru Brinza
  • Alexander Zelikovsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)


Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. This becomes computationally challenging since interaction even of pairs gene variations require checking more than 1012 possibilities genome-wide. This paper first explores the problem of searching for the most disease-associated and the most disease-resistant multi-gene interactions for a given population sample of diseased and non-diseased individuals. A proposed fast complimentary greedy search finds multi-SNP combinations with non-trivially high association on real data. Exploiting the developed methods for searching associated risk and resistance factors, the paper addresses the disease susceptibility prediction problem. We first propose a relevant optimum clustering formulation and the model-fitting algorithm transforming clustering algorithms into susceptibility prediction algorithms. For three available real data sets (Crohn’s disease (Daly et al, 2001), autoimmune disorder (Ueda et al, 2003), and tick-borne encephalitis (Barkash et al, 2006)), the accuracies of the prediction based on the combinatorial search (respectively, 84%, 83%, and 89%) are higher by 15% compared to the accuracies of the best previously known methods. The prediction based on the complimentary greedy search almost matches the best accuracy but is much more scalable.


Positive Predictive Value Random Forest Receiver Operating Characteristic Prediction Algorithm Disease Association 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    International HapMap Consortium, The International HapMap Project. Nature, 426, 789–796 (2003), Google Scholar
  3. 3.
    Daly, M., Rioux, J., Schaffner, S., Hudson, T., Lander, E.: High resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)CrossRefGoogle Scholar
  4. 4.
    Barkhash, A., Perelygin, A., Brinza, D., Pilipenko, P., Bogdanova, Y.U., Romaschenko, A., Voevoda, M., Brinton, M.: Genetic Resistance to Flaviviruses. In: 5th Conf. on Bioinformatics of Genome Regulation and Structure (BGRS 2006) (to appear, 2006)Google Scholar
  5. 5.
    Brinza, D., Zelikovsky, A.: 2SNP: Scalable Phasing Based on 2-SNP Haplotypes. Bioinformatics 22(3), 371–373 (2006)CrossRefGoogle Scholar
  6. 6.
    Brinza, D., He, J., Zelikovsky, A.: Combinatorial Search Methods for Multi-SNP Disease Association. In: Brinza, D., He, J., Zelikovsky, A. (eds.) Proc. IEEE Conf. on Engineering in Medicine and Biology (EMBC 2006) (September 2006) (to appear)Google Scholar
  7. 7.
    Clark, A.G.: Finding Genes Underlying Risk of Complex Disease by Linkage Disequilibrium Mapping. Curr. Opin. Genet. Dev. 13(3), 296–302 (2003)CrossRefGoogle Scholar
  8. 8.
    Clark, A.G., et al.: Determinants of the success of whole-genome association testing. Genome Res. 15, 1463–1467 (2005)CrossRefGoogle Scholar
  9. 9.
    Stephens, M., Smith, N.J., Donnelly, P.: A New Statistical Method for Haplotype Reconstruction from Population Data. The American J. of Human Genetics 68, 978–998 (2001)CrossRefGoogle Scholar
  10. 10.
    Ueda, H., Howson, J.M.M., Esposito, L., et al.: Association of the T Cell Regulatory Gene CTLA4 with Susceptibility to Autoimmune Disease. Nature 423, 506–511 (2003)CrossRefGoogle Scholar
  11. 11.
    He, J., Zelikovsky, A.: Tag SNP Selection Based on Multivariate Linear Regression. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 750–757. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Marchini, J., Donnelley, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics 37, 413–417 (2005)CrossRefGoogle Scholar
  13. 13.
  14. 14.
  15. 15.
    Mao, W., He, J., Brinza, D., Zelikovsky, A.: A Combinatorial Method for Predicting Genetic Susceptibility to Complex Diseases. In: Proc. IEEE Conf. on Engineering In Medicine and Biology (EMBC 2005), pp. 224–227 (2005)Google Scholar
  16. 16.
    Mao, W., Brinza, D., Hundewale, N., Gremalschi, S., Zelikovsky, A.: Genotype Susceptibility and Integrated Risk Factors for Complex Diseases. In: Proc. IEEE Conf. on Granular Computing (GRC 2006), pp. 754–757 (2006)Google Scholar
  17. 17.
    Kimmel, G., Shamir, R.: A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association. J. of Computational Biology 12(10), 1243–1260 (2005)CrossRefGoogle Scholar
  18. 18.
    Listgarten, J., Damaraju, S., Poulin, B., Cook, L., Dufour, J., Driga, A., Mackey, J., Wishart, D., Greiner, R., Zanke, B.: Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clinical Cancer Research 10, 2725–2737 (2004)CrossRefGoogle Scholar
  19. 19.
    Nelson, M.R., Kardia, S.L., Ferrell, R.E., Sing, C.F.: A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res. 11, 458–470 (2001)CrossRefGoogle Scholar
  20. 20.
    Tahri-Daizadeh, N., Tregouet, D.A., Nicaud, V., Manuel, N., Cambien, F., Tiret, L.: Automated detection of informative combined effects in genetic association studies of complex traits. Genome Res. 13, 1952–1960 (2003)Google Scholar
  21. 21.
    Tomita, Y., Yokota, M., Honda, H.: Classification method for prediction of multifactorial disease development using interaction between genetic and environmental factors. In: IEEE Comput. Systems Bioinformatics Conf. CSB 2005, poster (2005)Google Scholar
  22. 22.
    Waddell, M., Page, D., Zhan, F., Barlogie, B., Shaughnessy, J.: Predicting Cancer Susceptibility from SingleNucleotide Polymorphism Data: A Case Study in Multiple Myeloma. In: Proc. BIOKDD 2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dumitru Brinza
    • 1
  • Alexander Zelikovsky
    • 1
  1. 1.Department of Computer ScienceGeorgia State UniversityAtlantaUSA

Personalised recommendations