Two Birds, One Stone: Selecting Functionally Informative Tag SNPs for Disease Association Studies

  • Phil Hyoun Lee
  • Hagit Shatkay
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4645)


Selecting an informative subset of SNPs, generally referred to as tag SNPs, to genotype and analyze is considered to be an essential step toward effective disease association studies. However, while the selected informative tag SNPs may characterize the allele information of a target genomic region, they are not necessarily the ones directly associated with disease or with functional impairment. To address this limitation, we present a first integrative SNP selection system that simultaneously identifies SNPs that are both informative and carry a deleterious functional effect – which in turn means that they are likely to be directly associated with disease. We formulate the problem of selecting functionally informative tag SNPs as a multi-objective optimization problem and present a heuristic algorithm for addressing it. We also present the system we developed for assessing the functional significance of SNPs. To evaluate our system, we compare it to other state-of-the-art SNP selection systems, which conduct both information-based tag SNP selection and function-based SNP selection, but do so in two separate consecutive steps. Using 14 datasets, based on disease-related genes curated by the OMIM database, we show that our system consistently improves upon current systems.


Single Nucleotide Polymorphism Nucleic Acid Research Exonic Splice Enhancer Single Nucleotide Polymorphism Effect Single Nucleotide Polymorphism Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hedrick, P.: Genetics of pouplation, 3rd edn. Jones and Bartlett Publishers (2004)Google Scholar
  2. 2.
    Bhatti, P., Church, D., Rutter, J.L., Struewing, J.P., Sigurdson, A.J.: Candidate single nucleotide polymorphism selection using publicly available tools: a guide for epidemiologists. American Journal of Epidemiology 164, 794–804 (2006)CrossRefGoogle Scholar
  3. 3.
    Sherry, S., Ward, M., Kholodov, M., Baker, J., Phan, L., Smigielski, E., Sirotkin, K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29, 308–311 (2001)CrossRefGoogle Scholar
  4. 4.
    Brunham, L.R., Singaraja, R.R., Pape, T.D., Kejariwai, A., Thomas, P.D., Hayden, M.R.: Accurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 gene. PLOS Genetics 1, 739–747 (2005)CrossRefGoogle Scholar
  5. 5.
    Rebbeck, T.R., Ambrosone, C.B., Bell, D.A., Chanock, S.J., Hayes, R.B., Kadlubar, F.F., Thomas, D.C.: SNPs, haplotypes, and cancer: applications in molecular epidemiology. Cancer Epidemiology, Biomarkers & Prevention 13, 681–687 (2004)Google Scholar
  6. 6.
    Conde, L., Vaquerizas, J.M., Ferrer-Costa, C., de la Cruz, X., Orozco, M., Dopazo1, J.: PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes. American Journal of Epidemiology 33, 501–505 (2005)Google Scholar
  7. 7.
    Hemminger, B.M., Saelim, B., Sullivan, P.F.: TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics 22, 626–627 (2006)CrossRefGoogle Scholar
  8. 8.
    Xu, H., Gregory, S.G., Hauser, E.R., Stenger, J.E., Pericak-Vance, M.A., Vance, J.M., Zuchner, S., Hauser, M.A.: SNPselector: a web tool for selecting SNPs for genetic association studies. Bioinformatics 21, 4181–4186 (2005)CrossRefGoogle Scholar
  9. 9.
    Lee, P.H., Shatkay, H.: BNTagger: improved tagging SNP selection using Bayesian networks. Bioinformatics 22, e211–219 (2006)CrossRefGoogle Scholar
  10. 10.
    Sebastiani, P., Lazarus, R., Weiss, S.T., Kunkel, L.M., Kohane, I.S., Ramoni, M.F.: Minimal haplotype tagging. Proceedings of the National Academy of Sciences 100, 9900–9905 (2003)CrossRefGoogle Scholar
  11. 11.
    Halperin, E., Kimmel, G., Sharmir, R.: Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21, i195–i203 (2005)Google Scholar
  12. 12.
    Bafna, V., Halldorsson, B.V., Schwartz, R., Clark, A.G., Istrail, S.: Haplotypes and Informative SNP Selection Algorithms: Don’t Block Out Information. In: Proceedings of the 7th International Conference on Computational Molecular Biology, pp. 19–26 (2003)Google Scholar
  13. 13.
    Bakker, P.D., Graham, R.R., Altshuler, D., Henderson, B., Haiman, C.: Transferability of tag SNPs to capture common genetic variation in DNA repair genes across multiple population. In: Proceedings of Pacific Symposium on Biocomputing (2006)Google Scholar
  14. 14.
    Halldorsson, B.V., Istrail, S., Vega, F.D.L.: Optimal selection of SNP markers for disease association studies. American Journal of Epidemiology 58(3-4), 190–202 (2004)Google Scholar
  15. 15.
    Lee, P.H.: Computational haplotype analysis: An overview of computational methods in genetic variation study. Technical Report, -512, Queen’s University, Kingston, ON, Canada (2006), WEB URL:
  16. 16.
    Ramensky, V., Sunyaev, S.: Human non-synonymous SNPs: surver and survey. Nucleic Acid Research 30, 3894–3900 (2002)CrossRefGoogle Scholar
  17. 17.
    Ng, P., Henikoff, S.: Predicting deleterious amino acid substitutions. Genome Research 11, 863–874 (2001)CrossRefGoogle Scholar
  18. 18.
    Reumers, J., Schymkowitz, J., Ferkinghoff-Borg, J., Stricher, F., Serrano, L., Rousseau, F.: SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs. Nucleic Acid Research 33, D527–532 (2005)CrossRefGoogle Scholar
  19. 19.
    Yue, P., Melamud, E., Moult, J.: SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7, 166 (2006)CrossRefGoogle Scholar
  20. 20.
    Karchin, R., et al.: LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics 21, 2814–2820 (2005)CrossRefGoogle Scholar
  21. 21.
    Cartegni, L., Wang, J., Zhu, Z., Zhang, M.Q., Krainer, A.R.: ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Research 31, 3568–3571 (2003)CrossRefGoogle Scholar
  22. 22.
    Yeo, G., Burge, C.B.: Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proceeding of Proc. Natl. Acad. Sci. 101(44), 15700–15705 (2004)CrossRefGoogle Scholar
  23. 23.
    Fairbrother, W.G., Yeh, R.F., Sharp, P.A., Burge, C.B.: Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 (2002)CrossRefGoogle Scholar
  24. 24.
    Zhang, et al.: Exon inclusion is dependent on predictable exonic splicing enhancers. Molecular and Cellular Biology 25(16), 7323–7332 (2005)CrossRefGoogle Scholar
  25. 25.
    Akiyama, Y.: TFSEARCH: Searching Transcription Factor Binding Sites (1998), WEB URL:
  26. 26.
    Sandelin, A., Wasserman, W.W., Lenhard, B.: ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Research 32, W249–252 (2004)Google Scholar
  27. 27.
    Hubbard, T.J.P., et al.: Ensembl, Nucleic Acids Research (Database issue) (2007)Google Scholar
  28. 28.
    Karolchik, D., et al.: The ucsc genome browser database. Nucl. Acids Res. 31(1), 51–54 (2003)CrossRefGoogle Scholar
  29. 29.
    Krawczak, M., Thomas, N.S., Hundrieser, B., Mort, M., Wittig, M., Hampe, J., Cooper, D.N.: Single base-pair substitutions in exon-intron junctions of human genes: nature, distribution, and consequences for mrna splicing. Human Mutation 28(2), 150–158 (2007)CrossRefGoogle Scholar
  30. 30.
    McKusick-Nathans Institute of Genetic Medicine, J.H.U., National Center for Biotechnology Information, N.L.o.M.: Online Mendelian Inheritance in Man, OMIM (TM). WEB URL:
  31. 31.
    The International HapMap Consortium: The International HapMap Project. Nature 426, 789–796 (2003)Google Scholar
  32. 32.
    Hedrick, P.: Gametic disequilibrium measures: proceed with caution. Genetics 117, 331–341 (1987)Google Scholar
  33. 33.
    Lee, S.M.: Goal programming for decision analysis. Auerback, Philadelphia (1972)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Phil Hyoun Lee
    • 1
  • Hagit Shatkay
    • 1
  1. 1.Computational Biology and Machine Learning Lab, School of Computing, Queen’s University, Kingston, ONCanada

Personalised recommendations