Tag SNP selection based on clustering according to dominant sets found using replicator dynamics

  • Florian FrommletEmail author
Regular Article


Tag SNP selection is an important problem in genetic association studies. A class of algorithms to perform this task, among them a popular tool called Tagger, can be described as searching for a minimal vertex cover of a graph. In this article this approach is contrasted with a recently introduced clustering algorithm based on the graph theoretical concept of dominant sets. To compare the performance of both procedures comprehensive simulation studies have been performed using SNP data from the ten ENCODE regions included in the HapMap project. Quantitative traits have been simulated from additive models with a single causative SNP. Simulation results suggest that clustering performs always at least as good as Tagger, while in more than a third of the considered instances substantial improvement can be observed. Additionally an extension of the clustering algorithm is described which can be used for larger genomic data sets.


Clustering Dominant set Replicator dynamics Tag SNP selection 

Mathematics Subject Classification (2000)

62P10 62H30 90C20 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Balding DJ (2006) A tutorial on statistical methods for population association studies Nat. Rev Gen 7: 781–791CrossRefGoogle Scholar
  2. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265CrossRefGoogle Scholar
  3. de Bakker PI, Yelensky R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D (2005) Efficiency and power in genetic association studies. Nat Genet 37: 1217–1223CrossRefGoogle Scholar
  4. Beckmann L, Ziegler A, Duggal P, Bailey-Wilson JE (2005) Haplotypes and haplotype-tagging single-nucleotide polymorphism: presentation Group 8 of Genetic Analysis Workshop 14. Genet Epidemiol 29: 59–71CrossRefGoogle Scholar
  5. Bogdan M, Frommlet F, Biecek P, Cheng R, Ghosh JK, Doerge RW (2008) Extending the modified Bayesian information criterion (mBIC) to dense markers and multiple interval mapping. Biometrics 64: 1162–1169zbMATHCrossRefGoogle Scholar
  6. Bomze IM (1997) Evolution towards the maximum clique. JOGO 10: 143–164zbMATHCrossRefMathSciNetGoogle Scholar
  7. Bomze IM (2005) Portfolio selection via replicator dynamics and projections of indefinite estimated covariances. Dyn Contin Dis Impul Syst B 12: 527–564zbMATHMathSciNetGoogle Scholar
  8. Buló SR (2008) Private communicationGoogle Scholar
  9. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74: 106–120CrossRefGoogle Scholar
  10. Devlin B, Risch N (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29: 311–322CrossRefGoogle Scholar
  11. Halldorsson BV, Istrail S, Vega F (2004) Optimal selection of SNP markers for disease association studies. Hum Hered 58: 190–202CrossRefGoogle Scholar
  12. Halperin E, Kimmel G, Shamir R (2005) Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21: 195–203CrossRefGoogle Scholar
  13. He J, Zelikovsky A (2006) MLR-Tagging: informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics 22: 2558–2561CrossRefGoogle Scholar
  14. Lin Z, Altman B (2004) Finding haplotype tagging SNPs by use of principal components analysis. Am J Hum Genet 75: 850–861CrossRefGoogle Scholar
  15. Lohmann G, Bohn S (2004) Using replicator dynamics for analyzing fMRI data of the human brain. IEEE Trans Med Imag 21: 485–492CrossRefGoogle Scholar
  16. Motzkin TS, Straus EG (1965) Maxima for graphs and a new proof of a theorem of Turan. Can J Math 17: 533–540zbMATHMathSciNetGoogle Scholar
  17. Nicodemus KK, Liu W, Chase GA, Tsai YY, Fallin MD (2005) Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms. BMC Genet 6(Suppl 1): S78CrossRefGoogle Scholar
  18. Nicolas P, Sun F, Li LM (2006) A model-based approach to selection of tag SNPs. BMC Bioinform 7: 303CrossRefGoogle Scholar
  19. Pavan M, Pelillo M (2003) A new graph-theoretic approach to clustering and segmentation. IEEE Conf Comput Vis Pattern Recogn 1: 145–152Google Scholar
  20. Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pat Anal Mach Int 29: 167–172CrossRefGoogle Scholar
  21. Pelillo M, Torsello A (2006) Payoff-monotonic game dynamics and the maximum clique problem. Neur Comp 18: 1215–1258zbMATHCrossRefMathSciNetGoogle Scholar
  22. Pritchard JK, Przeworski M (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69: 1–14CrossRefGoogle Scholar
  23. Qin ZS, Gopalakrishnan S, Abecasis GR (2006) An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22: 220–225CrossRefGoogle Scholar
  24. Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78: 629–644CrossRefGoogle Scholar
  25. Stram DO (2004) Tag SNP selection for association studies. Gen Epi 27: 365–374CrossRefGoogle Scholar
  26. The Encode Project Consortium: (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636–640CrossRefGoogle Scholar
  27. The International HapMap Consortium: (2005) A haplotype map of the human genome. Nature 437: 1299–1320CrossRefGoogle Scholar
  28. The International HapMap Consortium: (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–862CrossRefGoogle Scholar
  29. Wellek S, Ziegler A (2009) A genotype-based approach to assessing the association between single nucleotide polymorphisms. Hum Hered 67: 128–139CrossRefGoogle Scholar
  30. Xu Z, Kaplan NL, Taylor JA (2007) TAGster: efficient selection of LD tag SNPs in single or multiple populations. Bioinformatics 23: 3254–3255CrossRefGoogle Scholar
  31. Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002) A dynamic programming algorithm for haplotype block partitioning. Natl Acad Sci USA 99: 7335–7339zbMATHCrossRefGoogle Scholar
  32. Zhang K, Sun F (2005) Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC Genet 6: 51CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Department of StatisticsUniversity ViennaViennaAustria

Personalised recommendations