TagSNP selection, which aims to select a small subset of informative single nucleotide polymorphisms (SNPs) to represent the whole large SNP set, has played an important role in current genomic research. Not only can this cut down the cost of genotyping by filtering a large number of redundant SNPs, but also it can accelerate the study of genome-wide disease association. In this paper, we propose a new hybrid method called CMDStagger that combines the ideas of the clustering and the graph algorithm, to find the minimum set of tagSNPs. The proposed algorithm uses the information of the linkage disequilibrium association and the haplotype diversity to reduce the information loss in tagSNP selection, and has no limit of block partition. The approach is tested on eight benchmark datasets from Hapmap and chromosome 5q31. Experimental results show that the algorithm in this paper can reduce the selection time and obtain less tagSNPs with high prediction accuracy. It indicates that this method has better performance than previous ones.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Ao SI, Kevin Y et al (2005) CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics. 21(8):1735–1736
Bafna V, Halldorsson BV et al (2003) Haplotypes and informative SNP selection algorithms: don’t block out information. The Association for Computing Machinery, pp 19–27
Carlson C, Eberle MA et al (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74(1):106–120
Daly MJ et al (2001) High-resolution haplotype structure in the human genome. Nat Genet 29(2):229–232
Das S (1971) Feature selection with a linear dependence measure. IEEE Trans Comp 20:1106–1109
Dawson E, Abecasis G et al (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature 418(6897):544–548
Gabriel SB, Schaffner SF et al (2002) The structure of haplotype blocks in the human genome. Science 296(5576):2225–2229
Halldorsson BV et al (2004) Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies. Genome Res 14:1633–1640
Halperin E, Kimmel G, Shamir R (2005) Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21(1):i195–i203
He W, Zelikovsky A (2006) MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics 22(20):2558–2561
Johnson G, Esposito L et al (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29(2):233–237
Kimmel G, Shamir R (2005) GERBIL: genotype resolution and block identification using likelihood. Proc Natl Acad Sci USA 102:158–162
Lewontin RC (1964) The interaction of selection and linkage I. General considerations; heterotic models. Genetics 49:49–67
Lin Z, Altman R (2004) Finding haplotype tagging SNP by use of principle component analysis. Am J Hum Genet 75(5):850–861
Pritchard J (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69(1):1–14
Sachidanandam R, Weissman D et al (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409(6822):928–933
Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73(6):1162–1169
Zhang K, Qin Z et al (2004) Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. Genome Res 14:908–916
The work was supported by the Natural Science Foundation of China under Grant No. 60871092, No. 60741001 and No. 60671011, the China National 863 High Tech Program under Grant No. 2007AA01Z171, the Science Fund for Distinguished Young Scholars of Heilongjiang Province in China under Grant No. JC200611, and the Natural Science Foundation of Heilongjiang Province in China under Grant No. ZJG0705.
About this article
Cite this article
Guo, M., Wang, J., Wang, C. et al. A hybrid clustering and graph based algorithm for tagSNP selection. Soft Comput 13, 1143–1151 (2009). https://doi.org/10.1007/s00500-009-0419-z
- TagSNP selection
- Clustering algorithm
- Maximum density subgraph (MDS)
- Linkage disequilibrium (LD)
- Haplotypes diversity