TagSNP selection, which aims to select a small subset of informative single nucleotide polymorphisms (SNPs) to represent the whole large SNP set, has played an important role in current genomic research. Not only can this cut down the cost of genotyping by filtering a large number of redundant SNPs, but also it can accelerate the study of genome-wide disease association. In this paper, we propose a new hybrid method called CMDStagger that combines the ideas of the clustering and the graph algorithm, to find the minimum set of tagSNPs. The proposed algorithm uses the information of the linkage disequilibrium association and the haplotype diversity to reduce the information loss in tagSNP selection, and has no limit of block partition. The approach is tested on eight benchmark datasets from Hapmap and chromosome 5q31. Experimental results show that the algorithm in this paper can reduce the selection time and obtain less tagSNPs with high prediction accuracy. It indicates that this method has better performance than previous ones.
TagSNP selection Clustering algorithm Maximum density subgraph (MDS) Linkage disequilibrium (LD) Haplotypes diversity
This is a preview of subscription content, log in to check access.
The work was supported by the Natural Science Foundation of China under Grant No. 60871092, No. 60741001 and No. 60671011, the China National 863 High Tech Program under Grant No. 2007AA01Z171, the Science Fund for Distinguished Young Scholars of Heilongjiang Province in China under Grant No. JC200611, and the Natural Science Foundation of Heilongjiang Province in China under Grant No. ZJG0705.
Ao SI, Kevin Y et al (2005) CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics. 21(8):1735–1736Google Scholar
Bafna V, Halldorsson BV et al (2003) Haplotypes and informative SNP selection algorithms: don’t block out information. The Association for Computing Machinery, pp 19–27Google Scholar
Carlson C, Eberle MA et al (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74(1):106–120CrossRefGoogle Scholar
Daly MJ et al (2001) High-resolution haplotype structure in the human genome. Nat Genet 29(2):229–232CrossRefGoogle Scholar
Das S (1971) Feature selection with a linear dependence measure. IEEE Trans Comp 20:1106–1109CrossRefGoogle Scholar
Dawson E, Abecasis G et al (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature 418(6897):544–548CrossRefGoogle Scholar
Gabriel SB, Schaffner SF et al (2002) The structure of haplotype blocks in the human genome. Science 296(5576):2225–2229CrossRefGoogle Scholar
Halldorsson BV et al (2004) Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies. Genome Res 14:1633–1640CrossRefGoogle Scholar
Halperin E, Kimmel G, Shamir R (2005) Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21(1):i195–i203CrossRefGoogle Scholar
He W, Zelikovsky A (2006) MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics 22(20):2558–2561CrossRefGoogle Scholar
Johnson G, Esposito L et al (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29(2):233–237CrossRefGoogle Scholar
Kimmel G, Shamir R (2005) GERBIL: genotype resolution and block identification using likelihood. Proc Natl Acad Sci USA 102:158–162CrossRefGoogle Scholar
Lewontin RC (1964) The interaction of selection and linkage I. General considerations; heterotic models. Genetics 49:49–67Google Scholar
Lin Z, Altman R (2004) Finding haplotype tagging SNP by use of principle component analysis. Am J Hum Genet 75(5):850–861CrossRefGoogle Scholar
Pritchard J (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69(1):1–14CrossRefGoogle Scholar
Sachidanandam R, Weissman D et al (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409(6822):928–933CrossRefGoogle Scholar
Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73(6):1162–1169CrossRefGoogle Scholar
Zhang K, Qin Z et al (2004) Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. Genome Res 14:908–916CrossRefGoogle Scholar