Abstract
Despite the advances in genotyping technologies which have led to large reduction in genotyping cost, the Tag SNP Selection problem remains an important problem for computational biologists and geneticists. Selecting the smallest subset of tag SNPs that can predict the other SNPs would considerably minimize the complexity of genome-wide or block-based SNP-disease association studies. These studies would lead to better diagnosis and treatment of diseases. In this work, we propose three variations of a genetic algorithm based on two-marker linkage disequilibrium, multi-marker linkage disequilibrium, and a third measure that we denote by prediction power. The performance of the three algorithms are compared with those of a recognized tag SNP selection algorithm using three different real data sets from the HapMap project. The results indicate that the multi-marker linkage disequilibrium based genetic algorithm yields better prediction accuracy.
Similar content being viewed by others
References
Bafna, V., Halldorsson, B.V., Schwartz, R.S., Clark A.G., and Istrail, S. 2003. Haplotypes and informative SNP selection algorithms: don’t block out information. Proc. of the 7th Int. Conf. on Research in Computational Molecular Biology, 19–27.
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. 2001. High resolution haplotype structure in the human genome. Nat Genet 29(2), 229–232.
Davidovich, O., Kimmel G., and Shamir, R. 2007. GEVALT: An integrated software tool for genotype analysis. BMC Bioinformatics 8, 36.
Devlin B., and Risch, N. 1995. A comparison of linkage disequilibrium measures for fine-scale mapping, Genomics 29(2), 311–322.
Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., and Altshuler, D. 2002. The structure of haplotype blocks in the human genome. Science 296, 2225–2229.
Halperin, E., Kimmel, G., and Shamir, R. 2005. Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21,Supp l 1.
He, J., and Zelikovsky, A. 2006. MLR-tagging: informative SNP selection for un-phased genotypes based on multiple linear regression. Bioinformatics 22(20), 2558–2561.
He, J., and Zelikovsky, A. 2007. Informative SNP selection methods based on SNP prediction, IEEE Trans Nanobioscience 6, 60–67.
Holland, J.H. 1992. Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, MA, USA.
Jun J., and Mandoiu, I. 2006. Optimal tag SNP selection for haplotype reconstruction, University of Connecticut.
Ke, X., and Cardon, L.R. 2003. Efficient selective screening of haplotype tag SNPs, Bioinformatics 19, 287–288.
Kimmel, G., and Shamir, R. 2005. GERBIL: Genotype resolution and block identification using likelihood. Proc. Natl Acad Sci USA 102(1), 158–162.
Liu, G., Wang, Y., and Wong, L. 2010. FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium. BMC Bioinformatics 11, 66.
Mansour, N., and Fox, G.C. 1994. Parallel physical optimization algorithms for allocating data to multicomputer nodes. Journal of Supercomputing 8(1), 53–80.
Qin, Z.S., Gopalakrishnan S., and Abecasis, G.R. 2006. An efficient comprehensive search algorithm for tag SNP selection using linkage disequilibrium criteria. Bioinformatics 22(2), 220–225.
Sham, P.C., and Cherny, S.S. 2011. Genetic architecture of complex diseases. In Zeggini, E., and Morris, A. (Ed.): Analysis of Complex Disease Association Studies, 1–14, Elsevier.
Sicotte, H., Rider, D.N., Poland, G.N., Dhiman, N. and Kocher, J.P.A. 2011. SNPPicker: High quality tag SNP selection across multiple populations. BMC Bioinformatics 12, 129.
Stram, D.O., Haiman, C.A., Hirschhorn, J.N., Altshuler, D.L., Kolonel, N., Henderson, B.E., and Pike, M.C. 2003. Choosing haplotype-tagging SNPs based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered 55(1), 27–36.
Thompson, D., Stram, D., Goldgar D., and Witte, J.S. 2003. Haplotype tagging single nucleotide polymorphisms and association studies. Hum Hered 56(1), 48–55.
Wang, W., and Jiang, T. 2008. A new model of multimarker correlation for genome-wide tag SNP selection. Genome Inform 21, 27–41.
Xu, Z., Kaplan, N.L., Taylor, J.A. 2007. Tag SNP selection for candidate gene association studies using HapMap and gene resequencing data. Eur J Hum Genet., 15(10), 1063–1070.
Zhang, K., Deng, M., Chen, T., Waterman, M.S., and Sun, F. 2002. A dynamic programming algorithm for haplotype block partitioning, Proc Natl Acad Sci 99, 7335–7339.
Zhou, N., and Wang, L. 2007. Effective selection of informative SNPs and classification on the HapMap genotype data. BMC Bioinformatics 8, 484.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Mouawad, A.E., Mansour, N. Multi-marker-LD based genetic algorithm for tag SNP selection. Interdiscip Sci Comput Life Sci 6, 303–311 (2014). https://doi.org/10.1007/s12539-012-0060-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-012-0060-x