Skip to main content
Log in

Entropy-based SNP selection for genetic association studies

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Because of their abundance, density, and ease of practical use, single-nucleotide polymorphisms (SNPs) have become the major source of information for association gene mapping in humans. Sensible strategies for selecting practically useful SNPs are therefore required. Among the factors influencing the mapping utility of a given set of SNPs are (1) their individual diversity, (2) their haplotype structure in the population of interest, and (3) their physical distribution. We propose a strategy integrating these aspects into a single mapping utility measure, which is based upon Shannon entropy, and which maximizes the amount of information extracted from a genomic region under a Malecot model of linkage disequilibrium (LD) decay. The same utility measure has also been used to define a criterion guiding SNP discovery and rational decision-making about the continuation or termination of a mapping study. The proposed strategy performs consistently well in a data set comprising 549 German control individuals, genotyped for 136 SNPs from four genomic regions of different LD structure. Adoption of the method in practice is estimated to save up to 30% of genotyping load when compared with equidistant SNP localization or pair-wise LD minimization alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Badano JL, Katsanis N (2002) Beyond Mendel: an evolving view of human genetic disease transmission. Nat Rev Genet 3:779–789

    Article  CAS  PubMed  Google Scholar 

  • Becker T, Knapp M (2002) Efficiency of haplotype frequency estimation when nuclear familiy information is included. Hum Hered 54:45–53

    Article  PubMed  Google Scholar 

  • Croucher PJP, Mascheretti S, Hampe J, Huse K, Frenzel H, Stoll M, Lu T, Nikolaus S, Yang SK, Krawczak M, Kim WH, Schreiber S (2003) Haplotype structure and association to Crohn's disease of CARD15 mutations in two ethnically divergent populations. Eur J Hum Genet (in press)

  • Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229–232

    CAS  PubMed  Google Scholar 

  • Douglas JA, Boehnke M, Gillanders E, Trent JM, Gruber SB (2001) Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies. Nat Genet 28:361–364

    Article  CAS  PubMed  Google Scholar 

  • Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229

    Google Scholar 

  • Genin E (2001) Selection of single nucleotide polymorphisms for association studies in candidate genes. Genet Epidemiol 21:S614–S619

    PubMed  Google Scholar 

  • Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29:233–237

    CAS  PubMed  Google Scholar 

  • Krawczak M, Konecki DS, Schmidtke J, Duck M, Engel W, Nutzenadel W, Trefz FK (1988) Allelic association of the cystic fibrosis locus and two DNA markers, XV2c and KM19, in 55 German families. Hum Genet 80:78–80

    Google Scholar 

  • Kruglyak L, Daly MJ, Reeve Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58:1347–1363

    PubMed  Google Scholar 

  • Lahiri DK, Bye S, Nurnberger JI Jr, Hodes ME, Crisp M (1992) A non-organic and non-enzymatic extraction method gives higher yields of genomic DNA from whole-blood samples than do nine other methods tested. J Biochem Biophys Methods 25:193–205

    Article  CAS  PubMed  Google Scholar 

  • McKeigue PM (2000) Efficiency of estimation of haplotype frequencies: use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. Am J Hum Genet 67:1626–1627

    CAS  PubMed  Google Scholar 

  • Morton NE, Zhang W, Taillon-Miller P, Ennis S, Kwok PY, Collins A (2001) The optimal measure of allelic association. Proc Natl Acad Sci USA 98:5217–5221

    Article  CAS  PubMed  Google Scholar 

  • Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723

    CAS  PubMed  Google Scholar 

  • Pritchard JK, Cox NJ (2002) The allelic architecture of human disease genes: common disease-common variant...or not? Hum Mol Genet 11:2417–2423

    Article  CAS  PubMed  Google Scholar 

  • Rioux JD, Daly MJ, Silverberg MS, Lindblad K, Steinhart H, Cohen Z, Delmonte T, Kocher K, Miller K, Guschwan S, Kulbokas EJ, O'Leary S, Winchester E, Dewar K, Green T, Stone V, Chow C, Cohen A, Langelier D, Lapointe G, Gaudet D, Faith J, Branco N, Bull SB, McLeod RS, Griffiths AM, Bitton A, Greenberg GR, Lander ES, Siminovitch KA, Hudson TJ (2001) Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat Genet 29:223–228

    Article  CAS  PubMed  Google Scholar 

  • Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517

    CAS  PubMed  Google Scholar 

  • Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837

    Article  CAS  PubMed  Google Scholar 

  • Schaid DJ (2002) Relative efficiency of ambiguous vs directly measured haplotype frequencies. Genet Epidemiol 23:426–443

    Article  PubMed  Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Systems Tech J 27:379–423

    Google Scholar 

  • Veal CD, Capon F, Allen MH, Heath EK, Evans JC, Jones A, Patel S, Burden D, Tillman D, Barker JN, Trembath RC (2002) Family-based analysis using a dense single-nucleotide polymorphism-based map defines genetic variation at PSORS1, the major psoriasis-susceptibility locus. Am J Hum Genet 71:554–564

    Google Scholar 

  • Zhang K, Calabrese P, Nordborg M, Sun F (2002a) Haplotype block structure and its applications to association studies: power and study designs. Am J Hum Genet 71:1386–1394

    Article  CAS  PubMed  Google Scholar 

  • Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002b) A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA 99:7335–7339

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This study was supported by the German National Genome Research Network (NGFN), the German Human Genome Project (DHGP), a GEM ("Center of Expertise in Genetic Epidemiology") grant from the German Federal Ministry of Education and Research, and a "DFG Forschergruppe" on complex disorders. The authors wish to thank Annette Stenzel and Peter Croucher for providing chromosome 6 and 16 SNP genotype data, and Timothy Lu for preparing the haplotype heat map of region 6p21.31.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jochen Hampe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hampe, J., Schreiber, S. & Krawczak, M. Entropy-based SNP selection for genetic association studies. Hum Genet 114, 36–43 (2003). https://doi.org/10.1007/s00439-003-1017-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-003-1017-2

Keywords

Navigation