Skip to main content

Advertisement

Log in

Simulating Linkage Disequilibrium Structures in a Human Population for SNP Association Studies

  • Published:
Biochemical Genetics Aims and scope Submit manuscript

Abstract

Existing simulation methods usually simulate linkage disequilibrium (LD) structures starting with an initial population that is randomly generated according to specified allele frequencies. These at random based methods might be unstable because the LD level of the initial population is generally extremely low. This study presents a new algorithm, SIMLD, to simulate genome populations with real LD structures. SIMLD begins from an initial population with possibly the highest LD level, and then the LD decays to fit the desired level through processes of mating and recombination over generations. SIMLD can produce case–control samples according to various disease models. Using empirical SNP marker information from three populations of HapMap data, we implement the proposed algorithm and demonstrate a set of experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Baker BS, Carpenter ATC, Esposito MS, Esposito RE, Sandler L (1976) The genetic control of meiosis. Annu Rev Genet 10:53–134

    Article  PubMed  CAS  Google Scholar 

  • Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265

    Article  PubMed  CAS  Google Scholar 

  • Bass MP, Martin ER, Hauser ER (2004) Pedigree generation for analysis of genetic linkage and association. Pac Symp Biocomput 9:93–103

    Google Scholar 

  • Chen GK, Marjoram P, Wall JD (2009) Fast and flexible simulation of DNA sequence data. Genome Res 19:136–142

    Article  PubMed  CAS  Google Scholar 

  • Dudek S, Mostinger AA, Velez D, Williams SM, Ritchie MD (2006) Data simulation software for whole-genome association and other studies in human genetics. Pac Symp Biocomput 11:499–510

    Article  Google Scholar 

  • Edwards TL, Bush WS, Turner SD, Dudek SM, Tortenson ES, Schmidt M, Martin E, Ritchie MD (2008) Generating linkage disequilibrium patterns in data simulations using GenomeSIMLA. EvoBIO, LNCS 4973:24–35

    Google Scholar 

  • Hahn LW, Ritchie MD, Moore JH (2003) Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19:376–382

    Article  PubMed  CAS  Google Scholar 

  • Haldane JBS (1919) The combination of linkage values, and the calculation of distances between the loci of linked factors. J Genet 8:299–309

    Article  Google Scholar 

  • International HapMap Consortium (2003) The International HapMap Project. Nature 426:789–796

    Article  Google Scholar 

  • International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320

    Article  Google Scholar 

  • International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861

    Article  Google Scholar 

  • Kosambi DD (1944) The estimation of the map distance from recombination values. Ann Eugen 12:172–175

    Article  Google Scholar 

  • Lewontin RC (1988) On measures of gametic disequilibrium. Genetics 120:849–852

    PubMed  CAS  Google Scholar 

  • Liang L, Zollner S, Abecasis GR (2007) Genome: a rapid coalescent-based whole genome simulator. Bioinformatics 23:1565–1567

    Article  PubMed  CAS  Google Scholar 

  • Peng B, Amos CI (2010) Forward-time simulation of realistic samples for genome-wide association studies. BMC Bioinformatics 11:442

    Article  PubMed  Google Scholar 

  • Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147

    Article  PubMed  CAS  Google Scholar 

  • Schmidt M, Hauser ER, Martin ER, Schmidt S (2005) Extension of the SIMLA package for generating pedigrees with complex inheritance patterns: environmental covariates, gene-gene and gene-environment interaction. Stat Appl Genet Mol Biol 4, Article 15

  • Wright FA, Huang H, Guan X, Gamiel K, Jeffries C, Barry WT, de Villena FP, Sullivan PF, Wilhelmsen KC, Zou F (2007) Simulating association studies: a data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23:2581–2588

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Science Fund of China under Grant nos. 61070137, 60933009, and 60371044, and by the U.S. National Institutes of Health under Grants GM085665, HL090567, and NS029525.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiguo Yuan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10528_2011_9416_MOESM1_ESM.doc

Simulated LD for sim500SNPs_data. D′ values by distance are given in the comparisons between the simulated data and real populations, JPT/CHB (a), CEU (b), and YRI (c) (DOC 41 kb)

10528_2011_9416_MOESM2_ESM.doc

Simulated LD for sim1000SNPs_data. D′ values by distance are given in the comparisons of simulated data and real data for populations (a) JPT/CHB, (b) CEU, and (c) YRI (DOC 42 kb)

10528_2011_9416_MOESM3_ESM.doc

Simulated LD for sim2000SNPs_data. D′ values by distance are given in the comparisons of simulated data and real data for populations (a) JPT/CHB, (b) CEU, and (c) YRI (DOC 41 kb)

10528_2011_9416_MOESM4_ESM.doc

Simulated LD for sim5000SNPs_data. D′ values by distance are given in the comparisons of simulated data and real data for populations (a) JPT/CHB, (b) CEU, and (c) YRI (DOC 41 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, X., Zhang, J. & Wang, Y. Simulating Linkage Disequilibrium Structures in a Human Population for SNP Association Studies. Biochem Genet 49, 395–409 (2011). https://doi.org/10.1007/s10528-011-9416-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10528-011-9416-x

Keywords

Navigation