Genotyping and Statistical Analysis

  • Artem Lysenko
  • Keith A. Boroevich
  • Tatsuhiko TsunodaEmail author


Development of technologies for high-throughput profiling of DNA variation has led to rapid discovery of causal genetic mutations underlying complex phenotypic traits and diseases. These exciting advances were originally enabled by the results from the Human Genome project (1990–2003) that allowed the completion of the first genome-wide association study in 2002 and led to the development of haplotype maps of the human genome. Technological advances in microarray genotyping and next-generation sequencing have since made possible the wide-spread and cost-effective application of this approach and, in combination, have powered the new age of biomedical discovery. This chapter introduces the history and fundamental principles of genetic association analysis, and explains key concepts and current statistical methods for processing these data. In particular, discussed topics include experimental design of association studies, quality control procedures, approaches for dealing with the population stratification, statistical testing for genetic associations and more recent developments in detection of effects of rare variants and genetic interactions.


Genome-wide association study High-throughput genotyping technologies Genetic association testing Genotype imputation Haplotype mapping 


  1. 1.
    Morgan TH (1911) Random segregation versus coupling in Mendelian inheritance. Science 34:384CrossRefPubMedGoogle Scholar
  2. 2.
    Sturtevant AH (1913) The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J Exp Zool A Ecol Genet Physiol 14:43Google Scholar
  3. 3.
    Lyamichev V et al (1999) Polymorphism identification and quantitative detection of genomic DNA by invasive cleavage of oligonucleotide probes. Nat Biotechnol 17:292CrossRefPubMedGoogle Scholar
  4. 4.
    Haga H, Yamada R, Ohnishi Y, Nakamura Y, Tanaka T (2002) Gene-based SNP discovery as part of the Japanese Millennium Genome Project: identification of 190 562 genetic variations in the human genome. J Hum Genet 47:605CrossRefGoogle Scholar
  5. 5.
    Ozaki K et al (2002) Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nat Genet 32:650CrossRefGoogle Scholar
  6. 6.
    Tsunoda T et al (2004) Variation of gene-based SNPs and linkage disequilibrium patterns in the human genome. Hum Mol Genet 13:1623CrossRefPubMedGoogle Scholar
  7. 7.
    Dearlove AM (2002) High throughput genotyping technologies. Brief Funct Genomic Proteomic 1:139CrossRefPubMedGoogle Scholar
  8. 8.
    Gabriel S, Ziaugra L (2004) SNP genotyping using Sequenom MassARRAY 7K platform. In: Current protocols in human genetics. Chapter 2, Unit 2 12Google Scholar
  9. 9.
    Chen X, Levine L, Kwok PY (1999) Fluorescence polarization in homogeneous nucleic acid analysis. Genome Res 9:492PubMedPubMedCentralGoogle Scholar
  10. 10.
    Hsu TM, Chen X, Duan S, Miller RD, Kwok PY (2001) Universal SNP genotyping assay with fluorescence polarization detection. BioTechniques 31:560CrossRefPubMedGoogle Scholar
  11. 11.
    Kwok PY (2002) SNP genotyping with fluorescence polarization detection. Hum Mutat 19:315CrossRefPubMedGoogle Scholar
  12. 12.
    Holland PM, Abramson RD, Watson R, Gelfand DH (1991) Detection of specific polymerase chain reaction product by utilizing the 5′–3′ exonuclease activity of Thermus aquaticus DNA polymerase. Proc Natl Acad Sci U S A 88:7276CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Olivier M (2005) The invader assay for SNP genotyping. Mutat Res 573:103CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Mast A, de Arruda M (2006) Invader assay for single-nucleotide polymorphism genotyping and gene copy number evaluation. Methods Mol Biol 335:173PubMedGoogle Scholar
  15. 15.
    Bumgarner R (2013) Overview of DNA microarrays: types, applications, and their future. In: Current protocols in molecular biology. Chapter 22, Unit 22 1Google Scholar
  16. 16.
    Dalma-Weiszhausz DD, Warrington J, Tanimoto EY, Miyada CG (2006) The affymetrix GeneChip platform: an overview. Methods Enzymol 410:3CrossRefPubMedGoogle Scholar
  17. 17.
    Hardenbol P et al (2003) Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat Biotechnol 21:673CrossRefPubMedGoogle Scholar
  18. 18.
    Nilsson M et al (1994) Padlock probes: circularizing oligonucleotides for localized DNA detection. Science 265:2085CrossRefPubMedGoogle Scholar
  19. 19.
    International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851CrossRefGoogle Scholar
  20. 20.
    International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299CrossRefGoogle Scholar
  21. 21.
    Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661CrossRefGoogle Scholar
  22. 22.
    1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061CrossRefGoogle Scholar
  23. 23.
    Voight BF et al (2012) The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet 8:e1002793CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Cortes A, Brown MA (2011) Promise and pitfalls of the Immunochip. Arthritis Res Ther 13:101CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Igo RP Jr, Cooke Bailey JN, Romm J, Haines JL, Wiggs JL (2016) Quality control for the illumina HumanExome BeadChip. Curr Protocols Hum Genet 90:2.14.1Google Scholar
  26. 26.
    Walter K et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82CrossRefPubMedGoogle Scholar
  27. 27.
    Yamaguchi-Kabata Y et al (2015) iJGVD: an integrative Japanese genome variation database based on whole-genome sequencing. Hum Genome Var 2:15050CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Hong EP, Park JW (2012) Sample size and statistical power calculation in genetic association studies. Genomics Inform 10:117CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Yang J et al (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44:369CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Martin ER et al (2010) SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies. Bioinformatics 26:2803CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Yang X, Chockalingam SP, Aluru S (2012) A survey of error-correction methods for next-generation sequencing. Brief Bioinform 14:56CrossRefPubMedGoogle Scholar
  36. 36.
    Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Cardon LR, Palmer LJ (2003) Population stratification and spurious allelic association. Lancet 361:598CrossRefPubMedGoogle Scholar
  39. 39.
    Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997CrossRefPubMedGoogle Scholar
  40. 40.
    Yang BZ, Zhao H, Kranzler HR, Gelernter J (2005) Practical population group assignment with selected informative markers: characteristics and properties of Bayesian clustering via STRUCTURE. Genet Epidemiol 28:302CrossRefPubMedGoogle Scholar
  41. 41.
    Price AL et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904CrossRefPubMedGoogle Scholar
  42. 42.
    Sham PC, Purcell SM (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335CrossRefPubMedGoogle Scholar
  43. 43.
    Gao X, Becker LC, Becker DM, Starmer JD, Province MA (2010) Avoiding the high Bonferroni penalty in genome-wide association studies. Genet Epidemiol 34:100PubMedPubMedCentralGoogle Scholar
  44. 44.
    Manolio TA et al (2009) Finding the missing heritability of complex diseases. Nature 461:747CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Lee S, Wu MC, Lin X (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13:762CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Cordell HJ (2002) Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Mol Genet 11:2463CrossRefPubMedGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Artem Lysenko
    • 1
  • Keith A. Boroevich
    • 1
  • Tatsuhiko Tsunoda
    • 1
    • 2
    • 3
    Email author
  1. 1.Laboratory for Medical Science MathematicsRIKEN Center for Integrative Medical SciencesYokohamaJapan
  2. 2.Tsunoda Laboratory (Medical Science Mathematics), Department of Biological Sciences, Graduate School of ScienceThe University of TokyoTokyoJapan
  3. 3.Department of Medical Science Mathematics, Medical Research InstituteTokyo Medical and Dental UniversityTokyoJapan

Personalised recommendations