The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

Minică, Camelia C.; Dolan, Conor V.; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M.; Boomsma, Dorret I.

doi:10.1007/s10519-013-9590-1

The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

Original Research
Published: 22 March 2013

Volume 43, pages 254–266, (2013)
Cite this article

Behavior Genetics Aims and scope Submit manuscript

Camelia C. Minică¹,
Conor V. Dolan^1,2,
Jouke-Jan Hottenga¹,
Gonneke Willemsen¹,
Jacqueline M. Vink¹ &
…
Dorret I. Boomsma¹

282 Accesses
4 Citations
Explore all metrics

Abstract

When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

About the power of biostatistics in sibling analysis—comparison of empirical and simulated data

Article 21 August 2015

Haplin power analysis: a software module for power and sample size calculations in genetic association analyses of family triads and unrelated controls

Article Open access 02 April 2019

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Article Open access 25 May 2023

Notes

Of the 41 SNPs, 20 SNPs were available in the current sample.
Fitting the constrained model in Mx and nlme produced identical results.
For convenience we have chosen the Bonferroni method to correct for multiple testing, although this procedure can be conservative (Laird and Lange 2011). However, in Fig. 4 we plot the values of the noncentrality parameter of the likelihood ratio test, as these values do not depend on the chosen alpha, or the correction for multiple testing. They are illustrative of the variation in power—before and following imputation—given various effect sizes.
As an additional check, the analysis of height data was repeated in Merlin (Abecasis et al. 2002), and this analysis produced similar results (results not shown).

References

Abecasis GR, Cardon LR, Cookson WO (2000) A general test of association for quantitative traits in nuclear families. Am J Hum Genet 66(1):279–292
Article PubMed Google Scholar
Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002) Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97–101
Article PubMed Google Scholar
Boker S, Neale MC, Maes H, Wilde M, Spiegel M, Brick T, Spies J, Estabrook R, Kenny S, Bates T, Mehta P, Fox J (2011) OpenMx: an open source extended structural equation modeling framework. Psychometrika 76(2):306–317
Article PubMed Google Scholar
Boomsma DI, de Geus EJK, Vink JM, Stubbe JH, Distel MA, Hottenga JJ, Posthuma D, van Beijsterveldt TCEM, Hudziak JJ, Bartels M, Willemsen G (2006) Netherlands Twin Register: from twins to twin families. Twin Res Hum Genet 9(6):849–857
Article PubMed Google Scholar
Burdick JT, Chen WM, Abecasis GR, Cheung VG (2006) In silico method for inferring genotypes in pedigrees. Nat Genet 38(9):1002–1004
Article PubMed Google Scholar
Chen WM, Abecasis GR (2007) Family-based association tests for genomewide association scans. Am J Hum Genet 81(5):913–926
Article PubMed Google Scholar
Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Prentice Hall, Harlow
Google Scholar
Fulker D, Cherny S, Sham P, Hewitt J (1999) Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet 64(1):259–267
Article PubMed Google Scholar
Gorjanc G, Henderson DA, with code contributions by Kinghorn B and Percy A (2007) GeneticsPed: Pedigree and genetic relationship functions. R package version 1.20.0. http://rgenetics.org
Kinghorn BP (1997) An index of information content for genotype probabilities derived from segregation analysis. Genetics 145(2):479–483
PubMed Google Scholar
Kinghorn BP (1999) Use of segregation analysis to reduce genotyping costs. J Anim Breed Genet 116(3):175–180
Article Google Scholar
Laird NM, Lange C (2011) The fundamentals of modern statistical genetics. Springer Verlag, New York
Book Google Scholar
Lango Allen H, Estrada K, Lettre G, Berndt S, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Ferreira T, Wood AR et al (2010) Hundreds of variants influence human height and cluster within genomic loci and biological pathways. Nature 467(7317):832–838
Article PubMed Google Scholar
Li Y, Willer C, Sanna S, Abecasis GR (2009) Genotype imputation. Annu Rev Genomics Hum Genet 10:387–406
Article PubMed Google Scholar
Mather K, Jinks JL (1977) Introduction to biometrical genetics. Cambridge University Press, Cambridge
Book Google Scholar
Percy A, Kinghorn BP (2005) A genotype probability index for multiple alleles and haplotypes. J Anim Breed Genet 122(6):387–392
Article PubMed Google Scholar
Pinheiro J, Bates D, DebRoy S, Sarkar D, the R Development Core Team (2012) nlme: linear and nonlinear mixed effects models. R package version 3.1–104
R development Core Team (2005) R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org
Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78(4):629–644
Article PubMed Google Scholar
Silventoinen K, Sammalisto S, Perola M, Boomsma DI, Cornes BK, Davis C, Dunkel L, De Lange M, Harris JR, Hjelmborg JV, Luciano M, Martin NG, Mortensen J, Nisticò L, Pedersen NL, Skytthe A, Spector TD, Stazi MA, Willemsen G, Kaprio J (2003) Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res 6(5):399–408
PubMed Google Scholar
Van der Sluis S, Dolan CV, Neale CM, Posthuma D (2008) Power calculations using exact data simulation: a useful tool for genetic study designs. Behav Genet 38(2):202–211
Article PubMed Google Scholar
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Book Google Scholar
Vink JM et al (2009) Genome-wide association study of smoking initiation and current smoking. Am J Hum Genet 84(3):367–379
Article PubMed Google Scholar
Visscher PM, Duffy DL (2006) The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits. Genet Epidemiol 30(1):30–36
Article PubMed Google Scholar
Visscher PM, Benyamin B, White J (2004) The use of linear mixed models to estimate variance components from data on twin pairs by maximum likelihood. Twin Res 7(6):670–674
PubMed Google Scholar
Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, Medland S, Hill WG, Hottenga JJ, Willemsen G, Boomsma DI, Liu YZ, Deng HW, Montgomery GW, Martin NG (2007) Genome partitioning of genetic variation for height from 11,214 sibling pairs. Am J Hum Genet 81(5):1104–1110
Article PubMed Google Scholar
Visscher PM, Andrew TA, Nyholt DR (2008) Genome-wide association studies of quantitative traits with related individuals: little (power) lost but much to be gained. Eur J Hum Genet 16(3):387–390
Article PubMed Google Scholar
Willemsen G, de Geus EJC, Bartels M, van Beijsterveldt TCEM, Brooks AI, van Burk GFE, Fugman DA, Hoekstra C, Hottenga JJ, Kluft K, Meijer P, Montgomery GW, Rizzu P, Sondervan D, Smit AB, Spijker S, Suchiman HED, Tischfield JA, Lehner T, Slagboom PE, Boomsma DI (2010) The Netherlands Twin Register biobank: a resource for genetic epidemiological studies. Twin Res Hum Genet 13(3):231–245
Article PubMed Google Scholar
Zheng J, Yun L, Abecasis GR, Scheet P (2011) A comparison of approaches to account for uncertainty in analysis of imputed genotypes. Genet Epidemiol 35(2):102–111
Article PubMed Google Scholar

Download references

Acknowledgments

Camelia C. Minică and Jacqueline M. Vink are supported by the ERC starting Grant 284167. The statistical analyses were carried out on the Genetic Cluster Computer (http://www.geneticcluster.org) which is financially supported by the Netherlands Scientific Organization (NWO 480-05-003), the Dutch Brain Foundation and the Department of Psychology and Education of the VU University Amsterdam. Data collection and genotyping were funded by the Netherlands Organization for Scientific Research (NWO: MagW/ZonMW grants 904-61-090, 985-10-002, 904-61-193, 480-04-004, 400-05-717, Addiction-31160008 Middelgroot-911-09-032, Spinozapremie 56-464-14192), Center for Medical Systems Biology (CSMB, NWO Genomics), NBIC/BioAssist/RK(2008.024), Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-NL, 184.021.007), the VU University’s Institute for Health and Care Research (EMGO+) and Neuroscience Campus Amsterdam (NCA), the European Science Foundation (ESF, EU/QLRT-2001-01254), the European Community’s Seventh Framework Program (FP7/2007–2013), ENGAGE (HEALTH-F4-2007-201413); the European Science Council (ERC Advanced, 230374), Rutgers University Cell and DNA Repository (NIMH U24 MH068457-06), the Avera Institute, Sioux Falls, South Dakota (USA), the National Institutes of Health (NIH, R01D0042157-01A), the Genetic Association Information Network (GAIN) of the Foundation for the US National Institutes of Health, the (NIMH, MH081802) and by the Grand Opportunity Grants 1RC2MH089951-01 and 1RC2 MH089995-01 from the NIMH. The authors have no conflict of interest to declare.

Author information

Authors and Affiliations

Department of Biological Psychology, VU University Amsterdam, Van der Boechorststraat 1, 1081 BT, Room 2B03, Amsterdam, The Netherlands
Camelia C. Minică, Conor V. Dolan, Jouke-Jan Hottenga, Gonneke Willemsen, Jacqueline M. Vink & Dorret I. Boomsma
Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
Conor V. Dolan

Authors

Camelia C. Minică
View author publications
You can also search for this author in PubMed Google Scholar
Conor V. Dolan
View author publications
You can also search for this author in PubMed Google Scholar
Jouke-Jan Hottenga
View author publications
You can also search for this author in PubMed Google Scholar
Gonneke Willemsen
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline M. Vink
View author publications
You can also search for this author in PubMed Google Scholar
Dorret I. Boomsma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Camelia C. Minică.

Additional information

Edited by Sarah Medland.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Minică, C.C., Dolan, C.V., Hottenga, JJ. et al. The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification. Behav Genet 43, 254–266 (2013). https://doi.org/10.1007/s10519-013-9590-1

Download citation

Received: 12 September 2012
Accepted: 18 February 2013
Published: 22 March 2013
Issue Date: May 2013
DOI: https://doi.org/10.1007/s10519-013-9590-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

Abstract

Access this article

Similar content being viewed by others

About the power of biostatistics in sibling analysis—comparison of empirical and simulated data

Haplin power analysis: a software module for power and sample size calculations in genetic association analyses of family triads and unrelated controls

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

Abstract

Access this article

Similar content being viewed by others

About the power of biostatistics in sibling analysis—comparison of empirical and simulated data

Haplin power analysis: a software module for power and sample size calculations in genetic association analyses of family triads and unrelated controls

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation