Abstract
Well-powered genome-wide association studies, now made possible through advances in technology and large-scale collaborative projects, promise to characterize the contribution of rare variants to complex traits and disease. However, while population structure is a known confounder of association studies, it remains unknown whether methods developed to control stratification are equally effective for rare variants. Here, we demonstrate that rare variants can show a stratification that is systematically different from, and typically stronger than, common variants, and this is not necessarily corrected by existing methods. We show that the same process leads to inflation for load-based tests and can obscure signals at truly associated variants. Furthermore, we show that populations can display spatial structure in rare variants, even when Wright's fixation index FST is low, but that allele frequency–dependent metrics of allele sharing can reveal localized stratification. These results underscore the importance of collecting and integrating spatial information in the genetic analysis of complex traits.
Similar content being viewed by others
References
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Bodmer, W. & Bonilla, C. Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet. 40, 695–701 (2008).
Spencer, C.C., Su, Z., Donnelly, P. & Marchini, J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 5, e1000477 (2009).
Nejentsev, S., Walker, N., Riches, D., Egholm, M. & Todd, J.A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324, 387–389 (2009).
Cohen, J.C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).
Wang, J. et al. Common and rare ABCA1 variants affecting plasma HDL cholesterol. Arterioscler. Thromb. Vasc. Biol. 20, 1983–1989 (2000).
1000 Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Ionita-Laza, I., Buxbaum, J.D., Laird, N.M. & Lange, C. A new testing strategy to identify rare variants with either risk or protective effect on disease. PLoS Genet. 7, e1001289 (2011).
Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Madsen, B.E. & Browning, S.R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).
Morris, A.P. & Zeggini, E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34, 188–193 (2010).
Mukhopadhyay, I., Feingold, E., Weeks, D.E. & Thalamuthu, A. Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genet. Epidemiol. 34, 213–221 (2010).
Neale, B.M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).
Bansal, V., Libiger, O., Torkamani, A. & Schork, N.J. Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet. 11, 773–785 (2010).
Knowler, W.C., Williams, R.C., Pettitt, D.J. & Steinberg, A.G. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am. J. Hum. Genet. 43, 520–526 (1988).
Lander, E.S. & Schork, N.J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).
Pritchard, J.K. & Donnelly, P. Case-control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).
Cardon, L.R. & Palmer, L.J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).
Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243–1246 (2005).
Marchini, J., Cardon, L.R., Phillips, M.S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).
Bacanu, S.A., Devlin, B. & Roeder, K. The power of genomic control. Am. J. Hum. Genet. 66, 1933–1944 (2000).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
Nelis, M. et al. Genetic structure of Europeans: a view from the North-East. PLoS ONE 4, e5472 (2009).
Bustamante, C.D., Burchard, E.G. & De la Vega, F.M. Genomics for the world. Nature 475, 163–165 (2011).
Moran, P.A.P. Notes on continuous stochastic phenomena. Biometrika 37, 17–23 (1950).
Copeland, K.T., Checkoway, H., McMichael, A.J. & Holbrook, R.H. Bias due to misclassification in estimation of relative risk. Am. J. Epidemiol. 105, 488–495 (1977).
Acknowledgements
The authors thank M. Pirinen, C. Spencer, Z. Iqbal and C. Lindgren for discussion and M. Pirinen for providing software to fit the linear mixed models used in this analysis. This work was supported by grants from the Wellcome Trust (089250/Z/09/Z to I.M., 086084/Z/08/Z to G.M. and 090532/Z/09/Z to the Wellcome Trust Centre for Human Genetics).
Author information
Authors and Affiliations
Contributions
G.M. conceived and designed the study. I.M. ran simulations and collected results. G.M. and I.M. jointly wrote the simulation code and manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7 (PDF 813 kb)
Rights and permissions
About this article
Cite this article
Mathieson, I., McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet 44, 243–246 (2012). https://doi.org/10.1038/ng.1074
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.1074
- Springer Nature America, Inc.
This article is cited by
-
The Newfoundland and Labrador mosaic founder population descends from an Irish and British diaspora from 300 years ago
Communications Biology (2023)
-
A Varying Coefficient Model to Jointly Test Genetic and Gene–Environment Interaction Effects
Behavior Genetics (2023)
-
Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling
BMC Genomics (2022)
-
Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data
Nature Genetics (2022)
-
A framework for high-resolution phenotyping of candidate male infertility mutants: from human to mouse
Human Genetics (2021)