Abstract
Common strategy of genome-wide association studies (GWAS) relying on large samples faces difficulties, which raise concerns that GWAS have exhausted their potential, particularly for complex traits. Here, we examine the efficiency of the traditional sample-size-centered strategy in GWAS of these traits, and its potential for improvement. The paper focuses on the results of the four largest GWAS meta-analyses of body mass index (BMI) and lipids. We show that just increasing sample size may not make p-values of genetic effects in large (N > 100,000) samples smaller but can make them larger. The efficiency of these GWAS, defined as ratio of the log-transformed p-value to the sample size, in larger samples was larger than in smaller samples for a small fraction of loci. These results emphasize the important role of heterogeneity in genetic associations with complex traits such as BMI and lipids. They highlight the substantial potential for improving GWAS by explicating this role (affecting 11–79% of loci in the selected GWAS), especially the effects of biodemographic processes, which are heavily underexplored in current GWAS and which are important sources of heterogeneity in the various study populations. Further progress in this direction is crucial for efficient use of genetic discoveries in health care.
Similar content being viewed by others
Introduction
New insights into genetic predisposition to diseases, related traits, and survival could substantially contribute to improvement of healthspan, particularly in aging populations in developed countries1,2,3. Genome-wide association studies (GWAS) have been thought to accelerate the progress in this endeavor.
Despite the apparent GWAS successes4, it is recognized that the common GWAS strategy faces serious difficulties, especially in the case of complex traits5,6,7. For example, one difficulty is the problem of the effect sizes: GWAS typically report associations with modest effects and even smaller effects are expected to be detected according to the infinitesimal hypothesis5. This difficulty is related to the problem of missing heritability8. Another difficulty is the problem of non-replication of genetic associations with complex traits9,10,11. These difficulties raise concerns that GWAS have exhausted their potential for complex traits and new strategies are needed6.
Various strategies for overcoming GWAS difficulties have been discussed. One broad category of strategies emphasizes the role of genetic factors such as rare variants, epigenetic modifications, miRNA, etc.6. Another category highlights the critical role of inherent heterogeneity of complex traits, heterogeneity which encompasses the complexity of endogenous and exogenous mechanisms predisposing to these traits5,7,12,13,14. The stochastic component in genetic susceptibility to complex traits may be another important factor15.
These strategies may or may not fit those which are widespread in current GWAS. For example, GWAS rely heavily on “the benefits of the large sample sizes achievable through collaboration”16 for detecting risk alleles of complex traits. This strategy makes perfect sense if one assumes that genetic susceptibility to a trait of interest is homogeneous, i.e., the same genetic variant, or, more broadly, biological process, predispose to this trait in different people in a given population. Evolutionary biology provides, however, little support to this hypothesis; rather, it argues that genetic predisposition to complex traits is inherently heterogeneous17,18 (see more details in the Discussion section “Why and how to improve the traditional GWAS strategy in case of complex traits?”). Then, the analyses of genetic predisposition to heterogeneous traits (as opposite to homogeneous traits defined above) relying on increasing sample size become problematic because “increasing the size of human disease cohorts is likely only to scale the heterogeneity in parallel”7.
In this paper, we examine the potential for improving the traditional GWAS strategy, which relies on increasing the sample size, in the case of inherently heterogeneous traits. The paper focuses on the results of the largest GWAS meta-analyses, so far. They include studies of lipids comprising nearly 100,000 individuals19 and nearly 188,000 individuals16, and studies of body mass index (BMI) comprising nearly 250,000 individuals20 and nearly 320,000 individuals21 of European descent.
Results
Estimates in GWAS with larger and smaller samples
To examine the efficiency of the traditional GWAS strategy, we first compared p-values between larger and smaller samples (see “Methods”). Figure 1 shows that BMI p-values in the larger sample21 (N~320 K) were larger than in the smaller sample20 (N~240 K) for 21.1%, i.e., 8 of 38 loci. For 11 additional loci, we observed minor change in p-values with their decrease by just two orders of magnitude despite increasing the sample size by about 33%. Therefore, the BMI p-values were either larger or had minor decrease (within two orders of magnitude) for 50% of loci.
For 13 of 38 SNPs (34.2%) the BMI p-values were either larger (5 loci) or had minor decrease (8 loci) in the larger sample compared to the smaller sample in the same study20 despite nearly two-fold difference in the sample sizes (Supplementary Figure 1). In the other study21, the BMI p-values in the larger sample were smaller than in the smaller sample by at least four orders of magnitude (Supplementary Figure 2) due to a nearly four-fold sample size difference between them (Supplementary Table 3).
For the lipid meta-analyses, the p-values were either larger (5 loci) or had minor decrease (4 loci within two orders of magnitude) for nine of 76 loci (11.8%) in the larger study16 (N~188 K) than in the smaller study19 (N~100 K) (Supplementary Figure 3).
Relative efficiency of GWAS
At first glance, the results in Supplementary Figure 2 appear to support the benefits of large samples in GWAS. The analysis presented in Fig. 1 and Supplementary Figures 1 and 3 provide less support, however. To gain further insights on the efficiency of GWAS, we evaluated the relative efficiency ρ (see Methods, “Efficiency measure”) of GWAS in the same pairs of larger and smaller samples as above.
Figure 2 shows that the relative efficiency of the BMI GWAS in refs 21,20 varied between 0.3 and 1.5. Higher efficiency in the larger sample than in the smaller sample (i.e., ρ > 1) was found for 3 of 38 loci (7.9%). However, the estimates of ρ for these three loci did not attain statistical significance (Fig. 2 and Supplementary Note 1). The relative efficiency ρ was significantly smaller than 1 for 19 of 38 loci (50%). Figure 2 shows lower efficiency not only for loci with larger p-values in the larger sample (Fig. 1) but also for those with modestly smaller p-values.
Figure 3 shows the relative efficiency of the BMI GWAS in the selected samples in each study. For GWAS from ref. 21, the efficiency in the larger sample (the entire sample, N~320 K) was higher than that in the smaller sample (Metabochip, N~88 K) for 7 of 35 loci (20.0%). For all 7 loci the estimates of ρ did not attain significance (Fig. 3 and Supplementary Note 1). The relative efficiency ρ was significantly smaller than 1 for 4 of 35 loci (11.4%). For GWAS in ref. 20, the efficiency in the larger sample (N~250 K) was higher than the efficiency in the smaller sample (Stage 1, N~123 K) for 12 of 38 loci (31.6%). For two of them (TMEM18 and FTO) ρ was significantly larger than 1. For 8 of 38 loci (21.1%) ρ was significantly smaller than 1 (Fig. 3 and Supplementary Note 1).
Figure 3 shows that the relative efficiency of the analyses became consistently higher with increasing the sample size in refs 20,21 only for one BMI locus (MC4R).
Figure 4 shows that the relative efficiency of GWAS of lipids in the larger sample16 compared to the smaller sample19 was higher for 18 of 76 loci (23.7%). For three of them (ABO, LDLR, and APOE) ρ was significantly larger than 1. For 13 of 76 loci (17.1%) ρ was significantly smaller than 1.
Potential for improving the efficiency of GWAS
Improving the efficiency of the analyses implies that fewer people are needed to achieve the same result as in the case of conventional (unimproved) efficiency. One potential source is improving the efficiency of the analyses in larger samples (characterized by ξ1) for loci with ρ = ξ1/ξ2 < 1 because ξ1 < ξ2 in this case.
It can be argued, however, that the situation with ρ < 1 can be common in follow up studies with larger samples because of the winner’s curse effect22. The winner’s curse hypothesis was adapted from the auction theory implying that in an auction the winner tends to overpay. In genetic association studies, it may characterize an ascertainment bias due to focusing on upwardly biased effect sizes capable to yield significant associations in the discovery studies. This hypothesis was introduced in pre-GWAS era to explain the lack of replication of genetic effects in the follow up studies which were of substantially smaller sample sizes compared to GWAS with N > 100 K individuals considered in the current paper. These large samples substantially weaken arguments on a pivotal role of the winner’s course effect in the situation with ρ < 1.
More importantly, however, is that in biology any inferences should be considered from the viewpoint of evolution (see the Discussion section “Why and how to improve the traditional GWAS strategy in case of complex traits?”). The role of evolution in the winner’s course effect is, however, unclear, particularly in the selected BMI and lipid GWAS.
The evolutionary theory suggests that genetic predisposition to complex traits should be inherently heterogeneous. Deviation from ρ = 1 is consistent with heterogeneity in genetic effects between two samples (see the Analysis section “Efficiency measure”). Accordingly, the situation with ρ < 1 can be not only due to the winner’s course effect but it can naturally be due to heterogeneity.
To quantify the potential for improving the efficiency in this case one needs to determine cut off for ρ. It is not entirely clear, however, how to do that. One approach is to select the cut off based on significance of ρ < 1 for specific loci (Figs 2). This approach implicitly assumes that the non-significant estimates for loci with ρ < 1 are likely the result of stochastic realization. Given insights from the evolutionary theory, non-significant estimates for loci with ρ < 1 may not necessarily be due to stochasticity but they can also be due to heterogeneity. Then, the less conservative approach would be to select the cut off based on cost/benefit reasoning.
For example, assuming a 90% relative efficiency of GWAS (i.e., ρ = 0.9) implies that the efficiency ξ1 (i.e., the ratio of the log-transformed p-value to the sample size) was larger by 10% in a larger sample than in a smaller one. This means that the efficiency of the analysis in the larger sample is 10% smaller than in the smaller sample. This is equivalent to underusing 10% of the available sample that in large samples with N > 100 K leads to underusing information on more than 10 K people. Therefore, by improving the efficiency ξ1, one can use 10% smaller sample in this case to achieve the same result as in the case of conventional (unimproved) efficiency. Table 1 shows that this improvement would also affect a large proportion of loci which, for 90% efficiency, ranges from 52.6% to 73.7%. Thus, given cost (in terms of investment in 10% increase of the sample size in this case) and benefits (in terms of using non-increased sample but conducting more rigorous analyses) one could decide which strategy would be beneficial in a specific situation.
The situation with ρ = ξ1/ξ2 > 1 indicates that there is also potential for improving the efficiency but in smaller samples characterized by ξ2. The same approaches to determine cut off for ρ as in the case of ρ < 1 (see above) hold here.
Because deviation from ρ = 1 (i.e., either ρ < 1 or ρ > 1) is consistent with heterogeneity, the potential for improvement can likely be in exploring more rigorous approaches to explicate an inherent heterogeneity in genetic associations with complex traits which remains after handling cross-sample heterogeneity using genomic methods used in the referenced GWAS meta-analyses (see the Discussion section “Why and how to improve the traditional GWAS strategy in case of complex traits?”).
Based on statistical tests, deviation from ρ = 1 was significant for: (i) 19 of 38 loci (50%) for the BMI GWAS in refs 21,20 (Fig. 2), (ii) 4 of 35 loci (11.4%) for the BMI GWAS in ref. 21 (Fig. 3, blue color), (iii) 10 of 38 loci (26.3%) for the BMI GWAS in ref. 20 (Fig. 3, green color), and (iv) 16 of 76 loci (21.1%) for GWAS of lipids in refs 16,19 (Fig. 4). According to this approach, significant deviation is observed for 11% to 50% of loci. Based on the cost/benefit approach, deviation from ρ = 1 by 10% (i.e., 1.1 < ρ < 0.9) is observed for 62% to 79% of loci (Table 1). Thus, these results indicate a substantial potential for improvement which may affect 11% (the most conservative estimate) to 79% (the less conservative estimate) of loci in the selected four GWAS.
Discussion
GWAS often relies on “the benefits of the large sample sizes”16 for detecting risk alleles of complex traits. Conversely, it is also argued that the traditional GWAS strategy merely relying on increasing the sample size is problematic because of the inherent heterogeneity of complex traits7. As a result, common GWAS may substantially underuse the available resources. In this paper, we examined the efficiency of GWAS of lipids16,19 and BMI20,21, which are the largest GWAS meta-analyses so far, and highlighted the potential for improving GWAS efficiency.
Next, we emphasize three important results, which support the strong potential for improving the efficiency of GWAS of complex, inherently heterogeneous traits.
First, our analyses show that the estimates of the significance of genetic effects may decrease with increasing sample size, i.e., p-values become larger in larger samples compared to smaller samples. For example, the significance of the BMI estimates was larger for 21% of loci in the larger sample (N~320 K) than in the smaller sample (N~240 K) (Fig. 1). Importantly, the large sample sizes of these “larger” and “smaller” samples offset the problem of stochastic variation in p-values, which is more likely in small samples.
Second, the analyses of samples of larger and smaller sizes showed that the efficiency of GWAS was larger in the larger samples for a small fraction of loci ranging from 7.9% (3 of 38 loci) for the BMI GWAS in refs 21,20 (Fig. 2) to 31.6% of loci (12 of 38) for the BMI GWAS in two samples in ref. 20 (Fig. 3, green color). The benefit of larger samples was supported by statistical significance for 2 loci for the BMI GWAS (Fig. 3, green color) and for 3 loci in GWAS of lipids (Fig. 4).
Third, consistent increase of the relative efficiency was found for only one BMI locus.
These results lead to three important conclusions. First, increasing the sample size of the study population in genetic analyses of complex traits does not necessarily decrease the estimates of the significance of genetic effects (p-values) but can actually increase p-values. Second, our results support the substantial role of heterogeneity in genetic predisposition to complex traits such as BMI and lipids. Importantly, this is an inherent (trait-specific) heterogeneity due to the elusive role of evolution in these traits (see below), which remains after handling cross-study heterogeneity using genomics methods in the referenced GWAS. Third, the results highlight the substantial potential for improving GWAS by explicating this inherent heterogeneity that may affect 11% (the most conservative estimate) to 79% (the less conservative estimate) of loci in the selected four GWAS.
Why and how to improve the traditional GWAS strategy in case of complex traits ? A key argument for increasing sample size in GWAS of traits with moderate and small effect sizes is to have sufficient statistical power to detect genetic effects. This is because low power decreases the likelihood that a significant association actually reflects “a true effect”23. Accordingly, a key hypothesis behind this argument is that “a true” genetic effect on a trait exists. The discipline of biology argues that “nothing in biology makes sense except in the light of evolution”24. Therefore, understanding the role of evolution in complex traits is critical in studies of genetic (i.e., biological) predisposition to these traits.
Evolutionary biology, epidemiology, and aging research argue that: (i) environmental exposures in modern societies are dramatically different than those in the past and (ii) complex traits may not be subject to direct evolutionary selection17,18,25. These factors imply that genetic variants may not have a wide norm of reaction for complex traits24. Furthermore, if the strategic goal is to improve human well-being, healthspan, and lifespan3, GWAS necessarily face a need to deal with a special class of traits, called age-related disease traits, i.e., traits, which are characteristic of the elderly people in modern societies. These are complex polygenic traits. Unlike the other complex traits, age-related disease traits have three important aspects. First, from the evolutionary point of view they are a relatively new massive phenomenon. This is because, for example, in 1840 the world record of mean lifespan for women was about 45 years26 implying that about half of the population did not survive to older ages where incidence of the age-related disease traits sharply increases. Second, they are characteristic of the post-reproductive period where selection pressure is not as strong as at the reproductive period. Third, these traits appear in late life whereas genes are transmitted from parents at conception, i.e., these events are separated by a large portion of the individuals’ life. It can be argued that refocusing from the genetics of age-related diseases to their precursors (often called endophenotypes), which are characteristic for reproductive age, could benefit the analyses. However, the role of evolution in endophenotypes is also elusive because genes regulating endophenotypes have not been directly selected against or in favor of their pathological dysregulation causing age-related diseases.
Sensitivity of genetic effects to the environment due to narrow norm of reaction to complex traits and specific properties of age-related traits weaken the conceptual basis of the hypothesis of “true” genetic effects on complex traits5,17 and strengthen the hypothesis of complex roles of the same genetic variants in the same trait27. The latter implies genetic heterogeneity. As a result, the common GWAS strategy relying mostly on increasing the sample size becomes problematic in case of complex traits.
A key to improve the efficiency of the analyses of genetic predisposition to complex traits without increasing sample size is to better handle heterogeneity. However, heterogeneity is the result of various processes that requires better understanding diversity of its sources. Common sources include: (i) processes associated with evolutionarily selected genetic patterns in populations, (ii) complex etiologies of human phenotypes, (iii) environmental influences, and (iv) age-related heterogeneity attributed to the elusive role of evolution in the development of age-related phenotypes. Clearly, different sources of heterogeneity require different strategies to work with. Common practice in GWAS is to handle heterogeneity attributed to population structures, using, for example, methods of principal component analysis28. These methods may efficiently address the first source of heterogeneity especially in young populations with no substantial survival selection. The second source can be addressed by refining the architecture of complex traits and by examining more homogeneous sub-phenotypes6,7. The third source requires addressing gene by environment interactions. The fourth source requires analyses of age-related changes in an organism over its life course and in populations (which takes into account age, cohort, and survival effects), as well as the analyses of mechanistic pathways from genes to downstream phenotypes through endophenotypes17,29,30.
Evidences of the importance of the role of age-related heterogeneity in genetic associations are accumulating in the field. For example, the analyses highlighted the role of age in genetic regulation of BMI31, sensitivity of the effects of longevity alleles to birth cohorts12,32, sensitivity of genetic associations with lipids to chronological age33,34, and changes in the allele frequencies with age35,36.
Thus, refocusing the analyses to a realistic concept of complex, inherently heterogeneous traits reflecting elusive role of evolution in these traits, has substantial potential for improving the efficiency of GWAS. Parameter of the relative efficiency ρ could help in prioritizing SNPs for more comprehensive analyses. This is crucial not only for a better understanding of the genetic influences on these traits, but also for efficient use of genetic discoveries in health care.
Methods
Selection of SNPs representing loci associated with BMI and lipids
We identified SNPs for 38 loci associated with BMI which were reported in refs 20,21 (Supplementary Table 1). These loci were selected for the analyses.
We also selected SNPs for 76 loci associated with lipids which were reported in refs 19,16 (Supplementary Table 4), as described below. We considered SNPs with the best associations reported in ref. 16 with one of the lipid traits, which include total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglycerides.
Analysis
To examine the efficiency of GWAS and its potential for improvement, we used two strategies. First, we compared the significance of the estimates (p-values) in larger and smaller samples. Second, we used a measure of the efficiency as detailed in the subsection below.
We compared the results from the BMI meta-analyses20,21 and the lipid meta-analyses16,19 separately. The focus was on comparative analyses of the results in larger and smaller samples of individuals of European descent.
The results for the selected 38 BMI loci were presented in ref. 20 for the entire sample (N~250 K individuals) and for two subsamples, Stage 1 (N~123 K) and Stage 2 (N~125 K). The results for these loci were presented in ref. 21 for the entire sample (N~320 K) as well as for two subsamples, Stages 1+2 (N~233 K) and Metabochip (N~88 K). We compared the results between larger and smaller samples. For all loci except one (RASA2) the sample size was larger in ref. 21 than in ref. 20. For RASA2, we used the results from ref. 20 to represent the larger sample. Comparative analyses within ref. 20 were focused on the entire sample and Stage 1. All 38 selected loci were used in these analyses (Supplementary Table 2). Comparative analyses within ref. 21 were focused on the entire sample and Metabochip. We used 35 of 38 loci for these analyses because of the lack of estimates for three of them on Metabochip (RASA2, ADCY9, and MTIF3, see Supplementary Table 3).
The results for the lipid loci were presented for the entire samples in each study, i.e., ref. 16 (N~188 K) and ref. 19 (N~100 K). Because for some SNPs the sample size in the analyses in ref. 16 was actually about the same as in ref. 19, we selected only those 76 loci for which sample size in ref. 19 was larger by at least 10,000 individuals than in ref. 16.
Efficiency measure
To better characterize the potential for improvement of GWAS, we derived a measure of the efficiency of GWAS meta-analyses (Supplementary Note 1 and Refs 37, 38, 39). This measure represents the ratio of the log-transformed probability of the effect b to the sample of size N, i.e., ξ = −log10(p)/N. The efficiency measure can be interpreted as the log-transformed p-value per unit observation, i.e., per person in this case. The estimate of the relative efficiency in two samples was defined as ρ = ξ1/ξ2. We used ξ1 for the larger sample and ξ2 for the smaller one. For a given trait-loci association, the relative efficiency ρ shows which sample had larger log-transformed p-value per unit (person). Then, if the efficiency ξ is larger in smaller sample than in larger sample then the analyses in larger sample are less efficient than in smaller sample.
The relative efficiency ρ is also a convenient characteristic of homogeneity/heterogeneity in genetic susceptibility to a given trait (see Supplementary Note 1). The assumption of a homogeneous genetic effect for a given trait implies that ρ = ξ1/ξ2 = 1. Deviation from ρ = 1 is consistent with heterogeneity in genetic effects between two samples.
Additional Information
How to cite this article: Kulminski, A. M. et al. Explicating heterogeneity of complex traits has strong potential for improving GWAS efficiency. Sci. Rep. 6, 35390; doi: 10.1038/srep35390 (2016).
References
Sierra, F., Hadley, E., Suzman, R. & Hodes, R. Prospects for life span extension. Annu Rev Med 60, 457–469, doi: 10.1146/annurev.med.60.061607.220533 (2009).
Olshansky, S. J., Perry, D., Miller, R. A. & Butler, R. N. Pursuing the longevity dividend: scientific goals for an aging world. Ann N Y Acad Sci 1114, 11–13, doi: 10.1196/annals.1396.050 (2007).
Aging, N. I. o. Living Long & Well in the 21st Century: Strategic Directions for Research on Aginghttp://www.nia.nih.gov/sites/default/files/strategic_plan108.pdf (2010) (Date of access: May, 15).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research 42, D1001–D1006, doi: 10.1093/nar/gkt1229 (2014).
Gibson, G. Rare and common variants: twenty arguments. Nat Rev Genet 13, 135–145, doi: 10.1038/nrg3118 (2011).
Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11, 446–450, doi: 10.1038/nrg2809 (2010).
MacRae, C. A. & Vasan, R. S. Next-generation genome-wide association studies: time to focus on phenotype? Circ Cardiovasc Genet 4, 334–336, doi: 10.1161/CIRCGENETICS.111.960765 (2011).
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753, doi: 10.1038/nature08494 (2009).
Yashin, A. I. et al. Genetics of aging, health, and survival: dynamic regulation of human longevity related traits. Front Genet 6, 122, doi: 10.3389/fgene.2015.00122 (2015).
Day-Williams, A. G. & Zeggini, E. The effect of next-generation sequencing technology on complex trait research. Eur J Clin Invest 41, 561–567, doi: 10.1111/j.1365-2362.2010.02437.x (2011).
Kidambi, S. et al. Non-replication study of a genome-wide association study for hypertension and blood pressure in African Americans. BMC Med Genet 13, 27, doi: 10.1186/1471-2350-13-27 (2012).
Kulminski, A. M. et al. Age, gender, and cancer but not neurodegenerative and cardiovascular diseases strongly modulate systemic effect of the apolipoprotein e4 allele on lifespan. PLoS Genet 10, e1004141, doi: 10.1371/journal.pgen.1004141 (2014).
Yashin, A. I. et al. How the quality of GWAS of human lifespan and health span can be improved. Front Genet 4, 125, doi: 10.3389/fgene.2013.00125 (2013).
Yashin, A. I. et al. How the effects of aging and stresses of life are integrated in mortality rates: insights for genetic studies of human health and longevity. Biogerontology doi: 10.1007/s10522-015-9594-8 (2015).
Martin, G. M. Epigenetic gambling & epigenetic drift as potential mechanisms underlying the quasi-stochastic distributions of late life neurodegenerative disorders. Molecular Neurodegeneration 7, L20, doi: 10.1186/1750-1326-7-s1-l20 (2012).
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat Genet 45, 1274–1283, doi: 10.1038/ng.2797 (2013).
Kulminski, A. M. Unraveling genetic origin of aging-related traits: evolving concepts. Rejuvenation Res 16, 304–312, doi: 10.1089/rej.2013.1441 (2013).
Nesse, R. M. & Williams, G. C. Why we get sick: the new science of Darwinian medicine 1st edn (Times Books, 1994).
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713, doi: 10.1038/nature09270 (2010).
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42, 937–948, doi: 10.1038/ng.686 (2010).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206, doi: 10.1038/nature14177 (2015).
Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet 33, 177–182, doi: 10.1038/ng1071 (2003).
Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365–376, doi: 10.1038/nrn3475 (2013).
Dobzhansky, T. Nothing in biology makes sense except in the light of evolution. The American Biology Teacher 35, 125–129, doi: 10.2307/4444260 (1973).
Vijg, J. & Suh, Y. Genetics of longevity and aging. Annu Rev Med 56, 193–212, doi: 10.1146/annurev.med.56.082103.104617 (2005).
Oeppen, J. & Vaupel, J. W. Demography. Broken limits to life expectancy. Science 296, 1029–1031, doi: 10.1126/science.1069675 (2002).
De Benedictis, G. & Franceschi, C. The unusual genetics of human longevity. Science of aging knowledge environment: SAGE KE 2006, pe20, doi: 10.1126/sageke.2006.10.pe20 (2006).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–909, doi: 10.1038/ng1847 (2006).
Yashin, A. I. et al. Genetic Structures of Population Cohorts Change with Increasing Age: Implications for Genetic Analyses of Human aging and Life Span. Ann Gerontol Geriatr Res 1 (2014).
Schork, N. J. Personalized medicine: Time for one-person trials. Nature 520, 609–611, doi: 10.1038/520609a (2015).
Graff, M. et al. Genome-wide analysis of BMI in adolescents and young adults reveals additional insight into the effects of genetic loci over the life course. Hum Mol Genet 22, 3597–3607, doi: 10.1093/hmg/ddt205 (2013).
Nygaard, M. et al. Birth cohort differences in the prevalence of longevity-associated variants in APOE and FOXO3A in Danish long-lived individuals. Exp Gerontol 57, 41–46, doi: 10.1016/j.exger.2014.04.018 (2014).
Kulminski, A. M. et al. The role of lipid-related genes, aging-related processes, and environment in healthspan. Aging Cell 12, 237–246, doi: 10.1111/acel.12046 (2013).
Jarvik, G. P. et al. Genetic influences on age-related change in total cholesterol, low density lipoprotein-cholesterol, and triglyceride levels: longitudinal apolipoprotein E genotype effects. Genet Epidemiol 11, 375–384, doi: 10.1002/gepi.1370110407 (1994).
Atzmon, G. et al. Lipoprotein genotype and conserved pathway for exceptional longevity in humans. PLoS Biol 4, e113 (2006).
Yashin, A. I. et al. Genes, demography, and life span: the contribution of demographic data in genetic studies on aging and longevity. Am J Hum Genet 65, 1178–1193 (1999).
Rao, C. R. Linear statistical inference and its applications (John Wiley & Sons, Inc., 1965).
Chiani, M., Dardari, D. & Simon, M. K. New exponential bounds and approximations for the computation of error probability in fading channels. Ieee T Wirel Commun 2, 840–845, doi: 10.1109/Twc.2003.814350 (2003).
Kendall, M. G. & Stuart, A. The advanced theory of statistics: 3 vol. (Charles Griffin, 1968).
Acknowledgements
The research reported in this paper was supported by Grants No 1P01 AG043352-01, 1R01 AG047310 and 1R01 AG046860 from the National Institute on Aging. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank the Editorial Board Member for the very constructive comments, which resulted in substantial improvement of this paper.
Author information
Authors and Affiliations
Contributions
A.M.K. conceived and designed the experiment and wrote the paper, Y.L. and I.C. conducted the experiment, I.C. and K.G.A. checked the data integrity, A.M.K. and E.S. contributed into the development of methodology, S.V.U., E.S. and A.I.Y. discussed the results and drafted the paper.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Kulminski, A., Loika, Y., Culminskaya, I. et al. Explicating heterogeneity of complex traits has strong potential for improving GWAS efficiency. Sci Rep 6, 35390 (2016). https://doi.org/10.1038/srep35390
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep35390
- Springer Nature Limited
This article is cited by
-
Inclusion of endophenotypes in a standard GWAS facilitate a detailed mechanistic understanding of genetic elements that control blood lipid levels
Scientific Reports (2020)
-
The impact of disregarding family structure on genome-wide association analysis of complex diseases in cohorts with simple pedigrees
Journal of Applied Genetics (2020)
-
Genome-wide analysis of genetic predisposition to Alzheimer’s disease and related sex disparities
Alzheimer's Research & Therapy (2019)
-
Genomics of disease risk in globally diverse populations
Nature Reviews Genetics (2019)