Abstract
Polygenic scores (PGS) are now commonly available in longitudinal cohort studies, leading to their integration into epidemiological research. In this work, our aim is to explore how polygenic scores can be used as exposures in causal inference-based methods, specifically mediation analyses. We propose to estimate the extent to which the association of a polygenic score indexing genetic liability to an outcome could be mitigated by a potential intervention on a mediator. To do this this, we use the interventional disparity measure approach, which allows us to compare the adjusted total effect of an exposure on an outcome, with the association that would remain had we intervened on a potentially modifiable mediator. As an example, we analyse data from two UK cohorts, the Millennium Cohort Study (MCS, N = 2575) and the Avon Longitudinal Study of Parents and Children (ALSPAC, N = 3347). In both, the exposure is genetic liability for obesity (indicated by a PGS for BMI), the outcome is late childhood/early adolescent BMI, and the mediator and potential intervention target is physical activity, measured between exposure and outcome. Our results suggest that a potential intervention on child physical activity can mitigate some of the genetic liability for childhood obesity. We propose that including PGSs in a health disparity measure approach, and causal inference-based methods more broadly, is a valuable addition to the study of gene-environment interplay in complex health outcomes.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Leveraging observational data to estimate causal links between exposures and health outcomes, when randomised control trials are not possible, has been one of the main aims in epidemiology. To that end, methods have been developed to derive causal inferences from observational data, aiming to handle bias introduced from confounding and reverse causation by mimicking features of randomised control trials. This is done by using tools such as instrumental variables, propensity scores matching or reweighting [1]. These methods assume that the cause of interest can be intervened on and results into a specific and tightly defined causal effect on the outcome [1]. However, there are many other instances where it is not possible to define a potential intervention, as exposures of interest are non-manipulatable and have non-specific wide-reaching consequences (e.g. ethnicity, sex or socio-economic position). In this situation, it has been proposed to shift attention away from the causal effect of the exposure and instead focus on how much disparity due to a risk factor would remain if a potential intervention would target a downstream mechanism [2, 3]. This health interventional disparity approach—the idea of estimating changes in health outcomes due to an intervention on a manipulable and specific mediator, has been applied to socially determined characteristics in epidemiology [4, 5]. In recent years, genetic data of research participants has become increasingly available in large epidemiological studies. In this paper, our aim is to apply the health disparity interventional approach to differences in health outcomes attributed to differences in genetic risk. We hope that this framework might offer new opportunities for researchers in genetic epidemiology and causal inference, grounding research questions within public health, by asking researchers to hypothesise potential interventions to mitigate disparity associated with genetic risk.
Genetically informed designs to study causation
Researchers have exploited genetically informed research designs to study causation, for example analysing datasets with related individuals such as twins [6]. These research designs, which include prior knowledge of the participants’ genetic relatedness allow researchers to capture some of the genetic and environmental unmeasured confounding, which can strengthen causal inferences. Using twin and sibling designs researchers have studied the causal relationship between exposures and health outcomes such as socio-economic position and depression [7], bullying victimisation and self-harm [8], as well as smoking and lung cancer [9]. Building on family-based studies, technological advances have allowed the mass collection of DNA from participants, and genotypes are now commonly available in large-scale population cohorts. This enables researchers to construct genetic liability indicators for common health outcomes and has led to the integration of genetic data into causal inference within epidemiological research [10]. Well-known examples are the numerous applications of Mendelian Randomisation, which uses genetic variants as instruments to estimate causal effects [11, 12]. Further, genetic data have been integrated into longitudinal studies of families and their children, allowing a more robust study of intergenerational genetic and environmental effects [13, 14].
Genes, polygenic scores, and counterfactuals
Fundamental to this research, is the assumption that genomic variants cause variations in phenotypes. This central dogma of biology describes a one-way stream of information, whereby variations in DNA result in differences in RNA, which in turn are responsible for the synthesis of proteins [15, 16], essential to biological functioning and development. Historically, this central dogma, developed at the same time as the emergence of computer science. Their synergies popularised commonly used terms such as the “genetic code” and the DNA as “blueprint” of life, “transcribing” and “translating information” [17, 18]. In the light of the current scientific consensus, different meanings of genes, or genetic effects, need to be considered when aiming to conceptualise genetic information as exposures in causal inference. As described in detail by Lynch (2021), genetic research, can be crudely categorised into questions regarding monogenic or polygenic traits. The former considers that changes in a singular gene cause changes in the product of the coded protein. In contrast, the latter refers to the fact that most common phenotypes are influenced by many genetic markers (polygenicity) which in turn influence many different outcomes (pleiotropy) [19]. These two theoretical settings need to be carefully considered when using genetics to study causal mechanisms over the life course. Within the stricter definitions of causality, an exposure of interest needs to be intervenable, and result in a specific causal effect [1]. In the context of mono-genic traits, and the advent of gene editing technology, such as a CRISPR, interventions on mono-genic traits could be deemed as acceptable [20]. However, for polygenic traits this might be harder to argue, as these often relate to common health outcomes, which are influenced by 1000s of genomic variants with pleiotropic effects [21]. This pleiotropy obscures the causal chain between polygenic score (PGS), which summarises the additive effects of multiple variants into one singular score, and outcome, leading to a broad rather than specific effect. Lastly, PGS might not be considered as feasible potential intervention targets. Embryo selection based on polygenic scores for common illnesses is currently not plausible and poses difficult ethical questions, as outcomes of such interventions will have unknown, potentially adverse and wide-ranging effects [22, 23]. Regardless, PGS have been described as promising tools in precision medicine and are commonly used in epidemiological research [24], but at the same time have been criticised for the their lack of theoretical and empirical evaluation [25, 26].
Health disparity measure approach
In summary monogenic traits may be conceptualised as causes within the stricter definitions of causal inference, however, this might not apply to polygenic traits. Therefore, in this paper, our goal is to borrow the idea of health disparity interventional affects and apply this framework to PGS, building on the previous work of our group [27]. Health disparity measures can be best understood in the context of mediation analysis where the focus is to identify whether some of the total effect between exposure (X) and outcome (Y), works via an intermediate mediator (M) (see Fig. 1). The total causal effect is defined by imagining the potential outcome of a hypothetical intervention on the exposure (X), comparing the outcome if the participant was exposed (e.g. X = 1) versus the potential outcome if the participant was not exposed (X = 0). This principle is then extended to causal inference mediation models, where the hypothetical intervention targets both the exposure and the mediator [28, 29]. When hypothetical interventions on the exposure are not justifiable, interventional disparity measures can be useful to express how much of the association between exposure and outcome would remain if we intervened on the distribution of the mediator. Here, we describe how this approach can be applied to PGS and then follow with examples using data from the Avon Longitudinal Study of Parents and Children and the Millennium Cohort Study, both from the United Kingdom.
Methods
The health disparity measure approach focuses on two estimands: the Interventional Disparity Measure—Direct Effect (IDM-DE) and the Adjusted Total Association (Adj-TA). As outlined by Micali et al. (2018), the Interventional Disparity Measure—Direct Effect (IDM-DE) is the disparity in the outcome (Y) associated with the exposure that remains if we were to intervene on the intermediate mediator (M), by shifting the distribution of M to the distribution that M would have had under no exposure (X = 0) [29].
In the situation of a binary exposure, and continuous mediator and outcome, we can specify \({M}_{C}^{0}\) as a random draw from the distribution of M conditional on the confounder C when X is set to take the reference value 0. Let Y(m) be the potential outcome when the mediator M is set to take the value m, in this case taking the randomly drawn value \({M}_{C}^{0}\). The IDM-DE is defined as:
where C is here assumed to be categorical. For general definitions see Daniel and De Stavola (2019) [30]. In addition to the IDM-DE, we also aim to estimate the association between exposure and outcome, without intervening on the mediator M. This Adj-TA is defined as:
The difference between Adj-TA and IDM-DE gives some indication of the potential change in disparity due to the hypothetical intervention.
These general definitions apply to situations with a binary exposure. However, genetic liability is commonly expressed as a continuous PGS. In this scenario, one option would be splitting the distribution of the exposure into multiple equal sized groups, representing the participants ranging from low to high genetic liability, for example, sample size permitting, quintiles (1 = lowest risk, 2 = lower risk, 3 = average risk, 4 = high risk, 5 = highest risk; indexed by j). In this case, the definitions of the IDM-DE and Adj-TA need to be adapted, depending on the choice of reference category. Let \({M}_{C}^{1}\) be a random draw from the distribution of M conditional on the confounder C when X is set to take the reference value 1. The disparity measures of interest are then defined as, for j = 2,3,4,5,
The same applies for the Adj-TA, at each level of the exposure in reference to the lowest liability reference (j = 1). This is defined as, for j = 2,3,4,5
Estimation of the interventional disparity measures call upon the three assumptions of no interference, consistency, and no unmeasured confounding of mediator-outcome associations [25].
Motivating example
We aim to study the extent to which a potential intervention on physical activity could mitigate genetic liability for obesity in childhood. Childhood obesity remains one of the main health concerns globally [31]. Children with overweight or obesity have been found to show higher risk of overweight and obesity in adulthood, which is associated with other health outcomes, such as cancer, depression, and asthma [32,33,34]. Further, individuals with overweight and obesity face bullying and stigmatisation from their peers and health professionals, contributing to the health burden [35, 36]. Individual differences in body size have been studied extensively, and twin [37] and genome-wide association studies have provided evidence for a substantial genomic contribution, indicating that hundreds if not thousands of genetic variants are associated with BMI [38]. In addition, rapid changes in the food environment (larger portion sizes, availability of high fat foods) as well as life-style changes (sedentary work and leisure) have been identified as risk factors [39]. Here we imagine an intervention on a downstream behavioural factor, physical activity, which might mitigate some of the genetic liability. Physical activity has been targeted in randomised intervention trials for childhood obesity [40, 41], and genomic studies have suggested that a higher genetic liability for obesity is associated with lower physical activity [42].
In the following, we will estimate the IDM-DE and the Adj-TA to understand the extent to which an intervention on physical activity in childhood can mitigate the association between genetic liability measured by a PGS and later BMI. Data are from the Avon Longitudinal Study of Parents and Children (ALSPAC) [43] and the Millennium Cohort Study (MCS) [44]. Full details of the samples and measurements can be found in Supplement Text and Supplement Table 1. In short, in both cohorts PGS for BMI were calculated using the conditional shrinkage method, developed by Ge et al. [45] from summary statistics of the Genetic Investigation of Anthropometric Traits (GIANT) consortium [38], using the automated analyses pipeline GenoPredPipe [46]. The PGS was then categorised using cohort-specific quintiles. Physical activity was measured using accelerometers when the children were 8 (MCS) or 11 years (ALSPAC), indicating the average minutes of moderate to vigorous physical activity (MVPA) over the course of one week. BMI measures were obtained during research clinic visits at age 11 years (MCS) and 14 years (ALSPAC). Included confounders were maternal education, maternal BMI prior to pregnancy, and child sex using parental report. The hypothesised associations are outlined in Fig. 2a (MCS) and Fig. 2b (ALSPAC). The analyses sample sizes were 2575 and 3347 for MCS and ALSPAC respectively and included complete cases only, followed by sensitivity analyses using imputation.
Estimation
Analyses, consisting of a series of regressions for the mediators and outcomes, were conducted in Stata version 16, with estimation of the IDM-DE and the Adj-TA carried out by plug-in parametric estimation and Monte Carlo simulation on a 1000-fold expanded dataset, with 1000 bootstrap samples to derive confidence intervals. Regression models included non-linear terms and interactions between confounders and mediators to allow for general parametric specifications and thus avoid unnecessary restrictive assumptions (e.g. with respect to linearity of associations).
Sensitivity analyses
In both studies, data on exposure, confounders, and mediators were affected by missingness. For this reason, the Monte Carlo estimation procedure described above was repeated after implementation of a single stochastic imputation of the missing values, using chained equations (with 10 burn-in iterations) assuming missingness was at random (given the observed data). The imputation models included all variables that contributed to the analytical models allowing for non-linearities and interactions. Standard errors were again estimated via bootstrap (with the imputation step redone on each bootstrap sample), avoiding the need for multiple imputations.
To examine the impact of unmeasured mediator-outcome confounders, we used an approach first suggested by Imai et al. [47] and then expanded in De Stavola et al. [48] which consisted in estimating the minimal size of the correlation induced by a confounder that, if controlled for, would remove the impact of the mediator. This correlation is reported for each study, with bootstrapped 95% confidence intervals.
Analyses were pre-registered and the code for all the analyses is available, see https://osf.io/9hbmu/.
Figure 3 shows the scatter plot of childhood BMI (the outcome) and physical activity (the mediator) against PGS-BMI (the exposure), with points colour-coded to reflect PGS quintiles. The lines show the predicted regression lines. These figures give a visualisation of the unadjusted associations between exposure, mediator, and outcome. The PGS-BMI is positively associated with BMI, and negatively associated with physical activity. BMI and physical activity are negatively associated. For more detailed information, means of MVPA and BMI, in each PGS-BMI quintile and their correlations are listed in Supplement Tables 2 and 3.
The hypothetical physical activity intervention envisaged here, shifts the distribution of physical activity to that experienced by those in the lowest genetic quintile (conditional on confounders; coloured in light green in Fig. 3). The resulting IDM-DEs and Adj-TAs are shown in Fig. 4. Estimates and 95% confidence intervals are listed in Supplementary Table 4a and b. Overall, the hypothetical interventions to shift the four top strata defined by categorical PGS-BMI to mirror the distribution in the lowest PGS-BMI category have a small impact on the total association between PGS and later BMI. This is indicated by the small or no differences between the IDM-DE and the Adj-TAs (see Fig. 4). The biggest differences are found for the highest PGS quintile (dark blue colour in Fig. 3), whereby the change in physical activity to what it would have been under lowest PGS quintile, was estimated to remove 0.33 kg/m2 (95%CI 0.21, 0.44) in BMI at 11 years in MCS (Adj-Ta = 2.69, 95%CI 2.40, 2.98; IDM-DE = 2.36, 95%CI 2.08, 2.64). In ALSPAC, the results followed a similar pattern, whereby the potential intervention was associated with a difference of 0.44 kg/m2 (95%CI 0.32, 0.56) in BMI at 14 years (Adj-Ta = 3.34, 95%CI 3.09, 3.59; IDM-DE = 2.9, 95%CI 2.66, 3.14). For the other quintiles of PGS, the impact of their respective interventions was smaller in both MCS and ALSPAC. Note that the interventions applied to each PGS-BMI quintile is of different magnitude because the shift in physical activity decreases from the highest to the first category (see Supplementary Table 3).
Sensitivity analyses
Estimates of the differences between the Adj-TA and the IDM-DE obtained from the imputed data (respectively based on N = 6172 in MCS and N = 6035 in ALSPAC) became slightly larger than those obtained from the complete records only (MCS, N = 2757 and ALSPAC, N = 3347). For comparison, the estimated differences in MCS were 0.06, 0.10, 0.21 and 0.33 when using the complete records and 0.08, 0.10, 0.21 and 0.34 when using the imputed data. In ALSPAC the estimated differences from complete records were 0.02, 0.18, 0.30 and 0.44 from the complete records and 0.11, 0.28, 0.35 and 0.65 from the imputed data.
The separate estimates of the Adj-TA and the IDM-DE were also slightly larger when using the imputed data. This might reflect the selection in participation by socio-economic position which is known to affect ALSPAC in particular [49]. A full list of the estimates from the imputed data can be found in Supplementary Table 4c and d.
Examining the potential consequences of unmeasured confounding between mediator and outcome in both studies, we found that the impact of intervening on the mediator would be null if there were an additional confounder to those included in the analyses that induced a correlation between physical activity and BMI (above that induced by the measured confounders) of − 0.11 (95%CI − 0.13, − 0.07) in ALSPAC and − 0.03 (95%CI − 0.07, 0.01) in MCS. These estimates reflect the weak and nearly null results found in the two studies.
Discussion
In this paper, we aim to demonstrate how polygenic scores within the interventional disparity approach could be used in the context of causal inference analysis. This approach is proposed as a novel additional tool for genetic epidemiological researchers with an interest in public health. Building on some previous work [27], these results suggest that a hypothetical intervention increasing physical activity has the potential to buffer a small proportion of disparity in BMI associated with the BMI PGS. This small impact needs to be considered in the context of randomised control trials (RCT) of interventions for childhood obesity. A meta-analysis of 14 RCTs indicated that physical activity interventions result on average in a BMI reduction of 0.10 kg/m2, and that the most effective interventions combined physical activity and dietary components [41]. Further, it is important to emphasise that our hypothetical intervention does not estimate the effect of changing physical activity on later BMI, but the extent to which intervening on physical activity can mitigate the association between genetic liability and BMI. Our analyses do not aim to find the intervention that is associated with biggest decrease in BMI or of increasing physical activity to its most beneficial level. Instead, we aim to investigate the extent to which potential interventions that improve the distribution of physical activity of individuals with high genetic liability, would remove part of their liability. We hope to have demonstrated how the health disparity measure interventional approach can be applied to estimate the potential impact of hypothetical interventions to reduce the disparity associated with genetic risk. Results from analyses of our two datasets produced only small differences between the adjusted total effect versus the interventional direct effect, but it should be noted that the shift of distribution considered in our calculations is driven by the strength of association between the PGS and the mediator. Other settings involving different PGS, mediators, and outcomes, might lead to greater shifts and hence greater disparity reductions. One additional benefit of the health disparity approach is that it asks researchers to specify a clear potential intervention target, grounding research in real life and pushing us to think through the implications and feasibility of the hypothetical intervention.
Health inequalities and genetics
Health inequalities are commonly understood as differences in health outcomes due to determinants that are outside of the individual’s control, which could be remedied by policy intervention. For example, there have been established observations that socio-economic position at birth can result in longstanding negative health outcomes [50]. Childhood obesity rates are highest in families with lowest income and education levels [51], and policy interventions have aimed to close this gap, targeting individual behaviours (e.g. healthy foods in schools programme) as well as structural (e.g. taxation on high energy dense foods) components with limited success [52]. More recently there has been a call to broaden health inequality exposures and to consider the question of whether genetic propensity for a health outcome can be considered as a cause for health inequalities. This might be seen as intuitive, as genetic factors are associated with later health outcomes and are also outside of an individual’s control [53]. However, a debate is still ongoing if these genetic differences in the population should be included in the study of social determinants of health [54]. Targeting individuals based on their genetic liabilities has been argued to be a direct continuation of the horrific eugenic practices of the last century [55], and there is a need for an ethical and theoretical framework on how to regulate the already existing embryo screening technology which uses PGS [23].
Assumptions and considerations
Most causal inference methods lean on the three major assumptions of no interference, consistency, and no unmeasured confounding. Interference would be present if the intervention target for one participant would impact the outcome in another participant. For example, behaviours of one participant might influence another, if the two participants are in the same class in school, or maybe members of the same extended family. In our analysis, this situation might be considered as highly unlikely, as the participants of cohort studies are commonly recruited from a large region. Further, it might be recommended to only include one participant per family, excluding siblings and cousins, which is what we have done here. It should be noted that checks for genetic relatedness based on observed genomic data are common practice in the quality control when analysing genetic data. The consistency assumption implies that the distributional intervention is “non-invasive” meaning that the outcomes for the participants would not have differed had they been observed or intervened to have that mediator distribution [56, 57].
One additional limitation is the weak association between the exposure and the mediator, found in both studies. The size of these associations clearly binds the scale of change due to the hypothetical interventions and demonstrates the challenges of studying intervenable pathways from genomic liability to a later outcome. Further applications of this framework should select mediators with stronger associations with the polygenic score of interest.
Lastly, the no unmeasured confounding assumption of the mediator-outcome association cannot be formally tested. However, we included baseline covariates, and they might capture at least some of the confounding. As illustrated in Fig. 2a and b, there is potential for unmeasured intermediate confounding of the mediator to outcome associations affecting our analyses, for example via dietary factors or environmental factors such as urban environment, for which we do not have reliable information. This may have led to negative confounding and hence overestimation of the interventional effects, in the presence for example of a negative association of urban environment with physical activity and positive association with BMI. Our sensitivity analyses showed that in ALSPAC such correlation would need to be at least − 0.1. Including participants with complete data on exposure, mediator, outcome, and all covariates resulted in reduced sample sizes, as well as the potential introduction of selection bias, as participants with complete records are those who contributed data to all relevant collection waves and these individuals may differ from the rest of the original cohort members. However, imputation, under the missing at random (MAR) assumption, did not lead to substantially different results.
Additionally, researchers aiming to apply this framework, need to be aware of the pitfalls that underlie the construction and interpretation of PGS. PGS aggregate effect sizes associated with single nucleotide polymorphism, but do not (yet) include other type of genomic variations, such as rare variants, deletions, and copy number variations. Hence, PGS only capture a proportion of the variance in the outcome. This incomplete measure of genetic liability has been shown to have consequences for mediation analyses, specifically leading to an exaggeration of the indirect effect, from genetic liability to outcome, via a mediator [58]. Further limitations in this area that remain are that, even though sample sizes have grown rapidly, the majority of available summary statistics are based on participants of white European descent which cannot be readily transported to global and diverse population cohorts [59].
As highlighted above, effective public health interventions for multifactorial diseases such as obeisty are most likely to target a complex combination of social and behavioural changes, e.g. diet, physical activity and parenting behaviours. However, our aim is to demonstrate how to quantify how much of the genetic effect of the BMI PGS would remain if its effect on PA were "equalised". Here we consider a single possible area of intervention, physical activity, to examine interventional effects, but methods for multiple mediators are available [28] and future work should aim to explore these in the context of genetically informed studies.
Conclusions
We have provided an example of how the health disparity framework might be implemented in longitudinal cohorts with genetic, behavioural, and anthropometric data. This approach lends itself to many non-communicable health outcomes that have some genetic aetiology but whose interventions often require changing behavioural or environmental targets. We see this approach in addition to methods applied to gene-environment interplay but grounded by the formal restrictions of specifying a plausible intervention target, linking research directly to questions of public health relevance.
References
Hernan MA. A definition of causal effect for epidemiological research. J Epidemiol Community Health. 2004;58(4):265–71.
VanderWeele TJ, Robinson WR. On the causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology. 2014;25(4):473–84.
Naimi AI, et al. Mediation analysis for health disparities research. Am J Epidemiol. 2016;184(4):315–24.
Jackson JW, VanderWeele TJ. Intersectional decomposition analysis with differential exposure, effects, and construct. Soc Sci Med. 2019;226:254–9.
Diderichsen F, Hallqvist J, Whitehead M. Differential vulnerability and susceptibility: how to make use of recent development in our understanding of mediation and interaction to tackle health inequalities. Int J Epidemiol. 2019;48(1):268–74.
McAdams TA, et al. Twins and causal inference: leveraging nature’s experiment. Cold Spring Harb Perspect Med. 2021;11(6):a039552.
Mezuk B, Myers JM, Kendler KS. Integrating social science and behavioral genetics: testing the origin of socioeconomic disparities in depression using a genetically informed design. Am J Public Health. 2013;103(Suppl 1):S145–51.
O’Reilly LM, et al. A co-twin control study of the association between bullying victimization and self-harm and suicide attempt in adolescence. J Adolesc Health. 2021;69(2):272–9.
Hjelmborg J, et al. Lung cancer, genetic predisposition and smoking: the Nordic Twin Study of Cancer. Thorax. 2017;72(11):1021–7.
Pingault JB, et al. Using genetic data to strengthen causal inference in observational research. Nat Rev Genet. 2018;19(9):566–80.
Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23(R1):R89–98.
Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12(1):44.
Torvik FA, et al. Mechanisms linking parental educational attainment with child ADHD, depression, and academic problems: a study of extended families in the Norwegian mother, father and child cohort study. J Child Psychol Psychiatry. 2020;61(9):1009–18.
Kong A, et al. The nature of nurture: effects of parental genotypes. Science. 2018;359(6374):424–8.
Cobb M. 60 Years ago, Francis Crick changed the logic of biology. PLoS Biol. 2017;15(9):e2003243.
Crick F. Central dogma of molecular biology. Nature. 1970;227(5258):561–3.
Neumann-Held EM. Genes-causes-codes deciphering the DNA’s ontological privilege. In: Neumann-Held EM, Rehmann-Sutter C, editors. Genes in development: re-reading the molecular paradigm. Durham: Duke University Press; 2006.
Sarkar S. Decoding “coding”—information and DNA. Bioscience. 1996;46(11):857–64.
Lynch KE. The Meaning of “Cause” in Genetics. Cold Spring Harb Perspect Med. 2021;11(9):a040519.
Chakrabarti AM, et al. Target-specific precision of CRISPR-mediated genome editing. Mol Cell. 2019;73(4):699–713.
Wray NR, et al. Common disease is more complex than implied by the core gene omnigenic model. Cell. 2018;173(7):1573–80.
Turley P, et al. Problems with using polygenic scores to select embryos. N Engl J Med. 2021;385(1):78–86.
Munday S, Savulescu J. Three models for the regulation of polygenic scores in reproduction. J Med Ethics. 2021;47:91.
Wray NR, et al. From Basic science to clinical application of polygenic risk scores: a primer. JAMA Psychiat. 2021;78(1):101–9.
Janssens A. Validity of polygenic risk scores: are we measuring what we think we are? Hum Mol Genet. 2019;28(R2):R143–50.
Wald NJ, Old R. The illusion of polygenic disease risk prediction. Genet Med. 2019;21(8):1705–7.
Herle M, et al. Parental feeding and childhood genetic risk for obesity: exploring hypothetical interventions with causal inference methods. Int J Obes (Lond). 2022.
Vansteelandt S, Daniel RM. Interventional effects for mediation analysis with multiple mediators. Epidemiology. 2017;28(2):258–65.
Micali N, et al. Maternal prepregnancy weight status and adolescent eating disorder behaviors: a longitudinal study of risk pathways. Epidemiology. 2018;29(4):579–89.
Daniel RM, De Stavola BL. Mediation analysis for life course studies. Pathways to health. Dordrecht: Springer; 2019.
Di Cesare M, et al. The epidemiological burden of obesity in childhood: a worldwide epidemic requiring urgent action. BMC Med. 2019;17(1):212.
Quek YH, et al. Exploring the association between childhood and adolescent obesity and depression: a meta-analysis. Obes Rev. 2017;18(7):742–54.
Reilly JJ, Kelly J. Long-term impact of overweight and obesity in childhood and adolescence on morbidity and premature mortality in adulthood: systematic review. Int J Obes (Lond). 2011;35(7):891–8.
Friedemann C, et al. Cardiovascular disease risk in healthy children and its association with body mass index: systematic review and meta-analysis. BMJ. 2012;345:e4759.
Spahlholz J, et al. Obesity and discrimination—a systematic review and meta-analysis of observational studies. Obes Rev. 2016;17(1):43–55.
Schoeler T, et al. Multi-polygenic score approach to identifying individual vulnerabilities associated with the risk of exposure to bullying. JAMA Psychiat. 2019;76(7):730–8.
Silventoinen K, et al. The genetic and environmental influences on childhood obesity: a systematic review of twin and adoption studies. Int J Obes (Lond). 2010;34(1):29–40.
Yengo L, et al. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum Mol Genet. 2018;27(20):3641–9.
Swinburn BA, et al. The global obesity pandemic: shaped by global drivers and local environments. Lancet. 2011;378(9793):804–14.
Liu Z, et al. A systematic review and meta-analysis of the overall effects of school-based obesity prevention interventions and effect differences by intervention components. Int J Behav Nutr Phys Act. 2019;16(1):95.
Brown T, et al. Interventions for preventing obesity in children. Cochrane Database Syst Rev. 2019;7:CD001871.
Hubel C, et al. Genomics of body fat percentage may contribute to sex bias in anorexia nervosa. Am J Med Genet B Neuropsychiatr Genet. 2019;180(6):428–38.
Fraser A, et al. Cohort profile: the avon longitudinal study of parents and children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42(1):97–110.
Connelly R, Platt L. Cohort profile: UK Millennium cohort study (MCS). Int J Epidemiol. 2014;43(6):1719–25.
Ge T, et al. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun. 2019;10(1):1776.
Pain O, et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 2021;17(5):e1009021.
Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Stat Sci. 2010;25(1):51–71.
De Stavola BL, et al. Mediation analysis with intermediate confounding: structural equation modeling viewed through the causal inference lens. Am J Epidemiol. 2015;181(1):64–80.
Munafo MR, et al. Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol. 2018;47(1):226–35.
Marmot M, et al. WHO European review of social determinants of health and the health divide. Lancet. 2012;380(9846):1011–29.
Ahrens W, et al. Prevalence of overweight and obesity in European children below the age of 10. Int J Obes (Lond). 2014;38(Suppl 2):S99-107.
Olstad DL, et al. Can policy ameliorate socioeconomic inequities in obesity and obesity-related behaviours? A systematic review of the impact of universal policies on adults and children. Obes Rev. 2016;17(12):1198–217.
Harden KP, Koellinger PD. Using genetics for social science. Nat Hum Behav. 2020;4(6):567–76.
Bann D. The scope of health injustice. Eur J Public Health. 2021;31(3):458–9.
Sear R. Demography and the rise, apparent fall, and resurgence of eugenics. Popul Stud (Camb). 2021;75(sup1):201–20.
Hernan MA, VanderWeele TJ. Compound treatments and transportability of causal inference. Epidemiology. 2011;22(3):368–77.
VanderWeele TJ, Hernan MA. Causal inference under multiple versions of treatment. J Causal Inference. 2013;1(1):1–20.
Pingault JB, et al. Research review: how to interpret associations between polygenic scores, environmental risks, and phenotypes. J Child Psychol Psychiatry. 2022;63:1125–39.
Mills MC, Rahal C. A scientometric review of genome-wide association studies. Commun Biol. 2019;2:9.
Evenson KR, et al. Calibration of two objective measures of physical activity for children. J Sports Sci. 2008;26(14):1557–65.
Acknowledgements
We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC and MCS team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses. We are grateful to the Centre for Longitudinal Studies (CLS), UCL Social Research Institute, for the use of these data and to the UK Data Service for making them available. However, neither CLS nor the UK Data Service bear any responsibility for the analysis or interpretation of these data
Funding
This research was supported by a fellowship from the Medical Research Council UK (MR/T027843/1) awarded to M.H. JBP is supported by the Medical Research Foundation 2018 Emerging Leaders 1st Prize in Adolescent Mental Health (MRF‐160‐0002‐ELP‐PINGA). The UK Medical Research Council and Wellcome (Grant Ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. A.P. is partially supported by National Institute of Health Research NF-SI-0617-10120 and Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the UK NHS, NIHR, or the Department of Health and Social Care. The authors acknowledge use of the research computing facility at King’s College London, Rosalind (https://rosalind.kcl.ac.uk), which is delivered in partnership with the National Institute for Health Research (NIHR) Maudsley Biomedical Research Centres at South London & Maudsley and Guy’s & St. Thomas’ NHS Foundation Trusts, and part-funded by capital equipment grants from the Maudsley Charity (award 980) and Guy’s & St. Thomas’ Charity (TR130505). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, King’s College London, or the Department of Health and Social Care.
Author information
Authors and Affiliations
Contributions
MH, AP and BDS: designed the study; MH, OP and BDS: analysed the data, MH: wrote the first draft of the manuscript. All authors read and revised the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Herle, M., Pickles, A., Pain, O. et al. Could interventions on physical activity mitigate genomic liability for obesity? Applying the health disparity framework in genetically informed studies. Eur J Epidemiol 38, 403–412 (2023). https://doi.org/10.1007/s10654-023-00980-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-023-00980-y