Background

In countries with a high-sociodemographic index, the major contributors to burden of disease during childhood and adolescence are non-communicable diseases such as obesity, asthma or allergies, and psychiatric disorders. These have a large cumulative impact on individuals, families and society [1]. Moreover, many early-life traits track throughout childhood and adolescence into adulthood. Childhood obesity, for example, is associated with adult obesity and cardiovascular disease [2]. Several childhood psychiatric disorders persist into adolescence and adulthood or precede severe mental illness such as schizophrenia, which usually starts at late adolescence or early adulthood [3, 4]. Low birth weight, as a proxy for a suboptimal intrauterine environment, has been shown to be robustly associated with many later-life non-communicable traits, including cardiovascular, respiratory and psychiatric disorders (see e.g., 5,6,7). This prompted researchers, including those within the Developmental Origins of Health and Disease (DOHaD) field, to investigate the basis for the early origins of later life differences in health and disease.

Insight into the etiology of childhood and adolescent traits and disorders may provide new perspectives, not only on how to improve wellbeing during childhood, but also how to prevent later adverse outcomes. Individual differences in developmental phenotypes, such as body weight and composition, behavioral problems, language skills, and their stability across ages are partly influenced by genetic factors [8,9,10,11,12,13]. Identifying the specific genetic variants that influence these traits, and the biological pathways through which they operate, can therefore help to unravel etiological mechanisms. Genetic studies can also define whether the relationships between childhood and adult traits, for example, birth weight and cardiovascular disease, are causally mediated by early life exposures. In addition, genetics can support how specific environmental factors contribute to variation in these traits, i.e., whether there is gene-environment interaction with the increase in risk depending on an individual’s genetic risk.

It is increasingly recognized that large sample sizes are essential in genetic research [14] and studies performed in large international consortia have become the norm. Two such consortia with a particular focus on the genetics of early life phenotypes are the Early Growth Genetics (EGG) consortium (http://egg-consortium.org/) and the EArly Genetics and Lifecourse Epidemiology (EAGLE) consortium (http://www.wikigenes.org/e/art/e/348.html) (Fig. 1). This paper describes these two consortia as they have shared objectives and the participating cohorts partly overlap. We also highlight the results so far and outline the directions of future research.

Fig. 1
figure 1

Logo’s

Description and aims of the EGG and EAGLE consortia

Both consortia arose in 2009 out of the EU-funded European Network for Genetic And Genomic Epidemiology (ENGAGE). The EGG consortium focuses on the genetic basis of growth-related phenotypes spanning from fetal life into adolescence, including birth weight, childhood obesity and pubertal development. EAGLE was established to investigate the genetic basis of the wide range of further phenotypes collected by these cohorts from fetal life into adolescence, such as those relevant to asthma and eczema, childhood psychopathology, cognition, and neurodevelopment. The collective objectives of EGG and EAGLE are:

  1. 1.

    to characterize the genetic background of traits and diseases in fetal life, childhood and adolescence by facilitating collaboration between pregnancy, birth, childhood and adolescent cohort studies, as well as adult biobanks (such as UK Biobank) with relevant information;

  2. 2.

    to define the causal relationships between early life exposures and related early life phenotypes and major sources of morbidity and mortality in later life;

  3. 3.

    to develop and improve statistical methods for analyzing complex, high-dimensional and longitudinal phenotypic data;

  4. 4.

    to provide training opportunities for junior researchers to develop in the field of genetic epidemiology.

The EGG and EAGLE consortia started as collaborations of population-based pregnancy and birth cohort studies, each of which has collected longitudinal data across a wide range of developmental phenotypes. As the collaboration developed, cohorts that started data collection during childhood and adolescence were also included. Almost all participating studies have genome-wide genotype data available. In addition, early life data collected through self-report and/or record linkage in adult biobanks, such as UK Biobank or the population based cohorts listed in Table 1 that have an adult counterpart, have been brought into the genome-wide association (GWA) meta-analyses for phenotypes such as birth weight. Both consortia welcome new collaborations, and they are keen to add data from longitudinal cohorts that are currently in the process of obtaining genotype data.

Table 1 Participating cohorts

Tables 1 and 2 provides a summary of the participating studies and their design, as of April 2018. Table 3 gives further details on the extensive data available, indicating, per cohort, whether data collection has taken place at least once at preschool, school, adolescent and adult age. However, many cohorts have had multiple follow-up rounds within any given period or follow-up data collection is ongoing, through research clinic assessments, questionnaires or record linkage. The majority of the cohorts have around equal numbers of males and females included.

Table 2 Study designs
Table 3 Data collected

Most cohorts were established with the aim of investigating risk and protective factors for a broad range of developmental phenotypes. They have collected data on physical traits, cognition, emotional and behavioral problems, as well as on lifestyle and environmental factors, such as smoking during pregnancy and physical exercise. Other cohorts were set up with a specific focus, such as asthma research, but many of these have collected ancillary information on a wider range of phenotypes. Table 2 gives an indication as to whether data collection was focused on a specific phenotype. Additional details on many of these studies will be available from cohort websites and publications (see Table 1).

Participating cohorts have obtained DNA from blood samples, saliva or buccal swabs. A variety of different genotyping arrays have been used over the years, but meta-analysis has been facilitated by imputation of directly genotyped data using reference panels such as those generated by 1000 Genomes or the Haplotype Reference Consortium [15, 16]. Moreover, an increasing number of cohorts have, or plan to get, additional ‘omics data including parental genotypes, DNA methylation profiles, RNA expression levels, metabolomics and/or microbiome data.

Results of the genetic studies performed in the EGG and EAGLE consortia

The implementation of GWA meta-analyses for each of the phenotypes of interest to EGG or EAGLE has usually been championed and organized at the level of a working group, formed by a subset of motivated investigators and analysts, who have assumed responsibility for assembling, combining and interpreting the genetic data. The wide range of phenotypes available to study across these consortia has provided fertile ground for many such working groups and has resulted in a large number of peer-reviewed papers across this wide range of phenotypes [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45]. These are typically GWA meta-analyses, focusing on the effects of individual genetic variants, but increasingly now extend to multivariate, polygenic analyses, that evaluate the joint effects of multiple associated genetic variants and apply this information to address questions of causality.

Amongst the many GWA analyses led by EGG and EAGLE, the traits for which the largest numbers of genetic loci reached genome-wide statistical significance (p < 10−8) have been birth weight (65 loci), atopic dermatitis (31), childhood BMI (15), allergic sensitization (10), and pubertal growth (10) [17, 19, 23, 26, 28, 36]. For other phenotypes with a large number of genome wide hits, such as age at menarche (108 loci) or ADHD (16 loci), the association analysis has involved collaborations with other consortia [25, 37]. The summary statistics for many of the genome-wide association studies undertaken by EGG and EAGLE investigators can be found on consortium websites (http://egg-consortium.org/; http://www.wikigenes.org/e/art/e/348.html) or are available from corresponding authors.

As with adult phenotype GWA studies, the number of association signals recovered by these studies is influenced heavily by sample size (N = 182,416 for age at menarche, N = 153,781 for birth weight) and, to a lesser extent, by phenotype characteristics (somatic or behavioral traits, continuous or binary outcomes).

In addition to cross-sectional GWA analyses, there have been many examples of projects that have investigated genetic relationships within childhood traits or between childhood traits and related adult phenotypes, often revealing shared genetic factors. For example, genetic overlap was found among related atopic conditions during childhood, and between atopic conditions and auto-immune disorders [19, 36]; among puberty-related phenotypes, and between puberty-related phenotypes and BMI [23, 24, 37]; between childhood and adult blood pressure [41]; between preschool internalizing symptoms and adult psychiatric disorders [18]; and between childhood and adult anthropometric traits [21, 26, 40, 44]. The development of statistical methods that support the calculation of genetic correlations from summary GWAS results [46] and the easy availability of such data from a growing number of GWA meta-analyses for adult traits have enabled these analyses to be undertaken with adequate statistical power.

Figure 2 shows genetic correlations, calculated exclusively from GWAS data, between birth weight and a range of continuous and disease phenotypes [28], generated using the linkage disequilibrium score regression approach [46] as implemented in the LDHub web utility [47]. For many cardiometabolic and anthropometric traits measured in late adult life, there is evidence of substantial sharing of genetic variation with birth weight. In line with the wider epidemiological data, the genetic correlations between birth weight and adult cardiometabolic traits (including type 2 diabetes, blood pressure, and coronary artery disease) tend to be negative. These data indicate that a substantial proportion of the observed covariance between birth weight and cardiometabolic disease predisposition is likely to be driven by genetic rather than environmental factors. However, the potential for more complex causal relationships (such as those that connect fetal genotype to adult disease via the correlation with maternal genotype and altered maternal environment) also needs to be considered. Full characterization of these complex relationships requires the application of statistical methods that enable partitioning of genetic effects into maternal and fetal components both at the level of individual SNPs [48] and genome-wide [49]. Using the M-GCTA method [49], for example, it has been reported that maternal genotypes contribute more to gestational weight gain in the mother, while offspring genotypes contribute more to birth weight [45].

Fig. 2
figure 2

Genome-wide genetic correlation between birth weight and a range of traits and diseases in later life. Genome-wide genetic correlations between birth weight and traits and diseases evaluated in later life. The figure (adapted from Horikoshi et al. 2016 [28] with permission of the authors) displays the genetic correlations between birth weight and a range of traits and diseases in later life as estimated using LD Score regression. Traits selected were those for which genome-wide association summary statistics were available in suitably large sample sizes, and the analyses were typically performed on the largest meta-analyses available as of early 2016. The genetic correlation estimates (rg) are colour coded according to phenotypic area. Allelic direction of effect is aligned to increased birth weight. Size of the circle denotes the significance level for the correlation (per the key). Correlations with a lower significance level are not depicted. Further detail on the methods and studies involved is available in Horikoshi et al. 2016 [28]. Diameter of circles is proportional to genetic correlation p value

Another critical advantage of genetic studies is the potential to characterize causal relationships using Mendelian randomization approaches [50]. Tyrrell et al. [42] found evidence of a positive causal effect of maternal BMI and fasting glucose levels on offspring birth weight but inverse effect of maternal systolic blood pressure on offspring birth weight. Despite bringing together the largest number of studies at the time with relevant data, there was insufficient power to dissect how the opposing effects of maternal glucose and systolic blood pressure are reflected in the maternal BMI effect (one reason why we are keen to extend the collaboration to any new cohorts). Crucially, however, appropriate application and interpretation of studies that seek to elucidate the mechanisms underlying associations between maternal and offspring phenotypes require investigators to consider diverse complicating factors including the correlation between maternal and fetal genetic instruments, and to account for these sources of potential bias in the Mendelian randomization analyses wherever possible [51].

The longitudinal data collected in EGG and EAGLE cohorts provide the means to investigate whether the influence of genetic variants changes over time. This has only recently been explored given the need for large numbers of studies with repeated measures. We have found that genetic variation in FTO, one of the first BMI increasing genetic variants to be identified in GWAS and one of the variants most strongly associated with mean BMI (in adults) is inversely associated with BMI in infancy only becoming positive in later childhood and adult [38], indicating the value of research that explores gene-by-age interactions. On a genome-wide scale, using meta-regression methods, polygenic risk scores generated from adult schizophrenia data yielded associations with variation in childhood and adolescent psychiatric symptom scores, which strengthened in magnitude with increasing age [52].

Strengths and weaknesses

The aggregation of data in consortia such as EGG and EAGLE provides vastly improved sample sizes and a powerful way to overcome the major weakness of many of the early GWAS, which were, in hindsight, underpowered to detect the generally small genome-wide significant associations. This has brought multiple robust association signals across many traits, and provided a valuable basis for dissecting the, often complex, causal relationships between epidemiologically-correlated traits.

A clear strength of the EGG and EAGLE consortia is the wealth of data available. This encompasses not only repeated measures for physical and behavioral traits, but also copious information on lifestyle and environmental circumstances. Moreover, some of the cohorts have collected data for several decades, and now provide repeated measures well into adulthood. This enables developmental research as well as analyses of the interplay between genes and environment.

To date, one of the limitations has been that the majority of participating cohorts have data based on European-ancestry populations (see Table 2 for exceptions). There is a clear need for equivalent data to be generated in samples from other ethnic groups, so that the genetic contribution to reproducible ethnic differences in the distribution of early life phenotypes can be explored and the implications for adult disease risk quantified.

Since the cohorts are population-based and lack a particular disease-focus, the consortia are not so well-suited to investigate conditions with a low prevalence. They are better-placed to analyze common traits, particularly those that can be measured on continuous scales and analyzed as quantitative measures, such as blood pressure instead of hypertension and ADHD symptom score instead of ADHD diagnosis [32, 34]. Power analyses demonstrate that identification of a genetic variant is, in most circumstances, more powerful for continuous traits than for dichotomous variables based on clinical cut-offs [53].

Future

Considerable progress is to be expected from ongoing increases in sample sizes, especially for traits such as childhood aggression, ADHD-related traits and internalizing symptoms, where the number of identified genetic variants has been limited so far. Access to new data sets can motivate efforts to tackle phenotypes that have not hitherto been subject to detailed genetic analysis.

The results emerging from many of these studies provide a timely reminder that analysis of early life phenotypes often requires researchers to consider the joint impacts of multiple genomes (e.g., those of the fetus and the mother) together with the web of environmental influences as potential contributors to individual variation. They also highlight the need to take into account the changes happening throughout development. This is now possible because of large, rich and complex datasets that support use of novel statistical methods for the analysis of causality or gene-by-age interaction [48, 49, 51, 54]. There have already been several examples of papers performing such analyses and this will only increase with the number of identified genetic variants. In addition, existing gender differences in the associations between early life and adult factors (such as cardiometabolic risk) suggest a need for more thorough analysis of the effects of gender on these early acting mechanisms. The focus to date on the role of maternal and offspring GWAS information indicates a failure to properly consider the contribution of genetic variation in the father that will be remedied as more data from complete trios and pedigrees becomes available.

We are also planning to expand these consortia to accommodate access to the increasing amount of ‘omics data now becoming more available. Combining the results from EGG and EAGLE GWA analyses with those from DNA methylation analyses performed by the Pregnancy And Childhood Epigenetics (PACE) consortium [55] and with the pregnancy/child cohorts in the COnsortium of METabolomic Studies (COMETS; https://epi.grants.cancer.gov/comtets/) will shed further light on the biological mechanisms underlying associations of early-life risk factors and childhood, adolescent and adult health outcomes.

The focus on translating this knowledge to clinical and public health settings represents a major motivation. Insight into genetic factors underlying stability in traits such as obesity and psychiatric disorders may aid in providing targeted interventions to the groups at highest need. A more complete understanding of the contributions of genetic and non-genetic factors in the relationships between early life and later life traits may focus attention on the most effective strategies for behavioural or environmental modification.