The LifeCycle Project-EU Child Cohort Network: a federated analysis infrastructure and harmonized data of more than 250,000 children and parents

Early life is an important window of opportunity to improve health across the full lifecycle. An accumulating body of evidence suggests that exposure to adverse stressors during early life leads to developmental adaptations, which subsequently affect disease risk in later life. Also, geographical, socio-economic, and ethnic differences are related to health inequalities from early life onwards. To address these important public health challenges, many European pregnancy and childhood cohorts have been established over the last 30 years. The enormous wealth of data of these cohorts has led to important new biological insights and important impact for health from early life onwards. The impact of these cohorts and their data could be further increased by combining data from different cohorts. Combining data will lead to the possibility of identifying smaller effect estimates, and the opportunity to better identify risk groups and risk factors leading to disease across the lifecycle across countries. Also, it enables research on better causal understanding and modelling of life course health trajectories. The EU Child Cohort Network, established by the Horizon2020-funded LifeCycle Project, brings together nineteen pregnancy and childhood cohorts, together including more than 250,000 children and their parents. A large set of variables has been harmonised and standardized across these cohorts. The harmonized data are kept within each institution and can be accessed by external researchers through a shared federated data analysis platform using the R-based platform DataSHIELD, which takes relevant national and international data regulations into account. The EU Child Cohort Network has an open character. All protocols for data harmonization and setting up the data analysis platform are available online. The EU Child Cohort Network creates great opportunities for researchers to use data from different cohorts, during and beyond the LifeCycle Project duration. It also provides a novel model for collaborative research in large research infrastructures with individual-level data. The LifeCycle Project will translate results from research using the EU Child Cohort Network into recommendations for targeted prevention strategies to improve health trajectories for current and future generations by optimizing their earliest phases of life. Electronic supplementary material The online version of this article (10.1007/s10654-020-00662-z) contains supplementary material, which is available to authorized users.

geographical, socio-economic, and ethnic differences are related to health inequalities from early life onwards [1]. These research findings suggest that optimizing early-life conditions has the yet unfulfilled potential to improve life course health trajectories for individuals themselves and also for their offspring through transgenerational effects [2]. A better understanding of the causality, pathways and life course health trajectories explaining associations of earlylife stressors with later life disease is urgently needed to translate results from observational studies into populationhealth prevention strategies.
Many European pregnancy and childhood cohorts have been established over the last years to assess the associations of early life with health across the lifecycle [3]. These cohorts are invaluable resources to obtain insight into societal, environmental, lifestyle and nutrition related determinants that may influence the onset and evolution of risk factors and diseases in later life. Cohort studies that started during pregnancy or early childhood provide the unique opportunity to study the potential for early-life interventions on factors that cannot be easily studied in experimental settings, such as socio-economic, migration, urban environment and lifestyle related determinants. Data from cohort studies can also be used for advanced analytical approaches such as sibling analyses and Mendelian randomization to assess causality of observed associations [4].
The impact of these cohorts and their data could be strongly increased by combining data from different cohorts. Combining data will lead to larger numbers and the opportunity to better identify risk groups and risk factors leading to disease across the lifecycle [3]. Also, it enables research for a better causal understanding and modelling of life course health trajectories. The enormous wealth of highquality prospective cohort studies enables collaboration at individual participant data level. Meta-analyzing individual participant data has the advantage that it can identify smaller effect estimates, specific subgroups, and mediator effects and, maybe most importantly, capitalizes on existing published and unpublished data. Results from well-performed individual participant data meta-analyses suffer less from publication bias than meta-analyses based on published data. Multiple individual participant data meta-analyses on environmental exposures, lifestyle related and (epi)genetic associations have already been published as part of birth cohort collaborations [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22].
The LifeCycle Project is a Horizon 2020-funded (2017-2022) international project. The general objective of the LifeCycle Project is to bring together pregnancy and childhood cohort studies into a new, open and sustainable EU Child Cohort Network, to use this network for identification of novel markers of early-life stressors affecting health trajectories throughout the life course, and to translate findings into policy recommendations for targeted prevention strategies. The overall concepts, design and future perspectives are described in this paper. The logos of the LifeCycle Project are given in Fig. 1.

The EU Child Cohort Network
The EU Child Cohort Network, the main deliverable of the LifeCycle Project, brings together nineteen pregnancy and childhood cohorts. Together, they include more than 250,000 children and their parents ( Fig. 2; Table 1). Recruitment to the cohorts of the EU Child Cohort Network began prior to and during pregnancy, as well as in childhood; together, the follow-up of these cohorts span the full life course and contain detailed phenotypic information and biological samples. The research potential of the EU Child Cohort Network is summarized in Table 2. The EU Child Cohort Network should be operational mid-2020. This network is open for other partners with population-based cohorts that started in early life and will be sustainable after the duration of the Horizon 2020 funded LifeCycle Project. The EU Child Cohort Network could contribute to future collaborations between different cohorts.

Data harmonisation
The LifeCycle Project has developed a harmonized set of variables in each cohort necessary to perform multi-cohort analyses on different research questions. The harmonization work is performed by a data-harmonization group with representatives from each partner or cohort. Based on the primary research focus in the LifeCycle Project, a priority list of variables has been developed for harmonisation. The cohort studies participating in the EU Child Cohort Network will be further enriched with novel harmonized integrated data on early-life stressors related to socio-economic, migration, urban environment and lifestyle determinants, based on data availability within the cohorts and external data from registries [36]. Integrated data will also be used to construct a novel holistic 'dynamic early-life exposome' model, which will encompass many human environmental exposures during various stages of early life [37][38][39][40]. The harmonized variables relate to the main research hypotheses (Fig. 3), and include: • Main exposures: Socioeconomic, migration, urban environment, lifestyle and nutrition related factors, genome-wide association screen; • Main mediators: Epigenetics, metabolomics, allergy, brain development; • Main outcomes: Cardio-metabolic (body mass index (BMI), body composition, blood pressure, cardiac structure and function, lipids, insulin, glucose); respiratory (allergy, wheezing, infections, lung function, asthma), mental (behaviour, cognition, education, ASD, ADHD, anxiety, depression); The availability of these data in different cohorts is given in Table 1.

Federated data analysis approach
Analyses in the EU Child Cohort will be predominantly using DataSHIELD, developed as part of the EU-FP7 BioSHaRe Project [23,25]. This is a safe and robust data analysis platform to perform joint multisite individual participant data meta-analyses, without physically transferring data (Fig. 4). DataSHIELD enables connections between local servers to analyze harmonized data located at different institutes. The major advantage of this approach is that the data from the different institutes, which together form the EU Child Cohort Network, are accessible for different researchers from various sites whilst they remain at the local sites.

Fair principles
The EU Child Cohort Network data management and access are based on the following key principles: • Full compliance with best practice in data privacy and security; • Use of coded data with appropriate institutional and participant consent; • Use of privacy enhancing technologies such as filters; • Use of policies that enable greater use of data in research; • Approval of all procedures, policies and methods by the relevant local authorities.
Management of and access to all data is primarily the responsibility of each institution. The FAIR (findable, accessible, interoperable, reusable) principles are taken into account for the general data management approach.

Findable
The LifeCycle Project has revitalized the existing www. birth cohor ts.net website. This website gives an overview of pregnancy and birth cohorts and the data available in these cohorts. Specific details of variables included in the EU child cohort network and their availability in the cohorts are presented in the open access EU Child Cohort Network Variable Catalogue. The catalogue was built using the MOLGENIS software platform for scientific data extending on BBMRI-ERIC directory of biobanks [41,42]. It also documents how each cohort has harmonized these variables, including information about the source variables used by the cohorts. No actual data are given in the online catalogue. All relevant websites and their contents are presented in Table 3.  Collaboration between prospective pregnancy/child cohort studies offers the opportunities to 1 Perform analyses in over 250,000 children and their parents Harmonize methods for data collection, biobanks, management, and analyses Perform analyses on published and unpublished data which limits publication bias Perform individual participant data meta-analyses with better statistical precision Stratify groups by geographical area or sex Compare determinants and outcomes between European populations Examine consequences of small variations in determinants from early life onwards Identify variations in geography and time periods for specific associations Infer causality from observed associations by advanced analytical approaches Enable analyses on life course trajectories on risk factors of non-communicable diseases Explore different life course models

Accessible
A harmonized set of data for EU Child Cohort Network is available by a server controlled by or located at each specific institute. Harmonized data from each cohort are held on secure Opal servers (http://opald oc.obiba .org/en/lates t/) at their institution. Protocols for setting up this data infrastructure are available, together with YouTube instruction videos. Data are accessed via a central analysis server using the R-based platform DataSHIELD. Access to data is conditional on approval by the cohort. Partners and their cohorts can always decide to share research data without using DataSHIELD, conditional on relevant local ethical and legal approvals. This approach is used for analyses that are not yet possible in DataSHIELD [25]. The field of data sharing and cross study analyses is rapidly advancing. Although we start with using DataSHIELD, we recognise that over time this may change.

Interoperable
Existing data have been harmonized and integrated into exposure variables to make them interoperable. Protocols for harmonization are available online. All harmonized data from different cohorts have been renamed into standardized variable names. A full list of the available variables per cohort is available in the EU Child Cohort Network Variable Catalogue.

Reusable
The EU Child Cohort Network reuses data that are already available within cohorts. The EU Child Cohort Network, with the harmonized set of variables and infrastructure, should be sustainable beyond the duration of the LifeCycle Project. During the last two years, four other European consortia have been funded, which are planning to build upon the harmonized data and federated analysis infrastructure in the EU Child Cohort Network. These consortia include the EUCAN-Connect, NutriPROGRAM, ATHLETE and LongITools Projects. Future collaborations may include not only European, but also global initiatives such as the NIH-Environmental influences on Child Health Outcomes (ECHO) Programme in the United States, which aims to build a virtual paediatric cohort based on new and existing birth cohorts, recognizing the enormous

Data governance
The LifeCycle Project or EU Child Cohort Network do not own data, but bring data from other cohorts together via a federated data analysis platform. Ethical and legal responsibility for data management and security is maintained by the source studies or home institutions. The principal investigators or home institutions should always administer permission for external access to specific data on their server for addressing research questions. The EU Child Cohort Network cannot provide open access to researchers. The data sharing protocols and agreements will be updated regularly, according to new legal practices, such as the European General Data Protection Regulation 2016/679 (GDPR). All governance protocols will take not only the short-term, but also the long-term EU Child Cohort Network, beyond the LifeCycle Project duration, into account.

EU Child Cohort Network research proposals
Proposals for research using the EU Child Cohort Network can be put forward by both LifeCycle Project partners and other researchers. External researchers can send a request for EU Child Cohort Network data use to the participating cohorts or lifecycle@erasmusmc.nl. Each LifeCycle Project proposal is discussed in the relevant coordinating work package (https ://lifec ycle-proje ct.eu/for-scien tists /workp ackag es/) and subsequently distributed among all cohorts participating in the LifeCycle Project and EU Child Cohort Network. Cohorts can opt-in or opt-out of each analysis, depending on the data availability, research interests or involvement in other projects. In the first phase, the focus of research projects is on those projects related to the LifeCycle Project research aims (see below). An efficient governance structure was organized and agreed upon by researchers and ethical and legal representatives. EU Child Cohort Network governance structure will be updated regularly where needed and will be made sustainable after the LifeCycle Project duration. Because there is no physical transfer of data needed, we are currently exploring the possibility of working with a short Data Access Agreement that replaces commonly used Data Transfer Agreements. When the EU Child Cohort Network is fully operational we aim to have regular EU Child Cohort Network meetings or telephone conferences to discuss: • Research projects (novel proposals, progress of ongoing projects); • Harmonization (novel proposals, progress of ongoing efforts); • DataSHIELD analysis approaches (priorities for further development); • Any relevant ethical or legal issues concerning federated analysis approaches; Participants in these meetings or telephone conferences are not only LifeCycle Project Partners, but representatives of all institutes that have harmonized their data and set up the IT infrastructure needed for the federated analysis of data via DataShield.

LifeCycle Project primary research areas
The LifeCycle Project uses the integrated and harmonized set of variables from the EU Child Cohort Network for identification of early-life stressors influencing cardio-metabolic, respiratory and mental developmental adaptations and health trajectories during the full life course (Fig. 3).

Integrated early-life stressors approach and the exposome
Early-life stressors, including socio-economic, migration, urban environmental, and lifestyle related factors, have been associated with cardio-metabolic, respiratory, and mental health and disease, which together contribute greatly to the global burden of non-communicable diseases [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. An accumulating body of evidence suggests that exposure to these factors during fetal life and childhood affects later life health trajectories [38]. Thus far, studies focused on the effects of early-life environmental exposures on later life health outcomes have largely been using a 'one-exposure at one-time point' approach. Research from LifeCycle Project partners suggests that instead of exposure to single stressors that individually may have weak effects, exposure to a cluster or pattern of adverse early-life stressors in specific age windows is more likely to influence health during the lifecycle [39]. We will apply a holistic 'early-life exposome' model to encompass many human environmental exposures, which is dynamic from conception onwards and complements the genome. To develop this early-life exposome, we will specifically take into account measurements in the external environment (socio-economic, migration, urban environment, and lifestyle factors), and biological markers reflecting the internal environment (DNA methylation, RNA expression, and metabolomics), and the dynamic life course nature of the exposome. We will use available methods developed as part of the EU-FP7 HELIX Project for further development of the early-life exposome model [29].

Cardio-metabolic, respiratory and mental health outcomes
Embryonic life, fetal life and early childhood are characterized by high developmental rates and seem to be critical periods for developmental adaptations with long-term consequences. Research from LifeCycle Project partners have shown that specific maternal lifestyle factors and fetal growth variation in early pregnancy are related to non-communicable diseases and their risk factors [45][46][47][48][49]. We will use repeatedly measured exposure, mediator and outcome data from the EU Child Cohort Network to compare different potential life course models including those assuming specific critical periods and those assuming interactive and cumulative effects throughout the life course. We will relate early-life stressors measured in different early-life periods (preconception, fetal life, early childhood) with life course health trajectories. We specifically hypothesize that early-life stressors lead to developmental adaptations of: • The cardiovascular system assessed in detail by advanced cardiac and great vessel ultrasound or Magnetic Resonance Imaging (MRI), and systemic metabolism, detected by measuring hundreds of metabolites using high-throughput approaches, which precede the development of cardio-metabolic diseases [50][51][52][53][54][55][56][57][58][59][60]. • Lung volumes, airway patency assessed by lung function measurements and clinical assessments, and immuno-logical or allergy-related assessments, which precede the development of respiratory disease [61][62][63]. • Structural and functional brain development assessed by ultrasound in fetal life or early infancy, or brain MRI in later life, which precede the development of mental health outcomes [64][65][66][67].

Epigenetic pathways
An accumulating body of evidence suggests that epigenetic changes play a key role in the associations of early-life stressors with lifecycle health and disease trajectories [68]. DNA methylation, the most frequently studied epigenetic phenomenon in large populations, is a dynamic process, which may be influenced by environmental stressors such urban environment, dietary factors and smoking [68]. DNA methylation changes are more common in early life. LifeCycle Project partners have identified DNA methylation markers related to specific early-life stressors including maternal BMI, smoking, dietary factors and birth weight [12,17]. The EU Child Cohort Network brings together many pregnancy and childhood cohorts with information about epigenomewide DNA methylation. Availability of repeatedly measured DNA methylation and of RNA expression data enables studies on persistence and functionality of DNA methylation markers potentially involved in early-life programming of non-communicable diseases.

Population impact
The concept that early life is critical for health and disease throughout the life course is well-acknowledged. However, there is still not much evidence for effective prevention or intervention strategies using early life as a window of opportunity to maximize the human developmental potential during the full life course. We will use different approaches to translate findings into population health recommendations. These include causal inference, aggregation of evidence for interventions based on reviews, dynamic microsimulation, and development of prediction models. Causality cannot be directly concluded from observational studies. Advanced analytical approaches that can help to infer causality include sibling comparison studies, propensity score matching and Mendelian randomization studies, in which genetic variants are used as unconfounded proxies for adverse exposures [69]. The EU Child Cohort Network facilitates integration of different causal inference methods and comparison of their findings, which will strengthen causal inference needed for translation of findings from observational studies to public health recommendations.
We will review and summarize evidence based on findings both from observational studies in the EU Child Cohort Network and from published intervention studies to develop recommendations for population and subgroup-specific interventions focused on the earliest phases of life. Dynamic microsimulation modelling using data from cohort studies enables policy evaluations and scenario analyses focused on early-life interventions when experimental studies are not possible [70,71]. The EU Child Cohort Network provides a unique infrastructure for these analyses, because of the available data and variation in exposures and outcomes, life course trajectories of non-communicable diseases and various subpopulations with different baseline risks.
Data from observational studies can help to develop models to predict risk factors for non-communicable diseases. Previous studies suggested that pregnancy, birth and infancy characteristics have the potential to identify groups at risk for obesity [72,73]. The EU Child Cohort Network is the ideal platform to develop models to predict from early-life stressor data the onset of risk factors for cardio-metabolic, respiratory and mental disease across the lifecycle. Models can include various background characteristics, which enable baseline risk estimation from socio-economic, migration, environment and lifestyle stressors, which may be difficult to modify in the short-term but help to predict the outcomes of interest.
Finally, we will develop E-learning modules and eHealth applications that will be made widely available to make the knowledge and research findings available for educational and health care purposes.

Conclusion
The LifeCycle Project and its EU Child Cohort Network lead to great opportunities for researchers to combine harmonized data from different cohorts by a federated analysis platform. It also provides a novel model for collaborative research in large research infrastructures with individual level data. The LifeCycle Project will translate results from research using the EU Child Cohort Network into recommendations for targeted prevention strategies to improve health trajectories for current and future generations by optimizing their earliest phases of life.

3
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.