An overview of the 17 cohorts that established the EU Child Cohort Network is provided in Table 1. Further details of each cohort can be found in Jaddoe et al. , the EU Child Cohort Network Variable Catalogue (http://catalogue.lifecycle-project.eu) and each cohort’s profile paper [31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49]. The network is open for other cohorts to join, provided they meet the following criteria: (1) commenced before or during pregnancy or in infancy; (2) plan to follow-up or already have followed-up the cohort throughout childhood; (3) are willing to harmonise data and make them available to researchers using the network. Cohorts can join the network by contacting the coordinating centre (firstname.lastname@example.org). Similarly, proposals for research based on EU Child Cohort Network data can be put forward by both LifeCycle partners and external researchers by also contacting the coordinating centre (email@example.com). Proposals for research may be based on all EU Child Cohort Network cohorts or a subset of cohorts with available data; they may also include requests for further data harmonisation, which can likewise be restricted to a subset of cohorts with data.
The EU Child Cohort Network’s core variables are a set of basic, predominantly “lowest common denominator” variables, derivable by the majority of participating cohorts and frequently needed as covariates or exposures in lifecourse research. The process adopted in LifeCycle to establish and harmonise these core variables for the EU Child Cohort Network can be broken down into eight steps; an overview of these steps is displayed in Fig. 1. A glossary of the key elements and concepts described in this paper is also provided in Box 4.
Step 1: establishing a preliminary list of target core variables
LifeCycle partners with expertise in a wide range of fields including lifecourse epidemiology, public health, environmental epidemiology, biology, statistics, paediatrics, obstetrics, economics, demography, epigenomics and data science, met in a dedicated workshop (June 2017) to identify a preliminary list of core early-life stressors and exposures related to cardio-metabolic, respiratory and mental health outcomes using a consensus approach. This initial list was then further modified by drawing on experiences from other previous collaborative efforts such as MOBAND  and CHICOS (www.chicosproject.eu), and through consulting the literature and experts in the field, before being circulated amongst LifeCycle partners for further comment.
Steps 2, 3 & 4: collating codebooks, evaluating the harmonisation potential of each variable and finalising a list of target core variables
All cohorts participating in LifeCycle were requested to provide the coordinating team with cohort metadata (codebooks, questionnaires, instrument documentation, etc.). From these, the potential for each cohort to derive each target variable was established. The core variable list was then adapted in an iterative manner to achieve a balance between precision and inclusivity, ensuring a maximum number of cohorts could contribute data for numerous variables while maintaining data validity. Where possible, international standards and classification schemes were applied. For example, the International Standard Classification of Occupation 1988 1-digit codes  were used to categorise parental occupation; the International Standard Classification of Education 97/2011 schemes [52, 53] were used to classify parental education; the WHO fetal growth charts  were used to establish size-for-gestational-age; the EUROCAT guide was used for classifying congenital anomalies. For some key exposures such as maternal smoking, breastfeeding, childcare attendance and gestational age, several variables were included, with some variables capturing more information but at the cost of fewer cohorts being able to derive the variables. Repeated measures were also included, to capture the dynamic, longitudinal nature of many variables.
Step 5: pilot harmonisation
Data harmonisation was staggered across cohorts. First, an initial pilot harmonisation was conducted among four cohorts covering the majority of target core variables (the Danish National Birth Cohort, the EDEN mother-child cohort, the Generation R study and the Southampton Women’s Survey). This enabled any potential issues in the core variable list to be identified and rectified. During the pilot harmonisation, the core variable list was revised iteratively through electronic communication, a workshop and a final teleconference.
Step 6: data harmonisation and local quality control
Harmonisation for the EU Child Cohort Network was carried out locally by each participating cohort. This avoided any transfer of data but carried the risk of harmonisation protocols being interpreted differently by different cohorts. To limit this possibility, a detailed harmonisation manual was drawn up by the coordinating team, and supervision and feedback was maintained between the coordinating centre and each of the cohorts. The harmonisation manual is available to download from the LifeCycle website (https://lifecycle-project.eu); it includes: (1) a final, annotated list of core variables, which, for each variable, includes: a variable name, a precise definition, a label, units, data type, permissible values and guidelines for what constitutes partial versus complete harmonisation (see Box 4 for definitions of partial vs. complete harmonisation); (2) relevant scale conversions; (3) relevant reference tables (e.g. WHO fetal growth charts, the EUROCAT guide for classifying congenital anomalies etc.). The harmonisation manual was circulated to cohorts in May 2018 and harmonisation of core variables by all cohorts was completed by May 2020. The duration of time that it took a cohort to harmonise all core variables ranged from three to eight months.
Once data were harmonised, each cohort was provided with detailed quality control instructions and scripts to check: (1) that variables matched the descriptions provided in the core variable list (name, datatype, values); (2) for outliers or improbable values; (3) for inconsistencies between non-repeated measures (e.g. all mothers coded as not smoking during pregnancy were also coded as smoking zero cigarettes during pregnancy); (4) for inconsistencies between repeated measures (e.g. children reducing height over time). Any inconsistencies identified were investigated on a cases-by-case basis to establish which values were legitimate and which were errors, also in light of the other data available.
Step 7a: uploading harmonisation descriptions to the EU Child Cohort Network variable catalogue
To facilitate the utilisation of EU Child Cohort Network data for research, and ensure the complete and accurate documentation of harmonisation, an online catalogue of EU Child Cohort Network variables was developed using the Molgenis platform  (http://catalogue.lifecycle-project.eu). This open source, searchable catalogue includes detailed descriptions of each variable included in the EU Child Cohort Network (variable name, data type, values, unit and description), as well as details of which cohorts have harmonised each variable, whether that harmonisation was complete or partial, an explanation of how the variable was harmonised, plus the syntax and descriptions of the source variables used by each cohort to derive the variable (Fig. 2). For the core variables, documentation of harmonisation was conducted by each cohort and uploaded to the catalogue after harmonisation was complete.
The catalogue has been built using a logical tree structure, but variables can also be located using a search function (Fig. 3). There are plans to also incorporate descriptive summary statistics for each harmonised variable. Thus, the EU Child Cohort Network Variable Catalogue provides a comprehensive overview of the EU Child Cohort Network’s data, ensuring they are both findable and reusable, as well as contributing to the longer-term sustainability of the network.
Step 7b: uploading data to a data management platform for the federated analysis of data
To help ensure the sustainability and accessibility of the EU Child Cohort Network, an IT infrastructure has been implemented enabling the federated analysis of data. Full details of this infrastructure are given elsewhere [29, 56, 57]. Briefly, this infrastructure consists of secure Opal servers  located either at each host institution or on outsourced IT infrastructures. Once harmonisation is complete, each cohort uploads their harmonised data to their Opal server, where they remain stored, behind secure firewalls. Individual-level data are accessed via an RStudio Open Source central analysis server (https://rstudio.com/products/rstudio/#rstudio-server) using the R-based platform DataSHIELD , which sends blocks of code to each Opal server and then combines the summary statistics that are sent back by each Opal server. There is no transfer of individual participant data to the researcher and a number of disclosure control filters ensure analyses are non-disclosive, thus the many ethical, legal and societal implications of transferring data from one site to another are avoided.
Step 8: central quality-control
Quality of harmonised data was assessed at the central level by creating summary statistics for each core variable in R/DataSHIELD. This was to identify outliers and improbable values and inconsistencies in data as outlined above, but also to identify large inconsistencies between cohorts. Where large inconsistencies were found, sampling and recruitment methods and differences in the instruments used to collect data were investigated, as well as the harmonisation process itself, in order to establish to what extent these differences were real versus an artefact of differing methodology.