Study population and recruitment
The GNC will recruit a total of 200,000 residents in the age range of 20–69 years at baseline. Study participants will be recruited through a network of 18 study centres, covering mainly urban and industrialised areas and some rural regions throughout Germany. These centres are grouped into eight clusters working closely together (Fig. 1). Each centre will recruit a minimum of 10,000 cohort participants, drawn randomly from compulsory registries of residents in the study areas. The anticipated response rate is between 40 and 50 %. Currently there is no financial compensation for participation planned apart from reimbursement of travel costs. Participants will be invited to the local study centre where they will participate in a standardised, computer-assisted personal face-to-face interview (CAPI), complete self-administered questionnaires (partially on touch screen computers) and undergo a number of standardised physical and medical examinations. In addition, they will provide blood, urine samples and other biomaterials. All participants will receive a generalised letter of results in the weeks following their study visit. This letter will include some of the basic test results (e. a. blood test results, blood pressure data, anthropometric data, accelerometry data). For the MRI examinations, we have defined an exhaustive catalogue of chance findings that should either, or should not, be communicated to the study participants, to inform about potential health problems that would require further medical attention.
Baseline examination and data collection
Data collection comprises two levels of intensity. All cohort participants will follow a 2.5 h recruitment protocol including the CAPI and questionnaire assessments and basic physical and medical examinations (Level 1), and a random sub-sample of 40,000 participants (20 % per study centre) will participate in an extended protocol that includes more in-depth physical and medical examinations (Level 2). In addition, at five MRI centres—in Augsburg, Berlin, Essen, Mannheim and Neubrandenburg—a total of 30,000 participants (drawn from the GNC participants of the respective study centres) will additionally undergo a high-resolution (3 T) MRI protocol for the acquisition of whole body, cardiac, and brain images, generating a comprehensive morphological and functional data base (MRI programme). The participants will either pertain to the random 20 % Level 2 sample or they will be offered the Level 2 examinations in addition to Level 1. The intensified sub-cohorts (Level 2 and MRI programme) will be used for more detailed studies on risk factors and early clinical precursor stages of diabetes, heart disease, and neurodegenerative disorders, in particular.
The baseline interview and self-administered questionnaires will cover socio-economic and socio-demographic factors, medical history, use of medications and health care, lifestyle factors, and questions related to environmental and occupational factors (Table 2). In addition, physical and medical examinations will be performed as specified in Table 3. Finally, all study participants will be asked to provide samples of blood, urine, saliva and stool, as well as nasal swabs (Table 3).
Table 2 Questionnaire data to be collected within the German National Cohort
Table 3 Physical and medical examinations to be conducted within the German National Cohort at baseline recruitment, by study level (Level 2 exams in addition to Level 1 exams)
Repeat data collections and medical examinations over time
All participants of the GNC will be re-invited for a second examination (re-assessment) 4–5 years after their baseline recruitment which will include about the same examinations (except long-term ECG and oral glucose tolerance test) as at baseline at Level 1 and 2, but with a reduction in the number of aliquots collected for the bio-specimens. For the MRI programme, funding has to be sought for the repeat-examination as the current funding only covers the baseline MRI examination. At re-assessment, intra-individual, medium-term changes in risk factors and prospective changes in quantifiable preclinical morbidity characteristics and incident clinical disease can be investigated. For the re-assessment, we anticipate a re-participation rate of at least 75 % of all cohort participants. This estimate is based on experiences from a number of ongoing studies in Germany, including KORA (Augsburg), HNR (Essen), SHIP (Greifswald), CARLA (Halle) and the German National Health Interview and Examination Survey 1998 (GNHIES)/DEGS survey for adults [5–7, 9, 20, 21].
In addition to the re-assessment after 4–5 years, short-term (0.5-year) replication studies will be embedded in the cohort, using a 6,000-subject sub-sample of participants of the first (baseline) visit, and a 4,000-subject subsample of participants in the second visit, proportionally spread over all 18 study centres. These reliability studies will serve to estimate within-subject “random” variations in risk factor measurements, so as to allow corrections of attenuation effects resulting from random misclassification.
Prospective follow up for vital status and case ascertainment of incident diseases
As from their first enrolment into the GNC, all cohort participants will be re-contacted every 2–3 years and asked to fill in short questionnaires about changes in lifestyle and other characteristics (e.g., use of medications, smoking, menopausal status, selected disease symptoms) and about the occurrence of physician-diagnosed diseases or of triggering events for selected major diseases, such as visiting a cardiologist or having been hospitalised (“active” follow-up). Additional telephone contacts and interviews will be used for participants who do not respond to written invitations. Self-reported cases of novel onset major diseases will be verified systematically by contacting the participant’s pertinent physician(s) and/or hospitals at which they were treated, and coded according to the International Classification of Diseases (ICD) [22]. The list of diseases for which this form of follow-up will be the principal method of prospective case ascertainment includes coronary heart disease/myocardial infarction, heart failure, atrial fibrillation, type-2 diabetes mellitus, cerebrovascular disease (stroke), depression, cancer, and chronic infectious diseases. With regard to cancer occurrences, follow-up will also be based on systematic record linkages with existing epidemiological cancer registries (“passive” follow-up), which are mandatory now and soon will cover all regions of Germany. For other morbidities, it is planned to complete self-reports by data obtained through linkage to health insurance records plus systematic queries at large clinics and hospitals.
With regard to vital status and mortality, follow-up will be performed through queries to registries of residents and authorities (“Einwohnermeldeämter”, “statistische Landesämter”, “Gesundheitsämter”). While the latter are common practice in many epidemiological studies, they are relatively labour-intensive. A new German National Death Index (NDI), which is presently being planned (independently of the GNC), will make the follow-up for mortality more efficient in future years.
Linkage to secondary data to obtain additional information on the individual’s health and employment history
Data from statutory health insurances
About 85 % of the population in Germany is insured by the statutory health insurance system, and the GNC is setting up cooperation structures with data holding institutions to allow individual data linkage for participants. The health insurance data contain information about disease diagnoses and treatment, ambulatory and hospital care and prescribed medications. The major objective of the linkage of health insurance data is to obtain information on the use of health services and medications, for health services research and research on medications and interventions as possible determinants of disease outcomes and prognosis. In addition, however, the information may be used as an additional source of information on disease incidence.
Data on occupational history
More than 95 % of the German workforces are member of the compulsory statutory pension insurance system. These insurances hold socio-demographic data of the employees, as occupational code, and job position, provided by the employer on an at least annual basis. The legal entity that holds all these data and is allowed (within strict limits) to use it for research purposes is the Institute for Employment Research (IAB). The GNC will cooperate with IAB to augment the data of the participants with respect to occupational history.
Geocoded environmental data
Data on environmental exposures will be generated by the combination of occupational and home addresses with available geocoded exposure data, e.g., for air pollution, background radiation, noise and heat.
Time line
The projected time frame for the National Cohort covers a period of 25–30 years (Fig. 2). The recruitment will start with a pilot phase in the 2nd half of 2013. Start of full-scale recruitment is planned for the beginning of 2014. Shortly after completion of recruitment, the study centres will start with the re-assessment of the entire cohort.
Expected case numbers for major chronic diseases, and statistical power considerations
For a project that has the size and scope of the GNC, developing a notion of necessary study size is a complex exercise. Besides the anticipated statistical evaluation for a great variety of possible study questions, the total number of study participants planned was also carefully balanced against the depth of phenotyping and detail of data collected from the study participants. One basic guideline was that, with regard to a number of chronic endpoints such as major forms of cancer and cardiovascular diseases, the cohort study should be large enough to allow future, stand-alone evaluations within the GNC for basic risk factor associations. For the most frequent diseases, such as diabetes or myocardial infarction, the GNC should also allow the development of detailed, multivariable risk models including large numbers of potential risk determinants. For rarer forms of disease or pre-clinical phenotypes the GNC should be large enough to make a meaningful contribution to international cohort consortia.
Table 4 presents the expected case numbers for the most frequent forms of cancer and for selected non-cancer chronic endpoints. Standard calculations indicate that studies which include at least 300–500 cases of a specific disease, with control-to-case ratios varying between 2:1 and 4:1, will have a statistical power of 0.80 (at a significance level of 0.05) to detect an odds ratio of about 1.4–1.6 for a binary exposure with 20 % population prevalence, or an odds ratio of about 1.5–1.7 for top versus bottom quartile categories of an exposure or risk factor. The number of 300–500 incident cases of disease corresponds, for example, to the rarer forms of cancer listed in Table 4 or to incident cases of type 2 diabetes among adults <40 years of age in the “Level 2” sub-cohort with intensified phenotyping. With 1,000–2,500 cases, as obtained for more frequent diseases at both study levels after 10 years, a statistical power (at significance level 0.05) of 0.80 or higher is given to detect exposure-disease associations corresponding to an expected odds ratio of around 1.15–1.3 between the extreme quartiles. 1,000–2,500 incident cases are expected to be observed for the more frequent cancer types, and in the intensified sub-cohort (N = 40,000) similar case numbers are also expected for diabetes and common CVD endpoints. At still higher levels of incidence, with 5,000 events and more (e.g., diabetes, frequent forms of cardiovascular disease and overall mortality), the minimally detectable odds ratios are even lower, and studies will have high statistical power also for complex forms of statistical modelling, e.g. involving interaction terms with rare exposures (e.g. genetic variants).
Table 4 Expected counts of incident cancer cases and incident non-cancer cases after 5, 10, 15 or 20 years of average follow-up, for the overall cohort (N = 200,000) or for the intensified sub-cohort (N ~ 60,000)
For studies of rarer disease outcomes, as well as studies that require extremely large numbers of endpoints (e.g., for comprehensive modelling of gene–environment interactions involving genetic variants with low allele frequency) the GNC consortium will collaborate with other large-scale cohorts in Europe, such as UK-Biobank, Constances, or the LifeGene study [10–12].