Rationale and Design of the Hamburg City Health Study

The Hamburg City Health Study (HCHS) is a large, prospective, long-term, population-based cohort study and a unique research platform and network to obtain substantial knowledge about several important risk and prognostic factors in major chronic diseases. A random sample of 45,000 participants between 45 and 74 years of age from the general population of Hamburg, Germany, are taking part in an extensive baseline assessment at one dedicated study center. Participants undergo 13 validated and 5 novel examinations primarily targeting major organ system function and structures including extensive imaging examinations. The protocol includes validate self-reports via questionnaires regarding lifestyle and environmental conditions, dietary habits, physical condition and activity, sexual dysfunction, professional life, psychosocial context and burden, quality of life, digital media use, occupational, medical and family history as well as healthcare utilization. The assessment is completed by genomic and proteomic characterization. Beyond the identification of classical risk factors for major chronic diseases and survivorship, the core intention is to gather valid prevalence and incidence, and to develop complex models predicting health outcomes based on a multitude of examination data, imaging, biomarker, psychosocial and behavioral assessments. Participants at risk for coronary artery disease, atrial fibrillation, heart failure, stroke and dementia are invited for a visit to conduct an additional MRI examination of either heart or brain. Endpoint assessment of the overall sample will be completed through repeated follow-up examinations and surveys as well as related individual routine data from involved health and pension insurances. The study is targeting the complex relationship between biologic and psychosocial risk and resilience factors, chronic disease, health care use, survivorship and health as well as favorable and bad prognosis within a unique, large-scale long-term assessment with the perspective of further examinations after 6 years in a representative European metropolitan population. Electronic supplementary material The online version of this article (10.1007/s10654-019-00577-4) contains supplementary material, which is available to authorized users.


Introduction
Within the last decades, a change in the disease pattern has been observed. Today, the majority of the ageing populations in industrialized parts of the world will survive acute events and move into survivorship living with one or more of the following conditions, e.g., cardiovascular and neurovascular disease, cancer, respiratory diseases, diabetes [1]. This change is mostly explained by better diagnostics leading to identification of disease at an earlier stage and to the development of more effective treatments of these diseases.
In addition to the change in disease patterns a change in diagnostic abilities and personalized treatment opportunities has occurred. The integration of imaging, molecular biology and clinical information holds promise to better detect at risk individuals and to personalize treatment.
Besides this new perspective of living for years at risk or in treatment for a chronic health condition, this may also influence physical, mental and sexual health, quality of life, general disability and society's health expenses. In Germany, like in other affluent industrialized countries, the above mentioned chronic disease epidemic accounts for more than a third of the annual health expenditures, almost half of the total hospital payments [2]. It was estimated in 2013, that 500,000 potentially productive life years were lost due to premature death by these conditions in the working-age population (29-59 years of age [3]).
Observational studies have changed practice of medicine and lifestyle for millions of people [4], e.g., the Framingham Heart Study, which for the first time identified risk factors for cardiovascular events [4]. The conditions for those who actually survived these diseases were only a minor part of the agenda. Thus, we established a prospective observational cohort study that can address the impact of information from biological samples, medical examinations and imaging, classic self-reported questionnaire data and their interplay on our understanding of common disease development. Issues of disease development, pathophysiological understanding, artificial intelligence and survivorship constitute the cornerstones of the Hamburg City Health Study.
Moreover, we establish a biobank including a wide variety of biomaterials enabling molecular analyses, that-in total, will lead to a better understanding of health, disease and survivorship.

Objectives
The Hamburg City Health Study has established a unique research platform with multiple risk assessment, numerous outcomes and imaging examinations in all participants, a sophisticated biobank and interdisciplinary network to address a wide range of questions about more than 30 major chronic diseases (see Box 1 and Fig. 1) and survivorship.
Therefore, the primary aims of the HCHS are to investigate in detail: • the causes for the development of functional health impairments and major chronic diseases, • the prognostic factors for surviving chronic diseases and • identification of factors supporting life in survivorship of major chronic diseases.

Study design and methods
Hamburg is the second largest city in Germany with 1,830,584 million inhabitants (31.12.2017 [5] from all social classes living in 7 districts and 104 urban quarters. The city is mostly urban, but has also some rural areas and a large harbor contributing to the environmental exposures of the population. 839,389 persons are older than 45 years and in 2017 approximately 90,000 persons moved away.
On an annual basis a sample census is carried out mapping social conditions of the German population including a random sample of 1% of the whole population. As shown in Box 2 the Hamburg population in comparison to the German population is characterized by 4% more other nationalities, people are 7% more often single, 14% have a higher education and a higher income. A total of 45,000 inhabitants, aged 45-74 years are to be included, identified by a random sample from the official inhabitant data file divided into six age and gender strata. The HCHS is a joint interdisciplinary endeavor of physicians and scientists from the University Medical Center Hamburg-Eppendorf. Over 30 departments and institutes from the University Medical Center Hamburg-Eppendorf work together in a unique cooperation at a single study center.

Pilot study, timeline and examinations
From May 08, 2015 until January, 31 2016 1800 volunteers in the age group 18-85 were recruited by a commercial campaign in the leading newspaper from Hamburg and took part to validate the invitation process and train the study nurses in the examination procedures. Moreover, the manageability of the questionnaires was tested. This pilot study led to minor changes in these aspects. The first participant was enrolled on February 08, 2016 in the main  The participants are contacted by a letter to their home address containing the invitation and an information leaflet providing basic study information. Participants organize their own appointment at the epidemiological study center at the University Medical Center Hamburg-Eppendorf. The appointment is initiated by a study nurse explaining the study rationale and participants are asked to sign informed consent including study participation, an extraction of a skin punch to create induced pluripotent stem cells and either none, one or all of the following options: external, virtual or internal autopsy in the event of death. In the end, participants also sign a consent accepting that both double de-identified and pseudo-anonymized data may be transferred to cooperation partners. Participants are also asked for consent to match their health insurance and pension insurance data with the HCHS dataset. During a 7-h examination participants undergo validated examinations of different organ systems such as anthropometric measures, resting blood pressure measurements, ECG tracings as well as validated physical examinations. Novel parts include detailed cardiovascular, cognitive and oral health phenotyping, skin screening, pulmonary function test, muscle tests and optical coherence tomography (see Box 3 for an overview of all examinations). At the end of the visit, a letter is handed out containing results of all examinations and standardized recommendations to be followed by the participants. Following the discussion with the local ethical committee only clinically relevant results are provided to the participants. Before, during and after the baseline visit validated self-report questionnaires asking for life style and environmental conditions, dietary habits, quality of life, physical and sexual dysfunction, professional life, psychosocial context and burden, digital  media use, medical and family history, occupational history as well as health care utilization are filled out (see Box 4 for an overview of all questionnaires). Validated risk scores are used to identify individuals at risk for coronary artery disease, atrial fibrillation, heart failure, stroke and dementia. After the baseline investigation, these score-positive participants are invited to the imaging examination including an MRI-examination of the heart and the thoracic aorta and/or brain depending on the target disease [6]. In order to establish a general control group, 1500 random participants are invited to an MRI as well. Participants at risk for osteoporosis or suspected bicuspid aortic valve disease, prostate cancer, HPV-infection or dementia are recommended a further medical clarification.

Laboratory parameters and biobanking
A panel of basic laboratory analyses are performed on the day of the visit in the study center. The assessed markers include: sodium, potassium, HbA1c, prostate specific antigen (PSA), creatinine, high sensitivity measured CRP, glucose, thyroid stimulating hormone (TSH), triglycerides, total cholesterol, HDL-cholesterol, LDL/ HDL ratio, and N-terminal pro B-type natriuretic peptide (NTproBNP). Furthermore, a complete blood count is performed. Biomaterials used for biobanking include serum, plasma (EDTA, citrate), genomic deoxyribonucleic acid (DNA), ribonucleic acid (RNA) from whole blood and peripheral blood mononuclear cells (PBMCs), blood cells Performance of a broad/comprehensive screening by bodyplethysmography. Ophthalmological examination Assessment of the objective refraction and subjective visual acuity, imaging of the macular and papillary retinal layers including intravasal flow visualization using swept source optical coherence tomography. Oral examination Scaling of periodontitis severity according to the CDC-AAP criteria and whole mouth examination with assessment of caries (DMFT index) and tooth status. Ultrasound of the abdominal aorta Evaluation of the infrarenal abdominal aorta by using the b-mode and continuous-wave-Dopplermode ultrasound to measure the diameter at the maximum, the outer-to-outer-method is used. Maximum flow-velocity is measured in the infrarenal aorta. Ultrasounds of the peripheral arteries B-mode and continuous-wave-Doppler-mode is used to evaluate flow-velocity and plaque burden of the common femoral, superficial femoral and popliteal arteries. Established clinical practice 2D and 3D transthoracic echocardiography Volumes and function of all four heart chambers by 2D and 3D echocardiography; left ventricular diastolic function; left ventricular mass. Ankle-brachial index (ABI) [7,8] Manually measurement (Doppler ultrasound) of systolic blood pressures at the posterior tibial, the anterior tibial, and the brachial arteries. Assessment of tooth status (DMFT index), pocket probing depths (periodontitis), bleeding on probing (gingivitis), dental plaque, condition of hard and soft tissues, assessment of craniomandibular dysfunctions, saliva and sulcus fluid sampling, oral hygiene behavior and oral health literacy. Physical activity [9][10][11] Objective measure of physical activity with Actigraph. Standardized neurological examination Assessment of the National Institutes of Health Stroke Scale. Ultrasound of the carotid artery B-mode and continuous-wave-Doppler-mode is used to evaluate flow-velocity and plaque burden of the carotid artery. Ultrasound of the peripheral venous system The b-mode is used for venous compression ultrasound at the femoral and popliteal vene.  [20,21] (erythrocytes, PBMCs), urine, saliva as well as tooth fluid and tonsils swabs. Additionally, from a random subset of study participants, skin stanza are collected of which fibroblasts are separated. These fibroblasts will be used for the generation of human induced pluripotent stem cells (hiPSCs) (for an overview of all biomaterials collected and planned measurements see Box 5). Subsequently, the biomaterial will be examined by state-of-the-art and innovative, high throughput approaches including analyses on the different OMICS levels such as genomics, transcriptomics, proteomics and lipidomics profiling. One part of the biobank is used for research projects within the first 6 years, the second part is stored for projects performed during the studies follow up and in the future.

Follow-up
Following the date of baseline examination, all participants will be contacted by mail containing a questionnaire which specifically ask participants to report any major medical event, medication, nutrition and lifestyle changes, physical and mental health, sexual dysfunction and overall quality of life as well as health care use. Participants are also asked to provide discharge letters or any kind of further information Generation of iPS cells on their health such as diagnostic findings or images. This contact takes place on an annual basis for 5 years. An endpoint-committee will review all collected information for special endpoints. After 6 years all participants are invited to undergo the same examination and procedures as in the baseline visit. On a continuous basis the study center is in contact with public authorities and the cancer register about vital status, cause of death and cancer incidence and in contact with involved health and pension insurances to match with related individual routine data.

Statistical analysis plan
The integrity of the collected data in the databases are controlled by detailed, predefined quality control algorithms according to standardized operation procedures (SOP) concerning detection of outliers, logically implausibility, or detect mistaken identity. Only quality controlled data will be used for statistical analyses.
In the analyses of baseline data methods for cross-sectional analyses will be applied. Univariate statistics for categorical variables will be presented as counts and proportions, and numeric variables will be presented as means, percentiles and standard deviations. Associations between baseline characteristics will be estimated by means generalized linear regression models.
For the full cohort as well as for sub-cohorts, time-toevent methods constitute the major approach for identifying and assessing risk factors for mortality and incidence or progression of diseases. Thus, when studying rates of a single event type, e.g., overall mortality, regression models for censored data will be used to simultaneously investigate the effects of risk factors of interest while adjusting for potential confounders.
In all analyses, effect estimates will be presented both, in relative terms as prevalence or risk ratios, and in absolute terms as means, rates or risks, the latter to assess public health implications. All effect estimates and statistical summaries will be presented with 95% confidence intervals, and, when appropriate, adjusted for multiple testing. For diseasespecific mortality or endpoints other than death, competing risks will be accounted for. For repeated measurements in the second examination, regression models for longitudinal data will be employed. In all analyses observations may be correlated either in time (repeated measures) or between subjects (e.g. relatives). Appropriate statistical methods will be used to account for correlated observations, e.g. by means of mixed effect models, or in marginal models by means of generalized estimating equations (GEE). Special attention will be paid to potentially informative drop-out or failure to obtain further measurements due to the occurrence of a specific event. These issues will be addressed by joint models for longitudinal and time-to-event data. In cases of a substantial amount of missingness in dependent or independent variables, multiple imputation will be incorporated in the analysis, as a supplement to the complete case analysis. To ensure transparency and reproducibility of results, all statistical methods and codes will be described and stored centrally as a supplement to study protocols. This material will be available to other researchers upon request.

Ethical concerns
During the study conception, the local ethics committee of the Landesärztekammer Hamburg (State of Hamburg Chamber of Medical Practitioners, PV5131) was consulted and its approval for the study protocol as well as the approval by the Data Protection Commissioner of the University Medical Center of the University Hamburg-Eppendorf and the Data Protection Commissioner of the Free and Hanseatic City of Hamburg were obtained. The study has been registered at ClinicalTrial.gov (NCT03934957). The procedures set out in this study, pertaining to the conduct, evaluation, and documentation, are designed to ensure that all persons involved in the study abide by Good Clinical Practice (GCP), Good Epidemiological Practice (GEP) and the ethical principles described in the current revision of the Declaration of Helsinki. The study will be carried out in keeping with local legal and regulatory requirements. The requirements of the GCP and GEP regulation will be adhered to. In order to be admitted to HCHS, all participants are to consent to participate only after the nature and scope of the study have been explained to and understood by them. Written informed consent is obtained from all participants. The examinations were chosen because of non-invasive nature of acquisition and standardized testing to assess intermediate phenotypes of the different diseases. Well-characterized intermediate phenotypes are of great value for screening, patient monitoring and in the context with clinical trials. At the end of the baseline-and MRI-examinations, all participants receive a standardized report of all relevant clinical results with a possible recommendation of referral to the general practitioner. Recommendations for all findings were defined before study start and standardized. Findings, which need clinical referral immediately, are defined as well and reported directly to the participants. If required, participants are also accompanied to the emergency department of the University Medical Center Hamburg-Eppendorf by a member of the study staff.

Data protection concerns
During the study conception, the Data Protection Commissioner of the University Medical Center Hamburg-Eppendorf and the Data Protection Commissioner of the Free and Hanseatic City of Hamburg were consulted and their approval for the study protocol was obtained (D4/17.06-22). The requirements of the Federal Data Protection Law (Bundesdatenschutzgesetz), the European General Data Protection Regulation and the law of Hamburg Hospitals (Hamburger Krankenhausgesetz) will be adhered to. Subjects will be identified solely by means of an individual identification code and data are only collected, stored and analysed pseudo-anonymized. To enhance data confidentiality, data, lab samples and genetic results have different identification numbers. The identification numbers will change between data collection, storage and disclosure. To avoid mistakes in assigning the right identification number, the pseudoanonymization is conducted electronically. All data will be stored at the study centre and all analytical activity will take place at dedicated workstations situated at the study centre. Extensive surveillance and login of all activity will be carried out.

Strengths and limitations
Most importantly, the HCHS is a prospective populationbased cohort study, which enables cause and effect analyses investigating major risk factors for a number of symptoms and chronic diseases. In addition, the extensive annual follow-up also contributes to the advantages of the HCHS. Further, the high number of examinations including a range of novel approaches in different disciplines contributes to the fact that data from the HCHS may unravel hitherto unknown causal associations. In line with this consideration, the extensive analyses planned with the use of biological samples giving basic genomic information in combination with phenotypic data will provide a rich data source for advanced analyses. The high number of participants secures that even relatively rare outcomes may be investigated. Furthermore, it is the ambition to link the data from the HCHS to socioeconomic information and individual records indicating information about diseases, prescribed medications, in-and outpatient health care use as well as sick leave etc. The cohort recruitment allows that prevalent cases of most chronic diseases will be enrolled and thereby opens up for early cross-sectional analyses investigating associations leading to hypothesis driven studies when the first 10.000 participants are included and when the cohort is complete. In line with this consideration, one may also point to the possibility of describing the distribution of diseases associated with patterns of several factors, i.e. socioeconomic status in diabetes patients, nutrition and lifestyle factors in dementia or the association between skin disorder and dental status. The design also offers an opportunity to conduct more complicated disease trajectory analyses using patterns of symptoms and diseases as both causes and effects. It is also of vital importance that this study offers an opportunity to investigate determinants of survival and survivorship in accordance with the aims of the study.
Even prospective cohort studies will have limitations due to selection bias in participation and recall bias. This study mostly relies on the recall of participants as the method of achieving information about central outcomes during the 6 years of follow-up. This limitation will be addressed by obtaining data from health insurance and pension funds. Compared to cohort studies investigating risk factors of diseases characterized by high incidence rates the waiting time for incident outcomes in diseases having low incidence rates is longer. Conducting a multidimensional study also reduces the number of participants with complete data, and as funding is limited, this problem represents a true limitation of the data set, which in the end will be less than a 100% complete. From a more technical point of view, some data may not be collected due to participants wish, lack of staff or technical malfunction of equipment. To address this limitation, standardized quality controls are performed weekly to detect missing data in order to ensure that no systematic processes are in function. Birthday, Christmas cards and annual newsletters will remind the participant to inform the study centre in the event of a medical endpoint. A number of standardized operating procedures for the recruitment, collecting and storage of data, quality control and analyses will be established. To avoid endpoint misclassification, an interdisciplinary endpoint committee will render an expert opinion on the basis of discharge letters or further information. The integrity of the collected data in the databases is controlled by detailed, predefined quality control algorithms according to standardized operating procedures in batches concerning detection of outliers, logically implausibility, or detect mistaken identity. Only quality-controlled data are used for statistical analyses.

Conclusion
In future, data from HCHS will strongly contribute to our knowledge about risk factors for and prognostic factors in major chronic diseases, survival and survivorship. It will be a unique source due to the combination of self-reported data, detailed imaging data, a vast number of biological information and unbiased administrative data established independent of the hypothesis of the study. It is the aspiration that the inclusion of novel aspects in all exposure assessment methods concurrently with well-established, traditional epidemiological tools across all exposures and outcome will help in achieving information of such quality so that it can feed directly into public health policy with regard to prevention and survivorship related aspects. In line with this aspiration, the use of the data in real-life clinic is also one of the main intention in this study. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.