The BELLA study is the module on mental health and HRQoL within the German Health Interview and Examination Survey for Children and Adolescents (KiGGS). Both studies have been conducted in close cooperation nationwide since 2003 and provide representative cross-sectional health- and mental health-related data on German children and adolescents as well as longitudinal data following participants into adulthood. The BELLA study uses a subsample of KiGGS. Participants were randomly drawn from the KiGGS sample and assigned to the BELLA study. The BELLA baseline assessment took place between 2003 and 2006 (n = 2863 children and adolescents aged 7–17 years) and was followed up at four measurement points, i.e., the 1 year (2004–2007), 2 year (2005–2008), 6 year (2009–2012), and the most recent 11-year follow-ups (2014–2017). New participants were included at the last two follow-ups to re-establish representative cross-sectional samples of children and adolescents and to compensate for loss due to dropout. Detailed information on the design of the BELLA study is presented in Fig. 1 (including a small preschool sample at BELLA baseline). Detailed descriptions of the KiGGS study [25, 26] and on the baseline assessment and first three measurement points of the BELLA study have been published [19, 27].
Only participants of KiGGS Wave 2 who agreed in KiGGS to be contacted for the BELLA study were invited to the 11-year follow-up. A letter was sent out including study information and a form to gather written informed consent. All participants and/or their parents were informed about the study procedures, told about means taken to protect their data and informed that participation was voluntary. Informed consent was gathered from the parents of children and adolescents younger than 18 years and from adolescents and young adults aged at least 14 years (adults could give their informed consent online as well). Data assessment was conducted online for the first time in the BELLA study; only if participants had no access to the internet or were not willing to participate online, a paper version of the questionnaire was provided (previous data assessments had been conducted by paper pencil questionnaires and computer-assisted telephone interviews). Parent reports were gathered on children aged 7–13 years, and self-reports were gathered in children, adolescents and young adults aged 11–31 years. The 11-year follow-up of the BELLA study was approved by the Federal Commissioner for Data Protection and received a positive vote from the Ethics Committee of Hamburg’s Chamber of Psychotherapists (on 24 September 2014).
Participation in the most recent 11-year follow-up of the BELLA study required participation in KiGGS wave 2. The sampling for KiGGS wave 2 was conducted in two steps. First, cross-sectional sampling included randomly selected children and adolescents from 167 cities and municipalities in Germany, which were selected from official residency registries . Second, for the longitudinal sampling in the KiGGS study, only participants who took part in the baseline assessment were followed up at KiGGS wave 2; KiGGS baseline participants could be included in KiGGS wave 2, if they agreed to participate . Participants were excluded from the sample as quality neutral losses when they did not belong to the target population (e.g., invalid address, moved to a foreign country, deceased) or if communication with parents was not possible due to language barriers . The numbers of invited and participating children and adolescents across all measurement points of the BELLA study are presented in Fig. 2. For the 11-year follow-up of the BELLA study, participants of the BELLA baseline assessment were re-invited for the baseline cohort sample of the BELLA study. In addition, new participants were included out of a randomly drawn subsample of the KiGGS wave 2 sample to allow representative cross-sectional analyses for children aged 7–17 years to be included in the cross-sectional sample. Please note, participants who participated for the first time in the BELLA study at the 6-year follow-up were not systematically re-invited for the 11-year follow-up, but the sampling procedure conducted in the KiGGS study resulted in a corresponding subsample in the BELLA study (see Fig. 2 and “Participants”). Out of the KiGGS wave 2 participants with an assignment to the BELLA study (n = 6370), 65.1% (n = 4148) agreed to be contacted by the BELLA study (baseline cohort sample: 73.7%; cross-sectional sample: 53.2%). Finally, n = 3492 children, adolescents and young adults participated in the 11-year follow-up of the BELLA study.
Response and cooperation rates
Response rates (RRs) and cooperation rates (COORs) were calculated according to the formulas RR2 and COOR 2 provided by the American Association for Public Opinion Research (AAPOR, ). Both rates were calculated twice, referring to the KiGGS participants with an assignment to the BELLA study (n = 6370) and regarding those KiGGS participants who agreed to be contacted by the BELLA study (n = 4148). Focusing on the latter sample, we calculated the response rate as the number of cases with valid survey data (n = 3492) divided by all cases we tried to get in contact with (i.e., those who participated, refused to participate, did not react at all to our invitation and we couldn’t reach via phone, and those with invalid contact information according to back-coming information); for calculating the corresponding cooperation rate, we divided the number of all cases with valid survey data by the number of cases we got in contact with (i.e., those who participated and those who refused to participate). The rates were calculated accordingly referring to the sample of KiGGS participants with an assignment to the BELLA study (using the corresponding numbers provided to us by the KiGGS study team). Of all KiGGS wave 2 participants assigned to the BELLA study (n = 6370), the (minimum) response rate was 56.5% and the cooperation rate was 68.7%. Of the families who had participated in KiGGS wave 2 and agreed to be contacted again by the BELLA study (n = 4148), n = 3492 finally participated in the 11-year follow-up with a (minimum) response rate of 84.5% and a cooperation rate of 94.5%.
Response analyses for participants and non-participants
Of those who agreed to be contacted by the BELLA study (n = 4148), n = 656 did not participate (15.8%). The main reasons were unavailability (66.2%, n = 398), active refusal (33.8%, n = 203), exclusion due to data quality issues (6.1%, n = 40) and quality neutral losses (i.e., letter undeliverable; 2.3%, n = 15). Among those young people who actively refused study participation, the main reasons were no interest (36.5%), no time (19.2%), and other reasons, e.g., negative experiences with studies or privacy issues (14.4%); approximately one-quarter of those who actively refused participation stated no reasons for refusal (23.7%), and a few people immediately hung up the phone when called to remind them of the study (7.4%).
Differences between the population of children and adolescents in Germany and KiGGS wave 2 participants are described elsewhere [26, 28]. We compared responders and non-responders of the KiGGS wave 2 participants with an assignment to the BELLA study (n = 6370). For this purpose, we predicted participation in the BELLA study by means of logistic regression analyses using sociodemographic (i.e., gender, age, urbanization, region, migration background, and SES) and health- and mental health-related variables (i.e., self- and parent-reported mental health problems, general health, physical health, impairments due to mental and physical health problems, and mental health care use in the last 12 months). To interpret our results, we followed recommendations  suggesting that OR = 1.68, 3.47, and 6.71 are equivalent to Cohen’s d = 0.2 (small), 0.5 (medium), and 0.8 (large), respectively. Only for age did we found a small effect, indicating that participation in the BELLA study was more likely in those aged 18–31 years than in those aged 14–17 years (OR = 1.73, 95% CI 1.51–1.97). For the remaining sociodemographic, health- and mental health-related variables, any effects were negligible.
For the cross-sectional sample (at the 11-year follow-up), a weighting procedure was applied to ensure adaptation to the KIGGS wave 2 population. The KIGGS wave 2 cross-sectional sample was itself weighted to be representative of the population in Germany taking the survey design (selection of a particular sample point and selection of participants within the sample point) and population distributions regarding age, gender, federal state (as of 31 December 2015) and foreigner status (German nationality yes/no; as of 31 December 2014) into account . For the BELLA cross-sectional sample, a weighting variable was calculated based on two steps: (1) the inverse participation probability multiplied by the KIGGS wave 2 weight was calculated based on the best participation probability model for participation in the BELLA study considering age, gender, citizenship of the mother, SES, current smoking of the mother, community size, highest education status of the parents, and apartment size; (2) an adaption weight was calculated to ensure comparability with the abovementioned population distributions covering four levels, namely, (i) age x gender, (ii) region (West, Berlin, East) × age group × education status of the parents, (iii) federal state × gender × age group, and (iv) region (West incl. Berlin vs. East) × foreigner status.
Dropout analyses for the 11-year follow-up
Regression analyses were conducted to examine systematic dropout at the 11-year follow-up for participants of the BELLA baseline (n = 2863) using sociodemographic and health- and mental health-related variables. Small effects found in the baseline sample indicated that dropout at the 11-year follow-up was more likely among those with a lower SES than a moderate SES (OR = 2.66, 95% CI 1.99–3.56) and with non-German citizenship (OR = 2.35, 95% CI 1.54–3.59). For remaining sociodemographic, health- and mental health-related variables, effects were negligible, if found at all.
Based on the 11-year follow-up, the BELLA sample can be differentiated into three main samples (see Fig. 2): first, a cross-sectional sample (n = 1580) including children aged 7–17 years, who were randomly selected for each age category and represent the German population for this age group; second, the baseline cohort sample of n = 973 participants of BELLA baseline (34.0% out of n = 2863 baseline participants); third, a total sample of all participants at the 11-year follow-up BELLA study (n = 3492) including those who had already participated at the 6-year follow-up, but not at previous measurement points of the BELLA study (n = 1050; 43.6% out of n = 2411 new 6-year follow-up participants). The sampling procedure conducted by the KiGGS study in combination with the fact that some villages used as sample points in the KiGGS study only had very small numbers of inhabitants, resulted in the following situation for the BELLA study. One individual from the baseline cohort sample and n = 110 individuals from the total sample, who had participated already and for the first time at the 6-year follow-up of the BELLA study, were included in the cross-sectional sample as well. The total sample of the 11-year follow-up of the BELLA study thus includes n = 3492 individuals (3603 cases summarized over all three samples minus 111).
Details on the sociodemographic characteristics of the BELLA cross-sectional sample (weighted and unweighted data), the BELLA baseline cohort sample (unweighted) and the BELLA total sample (unweighted) at the 11-year follow-up are presented in Table 1. The sociodemographic characteristics, region, migration background, and SES were almost equally distributed across all unweighted samples at the 11-year follow-up (please note, SES was measured in children and adolescents younger than 18 years based on information on income, profession and education of the parents). Participants in the BELLA baseline cohort sample were older (M = 23.17, SD = 3.32) at the 11-year follow-up compared to those in the BELLA total sample (M = 17.33, SD = 5.83) and those in the BELLA cross-sectional sample (M = 13.02, SD = 2.94). The age of participants at each measurement point of the BELLA study can be found in the Supplementary Material (File 1, Table S1).
For the 11-year follow-up of the BELLA study, data assessment was conducted mainly online; only if participants refused to fill out the online questionnaire or had no access to the internet was a paper version of the questionnaire provided. Self-reported data were collected from children and adolescents aged 11 years and older, and parent-reported data were additionally gathered for children younger than 14 years. The BELLA study used standardised instruments if available (complemented by self-developed measurements) to assess different aspects of health, HRQoL, mental health problems and mental health care utilisation. An overview of the instruments used across all measurement points of the BELLA study is provided in the Supplementary Material (File 1, Table S2). In addition, a large number of variables raised by the KiGGS study as indicators of somatic health (e.g., body mass index, blood pressure, laboratory parameters), health behaviour (e.g., nutrition, sports activities), and sociodemographic determinants (e.g., SES, migration background) are available and can be linked to mental health indicators from the BELLA study [31, 32]. We describe key measures administered at the 11-year follow-up in the following sections. Instruments used for analyses in the present article are mentioned again in the data analysis and results section (including information on their internal consistency in the corresponding samples under analysis).
Health and health-related quality of life
General health was assessed using the general health item (GHI) in self- and parent reports (“In general, how would you rate your/your child’s health?”) with a five-point response scale (1 = “excellent”, 2 = “very good”, 3 = “good”, 4 = “fair”, 5 = “poor”). The GHI is well-established and recommended by the WHO for use in health surveys . To measure self-reported HRQoL, the Kids-CAT was administered for the first time in a large population-based epidemiological sample. The Kids-CAT tool, developed and validated by the authors of this article, measures HRQoL in healthy and ill children and adolescents based on the five item banks on physical well-being, psychological well-being, parent relations, social support and peers, and school well-being [34, 35]. Acceptable to good internal consistency was found for the Kids-CAT dimensions in its validation study (mean standard errors of measurement ranged from 0.38 to 0.49 corresponding to Cronbach’s alphas from 0.76 to 0.86, ). The IRT-based measurement selects and administers the most informative items for each participant based on his or her location on the underlying latent trait . Therefore, the Kids-CAT provides fewer items and is as precise as traditional paper–pencil questionnaires. It has a child-friendly design and was easily accessible via the BELLA online questionnaire. For the first time, we also integrated a static proxy version of the most powerful Kids-CAT items to survey the parents’ perspective at the 11-year follow-up. Moreover, the well-established self- and parent-reported KIDSCREEN-27, including the KIDSCREEN-10 index with a five-point response scale (0 = “not at all” to 4 = “extremely” or 0 = “never” to 4 = “always”) , the SF-12 questionnaire , and the SF-36 questionnaire , were administered to measure HRQoL. Furthermore, validated short questionnaires of the item banks developed by the Patient-Reported Outcome Measurement Information System (PROMIS®) initiative [40, 41] were used to assess subjective well-being, family relations, physical activity, relations with peers, and global health. Good to mainly excellent internal consistency was reported for original PROMIS scales [42,43,44,45,46,47,48,49]. Within the scope of the BELLA study, the PROMIS questionnaires were translated into German (see e.g., , more publications on translations are to follow).
Mental health problems
At all measurement points, parent- and self-reports on mental health problems were assessed with the Strengths and Difficulties Questionnaire (SDQ) accompanied by the 5-item SDQ Impact supplement asking for difficulties that upset or distress the child and for interference with home life, friendships, classroom learning, and leisure activities with a four-point response scale (0 = “not at all”, 1 = “only a little”, 2 = “quite a lot”, 3 = “a great deal”) [51, 52]. For respondents aged 18 years and older, the Composite International Diagnostic-Screener (CID-S)  and the Symptom-Check List 9-item Short version (SCL-S-9)  were used at the 11-year follow-up. To survey symptoms of depression, the Center for Epidemiological Studies Depression Scale for Children and Adolescents (CES-DC, ) and the Patient Health Questionnaire-9 for Young Adults (PHQ [56, 57];) were used. Furthermore, depressive symptoms were assessed using the German translations of PROMIS Depression Short Forms across all age groups [58, 59]. The SCL-S-9, the CES-DC and the PHQ showed good to excellent internal consistency in former studies (Cronbach’s alphas ≥ 0.80 and 0.90, respectively; [54, 60, 61]).
Mental health care utilisation
Mental health service utilisation was assessed by surveying the psychiatric/sociopsychiatric/psychotherapeutic, psychological, or sociopedagogic care that respondents had used and how satisfied they had been with their treatment. Additionally, we assessed possible treatment needs and barriers that prevented people from accessing treatment.
Age- and gender-specific effects on general health and health-related quality of life over time
We investigated age- and gender-specific effects on self- and parent-reports of general health measured with the GHI and on HRQoL assessed with the KIDSCREEN-10 index using all available data across the measurement points of the KiGGS and BELLA studies. For analyses, we recoded response options of the GHI so that higher values indicated better general health. We calculated T values (M = 50; SD = 10) for the KIDSCREEN-10 index based on Rasch Person parameters of the European norm sample , with higher values indicating better HRQoL. Individual growth modelling was used for data analyses calculating linear mixed models, which allowed for repeated measurements using full-information maximum likelihood (FIML). Each model included age (at baseline), gender, the interaction age by gender, a linear time variable (with information on intervals between baseline and the measurement point in question in years), a squared and a cubic time variable as fixed effects; on the level of random effects, a subject identification variable was considered as random intercept and linear time was used as random slope. For each model, age was centred using the group mean at baseline (across all participants with valid baseline scores; Mage,t0 valid); for parent-reported HRQoL, the mean age from the 1-year follow-up was used (Mage,t1 valid) since no corresponding baseline data were gathered. We created graphs to illustrate gender-specific trajectories across age based on data from all measurement points using estimated marginal means from corresponding models. In preliminary analyses, we investigated potential cohort effects for each outcome. Random intercept models served to investigate whether the year of birth moderated the relationship between age (at each measurement point) and the outcome in question. Since information criteria and the χ2 difference test depend on sample size , we used McFadden’s R2  to evaluate the strengths of potential cohort effects comparing models with and without the interaction term of interest.
Mental health problems at baseline and related outcomes at 6-year and 11-year follow-ups
To examine the association between self- and parent-reported mental health problems (measured with the SDQ and SDQ Impact) reported at baseline and health-related outcomes at 6-year and 11-year follow-ups, we developed univariate general linear models for each perspective (self- and parent-reports at baseline), outcome (self-reported general health, mental health, physical health) and measurement point (6-year and 11-year follow-ups and). We included only predictors measured at baseline, i.e., mental health problems, impairment due to mental health problems (none, moderate, high), gender, age, SES, and the interaction of gender by age. Regarding health-related outcomes measured at 6-year and/or 11-year follow-up, we used the first item of the SF-36 to assess general health and transformed the item to a scale from 0 to 100, with higher scores indicating better general health; furthermore, the mental and physical health components of the SF-36 were used and standardised to a mean of 50, with a score above 50 representing better than average function and a score below 50 representing poorer than average function. Effect sizes were calculated using partial eta squared (η2 = 0.01 indicates a small, η2 = 0.06 a medium, and η2 = 0.14 a large effect).
Mental health care utilisation
Descriptive analyses were conducted on mental health care use and barriers to mental health care use.
All analyses were conducted with IBM SPSS 26.