Introduction

The onset of mental disorders mostly occurs in adolescence and young adulthood. An estimated 62.5% of disorders begin before the age of 25, with a peak age at 14.5 years [1]. These disorders are associated with negative outcomes on educational, occupational, and social domains [2,3,4,5,6,7,8]. Compared to those without psychiatric problems, individuals with a psychiatric disorder in their youth are nine times more likely to face negative outcomes on these domains in the transition to adulthood. For youth with subthreshold problems this is five times [9]. Early onset of psychiatric problems is associated with high persistence and negative prognosis [8, 10,11,12].

The etiology of psychiatric disorders is complex. Studies have shown that psychiatric disorders have a multifactorial etiology and that risk factors are pleiotropic [13,14,15]. Particularly in adolescence, the symptoms in the early stages of disorders tend to be non-specific [16, 17]. Subsequently, there are high comorbidity rates between psychiatric disorders as well as heterotypic continuity over time [11, 18,19,20]. This underlines the importance of a transdiagnostic approach in research [21].

To advance a more preventive and transdiagnostic approach in psychiatry, more epidemiological knowledge, especially on individual and environmental exposures, is necessary [7]. This knowledge could be used in individual prediction models for targeted prevention strategies [22, 23]. Considerable effort has been made to study the etiology of psychopathology, but this has been complicated by selective drop-out bias in general population studies, referral bias in patient-based samples, and a focus on a specific diagnosis or inheritance pattern in familial loading studies [24]. More accurate, large-sample deep-phenotype data in a high-risk population is likely to overcome these difficulties [16, 21].

The design of the iBerry (Investigating Behavioral and Emotional Risk in Rotterdam Youth) Study follows a cross-diagnostic approach that cuts across traditional diagnostic boundaries to examine the etiology and course of psychopathology instead of maintaining nosological boundaries with a focus on a single diagnostic category. The main aim of the iBerry Study is to examine the developmental course of psychiatric disorders and associated risk factors to contribute to the development of preventive interventions. The current paper discusses the design and protocol of the first follow-up measurement and gives a cohort profile update, details on the (non) response, and the prevalence of adolescent and parental psychopathology. Furthermore, the long-term effectiveness of using a screening questionnaire to select a cohort oversampled on their self-reported emotional and behavioral problems is discussed.

Study design

The iBerry Study is a cohort study of adolescents from the general population who were oversampled on their self-reported emotional and behavioral problems. The study is conducted in the greater Rotterdam area in the Netherlands, this region contains a combination of the highly urbanized city of Rotterdam, the surrounding suburban cities, and more rural villages [25]. The current study discusses the details from the first follow-up measurement (T1), the screening procedure at age 13 and baseline measurement at age 15 were described in detail elsewhere [24]. A concise graphical overview of the iBerry Study is presented in Fig. 1.

Fig. 1
figure 1

Overview of the design and summary of the different phases of the iBerry Study

Eligibility

As described previously by Grootendorst-van Mil and Bouter et al. [24], adolescents were selected for participation in the iBerry Study based on a questionnaire administered in the first year of secondary school as part of standard preventive healthcare performed by community Child and Family Centers in the Netherlands. All adolescents (mean age 13.1 years) filled out the Strengths and Difficulties Questionnaire–Youth (SDQ-Y), to assess their emotional and behavioral problems [26]. Unless the adolescent or their parent(s)/guardian(s) objected, all questionnaires from the school years 2014–2015 and 2015–2016 were screened. From these 16,736 screened questionnaires, adolescents with the highest 15% problem scores were selected, together with a random selection of adolescents with the lower 85% problem scores, resulting in the inclusion of 1,022 adolescents at baseline (September 2015–September 2019, response rate at enrollment 54%).

Enrollment at the first follow-up

Participants from the baseline measurement were contacted for the first follow-up measurement and 807 adolescents (79.0%) participated at T1. Data were collected between March 2019 and June 2022. Because the COVID-19 pandemic occurred during this measurement, we added two additional online measurements to collect data on emotional and behavioral problems during the lockdowns [27]. The median interval between the SDQ-Y screening and T1 was 4.7 years (IQR 4.5–5.4). The time between baseline and T1 had a median interval of 3.1 years (IQR 3.0–3.5).

Response rate

215 adolescents (21.0%) included at baseline did not participate at T1. A small number of participants objected to being contacted for follow-up measurements (n = 13, 1.3%). 71 adolescents (6.9%) declined participation in the first follow-up. The most common reasons for declining were a lack of interest (n = 48, 4.7%) or time (n = 10, 1.0%). The remaining 131 adolescents (12.8%) could not be reached.

Response rates were comparable for the high-risk adolescents (78.3%) and low-risk adolescents (80.6%), and the distributions of high-risk and low-risk adolescents were approximately equal in the responders and the non-responders (χ2 = 0.676, p = 0.411). A detailed overview of the baseline characteristics of responders and non-responders is provided in Supplementary table S1. Non-responders more often were male, had a higher age at baseline, had a non-Dutch ethnic background, had a lower educational level, and belonged to a lower income household. Non-responders were not more likely to score above the borderline cut-off for emotional and behavioral problems (measured with the Youth Self-Report) at baseline. Significant differences showed small effect sizes. Importantly, those non-responders at T1 were more frequently associated with incomplete baseline measurements, indicating that obtaining comprehensive data from this group was already challenging during the cohort's initial assessment.

Objectives

The iBerry Study aims to investigate the transition of subclinical symptoms to full-blown psychiatric disorders in adolescents who enter young adulthood. We use a cross-diagnostic approach to study all psychiatric disorders. The aim is to identify risk and protective factors and examine the mechanisms underlying the development of psychopathology.

Measurements

Assessment procedure

The adolescents were invited to the research center with one of their parents for a three-hour visit. All participants signed informed consent forms. All researchers were blinded from the SDQ-Y scores during the baseline assessment and follow-up measurements.

Details on data quality, control and management, and privacy protection have been described previously [24]. All protocols have been updated to ensure adherence to all applicable rules and regulations.

Main outcomes

The main aim is to study the long-term prognosis of adolescent subclinical psychiatric symptoms. We want to examine which determinants can predict the transition of subclinical symptoms to psychiatric disorders. To assess psychopathology we used various methods, including the Achenbach System of Empirically Based Assessment (ASEBA) questionnaires and the MINI Neuropsychiatric interview for DSM-5 diagnoses [28,29,30]. Specifically, we conducted detailed assessments of psychotic experiences [31], suicidality and nonsuicidal self-injury (NSSI) [32,33,34], and autism [35]. Subsequent studies will be conducted to examine the various determinants and the in depth measurements of psychopathology.

Main determinants

A broad range of biological, psychological, and social markers is assessed. At T1 these included socio-demographic characteristics, general functioning, sensory processing, aggressive and delinquent behavior, lifestyle and addiction, health care use and costs, personality, coping, family functioning, parenting, peers, relationships, sexuality, (adverse) life events and trauma, neuropsychological functioning, somatic complaints, anthropometry, and puberty development. We also collected detailed information on sleep and movement which was assessed using nine-day actigraphy measurements and a daily diary. Parents also provided information on parental psychopathology, personality, substance use, and anthropometry. A complete overview of all measurements is provided in Supplementary table S2.

Biological samples

Biological measurements at baseline were repeated at T1; we again obtained a blood and a hair sample from both the adolescent and from the accompanying parent.

Characteristics of the study cohort

Socio-demographic characteristics

Socio-demographic characteristics of the adolescents and their parents are presented in Table 1. At the first follow-up the adolescents in the cohort had a mean age of 18.1 years, 53.5% were female, 23.6% had a non-Dutch ethnic background, 61% lived in an urban area, and vocational education was the most reported education level. For 85.1% of the adolescents, one of their parents (83.4% mothers) also participated in the study. High-risk and low-risk adolescents differed on educational level, whether they participated with a parent, parental education level, and household income level, albeit with small effect sizes. The oversampled ratio of high-risk to low-risk adolescents remained consistent between baseline (2.5:1) and T1 (2.4:1), 570 high-risk adolescents (70.6%) and 237 low-risk adolescents (29.4%) participated at T1.

Table 1 Socio-demographic characteristics of the participating adolescents and parents at first follow-up

Emotional and behavioral problems

Emotional and behavioral problems, as measured with the Youth Self-Report (YSR) and Child Behavior Checklist (CBCL), are described in Table 2. In the high-risk group 37.1% of the adolescents reported problems in the borderline/clinical range (ASEBA norm scores, > 93rd percentile [28]), compared to 16.0% of the adolescents in the low-risk group. Combining the multi-informant questionnaires, 40.0% of the adolescents in the cohort (47.0% in the high-risk group, 23.1% in the low-risk group) scored above the borderline cut-off on the Total Problems scale according to one or both informants. For all scales, more high-risk adolescents scored above the cut-off compared to the low-risk adolescents.

Table 2 Emotional and behavioral problems of the adolescents participating at T1

Adolescent psychopathology

In 729 adolescents a complete semi-structured interview for the DSM-5 diagnoses was conducted (Table 3). The most common diagnoses in the high-risk group were substance-related disorders (40.0%), followed by mood disorders (35.1%), and ADHD (29.5%). In the low-risk group, substance-related disorders were common (42.1%), followed by mood disorders (16.8%). Notably, the high-risk adolescents more often met the criteria for multiple diagnoses (46.0%) compared to the low-risk adolescents (24.8%).

Table 3 Current psychopathology (last 2 years) in adolescents and one of their parents, assessed using structured clinical DSM interviews at T1

Parental psychopathology

In line with the prevalences presented in our design paper [24], parents in the cohort showed higher rates of psychopathology in the past 2 years compared to the general population (Table 3). In the parents of high-risk adolescents 27.7% met the criteria for one DSM-IV diagnosis, 13.9% met the criteria for multiple DSM-IV diagnoses. In the parents of low-risk adolescents, these prevalences were 24.1% and 8.2%, respectively. Because a validated Dutch translation of the DSM-5 interview was not available at the start of T1, the same DSM-IV interview from the baseline measurement was used for consistency.

Predictiveness of SDQ-Y score

To investigate the longitudinal predictive value of the SDQ-Y score at age 13 on the development of emotional and behavioral problems, we used multilevel random intercept regression models. We used sex, age, time, and an interaction term between time and risk status to model self-reported internalizing, externalizing, and YSR Total Problems scores. The estimated mean scores from linear mixed models for the three problem scales, stratified by SDQ-Y risk status, are visualized in Fig. 2. The coefficient estimates are summarized in Supplementary table S3. Next, we examined adolescents who reported internalizing, externalizing, and total problems above the borderline cut-off score. The odds ratios from the mixed effect logistic regression are presented in Table 4. Both in the linear and logistic models, we did not find an interaction effect between time and risk status. Between baseline and T1, the trajectory of emotional and behavioral problems did not differ between low and high-risk adolescents. Overall, both groups either remained equal or increased on emotional and behavioral problems. Notably, the difference between low and high-risk adolescents remained stable between baseline and T1, indicating that the SDQ-Y score is a good predictor of later psychopathology both in the short-term (baseline, after ~ 1–3 years) and in the medium-long term (T1, after ~ 4–6 years). Adolescents identified as high-risk had a four to sevenfold higher odds of scoring in the borderline range on internalizing, externalizing, and total problems compared to low-risk adolescents.

Fig. 2
figure 2

Estimated emotional and behavioral problem mean scores for the internalizing and externalizing subscales and the total problem scale at baseline and first follow-up measurement (T1) stratified by risk status

Table 4 Odds ratios from mixed effect logistic regression models including a random intercept, time, risk status, age, and sex to model adolescents reporting internalizing, externalizing, and total problems in the borderline/clinical range

Statistical power

In future studies that make use of the collected data, not all analyses will be performed in the complete cohort because of loss to follow-up and missing values. Depending on the association under study, we will consider the information that is available for the main predictor and main outcome variable. Based on sample sizes ranging from 1000 to 500 (with an alpha value of 0.05 and 80% power), the study can detect a difference in standard deviation ranging from 0.18–0.25 (50% prevalence) to 0.41–0.58 (5% prevalence). These are detectable effect sizes using a dichotomous measure of exposure, which are considered conservative. Within the iBerry Study, we will often study the effect of continuous determinants and prognostic factors, assessed at multiple time points, which will further increase power.

Follow-up and retention strategies

All adolescents received gift certificates for their participation. All travel expenses were reimbursed. To keep in contact we send newsletters, birthday cards, and holiday cards. It is also possible to follow the study on social media channels. To minimize loss to follow-up we ensured the correctness of contact information at each contact. If an adolescent was unable to participate it was possible to make an appointment during the evening, online, as a home visit, or to postpone the visit for a couple of months. Lastly, if these possibilities were not an option a short questionnaire was sent to the adolescent to collect data on sociodemographic characteristics, emotional and behavioral problems, psychotic experiences, (adverse) life events, substance use, and self-harm. In a separate questionnaire parents provided additional information on emotional and behavioral problems, adverse life events, healthcare use, personality, executive functioning, and family functioning.

Data linkage and collaboration

The available information of the participants makes it possible to integrate the cohort data with other data sources. Environmental characteristics can be studied based on the home address to be linked with data from Statistics Netherlands or in collaboration with the Geoscience and health consortium [27, 36]. Furthermore, adolescents provided informed consent to obtain additional information on their health and development from their healthcare providers. We warmly welcome other researchers to collaborate by combining the iBerry cohort data with other studies.

Strengths and limitations

Our results show that the screening procedure was successful in selecting a cohort of adolescents at risk of psychopathology. Adolescents with a high score on emotional and behavioral problems at age 13 were four to seven times more likely to report significant emotional and behavioral problems at age 18. New normative SDQ-Y norms for Dutch adolescents showed that the 15% SDQ-Y cut-off used to select the high-risk population aligns with adolescents scoring above the borderline/clinical cut-off score [37]. This relative cut-off score where adolescents were compared to their peers, is an efficient way to select adolescents at risk for psychopathology as evidenced by the high levels of psychopathology in our cohort at follow-up.

The specific design of our cohort enables us to address a limitation of general population studies that suffer from drop-out dependent on less prevalent risk factors and outcomes. With the oversampled selection and the retention of adolescents at higher risk of psychopathology, we will continue to have the advantage of increased power to examine associations of interest. Despite oversampling individuals with a high risk for psychopathology, the risk factors that our study identifies are relevant to the general population as well. This enriched population might enable us to pick up risk factors that are more rare, but it is not likely that the oversampling will lead to spurious associations [38].

Although our retention rate is high, we still have adolescents who were lost to follow-up. As indicated by our non-response analyses, these adolescents do differ on socio-demographic characteristics from the adolescents who did participate. This could introduce bias in our results, which we will reduce as much as possible. Attrition will be addressed in a targeted manner for each association under study, for example by controlling analyses for potential confounders and by using multiple imputation methods [39,40,41,42,43,44]. Furthermore, the scientific inference of our findings will not necessarily be compromised. While a completely representative sample of the source population is valuable, broader scientific generalizations can sometimes hold more significance than strict sample representativeness, especially when examining associations between variables of interest [45]. Selective drop-out will likely result in biased estimations of prevalences, and by design the prevalences within the iBerry cohort will be higher than in the general population, but this will not likely affect the direction of the associations under study [46,47,48].

Future perspectives

Now that the cohort has been established and we have found successful retention strategies, we aim to follow up this cohort until young adulthood. Adolescents are invited for follow-up visits every 2–3 years. Currently, the third visit to the research center is being conducted. This visit includes both repeated as well as new age-appropriate measurements, such as an extensive assessment for personality disorders.

To ensure valid conclusions we will also request microdata, anonymized data at the level of private individuals from Statistics Netherlands (CBS), which enables us to study the outcome in both individuals participating in subsequent follow-up phases and in those who might be lost to follow-up.

Conclusion

The iBerry Study closely examines adolescents and their parents to determine a broad range of biological, psychological, and social markers for the transition of subclinical symptoms to psychiatric disorders. Adolescents were successfully selected based on their self-reported emotional and behavioral problems at age 13. The current data underscore an evident vulnerability in high-risk adolescents, manifesting in significant psychopathological problems by age 18. The research's in-depth measurements in combination with its design, anchored by its careful screening and retention strategies, create a foundation for numerous detailed future investigations into the complexities of adolescent psychopathology.