The longitudinal relationship of alcohol problems and depressive symptoms and the impact of externalising symptoms: findings from the Belfast Youth Developmental Study

The aim of the present study was to contribute to the inconsistent literature on the comorbid relationship of alcohol problems and depressive symptoms from late adolescent to emerging adulthood by accounting for their trajectories and their conjoint relationship while controlling for the influence of externalising symptoms. We utilised data, from a longitudinal school cohort from Northern Ireland (Belfast Youth Developmental Study), over three time points where the participants were 16, 17 and 21 years of age. A total of 3118 participants were included, 1713 females and 1405 males. Second-order latent growth models were applied to examine growth trajectories. Parallel process growth models were used to assess whether growth trajectories of the symptoms were associated. Externalising symptoms were subsequently added as a covariate. Alcohol problems among males significantly increased over time but decreased in females. Depressive symptoms initially increased then decreased in both genders. Results indicated associations of the alcohol problems and depression, both initially and with time. Accounting for externalising symptoms only somewhat diminished this effect in males but not in females. An increase of initial levels of depression was associated with a decrease in alcohol problems over time. This association was only true among females. After controlling for externalising symptoms, the relationship was no longer observed. The present study provides further evidence of a significant relationship of alcohol problems and depression in adolescents and further supports a small literature indicating that depression may have protective effects of alcohol problems. Finally, the study shows the importance of accounting for externalising symptoms.


Introduction
Depressive symptoms (DS) and alcohol use problems (AP) (alcohol abuse and dependence as defined by DSM-IV) [1] are common in youth and often tend to co-occur [11,39,47,50]. AP are not necessarily alcohol use disorders (AUD) as an individual's relationship to alcohol can be problematic without s/he being diagnosed with dependence. The literature on the relationship between AP and DS is large but inconclusive on direction of the association. The goal of the present study was to investigate the relationship of AP and DS while considering that these problems change with age and therefore investigate their individual and conjoint trajectories.

Epidemiology
Depression is the most common affective disorder in youth [35]. Depressive disorders has a prevalence rate of 15-20% [17] and depressed symptoms or mood has a prevalence rate of 9-30% [54]. Alcohol dependence has been reported to occur in 6% of [13][14][15][16][17][18] year olds and in 14% in 18 year olds [45,63]. Problematic alcohol use is highest among [18][19][20][21][22][23][24] year olds [23] and high in the UK compared to the rest of the world [64]. Among adolescents in the UK, alcohol use and mental health issues are higher among adolescents in Northern Ireland (NI) than in the rest of the UK [8,59,62]. A contributing factor to the increased levels in NI has been argued to be the post-conflict status of NI [14].
DS are the second most common comorbid condition with AP in adolescents [3]. In late adolescence, the presence of either increase the odds of the other by 2-4 times [9,36]. The comorbidity is associated with increased severity of symptoms, increased risk of suicide, and less effective treatment [55,59,62], making their comorbidity of clinical importance.

Aetiology
There are four possible mechanisms accounting for the relationship between AP and DS in adolescents. (1) Single causal: AP acts as a trigger or risk factor for DS [16,40] or DS act as a trigger/risk factor for AP [12,60]. (2) A reciprocal or bidirectional causal relationship where both directions of causality are present: the presence of each disorder predicts an increased risk of the presence of the other [44]. (3) Shared genetic and/or environmental factors, where these factors increase the risk of both AP and DS [30]. (4) A third factor/s mediates/moderates the relationship of AP and DS [19].
There are strong arguments for either of these directions of effect. AP are hypothesised to directly cause DS due to the bio-psycho-social consequences of acute and chronic alcohol abuse and dependence [18,55]. An extensive review by Boden and Fergusson [6] on AUD and MD concludes that the main direction of effect is in fact from AUD to MD [6]. However, this does not exclude the opposite direction of effect where DS may directly predict AP due to the role of emotion regulation or the self-medication hypothesis; alcohol is consumed to relieve the negative effects of depression [33]. The presence of one mechanism does not exclude the other, in fact, the presence of one may trigger or heighten the effect of the other and they may both be present.
Several factors may influence the relationship of AP and DS, such as age and gender. The previous literature demonstrates gender difference in both prevalence of AP (generally higher in males) and DS (generally higher in females) as well as effects in the rates and correlations between DS and AP [9,54,65]. A factor commonly found to influence the relationship of AP and DS is externalising disorders/symptoms (expressed outward such as conduct, disruptive and impulse control disorders). Externalising disorders are associated with both AP and DS [9] and have been found to moderate their relationship [13,19,38,52], often indicating a diminished relationship or a non-existing relationship between AP and DS [34,46]. It is possible that these disorders are different expressions of the same underlying psychopathology, or that they are distinct disorders which happen to overlap. For example, in a large adult twin study, Kendler et al. [31] found that genetic factors were largely responsible for comorbidity of internalising (expressed inwards such as anxiety and depression) and externalising disorder [31]. Accounting for externalising symptoms may reveal important information in understanding the relationship between internalising symptoms and alcohol problems. However, many studies fail to take externalising symptoms into account.

Trajectories
AP and DS change in prevalence over time; they increase throughout adolescence and decline in early adulthood [9,24]. Majority of the literature on comorbidity of AP and DS does not account for change or trajectories, thereby assuming a falsely heterogeneous sample regarding change. Marmorstein [39] and Needham [50] did address the longitudinal relationship of AP and DS in adolescents but found conflicting results. Both studies found a bidirectional relationship but in different directions; Marmorstein [39] found a positive relationship between AP and depression, while Needham [50] reported a negative relationship of binge drinking and depression. In other words, an increase of AP was associated with an increase in depression, while an increase in binge drinking, perhaps counterintuitively, was a protective effect of depression [39,50]. Differences may be accountable by the different measures of alcohol (AP and binge drinking), but the results should still be interpreted with caution due to the wide age range of the sample. Students were aged 11-21, 11-23 and 18-28 for each wave, respectively, making interpretation of the findings difficult. As comorbidity changes significantly with age in adolescence [38,39], it is important that the range of age in each wave is narrow.

Present study
The present study aims to investigate the development and relative growth of AP and DS and the possible association of their trajectories. There is a large literature on the association between AP and DS, but only a few studies accounts for relative growth, an important factor due to the changing prevalence of the problems across adolescence. Those which do show an inconsistent pattern fail to account for the effect of age. The design of this study addresses these issues while accounting for externalising symptoms and gender.

Procedure
The study utilised data from the Northern Irish longitudinal study Belfast Youth Developmental Study (BYDS), a school cohort followed from the age of 11 to 21, across seven waves of data collection. Informed consent was taken for participation in the study. A detailed description of data collection of BYDS can be found elsewhere [53]. The study was approved by appropriate ethical committees and are therefore in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.

Participants
AP and DS were measured in the last three waves of the study when participants were aged 16, 17 and 21. Assessments were performed annually until students completed school at age 17/18. A follow up was completed at the age of 21. Externalising symptoms were measured in wave 4, when participants were aged 15. Participants with missing data in all waves for each phenotype were excluded from the study, (801). This resulted in a total sample of 3118, 1713 females and 1405 males; 3118 at the age of 16, 2535 at the age of 17 and 2287 at the age of 21.

Depressive symptoms
The Short Mood and Feeling Questionnaire (SMFQ) was used at age 16 and 17 [2] and the Revised Patient Health Questionnaire was used (PHQ-9) [61] at age 21. The creators of BYDS designed the study in this manner due to the age change of the participants, as there were plans for further waves of data collection (SMFQ is suitable for < 17 years, PHQ-9 is suitable for adults). The SMFQ consists of 13 items on a 3-point scale (0 = not true, 1 = sometimes true, and 2 = true) and the PHQ-9 consists of 9 items scored on a 4-item Likert scale of 0-3, ranging from 0 (not at all) to 3 (nearly every day).
To perform a longitudinal analysis, a minimum of three measuring points is necessary to measure change. It was therefore necessary to generate a time-consistent measure of DS. Five items were selected across the two measures which addressed the same topics ( Table 1). The individual items were dichotomised and a factor analysis was performed across measurement occasions to establish whether the same latent measure was assessed across the waves. Initially, CFA for each wave and for all waves together was performed. All models indicated excellent fit to the data, indicating that our generated measure did measure the same entity across time. Additionally, three models freeing and/or constraining parameters were tested to assess whether the structure was the same or similar across parameters. The models tested were, no invariance (least restrictive, model A), full invariance (most restrictive model, model B) and partial invariance (model C) [42]. Based on model fit and Chi square difference, model C was accepted as the best fitting model. The factor scores of the partial model were extracted and used for descriptive statistics only as to observe mean patterns of change. See supplementary material A for more detailed information.

Alcohol problems
The Alcohol Use Disorder Identification Test (AUDIT) was administered at the three time points of data collection (age [16][17][18][19][20][21]. AUDIT is a brief screening instrument used to detect alcohol-related problems [5], consisting of ten items. Each question was scored on a Likert scale ranging from 0 to 4, where 0 indicates no harm and 4 indicates a high level of harm. The scores are added up in the three subsets (consisting of 4, 3, and 3 items, respectively) and subsets are summed (range 0-40). The sum score was used for descriptive statistics only. A relatively large proportion of participants responded to all the items at time points 1 (76.48%), 2 (93.88%) and 3 (89.84%).
A relatively large proportion of participants responded to all the items at the age of 16 (76.48%), 17 (93.88%) and 21 (89.84%). To deal with missing data, we chose to replace missing data for each participant with the individualised mean of its relevant subcategory rather than the total mean. If one item or less was missing for each subscale for each participant, it was replaced with the individualised mean of its relevant subcategory rather than the total mean. Thus, participants with two or more items missing from each subcategory were scored as missing. Due to human error, item 2 was not included at time points 1 and 2 and was dropped from all waves to avoid bias. Internal reliability (excluding item 2) was good according to Cronbach's alpha (standardized for time points 1, 2 and 3, respectively: α = 0.84, α = 0.84, α = 0.83).

Externalising symptoms
Externalising symptoms were measured using the Strength and Difficulties Questionnaire (SDQ). The SDQ is a brief mental health measure used for children who were 3-16 years of age [20,22]. Externalising symptoms assess conduct problems (aggressive, antisocial and oppositional disorders) and hyperactivity (impulsive behaviours such as agitation and distraction). The SDQ items are scored on a 3-point scale of 0 = not true, 1 = somewhat true and 2 = certainly true. Missing item data as per individual were dealt with according to instructions [21]; if three items or less were missing, these were replaced with the individual mean, resulting in 164 individuals scores being estimated. Internal reliability was good (α = 0.92).

Statistical analysis
To test our hypothesis of whether the rate of change over time (slope) and initial scores (intercept) of AP and DS were associated, growth models are necessary. To investigate a participant's rate of change over time, latent growth models (LGM) were employed (see, for example, [4,7,48,58]).
LGM estimates a growth curve model of the nature of change in a variable over time and describes a smooth process of change [29,51]. The change of observed variables is estimated as a function of slopes and intercepts as well as random error averaged over individuals [41,43].
LGM accounts for the uneven times between assessment points.
A LGM model generally applies a first-order model, i.e., it utilises the sum of scores for multiple items. To measure true change and incorporate measurement error of multiple indicators into the model, a second-order growth model was applied. This also includes the benefit of being able to assess measurement invariance across time [25]. Multiple indicator predicts each repeated latent variable for each measurement.
Factor loadings for the intercept (starting point) for all constructs were set to 1 as default. Factor loadings for the slope were set to define the rate of change as linear and followed in accordance with the time of data collection. Factor loadings were spread out with yearly lapses based on the time of data collection and the mean age at each time point (16.1, 17.2 and 21.2 years); factor loadings were set at 0 (baseline), 1, and 5. Invariance was imposed over time for residuals.
A second-order LGM model is presented in Fig. 1. Covariates can be added to the model through regression of growth parameters on covariates.
LGM models are estimated stepwise to test whether each step results in an improved model fit of the data.
To investigate if growth of AP and DS was associated, second-order parallel growth models (PGM) were utilised, as it enables the user to model multiple parallel processes. PGM brings together two second-order LGM models and investigates how their growth parameters interrelate. The model fit allowed the intercept of each process, the slopes of each process as well as the intercept and slope of each disorder to correlate (Fig. 2). The variance for each growth factor describes individual variation around the mean. MLR estimator was applied to all the analyses due to the large samples and non-normal distribution of AP and DS in both samples. The best model was selected by lowest root mean square error of approximation (RMSEA), highest comparative fit indices (CFI), standardized root mean square residual (SRMR) and lowest Bayesian information criteria (BIC). A model fit with values of: CFI > 0.95, RMSEA < 0.06, SRMR < 0.08, and lowest possible BIC value suggest a good fit of the data [26]. Data were prepared and descriptive results obtained in STATA 13 and the growth models were fit in Mplus 7.11 [49].
Post hoc analyses indicated high dropout rates after the age of 16. While individuals who dropped out did not differ on initial levels of DS compared to those who remained in the study, they did report significantly higher levels of AP [mean (SE) 9.08 (0.77)] compared to those who remained in the study [mean (SE) 6.68 (6.36)] (but not on any other baseline measures). This is a common problem in longitudinal studies and as to deal with it, multiple imputation was performed on the AP and DS variables in STATA 13 and used in Mplus for further analyses. Multiple imputation allowed for clustering of repeated measures at the individual level using multivariate normal regression models allowing for interdependencies within individuals. To allow for clustering within school, dummy variables for school affiliations were included as covariates. Imputation was also run separately for males and females to allow for gender differences. Multiple imputation was used to create 50 datasets with imputed values. Missing values in the covariates were imputed using chained equations in Stata 13. Covariates were included to predict missingness and make MAR assumptions plausible (SES, living arrangements, parental monitoring). To increase accuracy of the imputation, auxiliary variables (AP and DS) were used in the imputation process. The 50 datasets with imputed values were then exported and the MPlus imputation function was used, whereby parameters estimated are averaged across the imputed datasets [49].
All parameters are presented as standardized.
Multigroup analysis was performed by gender for all models, as to test for equality of growth parameters across genders. Means and variances of growth parameters were estimated and constrained between genders and compared using Chi square statistics. Scaling correction factors were taken into account [57] in a modified Chi square test as MLR estimators were used.

Descriptive statistics
AP decreased throughout the study for females but peaked at the age of 17 for males (Table 2). DS initially increased followed by a decrease. Males and females different significantly at all measurement points in the mean score of AP and DS (p < 0.05). Significant correlations were indicated between AP and DS in all waves for all participants and between externalising symptoms and DS and AP.

Second-order latent growth model
Model fit indices of second-order LGM for AP and DS are presented in Table 3.

Alcohol problems
Models estimating and constraining (to zero) the variance of slope and intercept (growth parameters) were compared. A model estimating the variance of the intercept and slope Note that the figure only shows three observed items for DS and AP, this is for clarity only, the actual model contains five items for DS and nine items for AP. All growth parameters (i and s) of AP and DS are allowed to correlate as indicated by double arrows. Residual variance for each variable at each time point is estimated. Intercept is predicted by wave 1; the slope is predicted by all waves. Numbers indicate fixed parameters. w wave, i intercept, s slope indicated the best fit. Including externalising symptoms as a covariate indicated an improved model fit according to SRMR and CFI but not RMSEA. AP decreased significantly across time. The process is presented in Fig. 3. The residuals of the growth parameters were significantly and negatively correlated (− 0.37, p < 0.001), for females (− 0.47, p < 0.001) and males (− 0.19, p < 0.05). An increase of externalising symptoms was significantly associated with an increase in the intercept of AP (0.52, p < 0.001) (higher levels of externalising symptoms was associated with higher levels of AP) and a decrease in the steepness of slope (− 0.23, p < 0.001) (higher levels of externalising symptoms were associated with a steeper decrease of AP).

Depressive symptoms
Models estimating and constraining (to zero) the variance of slope and intercept (growth parameters) were compared. A model estimating the variance of the intercept and slope and including externalising symptoms as a covariate indicated the best fit. The process is presented in Fig. 4. DS decreased significantly over time. The residuals of the growth parameters were significantly and positively correlated (0.07, p < 0.001), for both females (0.05, p < 0.001) and males (0.04, p < 0.001). An increase of externalising symptoms was significantly associated with an increase of the intercept (0.25, p < 0.001) and slope (0.11, p < 0.001).
A multigroup analysis indicated that genders differed overall on growth parameters (χ 2 difference comparing all parameters constrained vs estimated = 24.42, df = 4, p < 0.001). Females reported higher initial mean levels of DS and greater variability in slopes over time [0.05*** (0.01)], compared to males [0.01 (0.01)].   Table 4. An unconditional PGM model indicated that the intercepts within each phenotype were significantly correlated as well as their slopes, in both males and females. The same was true for a conditional model apart from in males where no significant relationship was observed between the intercept and the slope of AP. A significant relationship was observed between phenotypes of intercept and slope. However, in males, no association between initial levels of AP and DS was observed after accounting for externalising symptoms. Additionally, an increase in initial levels of depression was associated with a steeper decrease of alcohol problems in females only. A finding not observable when externalising symptoms was accounted for.

Discussion
In this study, we examined the trajectories of AP and DS and their relationship in a longitudinal sample of adolescents in Northern Ireland. The results indicated that while AP remained relatively stable throughout the study, DS decreased overall. Analyses of possible associations of the trajectories indicated a comorbid relationship of AP and DS through association of both initial levels of AP and DS as well as their change over time. Moreover, in females, increased initial levels of DS were associated with a decrease of AP over time. However, this finding was accounted for by externalising symptoms.
Our results support previous studies suggesting that AP and DS peak in late adolescence and decrease somewhat in early adulthood [36]. Prevalence of AP and DS differed between genders. In line with the literature, DS were more severe among females, while AP were more severe among males in early adulthood (but not in middle adolescence where no gender differences were observed) [9,32,37]. It is interesting to note that individuals reporting initially higher levels of AP also reported a steeper decrease as they entered early adulthood. Individuals may simply decrease their level of drinking as they get older, possibly due to other responsibilities such as further education or family. Alternatively, this observation may be partly explained by the high dropout rates after the initial wave, suggesting AP to be a Fig. 4 Plot of DS for the full sample. Circle represents sample mean and triangle represents estimated mean Table 4 Estimates of growth parameters (95% CI) for PGM for alcohol problems and depressive symptoms AP alcohol problems, DS depressive symptoms, i intercept, s slope, x ↔ y covariance between x and y, i intercept, s slope *p < 0.05; **p < 0.01; ***p < 0.001 risk factor of dropping out of longitudinal studies/school. However, the individuals who did drop out did not report any difference in levels of DS. It is plausible that dropouts did not show comorbid symptoms but only AP; if so, our observed comorbid levels are less than the actual ones in early adulthood. Another possibility is that individuals with high levels of AP did develop DS which increased the risk of not participating/dropping out of school; if so, our results are neither representative of actual levels of comorbidity and will cause a bias in our interpretation of our results. Our results indicated a comorbid relationship of AP and DS, both at one particular time point and over time, in line with previous studies. Furthermore, the present results indicate that higher initial levels of DS are associated with a steeper decrease of AP; higher levels of DS acted as a protective factor of AP over time. Majority of the literature does not support this finding [10,56] though there are some which do [14]. While the present study does not investigate the underlying mechanisms behind this, it is possible that the protective effect observed of DS on AP is due to characteristics of DS such as social withdrawal, which results in less interaction with peers and, therefore, less exposure to alcohol use by peers [27] (alcohol use in early adolescence is strongly predicted by peer use [28]). This is further supported by genetic evidence suggesting that internalising traits decrease the risk of heavy alcohol use [15]. Alternatively, the association could be explained by the strong correlation between AP intercept and slope in females where higher initial levels of DS are associated with higher initial levels of AP which in turn are associated with a faster decrease of AP over time. Post hoc analyses investigated direct effects of DS intercept on AP slope and indicated no significant relationship (p = 0.42). In light of these results, we can exclude the possibility that initial levels of DS directly impact change in AP.
The protective effects of DS on AP were, however, accounted for by externalising symptoms, indicating that the relationship of DS and AP was conditional on externalising symptoms. The result may simply be due to the fact that the association between DS and AP is small and, therefore, goes unnoticed when externalising disorders are considered as its association with DS and with AP is larger. Furthermore, as internalising and externalising symptoms commonly are comorbid and externalising symptoms frequently are associated with an increased risk of alcohol use and problems, it is possible that externalising symptoms are more prominent even in the presence of internalising symptoms, which would explain why the protective effect of internalising is non-observable when accounting for externalising. Another possible thought but unfortunately out of scope for this paper though nonetheless important, is that AP, DS/internalising symptoms and externalising symptoms may just be different expressions of the same underlying problem. What is expressed may be a function of environment (such as parents, peers, schooling, and traumatic events), learnt coping mechanisms and genes.

Strengths, limitations and directions for future research
The present study has both limitations and strengths. Three main strengths should be noted. First, the sample is taken from Northern Ireland which is of interest in several aspects, such as heightened mental health issues and post-conflict issues. There are few longitudinal studies of Northern Irish adolescents. We had a relatively large sample and provided a fair representation of participants from deprived backgrounds (participants that were likely to be affected by the discussed issues). Second, the present study accounted for change over time, a factor rarely accounted for in the present context, as well as externalising symptoms and gender. Third, adapting questionnaires to the appropriate age is an important practice in longitudinal studies of youth, but does cause issues regarding analyses, an issue presently addressed.
However, several limitations still need to be addressed. Multiple measures are needed to allow for changes in the way problems present at different stages of development. For example, irritability is a common symptom of depression in adolescence but less so in adulthood. The fact that different measures should be used at different ages was considered when BYDS was designed regarding DS. However, this creates an issue when the intentions are to measure its change over time; different measures were used at different time points, which was the case in the present study and alternative measures had to be created which brings with them further limitations and lack of external validity. This is a common problem in longitudinal studies, but we have in the present paper proposed a method of dealing with this issue.
Moreover, while our findings may not be generalisable to other geographical areas as of the observable higher levels of AP and DS in NI comparatively to both the UK and the world, it does not necessarily imply that the relationships observed are not. Further research is encouraged in other countries as to confirm/reject our findings.
Missing data were a major limitation. Participation rates dropped significantly after the initial wave of the study. This is a common issue with longitudinal data and does cause biased results; as observed in our analyses, dropouts do report higher levels of AP. In the present study, we cannot therefore with certainty say that AP does decrease in early adulthood.
The study only includes three time points of data. The means of AP and DS suggest a curvilinear change; unfortunately, this cannot be modelled with only three time points. Additionally, it would be advantageous to have yearly assessments to better understand the change of the problems and their relationship throughout late adolescence. It would therefore, for both statistical (with more than three time points a curvilinear model can be accounted for) and developmental reasons (investigate change from early to late adolescence), be valuable to include more time points of data collection.

Conclusions
These findings provide additional support for a positive association between AP and DS in youth. The present result also indicates protective effects of DS on AP in females, though this is accounted for by externalising symptoms. Our study indicates the importance of accounting for trajectories of variables, accounting for externalising symptoms, as well as only looking at a narrow age range at each time point as AP and DS vary with age. However, to understand the development and relationship of two or more phenomena, it would ideally involve a birth cohort where details from gestation, school, family, environment, and genes are considered. However, this is costly and time consuming and it is difficult to account for all the possible variables which may have an impact. In the current study, we are aware of these shortcomings and wish the reader to be so as well when interpreting the results. While our results shed some understanding on these problems' relationship in this particular population, it is far from the full picture.