Background

Axial spondyloarthritis (axSpA) is a chronic inflammatory disease which predominantly affects the axial skeleton. It can be divided into ‘radiographic’ (ankylosing spondylitis, AS) and ‘non-radiographic’ (nr-axSpA), depending on whether definitive structural changes are evident on plain radiographs of sacroiliac joints [1, 2]. They are both characterised by inflammatory pain and functional impairment [3].

Like other conditions for which chronic pain is a feature, axSpA is known to be associated with depression [4, 5]. However, quantifying the prevalence of depression is challenging. Formal diagnosis of depression requires expertise and time consuming assessments. Instead, screening questionnaires are often used in both clinical and research settings. There are many different screening tools, each with variable score thresholds to define depression.

Estimating the prevalence of depression in axSpA is further complicated by the distinction of AS and nr-axSpA. Although they can be considered entities along a spectrum, there are key differences: the proportion of male patients is higher in AS than nr-axSpA cohorts, functional impairment is greater in AS cohorts, and AS is more commonly associated with elevated inflammatory markers [6]. Gender, functional impairment and inflammation can all potentially influence the likelihood of developing depression [7,8,9].

In spite of these challenges, quantifying the prevalence of depression is an important first-step to improve awareness and management of this comorbidity. Depression has direct importance to rheumatologists since current methods of disease severity assessments, on which important treatment decisions are made, rely on subjective patient-reported measures, such as the Bath AS Disease Activity Index (BASDAI) and spinal pain visual analogue scale (spVAS). The presence of depression is known to influence the reporting of pain and impairment [5]. Equally, more severe disease likely increases the risk of developing depression. It has been shown that self-reported anxiety and depression in axSpA have greater associations with disease activity and functional impairment than smoking or deprivation [10].

To date, depression in axSpA has not been systematically reviewed. Our aims were therefore to (i) report pooled prevalence of depression in axSpA, (ii) compare the prevalence of depression between axSpA, AS and nr-axSpA groups, and (iii) compare disease activity and functional impairment between those with and without comorbid depression.

Patients and methods

A systematic review was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [11]. The protocol for this review was pre-registered in advance (PROSPERO: CRD42017082359). Two reviewers (SZ, DT) searched Medline, PubMed, Web of Science, PsycINFO, CINAHL plus and the Cochrane Library for relevant literature in February 2018. In addition, abstract archives of the European League Against Rheumatism (EULAR), British Society for Rheumatology (BSR) and American College of Rheumatology (ACR) were searched up to, and including, 2017. The following search term was used: depress* AND (ankylosing OR spondyloarthr*). Differential nomenclature for depression (affective disorder, mood disorder, adjustment disorder, dysthymia) did not affect results in preliminary searches.

Studies were included if they: (i) were cross-sectional in design or were longitudinal studies reporting baseline data, (ii) used a validated diagnostic or screening criteria for depression with a defined cut-off threshold, and (iii) recruited adult patients fulfilling the modified New York or ASAS criteria for AS or axSpA, respectively. Studies were excluded if they used non-representative sampling (highly selective recruitment criteria, for example, studies that only recruited women) or had a samples size of less than 30 (to avoid unreliable prevalence estimates). Publications in abstract form only were also considered, as some prevalence studies may not be published as full articles and may have sufficiently described methodology within the restrictions of an abstract. Reviews, comments and editorials were excluded.

Two reviewers (DT, NM) independently assessed study eligibility and extracted data from qualifying studies. Any discrepancies were resolved through discussion moderated by a third reviewer. Information from included studies was extracted into predefined tabulated summaries containing data on: depression criteria and threshold used, study design, sample size, country setting, mean age, percentage of males, and depression prevalence.

Studies were assessed for risk of bias (SZ, NM) using the Health States Quality Index (Additional file 1: Table S1). This is an 11-point scoring system that assesses studies under the following headings: target population and observation period, diagnostic criteria, case ascertainment, measurement administration, catchment area and prevalence measure [12]. For purposes of the below meta-analysis, the score was divided by 10 to derive a quality parameter with an upper limit of 1.

Statistical analysis

Prevalence estimates were reported as percentages (95% confidence interval (CI), I2 statistic). When performing meta-analysis of proportions (variable bound between 0 and 1), traditional weighting methods based on inverse variance are problematic when the proportions are close to the bound limits, giving more weight to such studies [13]. The double arcsine transformation was therefore used. Results were pooled according to depression criteria and threshold using quality-effects models. This method redistributes weights by accounting for the above quality parameter and can be considered as an extension of the random-effects model. Sub-group meta-analyses by diagnostic category were also performed. All estimates were also presented using random-effects models (DerSimonian-Laird). Heterogeneity of meta-analysis estimates were presented using the I2 statistic. Funnel plot and the Doi plot/LFK index [13] were used to assess risk of publication bias.

Commonly used thresholds for each screening tool were used to categorise severity of depression. For the Hospital Anxiety and Depression Scale depression subscale (HADS) ≤ 7 was interpreted as no depression, 8 to 10 mild, 11 to 14 moderate and ≥ 15 severe [14]. For the Zung Self-Rating Depression Scale (SDS) ≤ 49 was interpreted as no depression, 50 to 69 depression, ≥ 70 severe depression [15]. The Patient Health Questionnaire (PHQ-9) threshold for depression has been suggested to be between 8 and 11, depending on context [16]. Studies that used similar thresholds were grouped together. If a study reported depression prevalence using two different thresholds, it could contribute to more than one pooled estimate. Sensitivity analyses were performed: studies reporting outlying prevalence estimates were critically reviewed and, if appropriate, excluded to evaluate changes in prevalence and heterogeneity. The association between study characteristics and reported depression prevalence was assessed using Spearman’s rank correlation (rs).

Differences in markers of disease activity and functional impairment were compared between groups with and without depression. Weighted mean difference (WMD) was calculated for each marker of disease activity and functional impairment, with pooled estimates shown in forest plots. Analyses were performed using MetaXL version 5.3 (Sunrise Beach, Australia; http://www.epigear.com/).

Results

A total of 769 full-text publications and 40 conference abstracts were found from the literature search. These titles and abstracts were screened for eligibility. After excluding duplicates and irrelevant studies, 70 studies were assessed for full-text eligibility, from which 15 full-text articles [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31] and one abstract [32] were included in the analysis. Among the excluded studies, two [33, 34] used the same cohort as the study by Healey et al. [20] and two [35, 36] reported the same cohort as Hakkou et al. [26]. A summary of the selection process is shown in the flowchart (Additional file 1: Figure S1).

Of the 16 included studies, 14 recruited participants with AS (including three axSpA studies which reported on AS participants separately) and two reported on nr-axSpA cohorts (including one axSpA study which reported on nr-axSpA participants separately). Sample size ranged from 60 to 1504. A total of 4753 axSpA patients were included across all studies, including separately described groups of 2857 AS and 334 nr-axSpA patients. Twelve studies were cross-sectional in design, three were longitudinal, and one was a randomised controlled trial (RCT). Four studies were from Turkey, five from China (including one from Hong Kong), three from the UK, and one each from Greece, Morocco and Spain (Canary Islands). The RCT recruited from Europe, Asia and South America.

Three screening criteria and one diagnostic criterion were used for identifying depression. Nine studies used HADS with three different thresholds, five used SDS with three different thresholds, and one used PHQ-9. For the purposes of meta-analyses, HADS ≥ 7/8 were grouped together, and SDS ≥ 50/51/53 were grouped together. Only one study used diagnostic criteria for depression, the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders (SCID).

Given the strict inclusion criteria employed for this meta-analysis, most studies had high quality scores with one scoring 6, three scoring 8, nine scoring 9 and three scoring ≥ 10 on the Health States Quality Index (Additional file 1: Table S2).

Prevalence of depression

The prevalence of depression ranged from 11 to 64%, depending on criteria and threshold used to identifying disease. Table 1 summarises the study characteristics, depression prevalence and quality score. Funnel/Doi plots and the LFK index suggested no evidence of publication bias (Additional file 1: Figure S2).

Table 1 Summary of study characteristics, prevalence of depression and quality of studies included in this meta-analysis

Figure 1 shows a forest plot of prevalence estimates using quality-effects models, grouped by criteria and threshold. Pooled prevalence of mild depression (HADS ≥ 7/8) was 38% (95% CI 30 to 45%, I2 = 85%). Pooled prevalence of at least moderate depression, using HADS ≥ 11, was 15% (95% CI 6 to 25%, I2 = 89%) and using SDS was 52% (95% CI 29 to 75%, I2 = 96%). The study by Hyphantis et al. [19] reported 15% depression using PHQ9 ≥ 10. The study by Chan et al. [24] reported a prevalence of 11% for major depressive disorder using the SCID reported. Pooled prevalence using both quality- and random-effects meta-analysis are shown in Table 2.

Fig. 1
figure 1

Pooled prevalence of depression in axSpA cohorts, grouped by criteria and threshold

Table 2 Summaries of pooled depression prevalence grouped by screening criteria and thresholds used

Two studies reported disproportionately high prevalence of depression. Hakkou et al. [26] attributed this to the cohort’s low socioeconomic status. Excluding this study improved the HADS subgroup heterogeneity without altering the pooled estimates significantly (Table 2). The Chinese study by Jiang et al. [31] reported the highest depression prevalence (64%). This cohort had the lowest mean age (27 years) and reported a low participation rate, with only 25% (683/2772) of the total cohort completing the required assessments. Excluding this study reduced both heterogeneity and prevalence estimate (52 to 36%) for the SDS group.

Prevalence of depression was inversely associated with age (rs = − 0.71, P = 0.003) but not with study size, BASDAI, year of publication or proportion of males (data not shown).

Comparing axSpA, AS and nr-axSpA cohorts

Fourteen of 16 included studies reported the prevalence of depression for AS cohorts. The pooled prevalences for AS cohorts are shown in Table 2. Again, the studies by Hakkou et al. and Jiang et al. reported high prevalence of depression (Additional file 1: Figure S3) and were excluded in sensitivity analyses.

The studies by Chan et al. [24], Zou et al. [28] and Kilic et al. [18] reported similar prevalence of depression between axSpA and their AS subgroups (Table 1). Kilic et al. also reported similar prevalence between AS and nr-axSpA subgroups (45.4 vs 42.3%, P = 0.58).

Two studies reported depression prevalence for nr-axSpA cohorts. Pooled prevalence of mild depression (HADS ≥ 7/8) for nr-axSpA was similar to that of AS cohorts (Table 2).

Comparing markers of disease severity between groups with and without depression

Eight studies compared markers of disease severity between groups with and without depression (Additional file 1: Table S3). For BASDAI, spVAS and BASFI, most studies reported significantly higher scores in the group with depression compared to those without, regardless of criteria or threshold used to define depression (Fig. 2). Across the depressed groups, scores were generally worse for the Bath AS metrology index (BASMI), AS disease activity score (ASDAS), CRP and ESR, but few individual comparisons were statistically significant.

Fig. 2
figure 2

Measures of disease activity and functional impairment are worse in axial spondyloarthritis patients with comorbid depression. Effect sizes shown as weighted mean difference (WMD)

All eight studies reported significantly worse BASDAI in the group with depression. Despite the variety of criteria and thresholds used, the weighted mean differences (WMDs) were similar. Pooling WMDs, BASDAI was 1.4 units (95% CI 1.0 to 1.9) higher in the depressed group. Of the six studies that reported spVAS, the groups with depression scored 1.2 units (95% CI 0.7 to 1.7) higher. Only two studies reported ASDAS, with a pooled WMD of 0.5 units (95% CI 0.3 to 0.7) between the two groups. ESR (3.5 mm/h, 95% CI 0.6 to 6.4 mm/h) was significantly higher in groups with depression, but not CRP (1.3 mg/dl, 95% CI − 0.9 to 3.4).

All studies reported significantly worse BASFI in the group with depression. The pooled difference in BASFI was 1.2 units (95% CI 0.6 to 1.8) but with more variation among the studies. The group with depression in the study by Hakkou et al. had much poorer function (BASFI, BASMI) compared to depressed groups of other studies, despite using a threshold for ‘mild’ depression. In contrast to other Bath indices, not all studies reported a difference in BASMI when comparing groups with and without depression. Nevertheless, the pooled estimate showed that axSpA patients with comorbid depression had significantly higher BASMI than those without (0.6 units, 95% CI 0.3 to 0.8).

Discussion

This systematic review and meta-analysis of nearly five thousand axSpA patients has shown that depression is highly prevalent and associated with greater disease activity and functional impairment. At least moderate depression was found in 15% of patients, although estimates varied depending on the criteria and thresholds selected; pooled estimates of prevalence of depression in axSpA were 38 to 52% using lower HADS thresholds and the SDS.

The strength of this meta-analysis lies in its strict inclusion criteria, as prevalence estimates can vary significantly depending on variable sampling methods and definition for depression. The included studies had high quality overall and low risk of publication bias. However, overall heterogeneity of prevalence estimates remained high, largely due to the wide range of screening or diagnostic tools used to define depression. Heterogeneity was improved when estimates were pooled by depression definition, and further reduced in sensitivity analyses excluding the studies by Hakkou et al. and Jiang et al. These two studies highlight the importance of age, which was inversely associated with depression prevalence, and socioeconomic status, which future studies could approximate by grouping by continent or gross domestic product.

The main limitation of the meta-analysis of prevalence was that almost all studies used screening criteria to detect depression. This may lead to over-estimation of the prevalence of true depression. Using diagnostic criteria, Chan et al. found one of the lowest prevalences of depression. However, they reported a sensitivity and specificity of 82 and 79%, respectively, for HADS ≥ 8 in their axSpA cohort, using SCID as the gold standard [24]. Almost all studies described hospital cohorts which are likely to have more severe disease than a random or primary care sample. However, disease activity was not found to be associated with prevalence estimates; although the RCT cohort had much higher disease activity than observational cohorts, it reported similar prevalence of depression. The higher disease activity and functional impairment found in patients with depression were unadjusted for confounders such as smoking and deprivation. It is possible that adjusted effect sizes could be smaller, although data from our own axSpA cohort [10] found that doing so made little difference to the effect size of depression. Lastly, the direction of the causal relationship between depression and axSpA disease severity could only be speculated from these studies of cross-sectional association.

The prevalence of depression in axSpA was higher than that of the general population [37], but very similar to those reported in rheumatoid arthritis (RA) cohorts. A meta-analysis of RA patients reported depression in 15% using HADS ≥ 11 and 34% using HADS ≥ 8 [38]. This similarity was unexpected since the RA cohorts were mostly female. Given that depression is more prevalent in females in the general population [8], it is possible that more male axSpA patients were reporting depression than male RA patients. However, our finding that the prevalence of depression was similar between AS and nr-axSpA groups suggest that gender may not be as significant a risk factor for depression in axSpA.

Patients with depression had significantly worse disease activity and functional impairment across most indices. As an example, the pooled difference in BASDAI was 1.4 units, which is clinically relevant as ≥ 2-unit improvement is considered as response to TNF inhibitors (TNFi) [39]. The direction of this potential causal relationship cannot be inferred from studies included in this meta-analysis. Patients with more severe disease may be at higher risk of developing comorbid depression. Conversely, depression has been shown to exacerbate the perception of pain [40] and may cause axSpA patients with depression to report greater disease severity. This may explain why the differences in subjective indices (BASDAI, spVAS, BASFI) were larger than differences in objective measures (BASMI, ASDAS). It is interesting that the difference in BASDAI between groups with and without depression were similar regardless of severity of depression; the presence of any level of depression may adversely influence response to the BASDAI questionnaire.

In healthcare systems where TNFi can only be continued on the basis of demonstrable response, such as in the UK, efforts should be made to identify and manage comorbid depression to avoid withdrawing efficacious therapy inappropriately. More objective measures of disease activity, such as ASDAS, may be more robust to the influence of depression. Longitudinal axSpA studies are needed to evaluate the impact of depression on treatment outcomes; we have previously demonstrated, in a longitudinal RA cohort, that depression at baseline adversely affects treatment outcomes [41]. Equally, inflammation has been implicated in the pathophysiology of depression and may contribute to non-responsiveness to antidepressant therapies [9]. Longitudinal studies could also shed light on whether TNFi improves severity of depressive symptoms.

Conclusions

Depression is common among axSpA patients and is associated with worse disease activity and functional impairment. Clinicians should be mindful of comorbid depression when managing axSpA patients, especially for younger patients and those with severe disease activity and functional impairment. Patients with depression should be appropriately referred and managed. This is especially pertinent if depressive symptoms are thought to adversely influence assessments of treatment response.