Background

Anxiety and depression symptoms are common among cardiac patients with prevalence rates of up to 30 and 20%, respectively, at hospital discharge and up to three months after hospitalization. This reflects the possible severity of the physical illness on other aspects of health [1, 2]. Previous studies have shown that anxiety and depression symptoms can predict future morbidity and mortality among cardiac patients [3, 4] underlining the importance of identifying these symptoms in order to initiate interventions to reduce them. A prerequisite for this is having a valid instrument to identify the symptoms.

The Hospital Anxiety and Depression Scale (HADS) was developed for patients with somatic illness admitted to the hospital [5] and is often used as a self-rating scale to screen for anxiety and depression symptoms across a wide range of patient and general populations. The scale includes two subscales, HADS-A and HADS-D measuring anxiety and depression symptoms, respectively. The scale is focused on the psychic symptoms of mood disorders, leaving out physical symptoms that can be confused with physical illness [5]. This is an advantage in cardiac populations where symptoms such as palpitations or dizziness might be related to the underlying cardiac disease and not a potential mood disorder.

HADS has been extensively tested for validity and reliability in English and other language versions, with satisfactory results across different patient populations, e.g. cardiac disease, cancer, psychological illness and in general populations [6,7,8]. Looking at previous validation studies of HADS in cardiac populations, however, there are differing results regarding the factor structure of the scale, Table 1. The originally proposed two-factor structure is confirmed in six studies [9,10,11,12,13,14], but eight studies find different versions of a three-factor structure to have the best fit depending on the analytic method used [12, 13, 15,16,17,18,19,20]. By contrast, one study finds a one-factor structure to have the best fit [21].

Table 1 Previous validations of HADS in patients with cardiac disease

Differential item functioning (DIF) is a form of measurement error at item level by which patients from different groups with the same level of a construct being measured do not have the same scores. The presence of DIF by gender has been examined for HADS, but the results are not consistent [22,23,24].

HADS has been translated into Danish and is frequently used in clinical research but the psychometric properties of the Danish version have not been evaluated. Even though the scale has been found to be valid and reliable in previous studies, this is no assurance of equivalent validity when used in a different language, culture or context. Therefore, the aim of the current study was to evaluate the psychometric properties of the Danish HADS in a large population of patients with the most common cardiac diagnoses: ischemic heart disease, arrhythmias, heart failure and heart valve diseases.

Methods

Data collection and sample

Data was collected as part of the DenHeart study. The design and methods have been described in the pre-published protocol [25]. The DenHeart study was designed as a national cross-sectional survey combined with data from national registers at baseline and one year follow-up. Over a period of one year (April 2013–April 2014) all patients discharged or transferred from one of five national heart centers were asked to fill out a questionnaire at hospital discharge. Excluded were patients under the age of 18, patients without a Danish civil registration number, patients who did not understand Danish and patients who were unconscious when transferred from a heart center.

Based on their discharge diagnosis from the Danish National Patient Register [26], patients were divided into diagnostic sub-groups [2]. Included in the current analyses are patients with ischemic heart disease, arrhythmias, heart failure and heart valve diseases.

Furthermore, co-morbidity characteristics were collected from the Danish National Patient Register [26]. The Tu co-morbidity index was calculated including congestive heart failure, cardiogenic shock, arrhythmia, pulmonary oedema, malignancy, diabetes, cerebrovascular disease, acute/chronic renal failure and chronic obstructive pulmonary disease – all calculated ten years back [27].

Information on demographic characteristics were collected from the Civil Registration System [28] and the Danish Education Register [29].

The HADS questionnaire

The HADS is a 14 item questionnaire originally developed to measure anxiety and depression symptoms in patients with somatic disease [5]. The instrument offers two subscales, HADS-A and HADS-D, each consisting of seven items and measuring anxiety and depression symptoms, respectively. HADS-A is focused on symptoms relating to generalized anxiety and HADS-D on symptoms relating to anhedonia, a central aspect of depression [30]. Each item is scored on a scale of 0–3 with each subscale score ranging from 0 to 21. Eight items are reverse scored with higher scores indicating a better response. These are reversed when summing the two subscales. The recommended cut-off values are 8–10 for possible presence of a mood disorder and ≥ 11 for probable presence of a mood disorder [5]. It has previously been found that among cardiac patients the minimal clinically important difference on the HADS is 1.7 points [31].

The Danish version of HADS has been frequently used for research purposes, both in observational studies and randomized controlled trials, as well as for screening purposes in clinical practice [2, 3, 32,33,34,35,36].

The translation of the HADS from English into Danish was evaluated by five independent assessors who were fluent in both English and Danish. For each item the equivalence of the translation was evaluated on a scale from 1 to 4, with higher numbers indicating stronger equivalence. The Translation Validity Index (TVI) was calculated as the proportion of assessments rated positively with score of 3 or 4 [37].

Other instruments

The Short-Form 12 health survey (SF-12) is a brief, generic measure of health-related quality of life that generates both a physical (PCS) and a mental component score (MCS). Higher scores indicate better health status [16]. The SF-12 has been validated in a population of patients with coronary heart disease from 22 European countries with satisfactory results for construct validity and a Cronbach’s alpha of 0.87 for PCS and 0.84 for MCS, respectively, indicating high internal consistency reliability [10]. HeartQoL is a disease-specific questionnaire that measures quality of life in cardiac patients and produces a global score and two subscales: a physical and an emotional scale ranging from 0 to 3 with higher scores indicating better quality of life status [18,19,20]. The instrument has been validated in a large sample of coronary patients with results confirming both discriminative and convergent validity and high reliability with a Cronbach’s alpha of 0.87 for the emotional subscale and 0.91 for the physical one [38].

Furthermore, two single items on anxiety and depression allowed patients to rate anxiety and depression on a 10-point Likert scale.

Psychometric properties of HADS

The following psychometric properties of the HADS were evaluated.

Floor and ceiling effects occur if more than 15% of the patients select the lowest or highest possible score on an item. Floor and ceiling effects can be an indication that extreme items are missing in either end of the scale, which can possibly limit its validity [39, 40].

Construct validity is defined as the degree to which an instrument measures what it is intended to measure. It is evaluated by testing hypotheses about an instrument – for example, relationships between parts of an instrument, relationships with scores of other instruments or differences between relevant groups [41]. An aspect of construct validity is structural validity, which is the degree to which the sub-scale scores of an instrument are an adequate reflection of the dimensions of the construct to be measured [41]. Structural validity was evaluated using exploratory factor analysis (EFA) and confirmatory factor analyses (CFA). CFA was conducted for the original two-factor structure suggested by Zigmond and Snaith [5], and also for four three-factor models [15, 42,43,44] and one one-factor model [21] found in previous studies including cardiac patients.

Construct validity was also examined through hypotheses testing by looking at HADS scores in relation to the MCS on SF-12, the emotional subscale of HeartQoL and a single item on anxiety and a single item on depression (convergent construct validity), and in relation to the PCS and physical subscale of HeartQoL (divergent construct validity).

We hypothesized high correlations (r > 0.60) between both HADS-A and HADS-D and the MCS score and the HeartQoL emotional score and high correlations between HADS-A and a single item measuring anxiety, and between HADS-D and a single item measuring depression. Furthermore, we hypothesized low correlations (r < 0.30) between HADS-A and HADS-D and PCS and HeartQoL physical as these measures were not supposed to be related to the HADS subscales.

Internal consistency reliability is an indicator of the extent to which the items of an instrument are internally correlated and therefore measure the same construct. This can be evaluated by calculating Cronbach’s alpha. A Cronbach’s alpha of between 0.70 and 0.95 is an indication of good internal consistency [40].

DIF is a form of measurement invariance at item level. DIF means that there are items for which patients from different groups with the same level of the construct being measured do not have the same scores. This can indicate that the item measures different things in the different groups. DIF can be uniform or non-uniform depending on whether the differences are present for all values of the scale or just for some values of the scale [45].

Data analyses

Demographic and clinical characteristics are presented as frequencies or means with standard deviations (SD). Item score distributions are presented as means with SD, frequencies for each response category and missing data. Histograms and the Kolmogorov-Smirnov test were used to determine whether item scores deviated from the normal distribution.

Exploratory factor analysis was conducted using principal axis extraction based on eigenvalues greater than 1. Oblimin rotation was applied with a cut-off point of 0.30 as designating loading on a factor.

Confirmatory analyses were conducted with the weighted least squared means and variance (WLSMV) estimator. A Root Mean Square Error of Approximation (RMSEA) estimate below 0.06 along with Comparative Fit Index (CFI) and Tucker Lewis Index (TLI) estimates above 0.95 indicated a good model fit [46].

Both the EFA and the CFA were conducted on the total population. Extensive previous literature exists that provide suggestions for models to be tested in the CFA.

Spearman’s rank-order correlations were used to determine convergent and divergent validity as data were not normally distributed. Convergent validity between HADS, SF-12 and HeartQoL subscales was examined by stratifying mean scores of MCS, PCS, and HeartQoL emotional and HeartQoL physical by HADS-A and HADS-D scores above and below 8.

Internal consistency was evaluated by calculating Cronbach’s alpha for subscales and also by corrected item-total correlations.

DIF was examined using multivariate ordinal logistic regression with items as the dependent variable and gender and total score (HADS-A or HADS-D depending on the item) as the independent variables. Because the proportional odds assumption was not fulfilled a partial proportional odds model was used. DIF was evaluated by different criteria. Uniform DIF can be considered if the odds ratio (OR) for gender is statistically significantly different from 1 [45]. Interactions between gender and total score were included to evaluate possible non-uniform DIF. A statistically significant interaction can be an indication of non-uniform DIF [45]. Because of the large sample size and the risk of finding statistically significant results with no or very little clinical meaning, DIF was also evaluated by Nagelkerke’s R.2 A difference in R2 of more than 0.03 between models was an indication of noticeable DIF (both uniform and non-uniform) [45].

Only patients with complete responses to the HADS were included in the analyses.

Analyses were conducted using SAS version 9.4, IBM SPSS version 25 and Mplus version 7.4.

Results

Demographic and clinical profile

Out of 25,241 eligible patients, 12,806 had complete responses to the HADS questionnaire giving a response rate of 51%. Demographic and clinical characteristics are presented in Table 2.

Table 2 Demographic and clinical characteristics

Item score statistics and translation validity index

The item score statistics are presented in Table 3. Item 8 showed markedly different scores compared to the rest of the items, with more patients using high response categories, Table 3. There were floor effects on all items and a ceiling effect on item 8, Table 3.

Table 3 Item and score statistics

Of the 14 items, 12 had an TVI of 100%, and two (items 3 and 11) had TVI of 60% (both of these were a part of HADS-A. The TVI for the total scale was 94%, Additional file 1: Table S1.

Factor structure

The results from the EFA indicate that the original two-factor structure of the HADS seems to fit in this cardiac population. However, item 7 showed almost the same loading on each subscale, Table 4. The correlation between HADS-A and HADS-D was 0.66.

Table 4 Exploratory factor analysis - rotated factor matrixa

The CFA indicated that the three-factor structure suggested by Friedman et al. [44] showed the best fit for the models tested, Table 5. The diagram from the CFA of the three-factor structure suggested by Friedman et al. [44] is presented in Fig. 1.

Table 5 Fit indices for confirmatory factor analyses of factor structures proposed in previous studies
Fig. 1
figure 1

Diagram from the confirmatory factor analysis presenting the model with the best fit. Standardized loadings (SE). PAn = psychic anxiety; Dep = depression; PAg = psychomotor agitation

Convergent and divergent validity

Looking at MCS, PCS, HeartQoL emotional and HeartQoL physical scores in relation to HADS scores, patients with scores below 8 on both HADS-A or HADS-D had high scores on MCS and HeartQoL emotional. Conversely, patients with HADS-A and HADS-D scores above 8 have the lowest scores. The same pattern is found in PCS and HeartQoL physical scores, Table 6.

Table 6 HADS scores in relation to SF-12 and HeartQoL scores

Correlations between HADS-A and MCS and HeartQoL emotional were 0.67 and 0.75, respectively. Correlations between HADS-D and MCS and HeartQoL emotional were 0.66 and 0.63, respectively. The correlation between HADS-A and the single item on anxiety was 0.68 and between HADS-D and the single item on depression it was 0.59. This confirmed the stated hypotheses about convergent validity. However, the two single items were highly correlated (0.76).

Correlations between HADS-A and PCS and HeartQoL physical were 0.25 and 0.35, respectively. Correlations between HADS-D and PCS and HeartQoL physical were 0.50 and 0.55, respectively. This did not confirm the hypotheses on divergent validity for HADS-D.

Internal consistency

For HADS-A mean inter-item correlation was 0.50 (range 0.35–0.61) and Cronbach’s alpha was 0.87. The corrected item-total correlations ranged from 0.52 to 0.71. Cronbach’s alpha would not be improved by the deletion of any item.

For HADS-D mean inter-item correlation was 0.41 (range 0.24–0.58). Cronbach’s alpha was 0.82. The corrected item-total correlations ranged from 0.44 to 0.67. Cronbach’s alpha would not be improved by the deletion of any item.

For all HADS items the mean inter-item correlation was 0.40 (range 0.24–0.61).

Looking at the three-factor structure, the Cronbach’s alpha for the psychomotor agitation subscale was 0.74 and 0.83 for the psychic anxiety subscale. The HADS-D subscale was unchanged with a Cronbach’s alpha of 0.82. Cronbach’s alpha would not be improved by the deletion of any item.

Differential item functioning

There were indications of DIF for item 3, 4 and 13 where women were more likely to have high item scores compared to men and for items 11 and 14 where men were more likely to have high item scores compared to women. There were significant interactions between item and subscale for items 1, 2, 5, 7, 8, 9 and 12, which is an indication of non-uniform DIF. However, in analysis using Nagelkerke’s R2 there was no noticeable DIF for any item, Table 7.

Table 7 Differential item functioning tested for gender

Discussion

In the present study the psychometric properties of the HADS in a large sample of Danish cardiac patients were evaluated. Floor effects were found on all items and ceiling effect on item 8. The original two-factor structure of the scale was confirmed in EFA, but CFA indicated a three-factor structure. The hypotheses proposed were supported for both subscales, providing evidence for convergent validity. However, for HADS-D the hypotheses proposed for divergent validity were not supported. Thus, divergent validity is not indicated. Internal consistency was good for both HADS-A and HADS-D.

The factor analyses indicate that the factor structure of the HADS is not completely clear. The EFA confirmed the original two-factor structure suggested by Zigmond and Snaith [5], but the CFA showed that the three-factor structure as found by Friedman et al. [44] in a French sample of patients suffering from major depression had the best model fit. The same result was found by Barth and Martin in a German coronary heart disease population [13]. Several other studies have found variations of a three-factor structure to have the best model fit for the HADS as indicated in Table 5. The differences in factor structure found across studies might be explained by different methodology such as data extraction method, model fit criteria, translation or type of patients included.

When considering the content of the three factors suggested by Friedman et al. [44]; psychomotor agitation (item 1, 7, 11), psychic anxiety (item 3, 5, 9, 13) and depression (item 2, 4, 6, 8, 10, 12, 14), the division of items from the original HADS-A into two factors can make sense as relating to two different dimensions of anxiety disorder. The items in the psychomotor agitation subscale relate to physical feelings of restlessness and agitation while the items in the psychic anxiety subscale relate to emotional representation of anxiety with worrying and nervous thoughts. Agitation is, however, also a common symptom among patients with depressive disorders and can occur as a side effect of antidepressant medication [47].

The interrelatedness between symptoms of anxiety and depression is further evident in the high correlations between HADS-A and HADS-D. This did not change when looking at the three-factor structure instead. It has previously been argued that a high correlation between anxiety and depression is to be expected, not because of common symptoms but because it is possible that anxiety can lead to depression and that depression can lead to anxiety. It is also possible that the two disorders result from a common cause. The causality of this relationship cannot, however, be determined from cross-sectional data [48].

In the EFA item 7 was found to load almost equally on both factors. This has been found in previous studies as well [13]. Item 7 reads ‘I can sit at ease and feel relaxed’; this may reflect aspects of both anxiety and depression.

Eight items in the HADS are reversely scored. This is a recommended method to avoid acquiescence bias which is the tendency for respondents of a survey to agree with statements regardless of their content. However, research suggests that individual differences in response styles can systematically affect the factor structure [49]. The uncertainty of the factor structure of the HADS is not necessarily a reason to discard the instrument, but rather to be clear on the purpose of using the scale. The two-factor structure may prove useful as a simple indication of either anxiety or depression. The possible presence of a third factor indicates that the scale may provide more refined results regarding different aspects of anxiety, rather than just an indication of generalized anxiety. Because the results regarding factor structure were not clear, the two-factor structure originally proposed was used in the remaining analyses for the paper.

There were floor effects on all items, which may indicate that the number of extreme response categories is not sufficient. As the HADS was developed to detect indications of a mood disorder, which is not present in the majority of the population, even a population with severe illness, it is not surprising that there are floor effects. Item 8 also showed a ceiling effect. The item reads ‘I feel as if I am slowed down’. In a population of elderly, severely ill patients just discharged, it is not surprising that this feeling would be prevalent. This item is susceptible to influence from either age or disease which is a bias in terms of validity as an indicator of mood.

The analyses of DIF indicated that there could be potential problems with DIF for several items. However, because of the risk of finding statistically significant results of minimal clinical importance in this large population, changes in Nagelkerke’s R2 between models were given priority. These indicated no noticeable DIF for any items. The presence of DIF for gender has been explored in previous studies [22,23,24, 50], but only one study found substantial DIF for item 14, with men being more likely to endorse this item [22].

When considering the usefulness of the HADS in clinical practice it should also be noted that HADS has been shown to predict morbidity and mortality in this patient population and similar patient populations [3, 4, 51].

Limitations of the study

There is no description of the process of how the HADS was translated into Danish from the questionnaire owner, so it is not clear whether the translation has followed the recommended steps to ensure cross-cultural validity [45]. The current analyses are, in fact, the first specific investigation of the psychometric properties of the Danish language version of HADS. For the current study, we evaluated the TVI for each item and the total scale with satisfactory results. Items 3 and 11 (both in HADS-A) received the lowest rating (60%).

Newer methods for exploring internal consistency exist, e.g. the use of McDonalds omega. However, for consistency with the methods chosen throughout this paper and for comparison with other HADS validation studies we chose to include Cronbach’s alpha.

The large sample size in this study is an advantage because of statistical power and because it allows a heterogeneous sample. There is, however, a risk of finding statistically significant results of minimal clinical importance. Therefore, we have not only looked at p-values to determine validity, but rather measures of strength of correlation, internal consistency and Nagelkerke’s R2 for analyses of DIF.

The response rate was 51%, which is to be expected in a population of severely ill patients on the day of hospital discharge. This may raise concerns about representativeness, however, the proportions of patients in the diagnostic sub-groups were similar to that of the entire eligible population, and responders and non-responders were comparable in terms of their demographic and clinical profiles, suggesting a representative sample [2]. We did, however, find a higher mortality rate in non-responders compared to responders [4].

In the present study we used a single question on anxiety and depression to measure convergent validity. However, the two questions were highly correlated. Including more comprehensive instruments to measure anxiety and depression would have been optimal to examine convergent validity. These were, however, not available in the data.

Conclusions

The findings of this study supported the validity and reliability of the HADS in a sample of Danish patients with cardiac disease. EFA supported the original two-factor structure of the scale, while CFA supported a three-factor structure consisting of the original depression subscale and two anxiety subscales; psychomotor agitation and psychic anxiety. The hypotheses regarding convergent validity were confirmed, but those regarding divergent validity were not confirmed for HADS-D. Internal consistency was good with a Cronbach’s alpha of 0.87 for HADS-A and 0.82 for HADS-D. There were no indications of noticeable DIF by gender for any items.