Introduction

Major depressive disorder (MDD) is common in primary care patients [1] with a lifetime prevalence rate in the French population of 10–25% in women and 5–12% in men [2]. Depression is associated with marked decreases in functioning, well being and health-related quality of life (HRQL) [3, 4], and an increases in disability days [5], use of health services and overall societal costs [6]. Antidepressant treatments are effective in reducing depression severity [7, 8] and in increasing patient functioning and HRQL [9, 10].

The Washington Panel on Cost-Effectiveness in Health and Medicine recommended the use of HRQL in the evaluation of health care interventions [11]. For this purpose, HRQL measurement needs to express patient health status on a scale where perfect health and death are valued 1 and 0 respectively. When such quality of life data are combined with corresponding data on the quantity of life, then the consequences of treatment are measured in units of Quality-Adjusted Life Years (QALYs) [12]. Where QALYs are calculated for social decision making purposes then the HRQL measures used to make the quality adjustment should be based on the preferences of the population as a whole. Such social preferences are only available for a limited number of HRQL measures and for a limited number of countries. EQ-5D is one such measure that has been calibrated in this way.

Studies of physical illnesses have suggested that patient's values for their own health state affect decisions concerning treatment and its outcomes [1315]. Several studies focused on establishing utility scores for a variety of health states in various mental illnesses, including schizophrenia [16], depression in primary care [17], temporary states of depression [18, 19] and treatment-related side effects [20, 21]. Health states in depression have been characterised by the presence or absence of symptoms, and depressed patients are usually categorised as responders and remitters using classical rating scales [22]. As responders sometimes present residual depressive symptoms, we classify patients as "Responder remitters", "Responders non-remitter" and "Non-responders".

The objectives of this paper are to describe the impact on HRQL of patients with MDD treated in a primary care setting, and to examine variations in terms of patients' demographic and clinical characteristics.

Material and methods

Design and patient sample

This national, multicentre, prospective, non-comparative cohort study was designed so as to reproduce the guidelines for management of depression in primary care. The scheduled follow-up period was two months, with assessments at baseline (D0), four weeks later (D28) and eight weeks later (D56).

The patients included in this study were recruited from an outpatient population, aged 18 and older, who consulted general practitioners for a new episode of MDD according to the DSM-IV [23]), and who were not treated with any antidepressant before inclusion. Patients whose symptomatology suggested schizophrenia or other psychotic symptoms, according to DSM-IV, were not included in this study. According to their experience and daily practice, general practitioners initiated an antidepressant treatment at baseline.

Data collection

Patients' characteristics

Patient profiles were created at baseline by recording age, gender, lifestyle, place of residence, socio-professional category and current professional status.

Clinical measures

Physicians assessed the severity of depressive symptoms using the Montgomery-Asberg Depression Rating Scale (MADRS) [24] and the Clinical Global Impression of Severity (CGI-S) scale. The CGI was rated by physicians on a seven-point Likert scale ranging from 1 = "Normal, not ill at all" to 7 = "Among the most ill patients".

Qualitative outcomes derived from rating scales, like response to treatment or remission, are usually used in both clinical trials and economic evaluations of new antidepressant agents [22]. Using MADRS scores at D56, patients were classified into two groups: those that had scores lower or equal to 12 were considered as "Remitters", the others were considered as "Non-remitters". Patients who had a decrease of at least 50% in relation to baseline score were considered as "Responder", whereas the others were "Non-responders". These two patients groupings led to the creation of three mutually exclusive groups: "Responder remitters", "Responders non-remitter" and "Non-responders".

Patient Reported Outcomes

The outcome measures used in this study were the 36-item Short-Form Health Survey (SF-36), the Quality of Life in Depression Scale (QLDS) and the EQ-5D.

The SF-36 is a generic HRQL measure consisting of eight dimensions assessing physical functioning (PF), role limitations due to physical problems (RP), bodily pain (BP), general health (GH), vitality (VT), mental health (MH), role limitations due to emotional problems (RE) and social functioning (SF) [25]. Two summary scores also assess both physical (PCS) and mental (MCS) facets [26]. All scale scores range from 0 (the worst HRQL) to 100 (the best HRQL).

The QLDS is a 34-item depression-specific HRQL instrument that assesses the ability and capacity of individuals to satisfy their daily needs [27, 28]. Each item is answered by Yes or No. An overall HRQL score is obtained by summing the 34 items. The results range from 0 (the highest HRQL) to 34 (the lowest HRQL).

EQ-5D is a generic measure of HRQL in which health status is defined in terms of 5 dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression [29]. Each dimension has three qualifying levels of response roughly corresponding to 'no problems', 'some difficulties/problems', and 'extreme difficulties'. EQ-5D defines a total of 243 unique health states. The importance of each of these states can be determined in a number of different ways. For the purpose of cost-utility analysis and other situations where the consequences of treatment are measured in terms of QALYs, these weights are typically established using utility measurement techniques such as Standard Gamble or Time Trade-Off (TTO) [12]. For the purposes of this present study, TTO weights elicited from a large national survey of the UK population were used [30]. Information collected using EQ-5D can be reported in terms of its individual dimensions and as a single index score (EQ-5DST).

Data analysis

Continuous variables were expressed by means and standard deviations, whereas categorical data were presented using frequency and percentage. The scales were scored using scoring algorithms described by the scale designers. Student's t-tests, ANOVA, Mann-Whitney, or Kruskal-Wallis tests were performed when appropriate to compare mean scores across subgroups. Regression analyses were used to examine the relationships between differences in the utility-weighted EQ-5DST and demographics, clinical response and HRQL measures. Several selection procedures (backward, stepwise) were tested in order to check the robustness of the model. The impact of each predictor was assessed with estimates and their 95% confidence interval. The data were analysed using the SAS software version 8.2. For all tests, the type I error was set to 0.05.

Results

Sample characteristics

Ninety-five physicians enrolled 250 patients between May and November 2002. Patient age ranged from 18 to 92 years, with a mean of 44.2 ± 14.1 years (mean ± standard deviation). The sex ratio (males/females) was 0.4. The mean MADRS score was 32.7 ± 7.7, ranging from 13 to 53. This high level of severity was also revealed by the CGI: about 85% of patients were rated "markedly ill" or more severely. The demographic and clinical characteristics of the sample are reported in Table 1.

Table 1 Patient sociodemographics and clinical characteristics

Among the 250 included patients, 24 were lost to follow-up (9.6%). Their sociodemographics and clinical characteristics were not significantly different from those of the 226 completers, so that all subsequent analyses were performed on the completers sub-sample.

Impact of MDD on HRQL

At baseline, the mean QLDS score was 20.8 ± 5.8, ranging from 5 to 31. The mean SF-36 dimension scores for the total sample were: PF 69.0 ± 24.5, RP 22.4 ± 30.7, BP 52.0 ± 23.5, GH 38.3 ± 17.3, VT 22.2 ± 13.1, MH 24.5 ± 12.1, RE 9.1 ± 21.3 and SF 30.2 ± 17.1. The mean SF-36 summary scores were PCS 43.6 ± 9.2 and MCS 21.1 ± 6.6. The mean EQ-5DST score was 0.33 ± 0.25, ranging between -0.59 and 0.85. It was noteworthy that 8% of the study population had an EQ-5DST score of worse than death, i.e. less than zero.

During follow-up, all the dimensions rated improved (Table 2). The mean EQ-5DST scores were 0.68 ± 0.24 (range: [-0.11; 1.00]) and 0.78 ± 0.21 (range: [-0.08; 1.00]) at weeks 4 and 8, respectively.

Table 2 EQ-5D dimension scores at baseline, D28 and D56

Comparison of EQ-5DST by demographic and clinical features

No significant differences were found in EQ-5DST by demographics characteristics (Table 3): men and women reported the same preference-based score at baseline (0.32 ± 0.22 vs. 0.32 ± 0.26, respectively) and their scores increased in a similar manner during follow-up. Younger patients reported higher utility scores than older patients at baseline, day 28 and day 56, although this pattern was not statistically significant.

Table 3 Differences on utility score by demographic and clinical characteristics

Significant differences in EQ-5DST were found by disease severity level assessed by CGI-S, with more severe patients having lower weighted index scores. At baseline, a mean difference of 0.12 was observed between "slightly/moderately ill" and "markedly ill" patients (p < 0.05), and 0.18 between "markedly ill" and "seriously ill" patients (p < 0.001). At the end of the follow-up, a mean difference of 0.12 was observed between patients with "first signs of illness" and "slightly/moderately ill" patients (p < 0.001). "Slightly/moderately ill" and "markedly ill" patients had EQ-5DST scores that differed 0.30 on average (p < 0.001). A mean difference of 0.14 between "markedly ill" patients and "seriously ill" patients was found (p < 0.05).

Comparison of EQ-5DST by clinical response

Clinical response, defined by MADRS scores, revealed statistically significant differences between mean EQ-5DST scores at baseline (p < 0.01), D28 (p < 0.001) and D56 (p < 0.001) (Table 4).

Table 4 Utility scores and clinical response during the study period

At baseline, an overall significant difference was found in comparing the three groups, with a mean difference of 0.14 observed between "Responder remitters" and "Responder non-remitters" (p < 0.01). During the study period, EQ-5DST scores increased in all groups of clinical response. At the end of the follow-up, a statistically significant mean difference of 0.14 was observed between "Responder remitters" and "Responders non-remitter" (p < 0.001). "Responders non-remitter" and "Non-responders" had EQ-5DST scores that significantly differed by 0.14 on average (p < 0.05).

Comparison of Patient Reported Outcomes

At each visit, EQ-5DST scores were compared with SF-36 dimension, SF-36 summary and QLDS scores (Table 5) by computing correlation coefficients. The correlation between EQ-5DST score and the Mental Health dimension of the SF-36 was the highest observed, whatever the assessment (DO: r = 0.49; D28: r = 0.56; D56: r = 0.63). At baseline, Pearson correlation coefficients were always greater than 0.30, except for the role-physical and role-emotional dimensions.

Table 5 Association between utility score and HRQL

The QLDS was significantly correlated with the EQ-5DST scores, ranging from -0.43 at baseline to -0.68 at the end of the follow-up period.

Multivariate analysis

An ordinary least-square regression analysis to predict EQ-5DST using demographic features, clinical and HRQL evolution only explained 40% of the variance in the weighted index scores. The statistically significant predictors in the regression model were differences in Physical Functioning, Bodily Pain, General Health and Mental Health (Table 6).

Table 6 Contributors of the difference in EQ-5DST during the study period

Discussion

This study evaluated the usefulness of EQ-5D in assessing health status of primary care patients with major depressive disorder.

The sampling of our study is representative of the primary care depressed population in France [2]. 8% of the patients rated their health state as worse than death. This result is not surprising given the relationship between depression and suicide [31, 32]. Despite different approaches to measuring health state utilities using standard gamble, time trade-off or rating scales, the findings of our study agree with those previously reported: the baseline mean utility of an untreated depression was 0.33, compared to 0.30 for Revicki [21] and 0.32 for Bennett [33]. Patient-rated EQ-5DST scores after the eight-week follow-up period was 0.78, which is comparable to utilities reported in other studies (0.79 [33]; 0.74 [21]; 0.76 [34]; 0.70 [35]). The main interest is that EQ-5DST values are easy to collect in large sample surveys due to the brevity of EQ-5D classification system with its 5 dimensions and 3 levels.

No differences in EQ-5DST utilities were observed by demographic characteristics, which is comparable to previous results in depressed patients [21, 34, 35]. More severely depressed patients reported utilities that were 0.30 points lower than less severely depressed patients at baseline. Several researchers have suggested that differences in utility greater than 0.05 are clinically important [12, 36]. These findings may reflect clinically important differences.

As demonstrated in previous studies [21, 37], we found that the EQ-5DST score and other HRQL measures shared only about 40% of variance. Utilities measure a patient's preference for their health state, while HRQL scales assess the patient's report of their functioning and well-being. Although these two concepts are related they are not identical [38], and measuring both may lead to a better understanding of reasons for non-compliance to treatment regimens.

There are several limitations that need to be considered when interpreting the results of this study. First, the study does not take into account the antidepressant prescribed or their side effects, which may influence patients' ratings [21, 39]. Second, the concomitant impact of depression and chronic medical conditions could not be examined in this sample. It is likely that the health state utilities of patients with depression, in addition to a chronic medical disease would be significantly reduced [17]. Lastly, a limitation of the analysis presented in this study relates to the source of the utility weights used to compute the EQ-5DST. Given that this was a national study conducted in France it may have been better to use social preference values based on the French population. Unfortunately, at the time of writing these values were not available for EQ-5DST. Weights were therefore adopted from a major UK study that provided the most robust technical estimates widely used in the evaluation of EQ-5DST in countries that lack their own national reference data.

Utility scores are needed for calculating QALYs, which are used as indicators of effectiveness or outcome in economic evaluations [35, 36, 40]. It is debatable whether or not patient or general population utilities should be used in cost-effectiveness studies [40]. Nevertheless, patients with experience in the disease may be the best providers of health state preference data. Cost-effectiveness studies are required to help clinicians and health care decision-makers in determining the impact of new antidepressants on both patient outcomes and medical or overall societal costs. Understanding patient preferences for depression outcomes is important for economic evaluations of new antidepressants, as well as for understanding patient behaviour and compliance to antidepressant regimens. Such a measure can be applied to cost-utility analyses either within clinical decision modelling studies or within prospective, randomised clinical trials and offers additional scope for the analysis and reporting of data derived from clinical trials of new compounds.