Introduction

The concept of health-related quality of life (HrQoL) includes the aspects of self-perceived well-being in relation to disease or treatment [1]. Currently, in economic evaluations of mental health care for patients with affective disorders in Germany, HrQoL is regularly assessed using generic patient reported-outcome measures (PROMs; e.g., [2,3,4]), with the EQ-5D [5, 6] and the Short Form health survey (SF-12) being most popular [7]. Generic PROMs have the advantage that they can be administered for a broad range of diseases and treatments, as they cover many aspects of self-perceived well-being [1].

Generic HrQoL measures, however, have the disadvantage that they are not sufficiently suitable to represent the recovery process of mental health conditions, such as affective disorders [8]. In cost-effectiveness analyses in Germany, it was regularly recognized that generic HrQoL measures may not be responsive to changes in symptom severity or self-efficacy (e.g., [2, 9, 10]). In the treatment of persons with mental health conditions, it is also increasingly recognized that these persons want to live a full life even when symptoms are present, and that this can contribute to the mental recovery process [11, 12]. Thus, to measure the recovery process, an instrument is needed that measures the impact on quality of life of persons with mental health conditions rather than symptoms [11, 13, 14]. For that reason, the Recovering Quality of Life (ReQoL) measures – ReQoL-10 and ReQoL-20 – have been developed to assess HrQoL for persons with mental health conditions and to capture the recovery process with respect to leading a meaningful life, even if symptoms of the mental health condition are present [11].

The ReQoL measures were developed on a broad theoretical basis and with the involvement of academics/psychometricians, policymakers, clinicians and patients [15]. The final 10/20 items of the measures were selected after factor analysis of data from more than 6500 patients and item response theory models employed to inform item selection. Recently, the ReQoL measures originally developed in English have been translated and linguistically validated in German using established methodology [16]. ReQoL measures are also available in other languages, such as Dutch [17], seven common Indian languages [18,19,20,21,22,23,24] and traditional Chinese [25].

The final ReQoL measures represent those components of recovery by the seven domains: activity, belonging and relationships, choice, control and autonomy, hope, self-perception, well-being and physical health [15, 26]. Furthermore, the items of the ReQoL measures are formulated negatively and positively and thereby are of decisive importance for measuring both improved and worsened HrQoL as a result of mental health problems [15, 26, 27]. Positively and negatively worded items are scored from zero to four and four to zero, respectively [15]. By summing up the scores of the items, an overall score for the ReQoL-10/ReQoL-20 can be calculated with zero representing the poorest HrQoL and 40/80 representing the highest HrQoL.

Furthermore, it is possible to calculate quality-adjusted life-years from the ReQoL measures for use in cost-utility analyses. For this purpose, preference weights from the general population in the United Kingdom were estimated and the ReQoL-Utility Index (UI) has been developed [28]. Unfortunately, preference weights from the general population in Germany are not available at this time.

So far, the psychometric properties of the ReQoL measures have been assessed in patients with anxiety and depression in the United Kingdom [29], in patients with psychosis in the Netherlands [17] and Singapore [30], in the general population in the United Kingdom [15] and Hong Kong [25] as well as in a convenience sample in the Netherlands [17]. For the patient populations, the ReQoL measures showed good internal consistency and better responsiveness and construct validity compared with the EQ-5D-5L in patients with depression, but not in patients with anxiety [29, 30]. Furthermore, the ReQoL-10 was reliable in a sample with patients with psychosis and the ReQoL measures showed good convergent and known-group validity [17]. In a first-episode psychosis population in Singapore, ReQoL-10 was found to have good internal consistency and adequate construct validity [30].

To the best of our knowledge, the psychometric properties of the ReQoL measures have not yet been assessed for patient populations in Germany. Therefore, the primary aim of this study was to assess the psychometric properties of the ReQoL-10 and ReQoL-20 for the assessment of patients with bipolar affective disorder, major depression and dysthymia in Germany. The secondary aim of this study was to assess the validity of the ReQoL-10 and ReQoL-20 by comparison with clinical measures and measures of HrQoL.

Materials and methods

Sample

Data used for this study were collected within a randomized controlled trial evaluating an evidence-based, stepped and coordinated care service model for mental disorders (RECOVER) [31]. Patients were recruited in the regular psychiatric care of the University Medical Center Hamburg-Eppendorf, Germany, from beginning of 2018 until the end of 2019. Patients were eligible for participation if they were at least at the age of 16 years and if they were diagnosed with at least one relevant mental disorder (among others, e.g. schizophrenic spectrum disorders, bipolar affective disorder, major depression, anxiety disorder or post-traumatic stress disorder) according to the International Statistical Classification of Diseases and Related Health Problems—10th Revision, German Modification (ICD-10) [31, 32].

Patients were excluded from the study if they fulfilled the criteria for organic mental disorders, addiction disorders as main diagnosis, and/or moderate to severe mental retardation. Furthermore, patients were excluded if they lacked correctable hearing and/or vision impairment and/or if they were with insufficient knowledge of German [31].

For the current study, the patient sample of the RECOVER trial was reduced to persons with bipolar affective disorder (ICD-10: F31), persons with major depression (ICD-10: F32.2) and persons with dysthymia (ICD-10: F34.1). Data was collected at baseline (T0), 6 months (T1) and 12 months after baseline (T2). The sample was further restricted to persons without missing information in the ReQoL measure at T0.

The trial was registered prospectively (NCT03459664), ethics approval was obtained from the ethics committee of the Hamburg Medical Association (PV5672) and all participants of the trial provided written informed consent. A detailed description of the RECOVER trial can be found elsewhere [31].

Measures

The German 20-item version of the ReQoL measures (ReQoL-20) has been used for measuring recovery-focused HrQoL [16]. The first 10 items of the ReQoL-20 constitute the ReQoL-10. The 20 items are scored on a scale with five levels ranging from ‘none of the time’ to ‘most of the time’. In due consideration of the positive and negative wording of the items, the items scores were summed up to a total score ranging from 0 to 80, with higher scores indicating a higher recovery-focused HrQoL [15]. The ReQoL-10 was calculated from the first 10 items of the ReQoL-20 with a total score ranging from 0 to 40. While the ReQoL contains a physical health item, this is not included in the score [28].

Symptom severity was assessed by the German versions of the 9-question depression scale of the Patient Health Questionnaire (PHQ-9) for the measurement of depressive symptoms [33, 34] and the Altman Self-Rating Mania Scale (ASRM) for the assessment of the presence and severity of manic or hypomanic symptoms [35]. The social, occupational, and psychological functioning was assessed by the Global Assessment of Functioning (GAF) scale [36], and the severity of illness was assessed using the Clinical Global Impression – Severity scale (CGI-S) [37]. HrQoL was measured using the indexes of the German versions of the EQ-5D-5L [5, 6] based on German preference weights [38] and the SF-12 (SF-6D) [7, 39] based on preference weights from the United Kingdom [40] as well as the visual analogue scale of the EQ-5D-5L (EQ-VAS) [6]. Furthermore, the mental component summary score (MCS) and physical component summary score (PCS) were calculated from the respective mental and physical dimensions of the SF-12 [39]. Additional information on the constructs and scores of the measures used for the assessment of symptom severity and HrQoL are given in the Online Resource.

The sociodemographic variables self-reported age, sex, marital status, migration background, school-leaving qualification, and education were used for description of the sample. Furthermore, the number of comorbid DSM-IV diagnoses was collected and categorized into comorbid clinical disorders (axis I diagnoses) and personality disorders (axis II diagnoses) [41].

Statistical analysis

Observations with missing information were removed from the analyses by casewise deletion. Sociodemographic characteristics of the sample were analyzed using descriptive statistics, and the distribution of the ReQoL-10/ReQoL-20 was assessed by histograms of the individual items for the total study sample. Normality of the distribution of the individual items of the ReQoL measures was analyzed using the Shapiro–Wilk test for normal data [42]. Reliability, validity, and responsiveness of the ReQoL measures were assessed. T0, T1 and T2 data was used to assess the test–retest reliability and the sensitivity to change of the ReQoL-10/ReQoL-20. All other analyses were based on T0 data.

The structural validity of the ReQoL measures was assessed using confirmatory factor analysis (CFA) to confirm a correlated traits model structure comprising of two distinct elements of positively and negatively worded items [26]. Goodness of fit was indicated by the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the Tucker-Lewis index (TLI) with cut-off values for the RMSEA and the CFI/TLI of  ≤ 0.08 and > 0.95, respectively [26, 43, 44]. Internal consistency, the extent to which scores of one measure are the same for repeated measurement using different sets of items, of the ReQoL measures was assessed between all halves of the questionnaires. Furthermore, test–retest reliability, the extent to which scores of one measure are the same for repeated measurement over time, of the ReQoL measures was assessed for the same measures applied over measurement time points. For the assessment of the internal consistency, the questionnaires were split in all possible halves and the average correlations of all halves were calculated using Cronbach’s Alpha [45]. Cronbach’s alpha coefficients above α ≥ 0.7 were defined as acceptable, above α ≥ 0.8 as good and above α ≥ 0.9 as excellent [46]. For the assessment of the test–retest reliability, only those persons without any improvement or worsening of symptoms between measurement points were selected from the sample. Unchanged symptoms between measurement points were defined as PHQ-9 (ASRM) difference < \(\left|5\right|\) (\(\left|4\right|\)) between measurement points [35, 47]. Test–retest reliability was calculated using the intra-class correlation coefficient (ICC) [48]. ICC below r = 0.50 were defined as poor, between r = 0.50 and r = 0.75 as moderate, between r = 0.75 and r = 0.90 as good, and above r = 0.90 as excellent [49].

The concurrent validity, the extent to how well one measure compares to other measures, of the ReQoL measures was assessed by comparison with clinical measures (PHQ-9 and ASRM) and measures of HrQoL (EQ-5D-5L, EQ-VAS, SF-6D, MCS and PCS). Furthermore, the known-group validity, the extent to how well one measure can demonstrate different scores for different groups, was assessed by comparison of scores for groups with less or more severe symptoms, and with good or poor global functioning and lower or higher severity of illness. The concurrent validity was assessed using Pearson’s Correlation Coefficient (PCC), and by scatterplots and LOWESS curves of the respective scores. PCC below r = \(\left|0.30\right|\) were defined as negligible, between r = \(\left|0.30\right|\) and r = \(\left|0.50\right|\) as low, between r = \(\left|0.50\right|\) and r = \(\left|0.70\right|\) as moderate, between r = \(\left|0.70\right|\) and r = \(\left|0.90\right|\) as high and above r = \(\left|0.90\right|\) as very high [50]. For the assessment of the known-group validity, the sample was split into groups with good and poor health using generic measures (CGI-S > 4, GAF ≤ 50 [36, 37]) and with less or more severe symptoms using clinical measures (PHQ-9 > 4 and ASRM ≤ 50 [33,34,35]). The known-group validity was assessed, in accordance with the psychometric evaluation of the original ReQoL measures [15], using the effect size (ES) Cohen’s d. ES of d = 0.20 were defined as small, of d = 0.50 as medium and of d = 0.80 as large [51, 52].

Responsiveness, the ability of one measure to detect a significant change assessed using a gold standard, of the ReQoL measures was assessed by sensitivity to treatment response (reduction of the PHQ-9 by ≥ 5 points) and remission of symptoms (PHQ-9 < 5) between time points. Sensitivity to change was assessed using the ES Glass' Δ and the standardized response mean (SRM). Furthermore, receiver operating characteristics (ROC) curves were constructed and discriminative abilities were assessed using the area under the curve (AUC), with an AUC of 1.0 defined as perfect discriminative abilities and an AUC of 0.5 defined as random chance. Optimal cut-off values were determined by the distance to the top-left corner from points on the ROC curve. Distance was defined as d2 = (1 − sensitivity)2 + (1 − specificity)2. Thereby, the point with the lowest distance was defined as cut-off point [53].

Data analyses were performed using Stata/MP 17.0 (StataCorp, TX, USA). All statistics were two-sided with a significance level of p < 0.05.

Results

Sample characteristics

The mean age of the total sample of persons with mood disorders (n = 393) was 39 years (Table 1). More than half of the sample was female (56%). The majority of the sample was single (51%), had an upper secondary school certificate (56%) and had a vocational training degree (45%). The mean number of comorbid DSM-IV axis I diagnoses and axis II diagnoses of the sample was 0.74 and 0.11, respectively. The complete sociodemographic characteristics of the total sample and the sub-samples are shown in Table 1.

Table 1 Sociodemographic characteristics of persons with mood disorders (F30-F39; n = 393)

Distribution of scores

The mean scores of the ReQoL-20 and ReQoL-10 in the total sample were 33.07 (SD 13.31) and 16.61 (SD 6.98) at T0, with a range from 8 to 76 and 3 to 39, respectively. The scores of both the ReQoL-20 and ReQoL-10 were slightly right skewed and leptokurtic with no apparent floor and ceiling effects (data not shown). At T1, the mean scores were 44.17 (SD 16.75) and 22.35 (SD 8.82) and at T2, the mean scores were 45.52 (SD 16.36) and 23.01 (SD 8.71). In the sub-samples of persons with bipolar affective disorder and of persons with major depression and dysthymia, the mean scores of the ReQoL-20 were 42.08 (SD 16.64) and 32.19 (SD 12.61) at T0, 48.00 (SD 15.77) and 43.71 (SD 16.86) at T1, and 48.93 (SD16.12) and 45.13 (SD 16.37) at T2, respectively. The respective mean scores of the ReQoL-10 were 21.44 (8.63) and 16.13 (6.61) at T0, 24.40 (8.17) and 22.11 (8.88) at T1, and 25.20 (SD 8.23) and 22.77 (SD 8.74) at T2.

The majority of the individual items of the ReQoL measures at T0 were not normally distributed (all with p < 0.05), with the exception of the items 2 (‘I felt able to trust others’), 3 (‘I felt unable to cope’), 6 (‘I thought my life was not worth living’), 10 (‘I felt confident in myself’) and 13 (‘I felt irritated’; Fig. 1). The three items of the ReQoL measures with the most positively skewed distribution were the items 5 (‘I felt happy’), 15 (‘I felt in control of my life’) and 18 (‘I had problems with my sleep’). Goodness of fit of the correlated traits model structure of the ReQoL-20 (ReQoL-10) was confirmed with a RMSEA of 0.08 (0.09), yet with a CFI of 0.87 (0.92) and a TLI of 0.85 (0.89).

Fig. 1
figure 1

Distribution of the individual items of the ReQoL measures at T0 (n = 393). 1 None of the time, 2 only occasionally, 3 sometimes, 4 often, 5 most or all of the time, item 1 I found it difficult to get started with everyday tasks, item 2 I felt able to trust others, item 3 I felt unable to cope, item 4 I could do the things I wanted to do, item 5 I felt happy, item 6 I thought my life was not worth living, item 7 I enjoyed what I did, item 8 I felt hopeful about my future, item 9 I felt lonely, item 10 I felt confident in myself, item 11 I did things I found rewarding, item 12 I avoided things I needed to do, item 13 I felt irritated, item 14 I felt like a failure, item 15 I felt in control of my life, item 16 I felt terrified, item 17 I felt anxious, item 18 I had problems with my sleep, item 19 I felt calm, item 20 I found it hard to concentrate

Reliability

Cronbach’s’ alpha for the ReQoL-20 (ReQoL-10) in the total sample with mood disorders was α = 0.91 (α = 0.83), indicating excellent (good) reliability among the items. Cronbach’s alpha for the ReQoL-20 (ReQoL-10) in the sub-samples of persons with bipolar affective disorder, and of persons with major depression and dysthymia was α = 0.94 (α = 0.87) and α = 0.89 (α = 0.81), respectively, indicating excellent (good) and good reliability among the items.

The ICC of the ReQoL-20 (ReQoL-10) in the total sample with mood disorders without improvement or worsening of symptoms from T0 to T1 and T1 to T2 measured by the PHQ-9 was r = 0.70 (r = 0.68) and r = 0.76 (r = 0.75), indicating moderate and good test–retest reliability, respectively (Table 2).

Table 2 Test–Retest Reliability of the ReQoL-10 and ReQoL-20 of persons without improvement or worsening of symptoms between measurement points for persons with mood disorders

Validity

The correlation coefficient between the ReQoL-20 and the ReQoL-10 was r = 0.94. In the sub-samples of persons with bipolar affective disorder and of persons with major depression and dysthymia, the correlation coefficient was r = 0.96 and r = 0.93, respectively.

The concurrent validity of the ReQoL-20 and ReQoL-10 with the clinical measure PHQ-9 was strong and moderate, indicated by a correlation coefficient of r =  − 0.76 and r =  − 0.69, respectively (Table 3, Figures S1 and S2 in the Online Resource). In the sub-samples of persons with bipolar affective disorder and of persons with major depression and dysthymia, the correlation coefficients between the ReQoL-20 (ReQoL-10) and the PHQ-9 were r =  − 0.80 (r =  − 0.70) and r =  − 0.75 (r =  − 0.67), indicating an overall strong negative linear relationship.

Table 3 Concurrent validity between ReQoL-10/ReQoL-20, clinical measures (PHQ-9, ASRM) and measures of health-related quality of life (EQ-5D-5L, EQ-VAS, SF-6D, MCS, PCS) for persons with mood disorders

The concurrent validity of the ReQoL-20 and ReQoL-10 with measures of HrQoL was overall moderate, with correlation coefficients ranging from r = 0.55 to r = 0.63 (Table 3, Figures S3 and S4 in the Online Resource). The only exception was the concurrent validity of the ReQoL-20 and ReQoL-10 with the PCS, which was only weak with a correlation coefficient of r = 0.29. All observed correlations were in the expected directions. The correlation coefficients between further measures of HrQoL and clinical measures are given in Table S4 in the Online Resource.

The mean scores of the ReQoL-20 (ReQoL-10) in the sample with minimal to moderate depression and moderately severe depression to severe depression measured by the PHQ-9 were 42.67 (21.16) and 25.70 (13.13), respectively (Table S2 in the Online Resource). The corresponding known-group validity of the ReQoL-20 and ReQoL-10 using PHQ-9 cut-off points was large with ES of d = 1.63 and d = 1.39, respectively (Table 4).

Table 4 Known-group validity of ReQoL-10/ReQoL-20 using cut-off points of clinical measures (PHQ-9a, ASRMb) as well as generic measures (CGI-Sc, GAFd)

The mean scores of the ReQoL-20 (ReQoL-10) in the sample with good or poor global functioning dichotomized by the GAF were 35.10 (17.68) and 30.35 (15.16), respectively (Table S2 in the Online Resource). When the severity of illness was dichotomized into lower or higher by the CGI-S, the mean scores of the ReQoL-20 (ReQoL-10) were 34.78 (17.57) and 30.07 (14.92). The corresponding known-group validity of the ReQoL-20 and ReQoL-10 using CGI-S and GAF cut-off points was small with ES ranging from d = 0.36 to d = 0.38.

Responsiveness

The ReQoL measures were sensitive to treatment response measured by the PHQ-9. The ES/SRM of the ReQoL-20 and ReQoL-10 ranged between 1.20/1.40 and 2.02/1.73 between all measurement points, indicating high responsiveness (Table 5, Table S3 in the Online Resource). The mean difference of the ReQoL-20 (ReQoL-10) in the sample with treatment response was 22.11 (10.70) between T0 and T1, and 16.58 (8.35) between T1 and T2. Mean differences and ES/SRM of measures of HrQoL based on treatment response measured by the PHQ-9 are given in Table S4 in the Online Resource.

Table 5 Sensitivity to change of ReQoL-10/ReQoL-20 based on the clinical measure PHQ-9 for persons with mood disorders—Treatment responsea

ROC analyses showed AUC for ReQoL-20 and ReQoL-10 differences and treatment response measured by the PHQ-9 ranging from 0.87 to 0.89 and from 0.84 to 0.85, respectively (Fig. 2, Figure S3 and Table S5 in the Online Resource). The optimal cut-off value for treatment response was a ReQoL-20 (ReQoL-10) difference ≥ 12 (≥ 6) with a sensitivity of 78.03% (75.76%) and a specificity of 85.26% (80.77%).

Fig. 2
figure 2

ROC curves of the ReQoL-20 difference between T0 and T1 and treatment response based on the clinical measure PHQ-9 for persons with mood disorders (n = 282)

For remission of symptoms measured by the PHQ-9, the ReQoL measures were also sensitive with large ES/SRM of the ReQoL-20 and ReQoL-10 between T0 and T1 (1.64/1.49 and 1.64/1.51) and with moderate ES/SRM between T1 and T2 (0.45/0.56 and 0.47/0.53; Table 6, Table S3 in the Online Resource). The mean difference of the ReQoL-20 (ReQoL-10) in the sample with remission of symptoms was 24.65 (12.13) between T0 and T1, and 5.80 (3.12) between T1 and T2. Mean differences and ES/SRM of measures of HrQoL based on remission of symptoms measured by the PHQ-9 are given in Table S6 in the Online Resource.

Table 6 Sensitivity to change of ReQoL-10/ReQoL-20 based on the clinical measure PHQ-9 for persons with mood disorders—Remission of symptomsa

ROC analyses showed AUC for ReQoL-20 (ReQoL-10) differences and remission of symptoms measured by the PHQ-9 of 0.96 (0.94) at T1 and 0.83 (0.81) and T2 (Fig. 3, Figure S4 and Table S7 in the Online Resource). The optimal cut-off value for remission of symptoms was a ReQoL-20 (ReQoL-10) score of 57 (30) with a sensitivity of 86.67% (85.00%) and a specificity of 89.47% (89.04%).

Fig. 3
figure 3

ROC curves of the ReQoL-20 at T1 and remission of symptoms based on the clinical measure PHQ-9 for persons with mood disorders (n = 282)

Discussion

This study aimed to assess the psychometric properties of the ReQoL measures in patients with affective disorders in Germany. The reliability of the ReQoL measures in this sample was overall good. The internal consistency was good to excellent and the test–retest reliability was moderate to good. The concurrent validity of the ReQoL measures with the clinical measure PHQ-9 and measures of HrQoL was strong and moderate, respectively. The ReQoL measures were able to distinguish between samples with minimal to moderate depression and moderately severe to severe depression measured by the PHQ-9 with large ES. The ReQoL measures, however, had a small known-group validity when the global functioning and the severity of illness was dichotomized into good/poor and lower/higher by the GAF and the CGI-S, respectively. Also, the ReQoL measures were sensitive to treatment response and remission of symptoms measured by the PHQ-9 with large ES.

The assessment of the psychometric properties of the ReQoL measures in a UK sample of patients with anxiety and depression showed similar results [29]. The concurrent validity of the ReQoL-10 with the clinical measure PHQ-9 was only moderate and the concurrent validity of the ReQoL-10 with the HrQoL measure EQ-5D-5L was low. The known-group validity of the ReQoL-10, however, was large, just as in the current sample. Furthermore, for remission of symptoms measured by the PHQ-9, the ReQoL-10 was responsive also with a large ES [29]. For patients with psychosis in the Netherlands, the psychometric properties of the ReQoL measures were also in line with the results of the current study [17]. The reliability of the ReQoL-10 was good and the concurrent validity of the ReQoL-10 with the HrQoL measure EQ-5D-5L was moderate. Furthermore, the known-group validity of the ReQoL measures was large for groups with lower and higher depression severity [17]. The structural validity of the ReQoL measures in the current sample using a correlated traits model was acceptable, yet comparably worse than the structural validity of the bi-factor model of the ReQoL measures in samples of the UK and Singapore, which had lower RMSEA (0.05 and 0.07 vs. 0.08) [26, 30].

The mean total ReQoL-10 score in the current study was lower than the mean scores of the patient samples reported in the other studies (16.6 vs. 18.6 to 27.8) [17, 29, 30]. One reason for this comparatively low mean total ReQoL-10 score could be the hospital setting from which recruitment for the sample of the current study took place [31]. By contrast, the settings of the other studies were outpatient care [29] or ongoing care [17]. Other reasons might be differences in recovery-focused HrQoL between patients with affective disorders and patients with anxiety [29, 30] or psychosis [17], or between patients of different countries.

Both ReQoL measures were found to be valid and reliable for the assessment of recovery-focused HrQoL for persons with affective disorders. However, the ReQoL-10 was less reliable with only good reliability among its items compared to the ReQoL-20 (α = 0.83 vs. α = 0.91). Furthermore, the concurrent validity of the ReQoL-10 with the PHQ-9 was only moderate, although the known-group validity and the responsiveness was comparably large for both ReQoL measures. In conjunction with the very high correlation between the ReQoL-10 and the ReQoL-20 (r = 0.94) it may be suggested that the ReQoL-10 is sufficient for the assessment of recovery-focused HrQoL for persons with affective disorders. Yet, the ReQoL-20 might be of use to provide a complete picture of the recovery process in research and psychiatric regular care [15]. Furthermore, compared with the further measures of HrQoL, the ReQoL-20 had a high concurrent validity with the PHQ-9 (PCC = − 0.76), whereas the concurrent validity of the EQ-5D-5L (PCC =  − 0.58) and the SF-6D (PCC =  − 0.55) was only moderate.

Large ES of the ReQoL measures for both, treatment response and remission of symptoms, show that the ReQoL measures are a sensitive and responsive measure for use in the area of affective disorders. With regard to the measures of HrQoL, it can be stated that the EQ-5D-5L was overall sensitive and responsive, yet with only moderate ES for both treatment response and remission of symptoms. Furthermore, the EQ-5D-5L index scores were extremely skewed compared with the distribution of the ReQoL-measure scores, with 3.39% of the sample indicating no problems in any dimension. Contrary to the EQ-5D-5L, the ES of the SF-6D were overall large and the distribution of its scores were only slightly skewed. Hence, the SF-6D can be considered a sensitive and responsive measure for use in the area of affective disorders.

As the EQ-5D was also previously shown to be possibly not suitable for use in the affective disorders and other mental health areas due to low responsiveness [8], the ReQoL measures and especially the ReQoL-UI derived from this may be promising for the concurrent use in mental health-related cost-utility analyses in Germany [28, 29]. The ReQoL measures may be of great importance for resource allocation decisions related to mental health services [15], as they are not only able to measure recovery-focused HrQoL but also change in depressive symptom severity and also possibly change in symptom severity of other mental health problems.

Limitations

There are several limitations to this study. First, for the psychometric evaluation of the ReQoL measures, a patient sample of a randomized controlled trial has been used that has been recruited in a university clinical setting from catchment area of the University Medical Center Hamburg-Eppendorf, Hamburg. For this reason, the generalizability of the psychometric evaluation may be limited to inpatients with mood disorders. Second, for the analysis, only complete cases have been used, resulting in a loss of information and potentially introducing bias to the results of this study. Yet, sample size for the psychometric evaluation was large and a variety of clinical measures, generic measures and measures of HrQoL was available for comparison. Third, both ReQoL measures were assessed simultaneously within one person. Ideally, study participants would have been randomly assigned to one of the two measures. However, a larger sample would have been necessary for this purpose. Fourth, a correlated traits model structure comprising of two distinct elements of positively and negatively worded items was obtained for the CFA. However, a bi-factor model structure of the ReQoL measures was confirmed elsewhere [26, 30]. Unfortunately, it was not possible to fit a bi-factor CFA model to the current sample, as convergence was not achieved. Last, test–retest reliability was assessed over a period of 6 months. A shorter period of time between test and retest would have been more suitable, as this would ensure no change in symptom severity would have been occurred [54, 55]. Yet, for the assessment of the test–retest reliability, only those persons without any improvement or worsening of symptoms between measurement points were selected from the sample to minimize the problem of a long assessment period. Notwithstanding, the test–retest reliability of the ReQoL measures for patients with affective disorders should be interpreted with caution.

Conclusion

The German version of the ReQoL measures is valid and reliable for the assessment of recovery-focused HrQoL for persons with affective disorders. As they have proven to also measure change in depressive symptom severity, the ReQoL measures are promising for use in mental HrQoL research as well as health economic research. Further research is needed in order to estimate preference weights from the general population for the development of a German ReQoL-UI.