Measuring broader wellbeing in mental health services: validity of the German language OxCAP-MH capability instrument

Purpose The OxCAP-MH capabilities questionnaire was developed and validated in the UK for outcome measurement in mental health clinical studies. Its broader wellbeing construct or validity in routine mental health services so far has not been assessed. The objectives were to investigate the extent the OxCAP-MH measures broader wellbeing compared to the EQ-5D-5L and to test psychometric properties of the German language OxCAP-MH in routine mental health services in Austria. Methods Study sample consisted of patients in socio-psychiatric services (n = 159) assessed at baseline and 6-month follow-up. Underlying factors associated with quality-of-life/wellbeing concepts measured by the OxCAP-MH and EQ-5D-5L were identified in exploratory factor analysis (EFA). Responsiveness was assessed using anchor questionnaires and standardised response mean (SRM). For discriminant validity, subgroups of respondents were compared using t test and one-way ANOVA. Test–retest analysis was assessed for a period of maximum 30 days from the baseline assessment with intra-class correlation coefficient (ICC). Results EFA identified a two-factor structure. All EQ-5D-5L items and seven OxCAP-MH items loaded on one factor and nine remaining OxCAP-MH items loaded on a separate factor. Responsiveness was found for patients who improved in anchor questionnaire scores with large or moderate SRM statistics. OxCAP-MH discriminated between various groups in univariable and multivariable analyses. Reliability of the German language OxCAP-MH was confirmed by ICC of 0.80. Conclusions Besides providing evidence that the OxCAP-MH measures broader wellbeing constructs beyond traditional health-related quality of life, the study also confirms the validity of the instrument for implementation in routine evaluation of mental health services. Electronic supplementary material The online version of this article (10.1007/s11136-019-02187-9) contains supplementary material, which is available to authorized users.


Introduction
There has been a general debate whether the currently used generic health-related quality-of-life (HRQoL) measures, for example, EQ-5D commonly used for quality-adjusted life years (QALYs) calculations, sufficiently reflect the most important dimensions of patients' wellbeing for evaluative purposes [1][2][3]. Some authors argue that an alternative way of assessment should be implemented to capture both health and non-health effects of evaluated interventions especially in evaluations conducted from a societal perspective [4][5][6][7][8]. Moving beyond the QALYs framework to account for non-health outcomes in economic evaluations requires the use of measures that capture a broader set of patient outcomes [4,9]. This is particularly relevant for mental health interventions which are expected not only to improve health outcomes, but also impact areas of recovery including social relationships, hope and optimism about the future, identity, meaning of life and empowerment of personal autonomy [10].
The capabilities framework has gained attention as an alternative to the traditional welfare economics approach used in evaluating quality of life. The term 'capability' in the context of social equality or inequality was defined by Amartya Sen in the field of the development economics research [11]. It meant to indicate a space of each individual to do things that he or she chooses and values in life [12].
A recent literature review by Mittchel et al. demonstrated increasing interest in the application of the capability approach in the field of health [13]. The review pointed out that several new instruments have been developed in the last decade which applied the capability approach in the assessment of health and social care interventions. These instruments demonstrated good psychometric properties in different disease areas, population groups and settings [14][15][16][17][18][19]. Moreover, studies showed that the capability-based instruments seem to measure broader aspects of human's wellbeing compared to, for example, EQ-5D [8,20,21]. There is a growing body of literature about the use of the capability-based instruments in economic evaluations to support decisions on allocation of resources [22][23][24]. Recently, the National Institute for Health and Care Excellence (NICE) in the UK recommended the use of ASCOT and ICECAP-O capability questionnaires in economic evaluations of social care interventions [25].
Sen's capabilities theory was further developed by Martha Nussbaum who proposed a list of 10 central human capabilities [11] which served as a basis for the OxCAP-MH (Oxford CAPabilities questionnaire-Mental Health) questionnaire (Online Supplement Table 1). The OxCAP-MH was originally designed to measure capabilities in patients with mental disorders in the UK [7] and has been successfully deployed in several trial-based mental health economic evaluations since [26][27][28]. Recently, OxCAP-MH was translated to German to facilitate its use in German-speaking countries [29].
The original English version of the OxCAP-MH was tested in a randomised controlled trial on a sample of patients with a primary diagnosis of psychotic disorders [7,14]. Findings from Vergunst et al. demonstrated good feasibility of the questionnaire in this patient group as well as its good psychometric properties in terms of reliability and validity [14,30]. However, the dimensionality of the OxCAP-MH attributes using exploratory factor analysis (EFA) and more elaborate comparison to the EQ-5D in terms of the measured quality-of-life and broader wellbeing constructs have not yet been investigated. Furthermore, the questionnaire has not been tested in terms of its responsiveness and discriminant validity. Also the use of this instrument has not been formally tested in a routine health services evaluation setting.
Therefore, the objectives of this study were (1) to examine the dimensionality of the OxCAP-MH and to analyse to what extent OxCAP-MH measures broader concept of wellbeing compared to the EQ-5D-5L, (2) to test the psychometric validity of the German language instrument in routine mental health services and (3) to compare the results to those of the original UK psychometric validation.

Study participants and procedure
Study participants were approached by their formal carers in the routine setting of socio-psychiatric services, organised by pro mente kärnten, in the Austrian state of Carinthia. Out of nine states in Austria, Carinthia is sixth in terms of the size of the population which constitutes around 6.4% of Austrian general population [31]. It has GDP per capita of 34,300 EUR and the second highest unemployment rate in Austria (10.2%) [31,32]. Pro mente kärnten is the biggest non-profit community-based organisation located in Carinthia that offers comprehensive support and treatment for mentally ill adults.
The eligibility criteria included the following: patients with diagnosed mental disorder receiving socio-psychiatric services, ability and willingness to give a written consent, age above 18, fluent in German and not in active phase of the disease.
The study was aligned with the clinical routine assessment system implemented in the socio-psychiatric services which is used to collect patient information every 6 months. At baseline, patients completed self-reported questionnaires during their regularly scheduled meetings with their carers. As routine follow-up, the same self-reported questionnaires were filled out at 6-month intervals. Other routinely collected data included socio-demographic characteristics, diagnostic codes according to the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) [33] and observerrated questionnaires filled out by the carers (Global Assessment of Functioning (GAF) and Mini-ICF-APP Social Functioning Scale).
In addition to the routinely collected information, study participants were asked to complete the OxCAP-MH and EQ-5D-5L at baseline and at 6-month follow-up. For the test-retest analysis, the OxCAP-MH was completed once at the next scheduled care meeting within 30 days of the baseline assessment.
The study had been approved by the Ethics Committee of the Medical University of Vienna (EK-Nr: 2191/2015, Votum: 08.03.2016). Organization Quality of Life BREF (WHOQOL-BREF), the  Short Brief Symptom Inventory-18 (BSI-18), Global Assessment of Functioning (GAF), Mini-ICF-APP, EQ-5D-5L and  OxCAP-MH. WHOQOL-BREF is a generic self-reported tool that covers four dimensions (physical, psychological, social relationships and environment) plus overall value for quality of life and satisfaction with individual's health. It consists of a total of 26 items with five levels of answers. Higher scores ranging from 0 to 100 denote higher quality of life. The instrument is widely used in international research projects [34]. The Cronbach's alpha for WHOQOL-BREF in a German sample of respondents from the general population and somatic and psychiatric patients was 0.88, 0.83, 0.76 and 0.78 for domains physical health, psychological, social relationships and environment, respectively [34].

Instruments included German versions of the World Health
BSI-18 is a measure of psychological distress widely used in mental health research. This 18-item self-reported questionnaire consists of three domains (depression, anxiety and somatisation). Each domain scores between 0 and 24, while the global scale Global Severity Index (GSI) ranges between 0 and 72 [35]. Higher score represents more psychological distress. Validation of the German version of the instrument has shown good psychometric properties of the GSI with a Cronbach's alpha of 0.93 and strong correlations with the Patient Health Questionnaire (PHQ) Depression (0.71) and Anxiety (0.73) [36].
GAF is an observer-rated instrument measuring overall level of psychosocial functioning on a scale from 1 to 100 with the higher score indicating better functioning. The instrument is widely used internationally in mental health research and is an integral part of the DSM classification system (Axis V) [37,38]. Estimates for the test-retest reliability of GAF vary between studies ranging from 0.54 to 0.9 [39]. Moderate correlations with SCL-90-R and weak correlations with the Social Adjustment Scale were reported for GAF [40].
Mini-ICF-APP is an observer-rated instrument of social functioning designed for use by carers of adults with mental disorders [41]. It consists of thirteen domains which are rated on a 0-4 scale where 0 represents 'no impairment' and 4 'total disability'. Consequently, higher score represents more severe impairment [42]. Test-retest reliability of the instrument in a UK sample was 0.82 and Cronbach's alpha was 0.86. Strong correlations with the Brief Psychiatric Rating Scale (BPRS) and Social and Occupational Functioning Assessment Scale indicated good validity of the Mini-ICF-APP [41].
The EuroQoL EQ-5D-5L is a generic, self-reported, preference-based questionnaire used to measure HRQoL alongside five domains (mobility, self-care, usual activities, pain/discomfort, depression/anxiety) [43]. The EQ-5D Visual Analogue Scale (VAS) is a measure of current health status where 100 represents full health and 0 represents death. EQ-5D-5L has been validated in the German setting [44]. In a sample of patients with depression, the EQ-5D-5L's Cronbach's alpha was 0.77 and correlation with Beck Depression Inventory was − 0.58 showing satisfactory validity [45].
Finally, the OxCAP-MH is a self-reported, 16-item questionnaire of capabilities. Each question is scored on a 1-5 Likert scale, where 1 refers to a very low level of the given capability domain and 5 refers to no limitations in the given capability domain. For easier interpretation, the total OxCAP-MH score which ranges from 16 to 80, is standardised to a 0 to 100 scale, using the formula 100 × (OxCAP-MH total score − minimum score)/range, where 0 represents no capabilities and 100 represents full capabilities [30]. Reliability of the original English OxCAP-MH instrument was assessed based on a sample of 172 patients (Cronbach's alpha 0.79). Test-retest reliability analysis drawn on subsample of 57 patients resulted in an intra-class correlation coefficient (ICC) of 0.86. Assessment of construct validity showed statistically significant positive correlations with the EQ-5D-3L, EQ-5D-VAS and the BPRS.

Analyses
Data were collected in paper form, entered into the data collection system by the carers and double checked by the data analyst. All data were pseudo-anonymised at source in pro mente kärnten. Questionnaire scores were calculated according to the guidelines provided for the respective measures. Due to the lack of Austrian tariffs for the EQ-5D, German [46] value set was used to calculate utility index (EQ-5D-index). According to the EQ-5D manual, in the absence of a country-specific value set, values of a neighbouring country or population similar to the population under investigation should be used [47]. Therefore, German values were used in the primary analysis. In addition, we run a sensitivity analysis using the UK value set as common practice in countries that do not have their own value sets available [48]. Since the sensitivity analysis results were very similar to those obtained in the primary analysis and the current NICE position statement does not recommend using the current UK 5L valuation set in general [49], we only report results for the German value set. Changes in questionnaire scores between baseline and 6-month follow-up were assessed using paired t test.

Floor and ceiling effects and item-total correlation
Floor/ceiling effects were considered present when lowest/ highest values in the total score were reported by more than 1 3 15% of respondents or when more than 40% of respondents scored lowest/highest values on questionnaire items [14,50]. Item-total correlation is used to confirm the homogeneity of the scale items and it was assessed as the correlation of an individual item with the total score calculated by omitting that item. According to the rule of thumb provided by Streiner et al. correlation between the item and the scale higher than 0.3 indicates satisfactory item homogeneity [51].

Exploratory factor analysis (EFA)
EFA is used to study dimensionality of a measurement instrument when a priori assumptions about the dimensional structure of an instrument are lacking [52]. As no prior EFA was performed for the OxCAP-MH questionnaire and no prior hypotheses about the dimensionality of the OxCAP-MH existed, the current study used EFA for two purposes. Firstly, to examine the dimensional structure underlying the OxCAP-MH questionnaire; secondly, to examine whether the OxCAP-MH and EQ-5D-5L items share the same set of the underlying factors or measure separate constructs. Prior to the factor analysis, the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy and Bartlett's Test of Sphericity were performed to test for suitability of the data for factor analysis. Good suitability for factor analysis is indicated by a larger KMO (score below 0.5 is considered unacceptable for factor analysis) and statistically significant Bartlett's Test [53]. Because both of the instruments (OxCAP-MH and EQ-5D-5L) are scored on categorical (polytomous) scales, EFA was performed using polychoric correlations adequate for this type of data [54]. For easier interpretation, we recoded EQ-5D-5L response levels to 1-5 scale, where 1 represents extreme problems and 5 represents no problems (opposite to the traditional scoring of the EQ-5D-5L descriptive system). The number of factors retained was chosen based on criteria which included the Kaiser Criterion, the scree plot and parallel analysis (PA). As a factor extraction method, we used Minimum Rank Factor Analysis (MRFA) recommended by Timmerman and Lorenzo-Seva [54]. In addition, we run EFA with commonly used Unweighted Least Squares as a factor extraction method which yielded similar factor solution as the MRFA. Therefore, we report findings for MRFA only. Factor loadings were rotated using promin rotation [55]. We used the freely available software FACTOR to conduct the EFA [56], for the other analyses we used Stata 15.1 [57].

Responsiveness
The anchor-based approach was applied to assess responsiveness. Self-reported quality-of-life measures that correlated with the OxCAP-MH at the minimum level of 0.5 were selected as anchor questionnaires. Correlations were assessed using Spearman's rank correlation coefficients applying Bonferroni correction to adjust for multiple testing. Correlations of < 0.3, 0.3-0.5 and > 0.5 were considered weak, moderate and strong, respectively [58]. Because of the lack of available data on the value of minimally important difference for the similar study population for the instruments used as anchors, the assumption of ½ standard deviation (SD) of mean baseline score was used as an indicator of a change that is meaningful to patients [59]. If a change in the instrument scores between baseline and 6-month follow-up assessments was estimated beyond ½ SD positively or negatively, patients were classified as "Improved" or "Worsened", respectively. Otherwise, patients were classified as "No change". Small, moderate and large magnitude of change was indicated by < 0.5, 0.5-0.79 and ≥ 0.8 values of standardised response mean (SRM), respectively [60]. SRM was calculated as the ratio of the mean change, between baseline and follow-up scores in a single group, to the SD of the change scores [51]. Changes at the individual patient level were also explored reporting percentage of the study respondents who improved, worsened or did not change according to both the OxCAP-MH and the anchor questionnaires.

Discriminant validity
Discriminant validity was defined as the ability of the questionnaire to discriminate between subgroups and was assessed by comparing differences in OxCAP-MH scores between defined patient groups based on age, sex, marital and employment status, mental health comorbidity, overall quality-of-life levels and satisfaction with health. Questions about overall QoL and satisfaction with health were derived from the two items of the WHOQOL-BREF instrument which included: "How would you rate your quality of life?" and "How satisfied are you with your health?" [61]. Associations between the OxCAP-MH scores and selected variables were tested using t tests for two group comparisons and oneway ANOVA for comparisons between multiple groups. In addition, a multivariable linear regression was used including all patient-related variables as explanatory variables in the model.

Test-retest reliability and Cronbach's alpha
Reliability was defined as the ability of the questionnaire to give consistent results after being administered on multiple occasions to the same person when no changes in health state/wellbeing were reported [62]. Carers were instructed to ask study participants to complete the OxCAP-MH questionnaire at the next scheduled meeting after baseline assessment that was not later than 30 days after the first completion of the questionnaire. Prior to completing the questionnaire, patients were asked: "Since you last completed this questionnaire, has anything in relation to your health or social situation changed?" Only patients who gave a negative answer to that question were included in test-retest analysis which was assessed using ICC. ICC estimates of < 0.5, 0.5-0.74, 0.75-0.89 and ≥ 0.9 indicated poor, moderate, good and excellent reliability, respectively [63]. ICC for test-retest analysis was calculated using a two-way random model with absolute agreement [64]. For Cronbach's alpha, values greater than 0.70 were considered an indicator of good internal consistency [50,51]. For all analyses, the level of significance was p < 0.05. Analyses were conducted on complete cases.

Patient characteristics
Out of the typical patient flow of 400 patients who were approached between June 2016-March 2017, 166 patients agreed to participate in the study. Two patients withdrew before completing the baseline assessment. Baseline data were collected for 164 patients, out of whom 159 participants had fully completed OxCAP-MH questionnaires and were included in the baseline analyses. Thirty-one study participants withdrew from the study or were lost to followup. One-hundred and twenty seven patients (n = 127, 80%) completed fully both the baseline and the 6-month follow-up assessments. No differences in baseline demographic and socio-economic characteristics and questionnaire scores were observed between the patients who completed the study and those who withdrew or were lost to follow-up.
Patient characteristics are presented in Table 1. Mean age was 45 (min. 18, max. 87) and 64% (n = 102) of the study sample were women. Participants had a variety of mental health diagnoses with depression (37%) and anxiety disorders (24%) being the most common. A total of 103 (65%) patients had single diagnoses and 45 (28%) patients had multiple mental health diagnoses.

Floor and ceiling effects and item-total correlation
The baseline OxCAP-MH scores were normally distributed ( Fig. 1) showing no floor/ceiling effects in the overall OxCAP-MH score. Ceiling effects were observed in the items 'appreciating nature' (64%), 'probability of assault' (48%) and 'respecting people around' (43%), indicating no limitations in these capability domains for the majority of the patients. Item-total correlations of the individual items were satisfactory and ranged between 0.29 and 0.61 (Online Supplement Table 2).

Exploratory factor analysis (EFA)
Kaiser's measure (KMO) was 0.835 for the OxCAP-MH items alone and 0.857 for pooled OxCAP-MH and EQ-5D items and the p-value for Bartlett's Test was < 0.001, both representing adequacy of the data for factor analysis. Investigation of the Kaiser Criterion, scree plot and PA suggested that a two-factor solution was most suitable for the OxCAP-MH alone as well as for the pooled OxCAP-MH and EQ-5D-5L questionnaire items (Online Supplement Figs. 1 and  2). Nine OxCAP-MH domains loaded on Factor 1 ( Table 2). These domains refer to contextual aspects of one's life that include personal and environmental dimensions ('suitable flat situation', 'local decisions', 'appreciation of nature', 'respect for people around', 'enjoy love and support', 'freedom of deciding for yourself', 'creativity', 'freedom of expression', 'access to interesting activities'). We refer to this factor as the 'contextual factor'. The remaining seven OxCAP-MH domains loaded on Factor 2. These items are related to limitations in one's health ('limit daily activities', 'less sleep over worrying', 'free time activities') and social sphere ('neighbourhood safety', probability of discrimination', 'probability of assault', 'meet socially with friends or family'). Hence, we refer to this factor as the 'sociophysical factor'.
Two-factor solution was also optimal for the EFA for pooled OxCAP-MH and EQ-5D-5L questionnaire items ( Table 3). All five EQ-5D domains strongly loaded on Factor 2 together with the seven OxCAP-MH domains covered by defined earlier 'sociophysical factor'. The remaining OxCAP-MH domains loaded on Factor 1 earlier defined as 'contextual factor'.

Responsiveness
Overall, study participants significantly improved in all questionnaire scores between baseline and 6-month followup ( Table 1). Distribution of change in OxCAP-MH scores shows that majority of participants improved (n = 75, 60%) or did not change (n = 16, 13%), and around one-quarter of the participants worsened (n = 33, 27%) (Fig. 2).
Correlation coefficients of the baseline and followup scores were strong between the OxCAP-MH and EQ-5D-index, EQ-5D-VAS, BSI-18, WHOQOL-BREF Physical, WHOQOL-BREF Psychological, WHOQOL-BREF Environment (Table 4). Moderate correlations of the baseline and follow-up scores were observed for WHOQOL-BREF Social relationships, Mini-ICF-APP and GAF. Correlations of change scores between OxCAP-MH and other measures were weak to moderate and ranged from − 0.10 (Mini-ICF-APP) to 0.51 (WHOQOL-BREF Psychological) Based on correlation analyses, EQ-5D-index, EQ-5D-VAS, BSI-18 and WHOQOL-BREF Physical, Psychological and Environment were chosen as anchor questionnaires. For the study participants who reported improvement in BSI-18 and WHOQOL-BREF Physical and Psychological scores, the improvement in the OxCAP-MH score was statistically significant at the 1% level with large SRM statistics. Statistically significant improvements in the OxCAP-MH scores at the 1% level with moderate SRM statistics were observed for those who improved in EQ-5D-index, EQ-5D-VAS and WHOQOL-BREF Environment scores (Table 5).
For the individual patient-level changes in the questionnaire scores, 41% and 42% of patients who improved in WHOQOL-BREF Environment and EQ-5D-index, respectively, improved also in the OxCAP-MH score, while 54% of those who improved according to the WHOQOL-BREF Physical and Psychological health, also improved in the OxCAP-MH score ( Table 6). Of those who did not change according to the BSI-18, EQ-5D-index, WHOQOL-BREF Environment, EQ-5D-VAS or WHOQOL-BREF Physical and Psychological health scores, 56, 59, 61, 62, 64 and 70% also did not change based on the OxCAP-MH score, respectively.

Discriminant validity
In the univariable analysis, OxCAP-MH significantly discriminated between employed and unemployed participants, between single-morbid and multi-morbid respondents and between different levels of subjective overall quality of life and satisfaction with health. In the multivariable regression framework, statistically significant associations were maintained for multi-morbidity, overall quality of life and satisfaction with health (Online Supplement Table 3). No    Access to interesting activities/employment 0.58 EQ-5D-5L 1 Mobility 0.88 EQ-5D-5L 2 Self-care 0.90 EQ-5D-5L 3 Daily activities 0.77 EQ-5D-5L 4 Pain/discomfort 0.69 EQ-5D-5L 5 Depression/anxiety 0.68 Cronbach's alpha 0.87 0.80 statistically significant associations with the variables sex, age and marital status were confirmed.

Test-retest reliability and Cronbach's alpha
For the test-retest analysis, a total of 146 patients (92%) completed the questionnaire a second time before the follow-up assessment. Forty-three cases were excluded due to indication of changed health state during the observation period. A further 34 questionnaires were excluded from the analysis due to the a priori set maximum 30 day timeframe limitation of this test. Overall, 69 patients could be included in the '0-30 days' test-retest analysis. The estimated singlemeasure intra-class correlation was 0.80 (95%CI 0.69-0.87) indicating good similarity between two assessments within individual respondents. Sensitivity analyses were performed for 21 (n = 39) and 14 (n = 28) days' time spans. Details are presented in Table 7. Cronbach's alpha for the OxCAP-MH questionnaire was 0.85.

Discussion
This study examined the German language version of the OxCAP-MH, a multi-dimensional capability instrument for outcome measurement in mental health research. The original English version of this instrument was developed in the UK and validated for severe mental disorders in the English setting showing robust psychometric properties [7,14]. After careful linguistic translation and cultural validation process [29], the German language version of the OxCAP-MH was tested and fully validated in Austria. In addition, the broader concept of capabilities in comparison to HRQoL was formally investigated through comparison with EQ-5D-5L in an EFA. The German language version of the OxCAP-MH proved to be a measure of quality of life and broader wellbeing beyond traditional HRQoL assessed by the EQ-5D-5L. In the EFA, all EQ-5D-5L domains loaded strongly on the socalled sociophysical factor while the remaining OxCAP-MH items loaded on the separate second factor that represents capability dimensions beyond health. These results indicate that in comparison to the EQ-5D-5L, the OxCAP-MH may be seen as enhanced rather than complementary in its concept. This finding is different from those of the two studies that looked into the factor structure of the ICECAP-A and ICECAP-O in comparison to the EQ-5D-3L. These studies showed that these questionnaires measure two distinct concepts and should be considered complementary measures since they provide different information about the patient's wellbeing [21,65].
Moderate to strong level of convergence of the baseline and follow-up scores between OxCAP-MH and other measures, including EQ-5D-index, EQ-5D-VAS, four dimensions of WHOQOL-BREF, BSI-18 and Mini-ICF-APP, confirmed that different domains of quality of life such as social functioning, health status and psychological distress are well reflected in the OxCAP-MH score. Similar to the UK validation results, the weakest correlation was reported between the OxCAP-MH and the GAF. This was expected also in the current context, as the GAF score reflects a clinician's view on patient's symptoms, impairment and functioning, and not the patient's own perceived wellbeing.
The current validation study also investigated new aspects of the OxCAP-MH instrument including its responsiveness and discriminant validity. These analyses showed that the OxCAP-MH is responsive to the improvements in a patient's health state over time measured by anchor questionnaires   and it discriminates among defined patient groups with high sensitivity. Only few studies examined capability instruments' responsiveness over time [20,66,67]. Furthermore, there are relatively few studies that looked at the use of capability measures among patients suffering from mental disorders [20,68,69], and these emphasised good construct validity and feasibility of the instruments. Given the existing literature, this study is unique in providing a full psychometric validation of a capability measure for a cohort with mixed mental health problems and its comparison to generic HRQoL.
As opposed to the validation of the English version, in this study, data collection took place in the real-world care setting and confirmed both the feasibility and validity of using the OxCAP-MH in the routine evaluation of mental health services. Our findings that the OxCAP-MH is versatile across different mental health diagnoses and can be used both in clinical and health economic research in diverse settings are in-line with preliminary evidence from other ongoing studies that utilise the instrument [26][27][28]70].
Similar to the UK validation study of the English version with patients suffering from psychosis, the German language OxCAP-MH had good internal consistency with a Cronbach's alpha of 0.85 and showed good reliability in test-retest analysis with ICC estimates of 0.80. These findings are similar to other studies on capability measures. For example, Rand et al. reported ICC of 0.78 for the ASCOT-SCT4 measure [69] and Al-Janabi et al. reported ICC of 0.72 for the ICECAP-A index score [71].
Despite limitations by its pragmatic design, including limited sample size, some degree of missing data on the diagnoses and socio-economic status and limited possibilities of conducting test-retest analysis in strictly defined time period, this study confirms the favourable psychometric properties of the instrument also in diverse cultural and disease settings. The study provides also new evidence on the responsiveness of the OxCAP-MH and its conceptual ability to capture broader aspects of wellbeing beyond the traditional HRQoL framework. The findings further support the use of the OxCAP-MH as an outcome measure both in experimental and in routine evaluation of health and social care interventions also the in German-speaking context. Acknowledgements Open access funding provided by Medical University of Vienna. No external funding was provided for this study. We would like to express our gratitude to all carers working in pro mente kärnten for their commitment to the study and conducting interviews with the patients. We also would like to thank all patients who participated in the study.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study had been approved by the Ethics Committee of the Medical University of Vienna (EK-Nr: 2191/2015, Votum: 08.03.2016).

Informed consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.