Comparing the German Translation of the ICECAP-A Capability Wellbeing Measure to the Original English Version: Psychometric Properties across Healthy Samples and Seven Health Condition Groups

As the cross-cultural use of outcome measures grows, it is important to determine whether these instruments are: appropriate for use in other settings, translated accurately, and perform in a similar manner to their original tools. This research aimed to compare the validity of the German translation of the ICECAP-A to the original English version of the instrument, across healthy adults and seven health condition groups (arthritis, asthma, cancer, depression, diabetes, hearing loss and heart disease). Data were analysed from a cross-cultural study, which recruited participants through online panels in 2012. Data were analysed on capability wellbeing (ICECAP-A), health-related quality of life (EQ-5D-5L and SF-6D), satisfaction with life (SWLS), and a series of other condition-specific outcome measures. The ICECAP-A was assessed for internal consistency, convergent validity and construct validity. 2501 individuals were included in the analysis. The ICECAP-A demonstrated good internal consistency within Germany and the UK population, and across all seven health condition sub-groups (α = .74–.86). In both countries, ICECAP-A scores were significantly correlated with SWLS, SF-6D and EQ-5D-5L scores for healthy participants and health condition groups (r = .35–.77). Finally, experiencing one of the seven health conditions (compared to being healthy) was significantly associated with lower levels of capability wellbeing in the German and UK samples (construct validity). The German translation of the ICECAP-A yielded valid and reliable data, in both healthy respondents and the seven health condition groups. Further work could be undertaken to develop a German specific value-set for the ICECAP-A.


Background
There is increasing pressure for economic evaluations to acknowledge the impact of health and social care interventions on the wider aspects of wellbeing, above and beyond health-related quality of life (Brazier and Tsuchiya 2015). Several western governments have also sought to incorporate comparatively 'subjective' forms of wellbeing alongside more 'traditional' economic indicators in their assessments of societal progress (Hicks et al. 2013;Stiglitz et al. 2010). As the measurement of wellbeing continues to grow (Linton et al. 2016), it becomes increasingly important to assess how well these tools are able to perform across countries, populations and settings.
The ICECAP tools represent one approach to the measurement of wellbeing for use in economic evaluations. These tools were influenced by the work of Nobel Prize winning economist, Amartya Sen (Sen 1993), and conceptualise wellbeing in terms of how capable a person is to achieve a personally valuable life . The primary rationale for focussing on capability in the health field is a growing concern that measures may be missing important aspects of quality of life; currently measures used in health economic analysis tend to focus primarily on health (Coast et al. 2008b). The ICECAP-O was the first generic measure of capability wellbeing for use across patient populations used in health economics, and was specifically designed for use in older-adults (i.e. 65 years old and over) (Coast et al. 2008a;Grewal et al. 2006). Following this, ICECAP-A was developed for use in the general adult population (i.e. 18 years old and over) (Al-Janabi et al. 2012). ICECAP measures are now being recommended for use in economic evaluations assessing interventions in social care and long-term conditions, both in the UK where the ICECAP measures were originally developed (National Institute for Health and Care Excellence 2014) but also internationally (Versteegh et al. 2016) Given the impact of economic evaluations on policy and practice, it remains vital to assess the quality of the instruments involved in this process (Mokkink et al. 2010). Psychometric evidence provides a source of insight into the validity and reliability of the measures in use, but these investigations are infrequently conducted across different cultural settings. Nonetheless, experiences of health and wellbeing may differ between countries and there is a need for internationally used instruments to be comparable and applicable across cultures (Taggart et al. 2013).
A growing literature on the psychometric performance of the ICECAP tools exists. In the UK, studies have demonstrated that the ICECAP-A is responsive to deteriorations in clinical symptoms among women with irritative lower urinary tract symptoms (Goranitis et al. 2016) and to changes in health-related quality of life following knee pain (Keeley et al. 2015). Additionally, support has been found for the convergent validity of the attributes within the ICECAP-A with the attributes of the EQ-5D, in the general population (Al-Janabi et al. 2013). Convergent validity describes the extent to which scores produced by a new instrument are correlated with scores on comparable (or conceptually identical) existing instruments (DeVon et al. 2007). Cross-culturally, the ICECAP-O has demonstrated good convergent validity with existing measures in a sample of post-hospitalised older adults in the Netherlands (Makai et al. 2013) and among nursing home residents with dementia in Germany (Makai et al. 2014). Similar studies have demonstrated the reliability and validity of the ICECAP-O in Spain (Sarabia-Cobo et al. 2017) and Sweden (Hörder et al. 2016) and for the ICECAP-A in China (Tang et al. 2018). However, to date, the validity of a German translation of the ICECAP-A has not been investigated. In addition, no translated ICECAP-A or ICECAP-O measure has been directly compared to the original version using similar population groups across different countries.
The challenge of translating instruments into different languages is critical to the use of outcomes measures across countries. One of the central challenges here concerns the extent to which questionnaire items will be interpreted in the same way across cultures, and in alternative languages (Reeve et al. 2013). Specific guidance on the translation of outcome measures highlights how poor translation may result in new measures that misinterpret the concepts underpinning the original tools or violate the regular speech patterns of the target language (Wild et al. 2005). In summary, the translation of outcome measures requires careful consideration, in addition to an exploration of psychometric properties.
The aim of this work was to quantitatively assess the psychometric comparability of the newly translated German version of the ICECAP-A to the original English language version. To further investigate the validity of the ICECAP-A across contexts, this research investigates the psychometric performance of the tool in healthy and health condition populations. Although ICECAP measures have been translated into numerous languages, this is the first psychometric study to simultaneously compare a translated ICECAP instrument in its country of use (Germany) with the original English language version of the instrument in the UK. This study will contribute to the growing literature of articles evaluating the quality and applicability of capability wellbeing measures.

Dataset
This study used data collected as part of the Multi Instrument Comparison (MIC) dataset, a large study of health and wellbeing measures collected across different population groups and countries in 2012 (Richardson et al. 2015b). The survey was conducted by a global panel company CINT Pty Ltd., using online panels to recruit relevant individuals. Participants were classified into health condition groups if they self-reported one of seven primary health conditions (asthma, arthritis, cancer, depression, diabetes, hearing loss or heart disease). 'Healthy' participants were defined as those reporting 70 or higher on a 0-100 visual analogue scale measuring overall health, and without any other illnesses lasting longer than three months in the past year. This study utilised data collected in Germany and the UK. Quotas were employed to obtain a representative sample in terms of age, sex and education in the healthy population, while target quotas of 150 individuals per health condition group per country were employed to reach similar numbers of health condition groups within and across countries. The survey was approved by the Monash University Human Research Ethics Committee (reference number: CF11/3192-2,011,001,748). In addition to questions related to health and wellbeing, standard sociodemographic data were collected (age, sex, gender, educational attainment).

Capability Wellbeing
The ICECAP-A comprises five questionnaire items with four response levels each, and is designed to capture people's capability to live a life that they value (Al-Janabi et al. 2012). These 'capabilities' are stability (feeling settled and secure), attachment (able to achieve love, friendship and support), autonomy (able to be independent), achievement (able to achieve and progress in life) and enjoyment (able to experience enjoyment and pleasure) (Al-Janabi et al. 2012). The methodology used to convert raw scores into capability values is described elsewhere (Flynn et al. 2015), however scores range from 0 which represents 'no capability' to 1 which represents 'full capability'. In the absence of a scoring tariff for both countries, UK scores were applied to the German and UK sample.
In accordance with available guidance on the topic, preparation for the translation involved establishing a multi-lingual study team and undergoing a process of familiarisation with the concepts of the ICECAP-A (Wild et al. 2005). Next, a first language German speaker was identified by the Australian MIC study team to forward translate the instrument. Plausibility and accuracy of this translation was undertaken in collaboration with two members of the German MIC data collaborators (including MS) prior to data collection. Subsequent to data collection, a back-translation was undertaken by one German native with prolonged experience living in English speaking countries and who did not have any knowledge of the ICECAP-A. The back-translated English version was then compared with the German translation by an American native with prolonged experience living in Germany, who also did not know the ICECAP-A, to check for any literal and semantic mistakes. Semi-structured interviews were conducted by MS and JU, who are both familiar with the ICECAP-A, with the aim to understand if there were conceptual differences between the original ICECAP-A and the back translated version. It was concluded that the semantic translation was conducted well, with little if any potential for improvement on the translated version (further information available on request). Both the original, and the translated version of the ICECAP-A are available through the ICECAP project website (https://www.birmingham.ac.uk/ICECAP).

Health Status
The EQ-5D is the most common measure used in health economics to generate qualityadjusted life years (QALYs) (Wisløff et al. 2014). The tool consists of five dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression), with the most recent version (EQ-5D-5L) consisting of five response levels per dimension (no problems, slight problems, moderate problems, severe problems and extreme problems). English population values have been developed for the EQ-5D-5L (Devlin et al. 2017). Values are used to estimate the relative preferences for different health states as measured by the concepts captured in the measure. Values on the EQ-5D-5L are anchored on a 0-1 dead to full health scale, with values below zero possible (minimum value for England EQ-5D-5L value set of −0.285). The EQ-5D-5L was translated into German and validated in a previous study (Hinz et al. 2014). The English population value-set for the EQ-5D-5L was applied to the UK and German sample in the absence of a German value-set.
The Short Form 36 (SF-36) is a widely used instrument for measuring generic health status worldwide. It consists of 36 items that are compiled into eight sub-scales: vitality, physical functioning, bodily pain, general health perceptions, physical role functioning, emotional role functioning, social role functioning and mental health (Ware Jr and Sherbourne 1992). For each of the eight sub-scales, scores are summed to a 0 (worst health state) to 100 (best health state). The SF-36 was also previously translated into a German language version and validated within an existing study (Bullinger 1995). SF-6D scores were derived from responses to the SF-36 using econometric modelling methods (Brazier et al. 2002) in line with previous studies using the Multi-Instrument Comparison (MIC) dataset (Richardson et al. 2015a).

Satisfaction with Life
The Satisfaction with Life Scale (SWLS) was developed as a measure of 'life satisfaction', a key component of subjective well-being (Diener et al. 1985). The scale consists of five items and is scored on a five-point Likert scale (ranging from strongly disagree to strongly agree). An overall scale score is calculated by summing the item scores; higher scores indicate greater levels of life satisfaction. SWLS was chosen after previous research in English speaking samples suggest subjective wellbeing measures have comparable correlation with the ICECAP-A and the physical and mental health subscales of the SF-36 (Richardson et al. 2016).

Condition Severity Measures
The severity of disease burden experienced by participants in each of the health condition sub-samples was measured using the following condition specific tools: Arthritis Impact Measurement Scales (Guillemin et al. 1997), Asthma Quality of Life Questionnaire (Marks et al. 1992), European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire Cancer-30 (Aaronson et al. 1993), Depression Anxiety and Stress Scale (Lovibond and Lovibond 1995), Diabetes-39 Questionnaire (Boyer and Earp 1997), Abbreviated Profile of Hearing Aid Benefit (Cox and Alexander 1995) and MacNew Heart Disease HRQL questionnaire (Höfer et al. 2004). Scores on these instruments were transformed into total scale scores ranging from '0' (low) to '1' (severe) health condition severity. A full outline of the method used to develop these scores has been detailed previously .

Statistical Analysis
All analyses were conducted using STATA 14. Descriptive statistics are used to report the socio-demographic characteristics of the sample, split by country context and health condition sub-sample. The psychometric analyses were split by country to test whether the German translation of the ICECAP-A performs similarly to the original UK version of the tool. Analyses were further split by health condition group, to test whether the tool performed as well as expected across populations with a range of health conditions.
The internal consistency of ICECAP-A was tested using Cronbach's alpha, and refers to the extent to which the items of an instrument are conceptually inter-related (Tavakol and Dennick 2011). Although ICECAP-A is expected to assess five distinct components (stability, attachment, autonomy, achievement and enjoyment), it would be useful to examine the extent to which the tool measures some underlying and unifying concept of 'capabilities'. Alpha scores range from '0 -1' and low scores indicate poor internal consistency, while scores of '.7' or higher indicate satisfactory internal consistency (Bland and Altman 1997).
To assess convergent validity, Pearson's correlation coefficients between scores for the ICECAP-A and related measures of health-related quality of life and wellbeing were calculated (EQ-5D-5L, SF-6D and SWLS). Correlations between the ICECAP-A and other measures are considered strong if over '0.5', moderate if between '0.3' and '0.5' and weak if less than '0.3' (Cohen 1988).
Construct validity describes a broad range of measurement concerns; however, overall it concerns investigations into the relationships between scores on a measure with other variables that it theoretically should be associated with (Westen and Rosenthal 2003). A central component of construct validation is ongoing hypothesis testing concerning other questionnaire instruments or between sub-groupings of respondents (De Vet et al. 2011). In line with guidance, we investigated the ability of the ICECAP-A to differentiate between groups hypothesised to differ in their levels of the construct capability-wellbeing (the healthy sub-sample, compared to respondents reporting one of the seven primary healthy conditions).
Construct validity was assessed using a two-stage process. Firstly, ordinary least squares (OLS) regression methods were used to test whether type of health condition (reporting the presence of one of seven primary health conditions compared to being healthy) had a significant association with capability scores, controlling for age, gender and educational level. Although previous research has employed exploratory factor analysis to validate ICECAP measures compared to other health measures (Davis et al. 2013;Keeley et al. 2016;Engel et al. 2017), our research question required a method to compare the performance of two versions of ICECAP simultaneously. Previous research has demonstrated that OLS models perform well in health-related quality of life outcome studies (Al-Janabi et al. 2017), and studies with capability wellbeing measures (Mitchell et al. 2013;Mitchell et al. 2017b;Franklin et al. 2018). Secondly, the sample was split into the seven health condition groups, and follow-up OLS analyses were conducted to examine the associations between health condition severity and capability wellbeing for each of the health conditions, controlling for demographic differences (age, gender and education). This analysis was undertaken to explore the associations between health condition within each of the health condition groups, rather than to provide a directly comparable indicator of severity between conditions.

Sample Characteristics
Missing data were found for a very small proportion of the respondents (0.08%, 2/2503), therefore data were only analysed for complete cases (N = 2501). The descriptive characteristics of this sample are presented in Table 1. The German sample (N = 1212) and UK sample (N = 1289) are further split into healthy and health condition group subsamples. The youngest respondents were found in the depression sub-sample and the oldest respondents were found in the heart disease sub-sample across the German and UK samples. Similarly, the depression sub-sample contained the most female respondents while the heart disease sub-sample contained the most males in both the German and UK samples. The cancer sub-sample had the highest level of university attendance in the German sample, while people with Asthma had the highest educational level in the UK data. Finally, across Germany and the UK, ICECAP-A scores were highest amongst healthy respondents and lowest amongst respondents with depression.
The characteristics of respondents who scored at different levels of the five ICECAP-A attributes are described in Table 2. In the German sample, there were comparatively higher scores for achievement, while in the UK sample, over 50% of the sample indicated that they experienced the highest level of autonomy. Levels of life SD Standard deviation satisfaction (SWLS) and health-related quality of life (EQ-5D-5 L) were also examined for participants scoring at different levels of the ICECAP-A attributes. In Germany and the UK, the highest mean SWLS scores were found for respondents with the highest levels of 'Stability'. Similarly, in both samples, the highest mean EQ-5D scores were found in respondents with the highest levels of 'Achievement'.

Internal Consistency Reliability
Cronbach's alpha was calculated separately for the German and UK samples and is presented in Table 3. The German translation of the ICECAP-A demonstrated good Attributes refer to each of the domains within the ICECAP-A The four levels within the attributes range from having full capability (4) to having no capability (1) internal consistency (α = .83) and was comparable with the internal consistency of the scale in the UK data (α = .85). When internal consistency was assessed separately for each health condition group the scale was found to be similarly reliable (Germany: α = .74-.86, UK: α = .78-.86).

Convergent Validity
Correlation coefficients highlighting the relationships between the ICECAP-A and three related tools are presented in Table 4. All of the correlation coefficients were (All coefficients were statistically significant (p < .001)) a Samples sizes are stated in Table 1 statistically significant (p < .001) and positive in direction. In the German sample, all of the correlation coefficients were moderate to large in size (r ≥ .3). These results demonstrate that scores on the German translation of the ICECAP-A converge with scores on measures of health-related quality of life and wellbeing, and that this convergence is generalizable across sub-populations. These results were comparable to the results yielded with the English language version of the ICECAP-A in the UK sample. One notable difference in results across study populations was that capabilities (All coefficients were statistically significant (p < .001)) a Higher levels of health related quality of life (EQ-5D-5L and SF-6D), and higher levels of life satisfaction (SWLS) a b correlations between the ICECAP-A and were not conducted with the SWLS at item level, as the five questions in this scale measure a single dimension (life satisfaction).
(ICECAP-A) were more related to life satisfaction (SWLS) than health-related quality of life (EQ-5D-5 L and SF-6D) among healthy respondents in the German and UK study samples. Correlation coefficients between the ICECAP-A attributes and the dimensions within these three outcome measures for the overall samples in Germany and the UK are presented in Table 5. In both country contexts, SWLS scores had good correlation with most ICECAP-A attributes in both countries, except for the autonomy attribute. The anxiety/depression dimension on EQ-5D-5L had good correlation with all ICECAP-A attributes, with only the autonomy attribute on ICECAP-A having higher correlation with usual activities on EQ-5D-5L across both countries. For SF-36, the dimensions of vitality, social functioning and mental health also had good correlations across all ICECAP-A attributes in both countries.

Construct Validity
The effects of experiencing one of the seven health conditions (compared to being in the healthy sub-sample) on capability wellbeing are presented in Table 6. The coefficients from the OLS regression indicate that in the German sample, the presence of any of the seven of the health conditions had a significantly negative impact on capability scores, controlling for socio-demographic characteristics. Depression (β = −.267, p < .001), followed by cancer (β = −.119, p < .001) and heart disease (β = −.107, p < .001) had the strongest effects on capabilities, while the smallest effect was for the presence of asthma (β = −.041, p < .05). The seven health conditions were also associated with significantly lower levels of capability wellbeing in the UK sample, in which depression also had the strongest detrimental effect (β = −.274, p < .001). The R 2 statistics indicate that the  Dependent variable ICECAP-A scores, Coef. Regression coefficient, SE Standard error, CI 95% Confidence interval, *p < .05, **p < .01 and ***p < .001 demographics and health conditions accounted for approximately a fifth of the variance in ICECAP-A scores (Germany = 21% and UK = 18%). Finally, the sample was split by health condition group to determine the extent to which condition-specific severity had a negative association with ICECAP-A scores within the seven health condition groups. The OLS regression coefficients for health condition severity in each of the separate health condition groups are presented in Table 7 (Appendix 1 and 2: Full description of the individual regression analyses). In each of the German health condition groups, health condition severity had a significant negative impact on capability wellbeing, most notably for heart disease (β = −.673, p < .001), arthritis (β = −.556, p < .001) and depression (β = −.499, p < .001). A similar pattern of results was observed for the UK health condition groups.

Summary of Main Findings
This study set out to determine whether the German translation of the ICECAP-A is psychometrically valid, and capable of yielding results comparable to the previously validated original UK version of the tool. The German ICECAP-A demonstrated good internal consistency and good convergent validity when compared to existing measures of healthrelated quality of life and satisfaction with life. Further, scores on the newly translated version of the tool were significantly lower amongst respondents reporting one of the seven studied health condition groups (compared to being healthy). Finally, the results yielded by the German translation of the ICECAP-A used in the German sample were largely comparable to the results yielded by the original English language version of the tool used in the UK.

Interpretation of Findings
The findings from this study contribute to the growing literature concerning the psychometric validity of ICECAP measures across cultural contexts. The ICECAP-A also demonstrated convergent validity with the SWLS and SF-6D, suggesting that it is able to capture elements of a person's health-related quality of life, but also wider aspects of subjective wellbeing. This work also extends existing cross-cultural (Australia, Canada, UK and US) insight on the detrimental impact of depression on capabilities to a German context . Meeting the challenges caused by mental health difficulties in Europe has become a widely recognised policy priority (World Health Organization 2008).

Strengths and Limitations
The key strength of this study is its utilisation of data collected from participants reporting a variety of health conditions, from multiple country contexts. It represents the first study to validate a translated ICECAP measure across countries compared to the original version. Further, evidence is presented for multiple psychometric properties, indicating a comprehensive description of both validity and reliability. There are, however also some limitations. In particular, the pragmatic nature of the study meant that formal back-translation of the measure was not undertaken until after the data collection was conducted. A further caveat is that due to the use of cross-sectional data, further research is needed to establish the causality of the significant relationships identified. It should also be noted that there were some sociodemographic differences in the participants between the samples in the German and UK data. Reliance on a UK value set to demonstrate the validity of the German translation is also not ideal, but necessary given no value set for ICECAP-A currently exists outside the country of origin (Flynn et al. 2015). In addition, there is no "gold-standard" for the measurement of capability wellbeing, with the ICECAP-A representing one of the first attempts to directly measure self-reported capability (Mitchell et al. 2017a). Validity assessment in this study relies on common measures of health status currently used in health economics and a measure of subjective wellbeing. Finally, condition severity is measured using condition-specific measures in each health condition group, therefore the follow-up OLS results highlight the importance of condition severity within conditions, rather than differences in the importance of severity between conditions.

Implications
This study demonstrates that the German translation of the ICECAP-A performs similarly to the original English version of the tool and is able to collect reliable and valid data when applied in a German setting. This study therefore provides initial evidence that the ICECAP-A capability measure may be appropriate to be used in countries outside where the measure was developed. Specifically, this paper provides evidence for German decision makers to recommend the use of the ICECAP-A in economic evaluations, as has been previously done for the ICECAP-O in the Netherlands for evaluations of long-term conditions (Versteegh et al. 2016).

Future Research
Although this study presents evidence of the validity of the German translated ICECAP-A, there are a number of questions that remain unaddressed in the use of translated capability measures. Two main areas that need to be addressed concern the appropriateness of the ICECAP-A descriptive and valuation system outside of the UK. This question is not only aimed at countries where English is not the first language, as both the specific capabilities that are important and how they are valued could vary in both English and non-English speaking settings.
In terms of further validation of the German translated ICECAP-A, ongoing research is needed to establish whether the German translation of the ICECAP-A performs well when tested for additional psychometric properties. For example, a longitudinal study design would enable researchers to explore whether scores are stable across reasonable durations of time (test re-test reliability) and whether scores are sensitive to changes in life circumstances and health status (responsiveness).

Conclusion
This is the first paper to provide psychometric evidence that simultaneously compares a translated ICECAP instrument in its country of use (Germany) with the original English language version of the instrument in the UK. Although the current study is based on cross-sectional data, the results indicate that the German translation of the ICECAP-A yields valid and reliable data and is comparable to the original English language version of the tool. This study adds to growing body of literature on the psychometric properties of the ICECAP-A, whilst also extending our understanding of the relationship between condition severity and capability wellbeing.

Data Availability
The data used in this study were part of a larger Multi Instrument Comparison survey undertaken by the Centre for Health Economics at Monash University. It is available without charge and universally accessible online via the AQoL website (www.aqol.com.au). From the homepage there are instructions for accessing the data.

Compliance with Ethical Standards
Ethical Approval and Consent to Participate Ethics approval was obtained from Monash University Human Research Ethics Committee (MUHREC Approval CF 11/1758. At the start of the survey, a Participant Information and Consent form was provided. Proceeding with the survey was deemed as consent.
Conflict of Interest HA and JC developed the ICECAP-A measure. JR and AI created the MIC database. ML, PM, MS and JU have no competing interests to declare.

Appendix 1
OLS regression analyses investigating the impact of disease severity on capabilities (using the German translation of the ICECAP-A in Germany) in six patient populations

Appendix 2
OLS regression analyses investigating the impact of disease severity on capabilities (using the ICECAP-A in the UK) in six patient populations  Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.