Background

In the UK, the government faces an increasing challenge to meet the growing demands on the healthcare system. Despite increased public expectations of treatment availability, an ageing population and higher levels of chronic disease, the government is aiming to achieve efficiency savings of £20 billion in the National Health Service’s (NHS) budget by 2014[1]. Savings are to be made through focusing on quality, innovation, productivity and prevention. Treatments offered on the NHS must be clinically effective and cost-effective, as assessed by the National Institute for Health and Clinical Excellence (NICE). The NICE guide to technology appraisal[2] states that cost-effectiveness should be reported in Quality Adjusted Life Years (QALYs), a measure combining length of time with quality of life (QoL). Therefore, the choice of instrument used to measure QoL is important, as the resulting QALY calculations determine whether a treatment is cost-effective and hence potentially funded. The issue of NICE cost-effectiveness funding thresholds may only be applicable to the UK; however, the methodological issue of measuring and valuing carer benefits has international application.

Dementia places a large burden on the economy, with costs incurred by the health care sector, social care sector and informal carers[3]. The largest proportion of the cost (55%) is incurred by informal carers looking after a friend or relative, and is indicative of the burden faced by carers. Carer burden can predict institutionalisation of the person with dementia[4, 5]; therefore evidence of effective methods to support carers in their role needs to be established to delay institutionalisation and the associated costs. Burden can affect QoL through decreased mental wellbeing caused by stress and worry, and also the opportunity cost of reduced time for leisure activities and self-care[6].

The need to use appropriate outcome measures in health economics research has been recognised[79]. Interventions involving people with dementia and their carers may be complex with multiple objectives; therefore it is necessary to consider multiple outcome measures. Focusing on one attribute, such as QoL, may lead to other benefits being overlooked. Moniz-Cook et al.[10] argued that a more cohesive approach to outcome measurement in dementia care research will lead to a more robust evidence base. Health economists require preference based utility measures for calculating QALYs. However, restricting benefit measurement to health-related outcomes in carer research places a patient identity on the carer, which may not be appropriate[11]. This article aims to address the question ‘what outcome measures are used most frequently in interventions involving carers of people with dementia, and how useful are these measures for economic evaluation?’

Methods

A systematic literature search of electronic databases was conducted on 1st March, 2012. PRISMA reporting principles were used as guidance[12]. PubMed (1946–2012), Medline (1950–2012), the Cumulative Index to Nursing and Allied Health Literature (CINAHL) (1981–2012), PsycINFO (1806–2012), and the NHS Centre for Reviews and Dissemination (containing the National Health Service Economic Evaluation Database (NHS EED), Database of Abstracts of Reviews of Effects (DARE) and Health Technology Assessment database) (1960–2012) were searched. Titles, keywords and abstracts were searched for the terms caregiver, randomized controlled trials and dementia or Alzheimer’s disease using MeSH terms where possible. The search strategy for each database is presented in Additional file1: Appendix 1. No restriction on publication year was set. Study eligibility was based on initial screening of title and abstract. Articles passing initial screening were retrieved for further review.

Studies were considered if they reported an intervention with outcome measures for carers of people with dementia. Carers could be paid workers or informal carers, such as friends and family members. We included outcomes for paid carers to get a broader indication of which aspects of health and social care provision are typically measured. No gender, age or nationality restrictions were applied. The person being cared for could be living in residential care, a medical facility or the community.

Carer outcome measures were extracted and categorised. The categories used in Moniz-Cook et al.[10] were a starting point: burden, mood, quality of life and staff competency and morale. Two additional categories were developed after reviewing the data: mastery and social support and relationships.

Results

2262 records were retrieved, 2093 articles remained after duplicates were removed (Figure 1). After screening titles and abstracts, 1638 articles were excluded. Exclusion reasons included no carer outcome measure (764 articles), the population not consisting of dementia carers (352 articles), commentary articles or clinical practice guidelines (267 articles) and systematic review articles (255 articles). 455 articles reporting on 361 studies remained. 228 outcome measures were extracted. A full list of extracted outcome measures, the number of studies they appeared in and their earliest and most recent year is in Additional file 2: Appendix 2. Table 1 presents key properties of outcome measures appearing in four or more studies (1% of included studies). Table 2 shows the change in composition of carer outcome measures used over the years.

Figure 1
figure 1

Flow of articles retrieved through electronic searches.

Table 1 Properties of the most frequently used outcome measures
Table 2 Composition of outcome measures across the years

Burden measures

The 44 measures in this category consisted of burden, stress and strain. Burden was the second most popular category of measure used in dementia carer research. The Zarit Burden Interview (ZBI)[13] was most popular, appearing in 76 studies (21%). The ZBI is dementia specific, originally a 29-item instrument but also available as a shorter 12-item version[14]. Domains of the ZBI cover physical health, psychological well-being, finances, social life, and relationship with the person with dementia. The earliest paper retrieved which included the ZBI was published in 1994; the ZBI is still used currently. The Revised Memory and Behavior Problem Checklist (RMBPC)[15] was the second most popular measure, appearing in 44 studies (12%). It is also dementia specific and contains 24 items adapted from the Memory and Behavior Problem Checklist (MBPC)[16]. The MBPC assesses the frequency and severity of problems exhibited by a person with dementia and their carer’s reaction to these problems. As with the ZBI, the RMBPC has also been in use since 1994 and is still used today.

Mastery

Forty-three measures encompassing the family carer’s coping, self-efficacy and competence were extracted. As can be seen in Table2, mastery measures were infrequently used in earlier studies. Currently, mastery measures account for 17% of the outcome measures included in dementia carer research. The Sense of Competence Questionnaire (SCQ)[17] was most popular, appearing in 12 studies (3%) since the year 2000. The SCQ was developed to measure the ability of carers to cope with looking after people with dementia living at home. Three domains are covered: satisfaction with the person receiving care, satisfaction with one’s own performance as a carer and the impact of caring on the personal life of the carer.

Mood

Mood measures were included the most frequently, and currently account for almost one third of dementia carer measures included. Sixty-one mood measures covering anxiety, depression, sleep and general mental well-being were extracted. The Center for Epidemiologic Studies Depression Scale (CES-D)[18] was the most frequently used measure, appearing from 1989 onwards. CES-D was followed in frequency by the General Health Questionnaire (GHQ)[19] and the Neuropsychiatric Inventory-Distress (NPI-D)[20]. The NPI-D primarily assesses the frequency and severity of behavioural disturbances occurring in people with dementia, but also asks carers to rate their reaction to the behaviours. The NPI-D is one of the more recently developed mood measures, first appearing in the year 2000.The next most popular measures were the Geriatric Depression Scale[21] which was developed for use in an elderly population, the Beck Depression Inventory (BDI)[22] and the Neuropsychiatric-Questionnaire (NPI-Q)[23], a version of the NPI-D suitable for use in a clinical setting which has appeared in publications from 2006 onwards.

Quality of life measures

Thirty-two QoL measures were identified. While QoL measure inclusion has increased over the years, only 16% of included outcome measures are currently for QoL. Four outcome measures were used most frequently: the Short Form-36 (SF-36)[24], the EuroQoL (EQ-5D)[25]; the World Health Organization Quality of Life-brief (WHOQOL-BREF)[26] and the Health Utilities Index (HUI)[27]. The SF-36 and EQ-5D appeared in publication from 2001 onwards, while the WHOQOL-BREF is a more recent, appearing 2007 onwards.

The SF-36 evolved from the RAND Health Insurance Experiment, a 15 year study of American health policy; and the Medical Outcome Study of patients with chronic illnesses[24]. The SF-6D was subsequently developed; enabling preference based utility scores and QALYs to be calculated from the SF-36 or SF-12[28, 29]. While it is possible to use the SF-6D directly in a study, developers recommend using the SF-36 or SF-12 and then translating results into the SF-6D. The six domains of the SF-6D are physical functioning, role limitation, social functioning, pain, mental health and vitality.

The EQ-5D was developed in Europe and consists of a questionnaire (EQ-5D) and a visual analogue scale (EQ-VAS). The EQ-5D comprises five domains: mobility; self care; usual activities; pain and discomfort; and anxiety and depression. A scoring algorithm converts responses into an index score which can be used to calculate a QALY. On the EQ-VAS, respondents are presented with a thermometer with markings representing the worst and best imaginable health state. Respondents are asked to draw a line to mark the level they would describe their health as being. While the scoring of the EQ-5D is preference based, the EQ-VAS is not.

The WHOQOL-BREF is derived from the WHOQOL-100, an instrument developed by a global research team and intended to be applicable cross-culturally[26]. The domains of the WHOQOL-BREF can be broadly categorised into physical health, psychological wellbeing, social relationships and the environment. Preference based utility scores are not available for either the WHOQOL-BREF or WHOQOL-100.

The Health Utilities Index has two main versions: the HUI2 used with children, and the HUI3 used with adults. The HUI3 has eight domains: vision, hearing, speech, ambulation, dexterity, emotion, cognition and pain[27].

Social support and relationships

The earliest published use of a social support or relationship measure was in 1999. Twenty-seven measures were identified in this category. Only the Social Support Questionnaire[30] and the Stokes Social Support Network List[31] were used consistently, neither was developed for dementia carers. The Social Support Questionnaire assesses the respondent’s perceived number of social support contacts and their satisfaction with the social support available. The Stokes Social Support Network List asks respondents to list people they have contact with on a regular basis and whether or not they are relatives. The respondent’s social network size and composition is then determined. The Stokes Social Support Network List is a recent measure, appearing in publications dated 2006–2010.

Staff competency and morale

Staff competency and morale measures were included from 1994 onwards. Twenty-one measures were identified. Only two questionnaires were used in four or more studies; the Maslach Burnout Inventory[32] and the Approaches to Dementia Questionnaire[33]. Burnout is described as the emotional exhaustion and cynicism experienced by staff involved with people-facing roles[32], and the consequences of burnout are low quality of care, low morale and high staff turnover. The Approaches to Dementia Questionnaire assesses the carer’s attitude towards the care recipient.

Discussion

The key to selecting appropriate outcome measures is defining what an intervention targets, and therefore what a measure has to be able to capture. As can be seen in Table2, the composition of measures included in dementia carer research has changed over time. In earlier years, mood measures were the most prevalent. While this is still true of current research, the gap between use of mood and burden measures has narrowed. Measures capturing social support and relationships are more commonly used now.

Whichever instrument is used, NICE prefers results to be converted into a QALY to allow comparisons across different illnesses and interventions[2]. In order to satisfy QALY methodology, quality weights must be based on preferences and anchored on an interval scale which contains full health and death points[34]. Preference-based generic instruments, such as the EQ-5D are preferred; however, ‘when EQ 5D utility data are not available, direct valuations of descriptions of health states based on standardised and validated HRQL measures included in the relevant clinical trial(s) may be submitted. In these cases, the valuation of descriptions should use the time trade off method in a representative sample of the UK population, withfull healthas the upper anchor, to retain methodological consistency with the methods used to value the EQ 5D’[2]. Validity of the instrument selected is important for results to be meaningful. The most popular measures in the QoL category have been validated with members of the general population.

The aggregation of carer and patient QALYs is rarely undertaken; however, one trial of befriending for carers of people with dementia presented the incremental cost-effectiveness ratio (ICER), as calculated with the EQ-5D for the QALY component, for both the carer alone and the carer and person with dementia combined[35]. The intervention was not cost-effective when the ICER was calculated for the carer alone, but became cost-effective when the spill-over effects on the person with dementia were incorporated. Aggregation of QALYs needs to be undertaken cautiously, with the information used to calculate resulting ICERs explicitly stated to allow for comparisons with interventions where QALYs have not been aggregated.

Out of the most popular instruments in the QoL category, only weights for the EQ-5D were derived using the time trade-off method. The SF-6D and HUI3 were valued using a visual analogue scale and standard gamble; the WHOQOL-BREF does not have preference based scoring. Three possible explanations for differences in health state valuations between measures have been put forward: coverage of descriptive systems, sensitivity of dimensions and valuation methods[36]. Instruments which describe more health states will pick up smaller changes in health status and are more appropriate for research where smaller health gains are expected to be made[37], such as research involving carers. The HUI3 can describe 972,000 health states; the SF-6D either 7,500 or 18,000 depending on the version, while the EQ-5D only describes 243 health states. A ‘ceiling effect’, where higher health states are chosen more frequently, is known to be a feature of the EQ-5D. In contrast, the SF-6D appears to have a ‘floor effect’, with responses clustered at the lower end of the scale. The floor effect is amplified in population groups with more physical health problems, so may not be an issue when conducting research with carers of people with dementia. This is because although many carers do have health issues, one may assume that they already have reasonable physical health to be able to cope with the physical aspects of caring.

The World Health Organisation defines health as ‘a state of complete physical, mental and social well being and not merely the absence of disease or infirmity’; a definition unchanged since 1948[38]. Furthermore, the seven determinants of health are suggested as: income and social status, education, physical environment, social support networks, genetics, health services and gender[39]. This reinforces the idea that we need to go beyond physical health measurement, and consider other attributes affecting QoL. This is particularly relevant for dementia carers, as research is primarily aimed at relieving burden rather than improving physical health.

While the EQ-5D covers physical domains well there is only one question on mental well-being. Due to the dominance of physical domains, it is not particularly sensitive to changes in carers of people with dementia, who might not see changes in their physical health over time though their QoL is still affected. This issue was raised by Al-Janabi et al.[11], who posed that measuring health related outcomes for carers places a ‘patient’ identity on carers. In a cross-sectional study involving carers of people with dementia completing the HUI2, Neumann et al.[40] found that the stage of Alzheimer’s Disease was a negative predictor of patient utility (as reported by carers completing the HUI2 as a proxy); however, the utility that carers reported for themselves was insensitive to the stage of the care recipients dementia. For research involving carers of people with dementia it may be necessary to include additional outcome measures alongside a generic primary outcome measure for cost-effectiveness analysis.

It has been found that disease specific instruments are better at detecting QoL changes than generic instruments[41]. The main advantage of disease specific instruments is that they are sensitive to changes associated with the disease in question; therefore studies do not need a large sample size. A disadvantage is that co-morbidities may be overlooked; by focusing on QoL changes associated with one particular illness, separate health issues are ignored. As people with dementia and their carers tend to be older, co-morbidities and side effects are particularly relevant. Disease specific instruments are typically focused on the person with the illness; therefore using a population group measure may be more appropriate for carers. Population specific measures cover a broader range of domains than disease specific instruments, with the additional benefit of being more sensitive than a generic instrument. This review found that the most popular instruments in the burden category were developed specifically to measure burden in dementia carers, combining disease specific with population specific domains.

This review found 29 studies which included details of costs; however, most of these were only partial economic evaluations which provided cost-outcome descriptions. Where cost-effectiveness analyses had been performed the unit of effect was typically time e.g. cost per additional year that the person with dementia lived at home, cost per reduction in hours spent on care tasks per day. Cost-utility analysis was included in 3 studies[35, 42, 43]; the outcome measures used were the EQ-5D, HUI2 and the Caregiver Quality of Life Instrument. All three measures are suitable for QALY calculations. The study that included the cost-utility analysis using the HUI2[42] aggregated carer and patient QALYs, which as mentioned above is not consistent with traditional QALY methodology. 9 of the studies listing costs were protocols, 7 planned to conduct cost-utility analysis using the EQ-5D and 2 planned to conduct cost-utility analysis using the SF-12 or SF-36.

Overall, burden and mood measures were the most frequently used. The earliest article retrieved from the searches was published in 1987 and included 4 mood measures and 1 QoL measure. Outcome measures in the mood category covered a broad range of symptoms from overall mental well-being, anxiety, depression and sleep quality. A variety of social support measures were used; the two most frequently used measures were not specific to dementia carers. Social support measures have grown in popularity but are still not as frequently used as burden, mastery, mood or QoL measures. The least frequently used category of measure was the staff competency and morale category. A large number of unspecified measures were found, mainly due to poor reporting of study methods precluding the authors of this review being able to identify the measure used. The increased use of guidelines such as CONSORT[44], has improved the quality of reporting of trials in recent years.

Future directions

The ICECAP index of capability has been developed to measure attributes of QoL rather than influences on QoL e.g. health[45]. The theory is that QoL does not decrease due to poorer health, but instead decreases through limitations in what one can do as a result of poor health i.e. individuals value the activities that they can undertake rather than health itself. In this sense, instruments such as the EQ-5D are only a proxy measure for QoL rather than a direct measure[46]. Two versions of the ICECAP are available: the ICECAP-O, suitable for ages 65+; and the ICECAP-A, suitable for ages 18+. The domains of the ICECAP-O are: love and friendship; thinking about the future; doing things that make you feel valued; enjoyment and pleasure and independence. These domains were developed to measure capability in older members of the general population[47] and have a certain degree of overlap with the categories of burden, mastery, mood, quality of life, and social support and relationships. The domains of the ICECAP-A are similar: security; loved and friendship; independence; achievement and enjoyment and pleasure. Currently, an algorithm to convert ICECAP scores into a QALY is not yet available. One way around this is to perform a mapping exercise of ICECAP scores onto EQ-5D scores. To be valid this would require considerable time and financial resources to construct the necessary data set.

The capability framework has also led to the development of the Adult Social Care Outcomes Toolkit (ASCOT)[48], an instrument to measure social care-related QoL. While the ASCOT does not specifically measure carer well-being, it is a step towards acknowledging the importance of the care environment that a person is living in. Domains of the instrument include: control over daily life; personal cleanliness and comfort; food and drink; personal safety; social participation and involvement; occupation; accommodation cleanliness and comfort; and dignity. While the domains are similar the ICECAP, the advantage that the ASCOT tool has is that it is a preference-based measure with scoring reflecting preferences of the general population[49].

Conclusion

Few studies currently incorporate economic evaluations alongside clinical trials as routine practice. The choice of instrument used to measure QoL has implications for whether or not a treatment is considered cost-effective and potentially funded. Health economists need to choose instruments appropriate for the population and expected outcomes. Researchers need to consider ease of administration and clarity of instrument to ensure as many participants as possible complete questionnaires. For carers of people with dementia, available time is already restricted so there is a need to avoid overburdening participants with lengthy questionnaires. If an instrument is not sensitive enough to detect changes in QoL for carers of people with dementia, the effect of an intervention which may appear to be beneficial to participants are underestimated. Use of capability based instruments such as the ICECAP and ASCOT should enable decision-makers to compare the value of health and social services that may improve the QoL of an individual without necessarily improving their health, to interventions that impact on both health and QoL[47].