FormalPara Key Points for Decision Makers

• Methods for resource-use measurement based on patient recall have been relatively neglected compared with outcome measurement; a workshop discussing the issues highlighted the importance of addressing this imbalance.

• Topics identified for a future research agenda included measurement of total versus condition-related resource use, the implications of a lack of gold standard, validation issues, recall periods, patient burden and the standardisation of resource-use measures.

• We believe that costing methodologies must be afforded the same consideration as is devoted to outcome measures.

1 Introduction

1.1 Background

Resource-use measurement is one of the most challenging parts of an economic evaluation to get right, and is correspondingly time consuming. Despite this, in comparison with outcomes research and other areas of health economics, resource-use measurement is the subject of very limited research efforts. In 1999 a review of methods for ‘assessing the costs of healthcare technologies in clinical trials’ suggested that funders should establish an archive of data collection instruments and referred to the need to determine appropriate recall periods, develop standard questionnaires and determine validity and reliability of questionnaires [1]. Thirteen years later, there has been some progress on all the issues raised, but there remains considerable room for improvement.

Despite the opportunities afforded by increasing access to electronically stored routine healthcare data in the UK, patient-reported resource-use measures (RUMs) are likely to be necessary in economic evaluation for the foreseeable future, and it is therefore important to ensure that they are of high quality. RUMs are developed in many different formats, and might include questionnaires, diaries or interviews, administered by post, in person or by telephone. Currently, researchers frequently duplicate effort by independently reinventing RUMs. Funded by a grant from the Medical Research Council (MRC) Network of Hubs for Trials Methodology Research, the Database of Instruments for Resource-Use Measurement (DIRUM) was created at http://www.dirum.org to provide a repository where researchers can share RUMs and methods [2]. The database now houses over 50 instruments, and is rapidly growing as more RUMs are submitted. Researchers are able to download and use the RUMs within the parameters of the permissions granted by the instrument developer. DIRUM also provides a preliminary archive of previous methodological work on resource-use measurement, which can be used as a platform for future research.

1.2 Workshop on Resource-Use Measurement Based on Patient Recall: Issues, Challenges and DIRUM

Following the development of DIRUM, a related workshop on resource-use measurement based on patient recall was held at the University of Birmingham, UK in October 2011. Of the 42 attendees, the majority came from UK universities but representatives from the UK National Health Service (NHS) and the pharmaceutical industry were also present. Speakers were drawn from a pool of UK academics with experience of resource-use measurement, and the workshop offered an opportunity to meet and discuss the issues, and to stimulate methodological research. The subjects were chosen as priority areas by speakers, and therefore represent a selected rather than systematic summary of topics. As the participants were primarily practising health economists, no introductory material on measuring costs alongside clinical trials was included; there are, however, many textbooks available which cover the subject (see Morris et al., for example [3]). The presentation slides are available on the DIRUM website (http://www.dirum.org).

Themes presented and discussed during the workshop were in the context of the UK, and included descriptions of the state of the art, the challenges associated with developing RUMs, means of improving practice and key items for a research agenda. In this Current Opinion article, we summarise these themes with the aim of stimulating methodological research into resource-use measurement and thereby improving the accuracy of resource-use estimates in economic evaluation. Based on the workshop content, we argue that insufficient research resource is currently directed towards resolving resource-use measurement issues, and that this area should be more highly prioritised for research.

2 Resource-Use Measurement: the State of the Art

2.1 Good Practice in Questionnaire Design

A recent review showed that patient self-reported questionnaires are widely used, but have a significant amount of missing data [4]. A number of practices can be identified that improve questionnaire design and, in theory, minimise missing RUM data. A Cochrane review and meta-analysis of methods to increase response rates to postal and electronic questionnaires identified over a hundred different strategies that had been evaluated through RCTs [5]. Although none of these referred specifically to patient-completed RUMs, some of the results can be generalised. For example, in terms of the administration of RUMs, a personalised approach identifying the originating institution and assuring confidentiality is most effective, while first-class mailing results in higher response rates than second class. Questionnaires should be as short as possible to identify the main costs from the chosen perspective [6], and filter questions should be used to direct completers past sections that are not relevant to them. The most relevant questions should be placed first; for RUMs, these would concern the main cost drivers.

A recent study aimed to determine whether the use of resource-use logs (diaries given to patients for recording resource use in real time) could reduce the amount of missing data in a subsequent patient-completed RUM [7]. Overall, the differences between trial arms were mostly negligible. However, for community-based NHS services there was significant improvement in completion rates with the log. It was concluded that a simplified resource-use log designed primarily using tick boxes has the potential to be a useful tool in reducing the amount of missing data from RUMs.

2.2 Experiences with Existing Questionnaires

Although many different RUMs have been developed, few have been as comprehensively tested as the Client Service Receipt Inventory (CSRI) [8]. The questionnaire was originally developed in the UK in the mid-1980s for evaluations of the closure of long-stay hospitals (particularly psychiatric and ‘mental handicap’ hospitals). The patient group had little experience of filling in forms and many of them used a wide range of health and other services. Taking approximately 20 minutes to administer, the questionnaire covers areas including the individual’s background, accommodation and living situation, employment history, social security benefits, health service usage and support from informal carers. There are now believed to be more than 200 different versions of the CSRI adapted to cater, for example, for different disease areas, healthcare systems, languages and modes of administration. The validity and reliability of the instrument are considered good, with consistency generally fairly well demonstrated [912].

Another established instrument is the Annotated Patient Cost Questionnaire (APCQ) [13]. In 1998, a need was identified by UK Health Economists’ Study Group (HESG) members for a standard questionnaire that could be used in a prospective economic evaluation to measure costs relating directly to patients and informal caregivers. A working party was set up, with broad aims of defining the items of resource use that could reasonably be obtained directly from patients. A menu of questions was developed for different contexts for researchers to develop patient-completed questionnaires for costs, reducing the ongoing ad hoc approach by individual researchers producing their own questionnaires from scratch. It was hoped that this more standardised approach would improve the transferability of results across different studies.

An empirical study was conducted to test how well the APCQ performed in practice [14]. From the main version, a derived questionnaire was readily developed for dialysis patients, covering travel costs to and from the dialysis unit, time spent receiving dialysis and travelling and other healthcare facility costs, and was found to be easy for patients to complete. The performance of the dialysis questionnaire was examined in several ways, with internal consistency tests showing good agreement, and test–retest reliability studies demonstrating consistent answers.

Experiences from developing these questionnaires have suggested that a slimline approach in which only the dominant cost drivers are sought may be appropriate. Standardisation should be based on broad principles rather than being an attempt to create identical RUMs for all occasions. The effort expended and attention to detail should be proportionate to the expected contribution of different services or resources to total costs; for example, an inpatient episode counts much more highly than an infrequently accessed therapy group in total cost, and it is clearly more important to accurately assess use of the former. Questions must also be sensitive to the needs of groups who may have cognitive or addiction problems, and sensitive to the topic under consideration (crime and an expectation of loss of benefits, for example, may be aspects of resource use that provoke particular concerns amongst respondents).

3 Challenges Associated with Developing and Applying RUMs

The first challenge to arise is whether it is more appropriate to use an available questionnaire (such as those described above), or to develop a new one. Apart from being less costly, existing questionnaires have the advantage of providing consistency across studies, but even the most appropriate existing questionnaire may not represent the best way to capture key information for any given study. Researchers need to consider whether the marginal benefits of developing and testing a bespoke questionnaire will outweigh the marginal costs.

The design of new questionnaires requires consideration of whether to capture condition-related or total resource use. Whilst in principle only those resources relating to the condition in question are relevant, the definition of ‘condition related’ may not always be obvious. For example, a study on blood glucose control that included only diabetes-related resource use might regard a hospital admission for a fractured leg as not relevant, when the accident may have been due to an episode of hypoglycaemia.

There are also a number of important issues that apply regardless of whether a new or existing questionnaire is used. For example, while it may seem obvious that ‘patient recall’ data will be obtained from patients, this will not be possible with children or cognitively impaired adults and the literature on preference elicitation indicates that in many circumstances the question of who to ask is not straightforward [15, 16]. Similarly, the decision whether or not to capture participants’ resource use prior to the start of a study has to be taken in context. In large trials, randomisation should ensure an even distribution between study arms of those who routinely are high resource users but, in smaller studies, it may be necessary to capture pre-study use of resources as a covariate [17]. The level of detail requested can also be problematic as patients’ lack of knowledge of health service structures may lead them to misclassify resource usage. The cost implications of inputs by consultants versus junior doctors are significant but respondents are unlikely to know the grade of doctor who treated them or maybe even whether the health professional in question was a doctor or a nurse.

3.1 Validation Issues

The validation of cost questionnaires represents another area of considerable uncertainty. Validity assessment includes considering the extent to which an instrument measures what it purports to represent, and the degree to which it might be helpful in answering a particular question. New questionnaires require validation to ensure the appropriateness of the response categories, the clarity of the instructions, and the layout, format and length of the questionnaire. However, the absence of a gold standard can make this problematic.

Only about 30 % of studies funded by the UK Health Technology Assessment (HTA) programme described an RUM validation process [18], which compares poorly with validation of patient-reported quality of life measures. However, these outcome measures are based on patient opinion rather than provable information, may apply to many cases, and represent a snapshot of health at a particular time; in contrast, RUMs are based on verifiable events, are completely specific to the context of the trial being undertaken, and cover a time period spanning the complete trial. Owing to the cognitive processes involved in responding to questionnaires and the nature of the data being collected, there may be some intrinsic conceptual challenges associated with validating RUMs. Based on the cognitive response model in which comprehension of the question is followed by retrieval of the relevant data before a judgment can be made and a final response given [19, 20], much internal debate concerning a patient-reported quality of life measure takes place at the judgment stage, whilst for a resource-use question the internal debate may well arise at the retrieval stage. Given the different cognitive processes involved, it may not be reasonable to consider validity for patient-reported quality of life measures and costs in the same way.

Validity measurement can take several forms [21], although the empirical literature for RUMs has tended to focus on content and criterion validity. Some examples of content validity are expressed in terms of comparison with content in previously used questionnaires; however, these may not be adequate as evidence of validity—the argument that an instrument is validated because of a comparison to another (unvalidated) instrument is not necessarily logical. Criterion validity considers whether a measure performs well in relation to an identified gold standard; however, although a gold standard for costs may be provided by routine databases, the hypothesis that these systems are accurate and complete is questionable.

Construct validity, commonly studied in outcomes measurement, focuses on expected relationships: an instrument is said to have construct validity if relationships with other measurements are as expected given the construct being assessed. However, for resource-use data the absolute quantities are likely to be just as important as any expected correlations and may be much more difficult to assess.

Some early content validation could potentially be carried out by patient representatives on trial steering committees, patient groups and via qualitative interviews. Ideally, questionnaires should also be piloted to pinpoint where poor response exists and allow reduction in the length if resources are not consumed (provided the pilot is truly representative of the study population). However, it may not be feasible to validate every combination of questionnaire, condition, context and adaptation, and doing so might not represent a good use of research resources; validating individual questions, types of question or methods may be more appropriate. Currently, there is no consensus on whether a specific measure, or a general approach to measurement, should be validated.

4 Improving Practice

Workshop participants made many suggestions for improving the design and administration of RUMs. For example, sources of unit costs should be identified at the outset of a study to help with specifying questions with the appropriate level of detail in the RUM, which makes many RUMs inherently country-specific as the granularity of unit costs varies between settings. Pilot or modelling studies should be used more extensively to identify important cost drivers where greater care should be taken to elicit valid responses. The patient should be treated as an essential source of information about the relevance of particular questions; different cultural or ethnic populations may differ in their resource use and the questionnaire might need to be adapted accordingly. DIRUM may have a role in allowing future trialists to identify and pilot relevant RUMs, before deciding whether to develop a new instrument.

Good-practice guidelines are required to improve methodological consistency. A recent review of the methods of resource-use data collection employed by clinical trials funded by the UK HTA programme [18] found that the majority of trials collected patient-level data and, of these, most used questionnaires, forms or interviews. Only a small proportion showed explicit evidence of having identified items of resource use at the planning stage through consultation with healthcare professionals or conducting a review of published economic literature; however, it is likely that some form of analysis will have been undertaken even if not explicitly stated. From the described validation procedures within this review, a good practice checklist was developed, covering perspective, identification of resource-use items, planning for data collection and analysis, resource-use data collection, piloting, validation, non-trial estimates of resource use, methods for costing and standardisation of a reporting format [18]. This has further been developed into a flow diagram detailing an ideal way of approaching instrument development, encompassing planning, development, piloting, analysis and deployment of instruments (Fig. 1).

Fig. 1
figure 1

Flowchart describing the processes for developing resource-use measures

5 Research Agenda

One of the aims of the workshop was to identify a future research agenda. A number of challenges affect any potential research; for example, there is little funding available specifically for methodological research on measuring resource use. The health economic community does not have a culture of validation for resource-use measurement, and lacks experience of the specialist psychometric techniques required to conduct validation studies effectively; however, learning from other disciplines, such as market research or psychology, is a possible route to tackling these problems.

The workshop concluded with an open discussion of potential topics for future research. We have summarised each topic in Table 1 with the hope of stimulating such research. However, this list is not intended to be exhaustive, nor is it prioritised in order of importance.

Table 1 Potential topics identified for developing into a research agenda

6 Conclusion

While there are many aspects of resource-use measurement for which the health economics community has not yet reached consensus, one area on which participants appeared to agree was that there is a paucity of evidence regarding the collection of resource-use data and that there is an urgent need for this deficiency to be addressed. We believe it is time to build on the enthusiasm shown at the workshop, and that the many methodological challenges raised in the course of the workshop should now be addressed in order to generate firm recommendations for researchers. We hope that this Current Opinion article will stimulate debate and, ultimately, galvanise the health economics community into action, resulting in better tools with which to generate the cost-effectiveness estimates that ultimately determine the interventions that patients will and will not be entitled to receive.