The first challenge to arise is whether it is more appropriate to use an available questionnaire (such as those described above), or to develop a new one. Apart from being less costly, existing questionnaires have the advantage of providing consistency across studies, but even the most appropriate existing questionnaire may not represent the best way to capture key information for any given study. Researchers need to consider whether the marginal benefits of developing and testing a bespoke questionnaire will outweigh the marginal costs.
The design of new questionnaires requires consideration of whether to capture condition-related or total resource use. Whilst in principle only those resources relating to the condition in question are relevant, the definition of ‘condition related’ may not always be obvious. For example, a study on blood glucose control that included only diabetes-related resource use might regard a hospital admission for a fractured leg as not relevant, when the accident may have been due to an episode of hypoglycaemia.
There are also a number of important issues that apply regardless of whether a new or existing questionnaire is used. For example, while it may seem obvious that ‘patient recall’ data will be obtained from patients, this will not be possible with children or cognitively impaired adults and the literature on preference elicitation indicates that in many circumstances the question of who to ask is not straightforward [15, 16]. Similarly, the decision whether or not to capture participants’ resource use prior to the start of a study has to be taken in context. In large trials, randomisation should ensure an even distribution between study arms of those who routinely are high resource users but, in smaller studies, it may be necessary to capture pre-study use of resources as a covariate [17]. The level of detail requested can also be problematic as patients’ lack of knowledge of health service structures may lead them to misclassify resource usage. The cost implications of inputs by consultants versus junior doctors are significant but respondents are unlikely to know the grade of doctor who treated them or maybe even whether the health professional in question was a doctor or a nurse.
Validation Issues
The validation of cost questionnaires represents another area of considerable uncertainty. Validity assessment includes considering the extent to which an instrument measures what it purports to represent, and the degree to which it might be helpful in answering a particular question. New questionnaires require validation to ensure the appropriateness of the response categories, the clarity of the instructions, and the layout, format and length of the questionnaire. However, the absence of a gold standard can make this problematic.
Only about 30 % of studies funded by the UK Health Technology Assessment (HTA) programme described an RUM validation process [18], which compares poorly with validation of patient-reported quality of life measures. However, these outcome measures are based on patient opinion rather than provable information, may apply to many cases, and represent a snapshot of health at a particular time; in contrast, RUMs are based on verifiable events, are completely specific to the context of the trial being undertaken, and cover a time period spanning the complete trial. Owing to the cognitive processes involved in responding to questionnaires and the nature of the data being collected, there may be some intrinsic conceptual challenges associated with validating RUMs. Based on the cognitive response model in which comprehension of the question is followed by retrieval of the relevant data before a judgment can be made and a final response given [19, 20], much internal debate concerning a patient-reported quality of life measure takes place at the judgment stage, whilst for a resource-use question the internal debate may well arise at the retrieval stage. Given the different cognitive processes involved, it may not be reasonable to consider validity for patient-reported quality of life measures and costs in the same way.
Validity measurement can take several forms [21], although the empirical literature for RUMs has tended to focus on content and criterion validity. Some examples of content validity are expressed in terms of comparison with content in previously used questionnaires; however, these may not be adequate as evidence of validity—the argument that an instrument is validated because of a comparison to another (unvalidated) instrument is not necessarily logical. Criterion validity considers whether a measure performs well in relation to an identified gold standard; however, although a gold standard for costs may be provided by routine databases, the hypothesis that these systems are accurate and complete is questionable.
Construct validity, commonly studied in outcomes measurement, focuses on expected relationships: an instrument is said to have construct validity if relationships with other measurements are as expected given the construct being assessed. However, for resource-use data the absolute quantities are likely to be just as important as any expected correlations and may be much more difficult to assess.
Some early content validation could potentially be carried out by patient representatives on trial steering committees, patient groups and via qualitative interviews. Ideally, questionnaires should also be piloted to pinpoint where poor response exists and allow reduction in the length if resources are not consumed (provided the pilot is truly representative of the study population). However, it may not be feasible to validate every combination of questionnaire, condition, context and adaptation, and doing so might not represent a good use of research resources; validating individual questions, types of question or methods may be more appropriate. Currently, there is no consensus on whether a specific measure, or a general approach to measurement, should be validated.