Background

Outcome selection and reporting in randomised controlled trials (RCTs) is often problematic. Heterogeneity in outcomes measured across studies in the same disease or treatment may hamper effective evidence synthesis. A systematic review of oesophageal studies, for example, found 10 different measures for postoperative mortality which were often undefined [1]. In addition, selective reporting of outcomes puts trials at risk of outcome reporting bias and can mean treatment effects are exaggerated [2]. These issues may be further complicated for patient reported outcomes (PROs). PROs are typically assessed using questionnaires (patient reported outcome measures (PROMs)) and many validated questionnaires are available because PROMs have been developed by different groups and disciplines (for example, clinical versus psychological) or for differing purposes (for example, measurement of health in generic populations versus disease-specific patient groups). A single PROM can be made up of numerous scales and single items and generic and disease specific PROMs are often combined to assess a range of relevant health domains within an RCT. This means that different (and often ill-defined) outcomes may be reported and the multiplicity of items and scales may also allow selection of statistically significant rather than pre-determined a priori PRO endpoints to be reported, increasing the risk of outcome reporting bias. Problems are further accentuated for PROs because terminology of the scales and items across PROMs is not universally agreed meaning data synthesis across studies is difficult when different questionnaires are used, and while there is overlap in the issues that are measured there is also variation because PROMs have been developed by different methods and for different purposes. Potential solutions to these challenges are to develop and use core outcome sets.

Core outcome sets (COSs) are an agreed minimum set of outcome domains to be measured and reported in all trials of a particular treatment or condition [3]. The routine measurement of COSs has the potential to facilitate data synthesis and reduce outcome reporting bias by standardising the outcomes that are measured across studies and this has been emphasised by the COMET (Core Outcome Measures in Effectiveness Trials) initiative which supports the development and application of COSs for pragmatic (effectiveness) trials [4]. Pragmatic trials are designed to assess whether an intervention is effective for routine clinical practice and outcomes, therefore, need to be relevant and important to patients as well as clinicians and other key decision-makers [5]. In many cases these are the outcomes that are assessed with PROMs, particularly if the questionnaire has been developed with patient input [6] but the availability of so many different PROMs, however, means there are problems with selecting which of the measured health domains are ‘core’. The aim of this study, therefore, was to explore and report methods to identify PRO domains from the wealth of available PROMs and to use this approach to inform the development of a COS to use in pragmatic trials in a specific condition. Consensus on which outcomes to include in the final core set, and the methods to achieve this, are the focus of further research.

Methods

This study was undertaken within one disease site and treatment - radical treatment for oesophageal cancer, selected because the research team have clinical and PROM expertise in this area and have previously tried to summarise PRO evidence [79]. There were three phases of work: (1) a systematic literature review to identify validated PROMs used in oesophageal cancer studies and the scope of these instruments; (2) a detailed content analysis to explore PROM diversity; and (3) categorisation of PROM content into health domains (Figure 1).

Figure 1
figure 1

Methods to identify PRO domains to inform a core outcome set.

Identification of PROMs used in oesophageal cancer studies

A systematic review was performed to identify and present the scope of existing validated PROMs in order to provide knowledge of the current of state of PRO measurement in this field.

Search strategy

Electronic searches in MEDLINE, Embase, PsycINFO and CINAHL databases between January 2006 and May 2011 were performed. The search strategy included terms for patient-reported outcomes, oesophageal cancer, surgery and chemotherapy, radiotherapy or combined therapy (see Additional file 1). Searches were limited to studies published in English language. Relevant studies published prior to 2006 were identified from a previous systematic review [8]. Abstracts of identified records were screened for inclusion and full text articles were assessed for eligibility by one of three reviewers (RW, MJ, RCM) with reasons for exclusion documented. No studies were excluded based on a risk of bias assessment or judgement of methodological quality because the purpose of the current study was to identify PROs rather than examine the quality of the data or treatment effect.

Selection criteria

Included were studies that used at least one validated PROM to evaluate health-related quality of life (HRQL) after radical treatment of oesophageal cancer, including surgical, chemotherapy and/or radiotherapy interventions. Valid PROMs were defined as those that had been tested for psychometric validity and reliability in appropriate patient populations with methodology verified from published papers. No restrictions on study design or sample size were applied. Studies of palliative treatment, comparisons of clinician- or hospital-related factors, and those limited to investigating satisfaction with care or health utilities were excluded.

Data extraction

Data were extracted using a pre-designed form, piloted before full data extraction with a sample of included studies. Study publication date, design and treatment intervention, the name of the PROM(s), the reported PRO scales and single items, and details of any additional non-validated questions were extracted. These were recorded by one reviewer (RM) and checked by additional members of the study team (MJ, MAGS). The validated PROMs were obtained, including other validated disease-specific PROMs known to authors. Verbatim names for the PRO scales and single items as termed by the PROM developers were extracted and all PROM items (scale components and any single items) were recorded. Data were stored in an electronic database.

Examination of PROM content

A detailed content analysis of the identified instruments was performed to explore the diversity of PROs in this field. Verbatim names for scales and single items were listed. Scales with identical names and others that were similar (defined as having a least one identical word) were documented, counted and compared for consistency and overlap of the component items.

Categorisation into health domains

To synthesise the existing content of instruments and provide a framework for future core set development, all PROM items (scale components and any single items) were examined and systematically categorised into conceptual health domains according to the issue they addressed. This was performed by expert methodologists (an oesophageal cancer surgeon and a psychologist) with experience of questionnaire development in health-related quality of life research and cancer (JMB and MAGS) based on their knowledge, familiarity and practiced skill of grouping questionnaire items in this field. Health domains were defined as generic aspects of quality of life affected by health or disease-specific issues and symptoms [10]. Further domains were defined until saturation, that is, all individual PROM items had been mapped onto a domain. Issues addressed in non-validated questions were additionally mapped to domains to verify that the conceptual health domains encompassed all outcomes measured in the included studies. Mapping of items to domains was checked for completeness and consistency by two authors (IK and RCM) and a patient advocate working within oncology research to maximise validity and reliability of the method. Variances were resolved by discussion within the study team and with the senior author (JMB). Data were recorded electronically.

Results

Identification of PROMs used in oesophageal cancer studies

A total of 1,351 records were screened for inclusion and 111 full-text articles were assessed for eligibility. Of these, 56 were excluded because they did not meet the criteria for eligibility, including seven studies that used PROMs without sufficient psychometric validation. Some 55 relevant articles reporting 56 studies were identified (Table 1) [1165]. Almost all studies (n = 54, 96%) included data on PROs after surgery, either alone or with neoadjuvant chemo/radiotherapy. Nineteen validated PROMs were used (Table 1) [56, 6683]: nine for gastrointestinal diseases, five cancer-specific instruments and five generic instruments. One oesophageal specific PROM was adapted from a cancer instrument (adapted Rotterdam Symptom Checklist). Three were earlier versions of an updated PROM (EORTC QLQ-C36, QLQ-OES24 and MOS SF20).The most frequently used PROMs were the EORTC QLQ-C30 (n = 34, 61%), and the disease-specific modules EORTC QLQ-OES18, or earlier version QLQ-OES24 (n = 27, 48%). PROMs were not always used in their entirety, with evidence of selective outcome reporting of scales and single items in 33 (59%), although there was variation across studies in the outcomes that were selected (data not shown). Twenty-one (37%) studies added an additional 74 non-validated items. A further two validated disease specific PROMs; the EORTC QLQ-OG25 [84] and EQOL[85], were sourced from authors’ knowledge, neither of which had been used in a published study since development and validation at the time of the conducted search (May 2011).

Table 1 Oesophageal cancer studies ( n= 56) using validated PROMs ( n= 21)

Examination of PROM content

There were 116 scales (composed from 574 individual items) and 32 single items in total, with 94 different verbatim scale/item names (Table 2). ‘Pain’ and ‘physical function’ were the most common verbatim name for a scale, used in six different PROMs, but other PROM scale names were also very similar (for example, physical wellbeing, physical problems, physical distress, physical activity, role physical) (Table 3). Some scales with identical names, however, had different component items. For example, ‘physical function’ in one PROM consisted of seven items relating to tiredness/fatigue, feeling unwell, waking up at night, changes in appearance, physical strength, endurance and feeling unfit [72], compared to ‘physical function’ in another PROM consisting of five items that referred to strenuous activity, ability to walk certain distances, time spent in bed or a chair, and need for help with self-care [74]. Similar heterogeneity was found for PROs assessed with single items, for example ‘cough’ in one PROM assessed waking at night because of coughing [67], whereas in another it was an assessment of coughing following eating [69]. While the two items assessed slightly different aspects of coughing they had the same name (‘cough’) and thus reporting would only refer to cough and not the actual issue being assessed within the item.

Table 2 Verbatim names of PROM scales and single items
Table 3 Identical and similar names for PRO scales used in different PROMs

Categorisation into health domains

All PROM individual items (n = 606) were categorised into 32 conceptual generic or symptom specific domains by the study authors (Table 4). Illustrative examples of this categorisation process are provided for some of the generic health domains (Table 5). The most common assessed health domain (concept), that is, the health domain that most PROM items mapped to, was emotional function, assessed in 18 of the 21 PROMs. Other commonly assessed health domains were ‘pain/pain-related swallowing’ (assessed in 14 different PROMs), ‘physical activity/activities of daily life’ (in 13 PROMs) and ‘appetite/eating/taste’ (in 12 PROMs). Uncommon domains were ‘spiritual issues’ (assessed in one PROM) and ‘dizziness/dumping’ (assessed in two PROMs). Non-validated questions predominantly focused on eating and therefore were mapped onto the ‘appetite/eating/taste’ domain. A patient advocate checked the categorisation of items into health domains and there were no difference of opinion.

Table 4 Categorised PRO health domains showing number of items in existing PROMs assessing each domain
Table 5 Selected categorised health domains and example PROM items mapped to these domains

Discussion

This study comprehensively analysed PROs from studies in radical treatment for oesophageal cancer. Some 116 scales and 32 single items were identified from 21 validated PROMs. As many as 94 different verbatim names were used to describe PRO scales and single items and although many names were similar, content examination revealed component questions did not always address comparable issues. In-depth examination and categorisation of PROM contents concluded that together they addressed 32 different health domains demonstrating the vast overlap between PROMs.

Our findings show how evidence synthesis of oesophageal cancer PROs may be hampered because of the range of PROMs used in trials and the multiple scales and single items within them, often with inconsistent and non-transparent terminology. Core outcome sets aim to reduce this problem by identifying and prioritising the important health domains to be measured in all studies. The development of core outcome sets in other clinical areas has been undertaken using a range of methods, in particular the approach to including PROs [8689]. In rheumatoid arthritis, for example, the initial American College of Rheumatology (ACR) core set was developed by a committee of experts (16 professionals in rheumatoid arthritis trials, health services research and biostatistics) who reviewed the literature on the validity of trial outcomes (for example, sensitivity to change or how well it predicted/correlated with a definite clinical change) and used a nominal group process to recommend and reach consensus on a list of core outcomes. The list was presented and finalised at a specialist international conference (OMERACT: Outcome Measures for Rheumatology in Clinical Trials) and contained both clinical and PROs, although patients were not involved in the consensus process. Outcomes were specific (for example, number of swollen joints) or more general domains (for example, functional status), with recommendations on how to measure the outcomes decided later [86]. Subsequent OMERACT conference discussions and workshops deliberately involved patients and led to the addition of fatigue in the ACR core set [87, 90], and continued work using interviews with patients, identified further important PROs [5]. This led to the development of a ‘patient core set’ of disease-specific and global outcome domains solely derived from patient opinion to complement the professional ACR core set [88]. Our current study methodology ensures that the patient perspective and relevant PROs inform the development of a core set of outcome domains from an early stage, because it examines the content of validated PROMs which are developed with significant patient involvement. The identified PRO domains will be prioritised using a Delphi method to reach consensus on the important to include in the core outcome set, alongside clinical outcomes [1], and is the focus of future work. Patients, surgeons and clinical nurse specialists will be surveyed to ensure the opinions of all key stakeholders are sought, a recommended approach by the COMET (Core Outcome Measures in Effectiveness Trials) initiative [4].

This study included a detailed systematic search to identify PROs measured in oesophageal cancer studies and used rigourous methodology to identify health domains, however, it does have weaknesses. The categorisation of question items into health domains was performed by two experts and independently checked by other members of the research team, including a patient advocate, but it is possible that others may have categorised items differently. Inter-rater reliability statistics could have been recorded to describe agreement between the experts when categorising items. Future work therefore is needed to standardised and validate this method. In addition, presentation of the methodology to a greater number of patients or patient representatives could strengthen the robustness and reliability of the categorisation process.

Conclusion

In summary, this study demonstrates there is diversity in the PROMs selected to evaluate radical treatment for oesophageal cancer. Within and between PROMs there is a lack of clarity between named scales and items and the underlying health domains being assessed meaning data synthesis is limited. A methodology for identifying important PRO health domains is proposed which can be used to inform the development of a core set of health domains. Following this it will be necessary to determine accurate and efficient ways to measure these core domains, drawing on items banks developed by initiatives such as PROMIS (Patient Reported Outcomes Measurement Information System) and COSMIN (Consensus-based Standards for the selection of health status Measurement Instruments) [91, 92].