Background

Patient reported outcome (PRO) measures include health status assessments and measures for health-related quality-of-life (HRQOL), symptom reporting, satisfaction with care, treatment satisfaction, economic impact, and specific dimensions of patient experience such as depression and anxiety [1]. The USA Food and Drug Agency (FDA) adopts a much broader definition [2] as “A PRO is any report coming directly from patients about a health condition and its treatment”, meaning that PROs capture patients’ perspectives about how illness or new therapies impact on their general well-being. There is a growing interest from clinicians, researchers, industry and policy-makers in routinely collecting PROs to facilitate timely, patient-centred and evidence-based care. For example, the National Health Service (NHS) of the UK has been implementing a world-leading initiative for the routine collection of PROs that firstly included a few selected elective surgeries (e.g. unilateral hip replacements, unilateral knee replacements, groin hernia surgery or varicose vein surgery) [3] but are soon expanding to many other conditions such as mastectomy and breast cancer, among others. In the USA, the Patient-Reported Outcomes Measurement Information System (PROMIS), a National Institutes of Health funded initiative starting in 2004, is providing a publicly available web-based resource that can be used to measure key health symptoms and HRQOL [4]. The traditional paper-based PROs instruments are limited by its lack of flexibility, language and literacy requirement, [5, 6] possible inappropriateness towards minority groups, [7, 8] lack of timeliness (in generating instantaneous clinical meaningful interpretations) [9] and inability to adopt state-of-the-art measurement science such as Item Response Theory (IRT) and Computer Adapted Test (CAT) technique [10]. To overcome the difficulty of integrating the administration and analysis of PRO instruments into clinical practice, researchers are developing and validating alternatives to traditional paper-based instruments such as office-based touch-screen computers, [1113] telephone-based interactive voice-response (IVR) systems, [1416] hand-held computers, [17, 18] mobile phones, [1921] and more recently, the Internet [2224]. Some rationales [2528] put forward for measuring PROs in a cancer setting include, but not limited to: 1) better communication and shared decision making by patients and providers; 2) assessing the health status of patients entering therapy and identifying treatable problems; 3) determining the degree and sources of the patient’s decreased ability to function; 4) distinguishing between types of problems, including physical, emotional, and social; 5) detecting adverse effects of therapy; 6) monitoring the effects of disease progression and response to therapy; 7) informing decisions about changing treatment plans, and 8) predicting the course of disease and outcomes of care.

However, despite growing interest and urges by the leading experts for applying routinely collected PROs for all cancer patients, there has not been an updated comprehensive review of the evidence regarding the impact of adopting such a strategy on patients, services providers and organisations. The most recent review focused only on clinical trial design [26] studies of cancer patients, and only assessed a limited number of outcomes. The current project aims to provide the much needed comprehensive review update, including all relevant quantitative studies investigating the effectiveness of routine PRO collection in cancer patients. The review research questions were:

  1. 1.

    What are the impacts of composite measures of PROs collected on cancer patients during treatment with regards to:

  2. a)

    Provider behaviour for improving care delivered;

  3. b)

    Organisational changes within health care settings for improving processes and models of care (e.g. targeting and tailoring care);

  4. c)

    Improving clinical outcomes for patients; and

  5. d)

    Improving patient experience of care (e.g. self-care).

  6. 2.

    What mechanisms are involved in the link between PROs and the impacts identified in 1(a)?

  7. 3.

    What factors moderate the extent of the impacts identified in 1(a)?

Methods

Existing systematic reviews and rationale for the current review

In order to develop an efficient search and review strategy, over 200 existing reviews on the same or similar topics were firstly systematically examined (identified in a broad search covering PROs and quality of life measures between January 2000 and October 2011). Three reviews [2628] were identified as the baseline reviews for this project and their review strategies were carefully examined in aspects such as the aim and scope, time span, search strategy and search terms used, articles included in each review, and conclusions drawn. A table summarising the three systematic reviews is presented in Table 1.

Table 1 A comparison of three baseline reviews

Review search strategy

Analysing the results of above three systematic reviews demonstrates the importance of search strategies in determining what literature will be included in the study, which in turn, may influence what conclusions will be derived. Valderas et al.’s (2008) [27] review excluded three out of the five clinical trials on cancer patients that were included in Marshall et al.’s (2006) [28] review. Lucket et al.’s (2009) review [26] excluded one article (Taenzer et al. (2000), [33] a before-after study) from Marshall et al.’s review [27]. A mixed methodology search was developed in order to maximise the identification of recent literature in a short period of time. The search was conducted in six different ways as follows:

  1. 1.

    A text-based search strategy was developed based on previous reviews. To elicit previous reviews, a search was conducted for the text terms ‘patient reported outcome*’, ‘self-reported’, ‘self-assessed’ anywhere in title, abstract and key words, combined with ‘quality of life’, ‘symptom’, ‘functional status’, ‘health status’, ‘patient satisfaction’, ‘unmet need*’ anywhere in title, abstract and key words. For original articles, a search was conducted using the same strategy as above but restricted to those with ‘neoplasm’ or ‘cancer’ in the key words. The search results were restricted to between January 2000 and October 2011 (full search strategy is listed in Additional file 1: Appendix 1).

  1. 2.

    All reviews were evaluated (over 200 in total on various topics but not limited to only cancer patients) with three baseline reviews used as the starting point for our top-down and bottom-up search strategy. We chose the three baseline reviews because that: 1) they are all systematic reviews that could be helpful in forming the structure or strategy of the current review (but not necessarily restricted to cancer patients); and 2) they were published after 2005.

  2. 3.

    All articles were examined if they cited the 7 key randomised controlled trials [3339] listed in the above reviews (bottom-up approach). References were also sought from the most recently published trials, editorials, and commentaries (a top-down approach). The powerful citation tracking feature of Scopus™ made this strategy feasible.

  3. 4.

    Simplified text terms (i.e. patient reported outcome, PRO, PROM, Quality of life, QOL) were used to conduct a web search for identifying grey literature.

  4. 5.

    Leading researchers and experts in the field (elicited through the advice of Cancer Institute NSW (CINSW), editorials, review articles and most cited articles) were purposefully searched in order to analyse the references and citations in their publications.

  5. 6.

    Some key cancer centres’ websites were also searched in order to get more detailed information.

The search was limited to the Scopus™ database as it is the largest abstract and citation database of peer-reviewed literature and quality web sources including 100% coverage of Medline titles and EBASE. It also tracks, analyses and visualises publication results, which is well suited to our top-down and bottom-up search strategy.

Aim, study selection and endpoints of the review

In this review, the aim was to synthesize the evidence in relation to the impact of routinely collected PROs on patients, providers, and health organisations. The frameworks proposed by Greenhalgh and colleagues [25] and by Abernethy and colleagues [40] were adopted to guide our evaluation of the existing literature. Greenhalgh et al. [25] proposed a framework (Figure 1) that depicts mechanisms between the routine collection of PROs and changes in patient outcomes. The authors suggest that the multilayer mediators (i.e. changes to doctor-patient communication, monitoring treatment responses, detecting unrecognised problems, changes to patient health behaviour, changes to clinicians’ management plans, and improved patient satisfaction) have complex relationships among them. The studies that revealed these complex relationships may assist in understanding whether and how the underlying mechanisms of routinely collected PROs work to improve the intended outcomes.

Figure 1
figure 1

A hypothetical framework to understand the impact of routinely collected PROs on patient health outcomes (adopted from Greenhalgh et al. (2005) [25]with permission).

Recently, Abernethy and colleagues [40] have argued that the routine collection of PROs has the capacity to impact not only at the patient-level, but by addressing the logistics of data linkage, and could ensure that the system will grow to accommodate other clinical- and health system-level issues; for example, evaluating comparative effectiveness of treatments, monitoring quality of care, and translating basic science findings into clinical practice (Figure 2). The integration of data systems will fuel rapid learning cancer care at the national and societal levels (see Figure 2a and b), making many types of research and system learning possible across institutions and health sectors. The benefits and implications of such a rapid learning health care system may include, but is not limited to, strong and effective quality improvement (QI), increased transparency, accountability, public reporting, better health system performance (monitoring, planning, financing, evaluating, responding) and better quality of care.

Figure 2
figure 2

(a) A data linkage framework (b): A learning health care system. Note: Figures 2: adopted from Aberthnethy et al. (2010) [40] with permission.

Combining both frameworks, a list of outcome indicators was developed (Table 2) against which each eligible study was assessed. To include not only the doctors’ experience with patients after collecting PROs, but also the experience of other health services providers (i.e. nurses, allied health workers), the term ‘Patient-provider communication’ was used instead of ‘doctor patient communication’ as proposed by Greenhalgh et al. [25]. In order to answer review questions 2 & 3 for the studies included, all possible explicit mediation effects were reviewed through examining if a path-analysis or a mediation-analysis by multiple, staged regression approach was presented in the paper. To examine potential moderating effect, each study was examined to determine if it explicitly tested the interaction effect/moderating effect, or inexplicitly conducted subgroup analysis. Significant possible mediating or moderating effect results were indicated as part of review endpoints in Table 3. Inferences made and the discussion were based on these results.

Table 2 Outcome indicators assessed for each eligible study included in the review
Table 3 The characteristics of design and study quality

Inclusion and exclusion criteria

The inclusion criteria were: 1) substantial content in presenting empirical evidence on the impact of routinely collected PROs on at least one of the outcomes listed in Table 2; 2) adult cancer patients; 3) conducted in an oncologic setting including inpatient, outpatient and outreach services; and 4) studies using a composite PRO. We defined a composite PRO as those PROs are often based on a well-developed instrument and with an aim for measuring a substantial aspect of patient conditions (or treatment) with at least 4 items. To reflect the demanding and complex nature of evaluating the impact of routine collected PROs, eligible studies included a variety of designs including, but not limited to, randomised controlled trials (RCTs), controlled before-after trials (CBA) and interrupted time series (ITS). ITS designs have a longitudinal character, with repeated measurements and at least three data points before and after the intervention point. Surveys and clinical audits were also included if the studies provided quantitative results relevant to the listed outcomes.

Studies were excluded if they were non-English language articles, opinion and theoretical articles, historical descriptions, review articles, feasibility studies of some PROs collection devices, studies investigating child cancer patients or qualitative studies with no substantial quantitative results on the review endpoints.

Data extraction and quality assessment

Electronic search results were downloaded into EndNote bibliographic software. Two reviewers independently (JC, LO) screened all titles and abstracts of citations identified by the electronic search, applied the selection criteria to potentially relevant papers, and extracted data from included studies using a standardised form. Any disagreements concerning studies to be included were resolved by consensus.

All studies were classified into two domains. Domain 1 correlated sample characteristics with population wide characteristics, and Domain 2 focused on study design. The data extraction form was adapted from other review studies using the outcome measures discussed above (see Table 4). For each eligible study, a list was made including the leading author, country and jurisdiction, design, sample, outcome measures, the PROs used, times of feedback and intervention, members of medical teams given feedback, management plans offered to teams, and training (see Table 5). All qualifying studies were listed chronologically with the outcome indicators (see Table 3).

Table 4 The components, rating criteria and symbol, and categories used in summarising the study evidence in the current study
Table 5 The impact and effect sizes of the studies on patients, care providers and organisations*

In Domain 1, the routinely collected PROs in particular participants or samples was classified as rated on a 4-point scale representing how closely the participants or samples overlapped with the characteristics and needs of the intended study populations(1 star=very weakly related to 4 stars =very strong related). For example, for a study conducted in the US on a sample of lung cancer patients, the degree of overlap of the study sample with the characteristics of lung cancer patients in the US overall was assessed by considering the study setting, sample size and sampling frame, response rate, loss-to-follow-up, and characteristics of the study sample. In Domain 2, study design was classified and rated on 4 categories with 1 star indicating the weakest design and 4 stars indicating the strongest design. Four stars indicated a randomised trial or experimental study; 3 stars indicated a controlled trial, pre–post trial with control (controlled before–after trial), time series, or observational cohort with multivariable adjustment; 2 stars indicated a pre–post trial without control, observational cohort study without multivariable adjustment, cross-sectional study without multivariable adjustment, analysis of time trends without control, or well-designed qualitative study; and 1 star indicated a case series, other qualitative study, or survey (descriptive) study.

Revised appraisal criteria were adapted from the guidelines on the assessment of quality improvement interventions [58, 59]. A global rating was also created using the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) system [60]. The British Medical Journal has recommended the GRADE system since 2006 for grading evidence when submitting a clinical guidelines article. It has multiple advantages and is useful for systematic reviews and health technology assessments, as well as for evaluating research on clinical guidelines. The global rating created in the current study was based on the integration of the Domain 1 and Domain 2 ratings, as well as the intervention fidelity (the degree of success of the interventional strategy, the patients’ and providers’ adherence to the intervention strategy), dose–response gradient, precision and validity of outcomes (potential confounding factors and biases), and uncertainty of the direction of results. The global rating was divided into three categories; indication that the study should carry great (3 checks), moderate (2 checks), or little (1 check) weight when considering the strength of evidence (see Table 4). Any experimental research that is reported in the manuscript was performed with the approval of an appropriate ethics committee.

No attempt was made to quantitatively synthesize the results as the data were too heterogeneous to support pooling.

Results

The multi-method search strategy yielded 27 publications that were eligible to be included in the review – a significant increase compared with most of the recent reviews. The detailed flowchart of the search strategies and its relative results was presented (Figure 3). The results and conclusions drawn were based on the 27 studies included in the analyses despite the large number of full-text articles extracted.

Figure 3
figure 3

PRISMA Flow Diagram illustrating the systematic review process from electronic searching through to study inclusion.

Of the 27 publications, 16 were identified as randomised controlled trials, 2 as before-after studies and 9 observational studies with 11 studies published before 2009. The characteristics and quality of the studies are presented in Table 5 with their impact on outcome indicators presented in Table 3. As Trowbridge et al. (1997) was the only article in the 1990s included in two of the previous reviews, it was listed in the summation tables for the purpose of comparison.

Overview of study quality

There has been a marked increase in the volume and quality of the studies published recently in this area. Of the 16 randomised controlled trials included in this review, 7 were published between 2010 and 2011. The quality of studies published since 2010 is also demonstrably improved with much larger sample sizes, including 3 trials [54, 57, 61] with a sample size greater than 200 and 2 trials with a sample size over 580 [51, 56].

However, despite the increased volume and improved quality of the studies, there remains a lack of large cluster randomised controlled design studies, as recommended by Fayers [62] who argues that cluster RCTs are well suited to overcome the limitations of simple RCTs. It is well-known that system intervention trials such as routine collection of PROs, and feedback to the clinicians and systems, are prone to cross-contamination and to introducing investigator and participant biases. Two recently published studies [54, 57] were the continuation of an earlier study published by Velikova et al. (2004) [36]. Most studies reviewed did not systematically examine outcomes and mechanisms, and placed more emphasis on processes rather than outcome measures [25]. All studies were conducted in a limited setting (often in a single centre) thus restricting the generalisation of the findings.

No studies have adopted a comprehensive theoretical model and framework, despite the repeated demand from leading researchers in the area [25, 6365]. All studies focused on the patient and health professional level within the clinic setting. No study to date has examined the impact of collecting PROs on health care organisations, health system improvement, quality improvement or population health at a system or societal level.

Overview of study findings

Impact on patient-provider communication

Across the 27 studies included in this review, 4 studies [39, 47, 51, 53] did not examine or report the effect of a routinely collected PRO on patient-provider communication. Among the 23 studies that did report such an impact, 21 studies (91.3%) reported a positive effect which included well-designed and conducted large RCTs [33, 36, 37, 54, 56, 57]. One study reported no significant improvement of patient-provider communication possibly due to a lower severity level of cancer patients (only 37% of patients received anticancer therapy, hence the reduced need for communication for the treatment) [38]. Another study reporting a negative effect had an already high communication level at baseline (hence a ceiling effect leaving little room for further improvement) [34].

Impact on monitoring treatment response

Despite most of included 27 studies did not explicitly state their study objectives as to examine the impact on monitoring treatment response, 11 of the 27 studies did report an outcome (Table 3) [16, 20, 36, 41, 4551]. All 11 studies found a strong or modest effect of implementing PROs on the increased monitoring activities of treatment response. The strongest effect occurred in the studies that focused on the monitoring of patient symptoms, side effects and toxicity during and after chemotherapy for the outpatients. In particular, the real-time, patient reported symptoms and toxicity (through innovative mobile phone-based, web-based or IVR systems) significantly improved the monitoring of treatment response.

Impact on detecting unrecognised problems

Although the idea of routinely collected PROs may provide better opportunities for services providers (as well as patients) to detect unrecognised problems through growing awareness, improved communication and monitoring seems intuitively plausible, only 16 out of 27 studies reported some results related to the detection of unrecognised problems (Table 3). Amongst the 16 studies, 15 studies [16, 20, 33, 36, 37, 39, 41, 43],[4550, 53] reported either a strong or moderate positive impact on detecting unrecognised problems. However, a study by McLachlan and colleagues [38] did not find any difference between the intervention arm and control arm.

Impact on changes to patient health behaviour

No study to date has provided a systematic evaluation on the impact of collecting PROs on changes to patient health behaviour. It is unknown whether and how patient health behaviours have been changed.

Impact on changes to patient management

Amongst 17 studies that provided some results of changes to patient management, 13 studies [11, 20, 33, 37, 39, 45, 46, 4850],[52, 53, 55] reported either a strong or modest positive effect on the changes to patient management whilst 4 studies [3436, 54] found no such effect. However, it is worth noting that 10 studies did not provide any information about the changes to patient management and there were often less complete descriptions of the results on patient management when reported.

Impact on patient satisfaction

Among the 16 studies that reported results related to the impact on patient satisfaction, 13 studies [11, 16, 20, 37, 41, 4346, 48, 49],[52, 54] reported a very strong to moderate positive effect on improved patient satisfaction. For the three studies [33, 34, 38] that did not find such a positive effect, one study [33] reported a possible ceiling effect meaning that both the intervention group and control group had a very high baseline patient satisfaction level potentially impeding any demonstration of a significant difference between two arms during the follow-up period.

Impact on health outcomes

Amongst the 15 studies that reported some results related to the impact on health outcomes, 13 studies [20, 3537, 39, 4143, 45, 47, 50, 51],[53] reported some positive improvement, ranging from modest to strong, while two studies [34, 38] failed to find any such effect. It appears that symptoms, side effects and toxicity are most likely to be improved, followed by emotional wellbeing. There is little evidence on the improvement of both overall HRQOLs as well as social wellbeing.

Impact on quality improvement, transparency, accountability and public reporting, and on better system performances (monitoring, planning, financing, evaluating, responding)

No study to date has provided a meaningful, explicit framework nor relevant evidence on these endpoints.

Overall strength and direction of evidence

Overall, there is strong evidence supporting the notion that routinely collected PROs, with feedback, improves patient-provider communication and increases patient satisfaction (Table 6). There is some evidence to support the notion that it improves the monitoring of treatment responses and detection of unrecognised problems, and there is weak but positive evidence that, over time, it leads to changes in patient management. Despite some encouraging results, there is still a great degree of uncertainty regarding the impact of routinely collected PROs, with feedback, on patient health outcomes. There is little or no evidence that it has led to significant positive improvements in quality improvement, transparency, accountability, and public reporting, or in system performance at a population health or societal level. Apart from clinical trials and clinical practice, its impact on health services research and population health is largely unknown.

Table 6 The overall strength and direction of evidence

Potential moderating factors and links between routine PRO collection (with feedback) and patient outcomes

Although the evidence is limited, it appears that routine collected PROs with sufficient intensity of feedback (multiple times over a sustained period of time) [13, 39, 44, 54], targeting multiple stakeholders (doctors, nurses, allied health workers, as well as patients) [35, 52] with simple, clear, graphical and longitudinal meaningful interpretation of the results, and providing sufficient training for both health professionals and patients [20, 57], are critical links between an intervention and the intended outcomes. There is also evidence to suggest that for some complex issues such as depression and low social functioning, routine screening and feedback may need to be integrated with other strategies such as decision-making aids, education, clear management plans and clinical pathways including referrals, in order to change patient outcomes [43, 49, 51]. There is preliminary evidence that some of the impacts of PROs may be more pronounced amongst subgroups with more severe problems at baseline (e.g. depression, symptoms) [38, 42, 65]. More studies are needed to fully understanding these mediating and moderating effects.

Discussion

There is very strong evidence in supporting the notion that routine collected PROs with timely feedback enhances patient-provider communication. This current study finding of a positive effect on patient-provider communication is consistent with previous reviews conducted in both cancer [26] and non-cancer settings [25, 27, 28]. There is also strong evidence to support the notation that routine collected PROs significantly improved the monitoring of treatment response.

There is reasonably strong evidence in supporting the notation that routine collected PROs are helpful in identifying unrecognised problems in a large variety of settings. Within studies that reported, to some extent, the results related to unrecognised problems, there is a need for the development of more comprehensive and valid measures. Such a change in the measures would contribute towards understanding specifically the PROs’ impact on identifying the underreported and unrecognised problems for different cancer patients at different settings.

Overall, there is reasonable evidence in favouring the hypothesis that implementing a routine collected PROs system brings positive changes to patient management in the settings where a patient management plan is integrated with a routine collection of PROs. It appears that the simple routine feedback of PROs may not be sufficient to improve patient management and outcomes [48]. Other necessary resources may be needed such as education, referral services and a detailed patient management plan following the PROs [43]. There is also a need to develop better measures of change to patient management as it is often complex and difficult to quantify [57].

There is strong evidence to support the notation that routine collected PROs with timely feedback significantly enhance patient’s experience and satisfaction. There may be other improved experience and satisfaction in other stakeholders such as patients’ family members, caregivers, as well as health professionals that were not measured or unreported. Future research into furthering the understanding of stakeholder experience after implementing routine collected PROs would be desirable.

Although positive evidence in supporting the notion that routine collected PROs may improve health outcomes is weak, this finding needs to be confirmed by better designed studies covering a large set of well-developed outcome measures. There is also a need to understand the impact on long-term health outcomes such as survival rate. Most of the studies included in this review did not focus on health outcomes and some of the positive improvement on the outcomes only occurred on selective measures. It is not clear how these positive improvements can be generalised across different settings.

There is a variety of models on how to routinely collect PROs and how to feed back the data to different stakeholders. Given that cancer patients are vastly different given their background, type and stage of cancer, prognosis, treatment, and the positions at the life course continuum, precaution should be exercised when attempting to apply the general observation above to each and every different setting. For example, recent studies demonstrated a positive impact of routine collected PROs on symptom control through either web-based or mobile phone based approach. However, such positive impacts were less pronounced on HRQOL.

Limitations of the current review

Our review has several limitations. First, there was no attempt made to contact the authors to ask for potential unpublished data on the topic. Thus, there may be chance of missing some grey literature or the studies that under preparation for publication. Second, given the multitude of endpoints included, and different types of studies involved, the assessment of eligibility for inclusion of potential studies required some degree of subjective judgement. Third, our application of GRADE system was rather simplistic restricted by large number of endpoints and variability of studies included. These limitations may give rise to some uncertainty in terms of synthesis of the results. Fourth, our study follows a systematic review approach with inclusion of both experimental trials and quantitative observational studies. However, we did not include qualitative studies in our review which may provide additional insight into the questions raised. This is particularly relevant with respect to questions 2 and 3 as there was very little quantitative evidence from the included studies. It is important to note that despite efforts to formulate the review endpoints based on solid and well-established causal and theoretical frameworks (providing insight into not only if but also how the introduction of PROs affects patient outcomes), the causal mechanisms and process endpoints included in the current review are by no means exhaustive. There may be other important causal mechanisms that could be benefited from a realist review approach [66].

Conclusions

There is growing evidence supporting the routine collection of PRO to enable better and patient-centred care, especially in cancer settings. Despite the strong evidence in supporting the notion that the well-implemented routine collection of PROs enhances patient-provider communication and improves patient satisfaction, and growing evidence supporting ideas that it also improves the monitoring of treatment response and the detection of the unrecognised problems, the evidence-base was weak for its impact on changes to patient management and improved health outcomes and non-existent for changes to patient health behaviour, strong and effective quality improvement, increased transparency, accountability, public reporting and better health care system performance. These evidence gaps require further committed and well-planned research in addition to the well-accepted PROs. Decision-making agencies have been well positioned for leverage on the rapid advancement of different PRO models, the application of the item response theory and computer adapted test in developing PROs, and on the acceptance of such technology by patients and health professionals over the last decade. The real-time and routinely collected PROs will enable the development of a rapid learning health system with the potential to advance our knowledge of drug development, unsurpassed models of cancer patient care and a more patient-centred health care system.