Introduction

With an increasing emphasis on patient-centred care, there is a growing interest in outcome measures most relevant to patients [1,2,3,4,5,6]. Patient-reported outcomes measures (PROMs) comprise data collected from individual patients and include an array of outcomes such as symptoms, daily functioning, and health-related quality of life (HRQoL). PROMs are increasingly used in daily clinical practice for individual patient management [7]. Individual PROMs data incorporates the patient’s perspective on their health status and can detect issues that are most bothersome to the individual patient. By reporting these issues to both patients and clinicians, patient-physician communication improves [8,9,10]. This may support shared-decision making, and therefore offers considerable potential to enhance quality of care and clinical outcomes [8, 11]. A second application of PROMs data feedback is the use of aggregated PROMs scores collected in clinical studies or trials to inform patients about treatment harms and benefits [12].

Currently, when PROMs are incorporated within clinical practice, raw or summarized PROMs data are given as feedback to patients and/or clinicians by using different graphic visualization formats [13]. In order for them to understand and apply the information during clinical encounters, patients and clinicians ought to interpret such formats correctly [13]. A previous review by Bantug et al. [13] reported that the majority of patients and clinicians were able to interpret plain or straightforward graphs. Bantug et al. suggested that future research should focus on optimizing graphic visualization strategies. After the publication of this review in 2016, considerable new evidence has become available about this topic. Moreover, the focus of recent studies has shifted towards the effect of aspects such as score directionality [14,15,16,17] and axis labelling [1, 15] on correct interpretation by patients and clinicians. Furthermore, there is increased attention for guiding the clinical interpretation of PROMs data, e.g. to distinguish severe or clinically relevant symptoms [1, 3,4,5, 14, 18]. For instance, the display of score thresholds and warnings if scores change over time would be helpful in daily practice [1], as well as scores from reference populations to compare individual scores to [1, 5, 14]. Both facilitate the correct use of scores during clinical encounters. The focus on clinical interpretation led to the introduction of funnel plots [2, 19], heat maps [4], and icon arrays [3, 15, 18, 20], underlining the relevance of assessing a wider variety of graphic visualization formats.

While worldwide implementation of PROMs data collection in clinical practice keeps progressing [21], new evidence on graphic visualization formats for PROMs scores for interpretation by patients and clinicians has become available. This systematic literature review aims to (1) address the latest evidence for graphic visualization formats of PROMs in clinical practice, by extracting preferences and interpretation accuracy for patient and clinicians, and (2) investigate how clinically relevant PROMs scores are distinguished, in order to guide clinical interpretation of PROMs scores for their use during clinical encounters.

Methods

A systematic literature review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [22].

Search strategy

An exhaustive search strategy was developed by a medical librarian (E.W.) experienced in systematic literature searches. The following search terms were included: (“Patient reported outcome (PRO)” OR “Health-Related Quality of Life”) AND “data presentation/display” AND “health communication” AND (“cancer” OR “clinical decision-making”). We focused the search on oncology as we work in an oncological setting. Also, most of these studies have been conducted in oncology, as confirmed by our search and previously shown by Bantug et al. [13]. Literature from January 2000 to July 2020 was searched in MEDLINE (accessed through PubMed), Embase (assessed through Ovid Platform), PsycINFO (assessed through Ovid Platform), and CINAHL. The full search strategy is included in Additional file 1: Table 1. Duplicates were removed using the Bramer method [23]. A forward and backward reference check was performed on all final included articles.

Review procedure

After removing duplicates, two researchers (E.A., I.F.) independently reviewed potential abstracts. The researchers eliminated articles according to the predefined inclusion and exclusion criteria (see below). In case of discrepancies, a third researcher (K.d.L.) was consulted. Subsequently, both researchers (E.A., I.F.) independently reviewed the full text version of each paper for inclusion in the final selection. Disagreements were discussed between the three researchers (E.A., I.F., and K.d.L.) to reach consensus.

Studies were included when concerning patients of 18 years and older treated in clinical practice; addressed communication of individual level or group level PROMs data, using graphic visualisation formats, to either patients or clinicians; were published in English. Exclusion criteria were: studies without English or full-text version; systematic literature reviews; Delphi studies; studies where PROMs had not been completed by patients or were not applied or visually presented to patients and/or clinicians.

Analyses

Two researchers (E.A., K.d.L.) independently extracted data from the included articles. The findings of both researchers were compared and verified by a third researcher (I.F.). First, study characteristics were described, including study goal, population, and type of PROMs data that was visualised. Second, findings about visualisation preferences and interpretation accuracy were extracted. Furthermore, methods and strategies for guiding the interpretation of scores during clinical encounters were extracted. The findings were described for patients and clinicians separately, and a distinction was made between individual PROMs data for use during clinical encounters, and mean or aggregated PROMs data that could be included in treatment decision-making.

Quality assessment

Two researchers (E.A., K.d.L.) independently assessed the methodological quality of the papers and compared their final judgments. The Critical Appraisal Skill Program (CASP) was used for methodological assessment of the included papers. CASP enables to systematically assess the trustworthiness, relevance, and results of published papers by comprising several criteria for qualitative studies, randomized controlled trials, and cohort studies [24]. The reviewers scored the papers per criteria with a positive or negative response, or ‘not applicable/unknown’. Studies with a positive score for half or more of the criteria were deemed of sufficient quality [24].

Results

Our search retrieved 1673 studies, from which 414 duplicates were excluded. Subsequently, from the remaining 1259 studies, 1186 were excluded; these did not describe PROMs data visualization (Fig. 1). Then, full-text articles from 73 eligible studies were assessed. From these, 47 were excluded based on inclusion and exclusion criteria, and/or because no full-text version was available (n = 15, all conference abstracts). Ultimately, 25 studies were included in this review for data extraction (Additional file 1: Table 2). After quality assessment following the CASP criteria, all studies had a positive score for half or more of the criteria (Additional file 1: Table 3).

Fig. 1
figure 1

PRISMA flow chart of literature search and review process

Most studies used either mixed methods design (n = 12), including human-centered design, or a qualitative design (n = 9), including interviews (Fig. 2). Sample sizes ranged from 8 (interview study) to 1,017 (survey study). Studies had been carried out in different clinical domains and in different countries, studying different graphic visualization formats and designs. The majority of studies included participants during or after treatment, whereas nine studies made use of hypothetical settings. PROMs data formats were either based on individual patient scores presented to patients (n = 17) and/or clinicians (n = 14), or based on mean group-reported data from for instance clinical trials that were presented to patients (n = 10) and clinicians (n = 8). The different graphic visualization formats that were studied are presented in Fig. 3. The results are presented according to the distinction between patients and clinicians, subdivided into preferences and interpretation accuracy.

Fig. 2
figure 2

Studies included in this review were carried out using a different designs; b in different countries; c investigating different graphic visualization formats*, and d among different clinical domains. RCT: Randomized Controlled Trial; USA: United States of America; UK: United Kingdom. *Studies included multiple formats

Fig. 3
figure 3

Different graphic visualization formats, presenting the domain of ‘emotional functioning’ as an example. Longitudinal: a line graph, including threshold line; b bar chart, including threshold line; c heat map. Cross-sectional: d icon array; e funnel plot; f pie chart

Individual level PROMs data—patients

An overview of the extracted data for patients on individual and group level is presented in Table 1.

Table 1 Summary of data extraction: visualization strategies and preferences, interpretation accuracy, comparators; use of PRO data on individual and group level, in patients

Preferences

From the 25 studies included in this study, 12 assessed patients’ preferences. Patients appreciated that their individual PROMs scores were presented visually over time [16, 25]. Six studies reported that patients preferred bar charts over other graphic visualization formats for reporting longitudinal PROMs data [1, 3,4,5,6, 14]. Bar charts were ‘visually clear’, and facilitated ‘appraisal at a glance’, since bar charts are a generally well-known format for presenting data [2]. Five studies reported that line graphs were preferred as well [1, 6, 15, 16, 25]. Patients experienced line graphs as ‘easy to understand’ and ‘quick to retrieve relevant information from’ regarding their symptoms and well-being [1]. Furthermore, line graphs were preferred for overall ease-of-understanding and usefulness [16].

Interpretation accuracy

Eight studies assessed patients’ interpretation accuracy for visualizing individual PROMs data. Bar charts were more often interpreted correctly (87.8%) compared to line graphs (74.3%) [14]. According to Geerards et al. [26], a study conducted among 1386 participants of the general population, 92.9% of participants thought that bar charts were clear, and 82.4% thought bar charts were accurate. In a study by Loth et al. [17], among 40 patients, 90% reported that bar charts were “very easy” or “rather easy” to understand. Nevertheless, Grossman et al. [27] described that from 12 included patients, half of them failed to interpret bar charts correctly, and even patients who successfully read it, often required multiple attempts to do so. These participants were, however, introduced to bar charts for the first time [27]. One study reported that line graphs could be interpreted accurately by patients; nonetheless, an undefined proportion of patients needed initial guidance on how to read line graphs [1]. Liu et al. [28] reported that from 25 respondents (patients), several did not notice or understand the longitudinal nature of a line graph from left to right, particularly those with limited health literacy. Furthermore, a few patients misinterpreted a line drawn between two data points to mean information from between visits [28].

Group-level or aggregated PROMs data—patients

The aim of presenting aggregated or mean PROMs scores to patients is to inform them about potential treatment outcomes, in order to support shared decision-making. In the included studies, aggregated PROMs data were mainly derived from clinical studies or RCTs. These were often presented to patients to (hypothetically) inform them about differences in outcomes between two treatments [12, 15, 20, 29,30,31,32,33].

Preferences

Seven studies assessed patients’ preferences for presenting aggregated outcomes over time. Four studies reported that patients preferred line graphs over bar and pie charts [12, 14, 16, 30]. Line graphs were preferred because of their relative simplicity, straightforward layout, and perceived helpfulness [12, 16, 30]. The preference for line graphs decreased when confidence intervals were presented [12]. To present proportional data, three studies reported that patients preferred pie charts [18,19,20]. Pie charts were easy to read and enabled obtaining information quickly [20]. Smith et al. [18] reported that of 39 patients, 55% preferred pie charts, compared to a 25% preference for icon arrays, and a 20% preference for bar charts. Van Overveld et al. [19] reported that among 37 patients, pie and bar charts were both preferred.

Interpretation accuracy

Seven studies assessed patients’ interpretation accuracy for presenting aggregated outcomes over time. In four studies, patients’ most often correctly interpreted line graphs [14, 16, 29, 30]. Interpretation accuracy ranged from 98–100% [16, 30]. In particular, 56% of patients correctly answered questions for “better” (i.e. higher score means better outcomes) formatted lines, compared to 41% for “more” (i.e. higher score means more symptoms), and 39% for “normed” (i.e. score as compared to reference scores) line graphs [29]. Line graphs showing results of two treatment options resulted in decisions that reflected adequate comprehension of the information in the graph [14]. One study reported that patients’ accuracy was highest for pie charts and icon arrays, compared to bar charts [20]. Icon arrays were easy to understand for patients as well [20].

Individual level PROMs data—clinicians

An overview of the extracted data for clinicians on individual and group level is presented in Table 2.

Table 2 Summary of data extraction: visualization strategies and preferences, interpretation accuracy, comparators; use of PRO data on individual and group level, in clinicians

Preferences

Thirteen studies assessed clinicians’ preferences regarding visualization of PROMs scores from individual patients. In general, clinicians appreciated viewing PROMs scores repeated over time, in order to track their patients’ symptom experiences [14,15,16, 25, 35, 36]. Moreover, six studies showed that bar charts were most preferred when plotting longitudinal individual PROMs data [1, 2, 4, 5, 14, 17]. Furthermore, line graphs were preferred in four studies [1, 14,15,16, 29]. However, Brundage et al. [37] and Izard et al. [3] both stated that clinicians did not universally find one approach for longitudinal data as appealing or preferred. Contrastingly, a study among 227 health professionals by Kuijpers et al. [4] showed that the majority of medical specialists (46%) and nurses (32%) preferred heat maps to line graphs and bar charts.

Interpretation accuracy

Thirteen studies assessed clinicians’ interpretation accuracy of graphic visualization formats for individual PROMs level data. Multiple studies showed that clinicians’ interpretation accuracy was similar over different graphic visualization formats: both line graphs and bar charts were found easy to understand, and were interpreted accurately [3, 4, 18]. Hartzler et al. [36] reported that among twelve clinicians, respondents generally felt that both line graphs and bar charts provide a useful basis for comparison. Pictographs were reported to be the least helpful for clinicians, although clinicians stated these could be helpful for patients with limited literacy [3, 6]. Uniformity in directionality of scores could increase clinicians’ interpretation accuracy in different graph formats. However, Brundage et al. [37] found that adding asterisks for clinical significance and confidence limits around scores did not contribute to a better interpretation accuracy in clinicians.

Group-level or aggregated PROMs data—clinicians.

Preferences

When presenting aggregated PROMs data, seven studies stated that clinicians mostly compared data between two treatments or compared scores to mean reference population scores. Brundage et al. [16] stated that for comparing treatments, 18 out of 20 clinicians preferred formats displaying multiple time-points, with the highest preference for line graphs with normed scores or confidence intervals, or bar charts. Smith et al. and Tolbert et al. [18, 29] describe a preference for line graphs, with over 75% of included clinicians preferring this graphical format; bar charts were less supported, as only 10% of 40 clinician participants preferred bar charts [18]. Van Overveld et al. [19] reported however that bar charts were preferred, since they were easy to read. To present proportional data, pie charts were most preferred in 3/8 studies [18, 20, 34]. To meet a variety of preferences, Liu et al. [28] found that a dynamic dashboard gave clinicians the opportunity to customize the formats to their own needs.

Interpretation accuracy

Six studies assessed clinicians’ interpretation accuracy on aggregated data. Line graphs usually contributed to the ease of understanding PROMs scores for clinicians [16, 29]. Additionally, pie charts with proportional data were most often interpreted accurately by clinicians, for example when pie charts presented a proportional change in outcomes compared to baseline [20]. Clinicians both endorsed and objected that p-values, confidence intervals, and normed scores could contribute to their interpretation accuracy of aggregated PROMs data [16, 18]. Furthermore, directional consistency, balancing information, and giving feedback, and clear labeling could improve interpretation accuracy [18, 19].

Throughout many of the included studies, challenges were described that may affect correctly interpreting visualized PROMs data, such as (1) Directional inconsistency, i.e. a higher functioning score means better health, but a lower symptom score means better health as well [5, 16, 17, 34, 37]; (2) Lack of standardisation rules for interpretation and visualization, that may cause interpretation inaccuracy [2, 25, 35]; (3) The need to designing multiple formats per target group, as no ‘one-size-fits-all’ solution in graphic visualisation exists for both patients and clinicians [3, 6, 28]; (4) The timing of providing feedback on PROMs visualization, as this affects assessment experience [14, 26], and (5) Patients ‘ opposition to PROMs use in clinical practice [19, 36]. We summarize the challenges and the proposed solutions for these challenges in Table 3.

Table 3 Challenges and factors for improvement to consider when implementing visual individual PROMs feedback in clinical practice

How to distinguish clinically relevant PROMs scores

Clinically alarming scores were put in perspective by comparing current scores to the patients’ previous scores or to norm population scores. Ten studies described strategies to distinguish clinically relevant scores in practice [3,4,5, 16,17,18, 35]. These studies were conducted internationally among patients with different cancer diagnoses. Most frequently used comparator scores were the patients’ own previous PROMs data [3,4,5, 16,17,18, 35]. Additionally, four studies used a norm population (i.e.; patients with the same disease) to determine the clinical relevance of scores [1, 5, 19, 37]. Patients pointed out that the comparison of their own data with the scores from ‘patients like them’ (i.e. same sex, and age) was most valuable: it puts the patients functioning in perspective of what is regarded ‘normal’ [5].

Studies showed different methods of applying color to highlight scores, including: visualizing clinically non-alarming scores in green and clinically alarming scores in red [17]; red and green shading to show undesirable and desirable score ranges, respectively [4, 5, 18, 25]; a background fading from red (bottom) to green (top) [1]; traffic light colors (i.e. green, orange, red) [2].

Smith et al. [18] reported that 74% of patients and 80% of clinicians preferred green shading of non-alarming scores or red shading of alarming scores. Hildon et al. [2] reported that traffic light colors could improve interpretation accuracy across different graphic visualization formats, since these are universally recognized. Loth et al. [17] reported that 93% of the studied patients correctly interpreted the meaning of traffic light colors. Nonetheless, PROMs visualization strategies must include a detailed explanatory legend of the meaning and interpretation of colors and scores [4].

Other visualization strategies included the use of red circles around important scores—these were interpreted more accurately than green shading [34]. Furthermore, threshold lines across score bars were used to indicate whether scores are better or worse than threshold scores. These visualization techniques pleased 69% of patients and 70% of clinicians [17, 18]. Contrastingly, Snyder et al. [34] reported that a threshold-line was significantly more often rated as ‘very clear’, and most useful compared to green shading, and red circles. In more detail, another study found that a dotted threshold line was preferred over a solid threshold line to indicate alarming scores in bar charts [5]. Lastly, exclamation points can be used to indicate possibly concerning score changes, which was the preferred method for 79% of 39 patients and 40% of 40 clinicians [18].

Discussion

This systematic review included 25 studies in different healthcare settings throughout Europe, the US, and Canada, that reported about preferences and interpretation accuracy of patients and clinicians for the visualization of PROMs scores. Very few graphical visualization formats for presenting PROMs data could be identified. Overall, a limited amount of literature was found on this matter, which was summarized as follows. For individual level PROMs data, patients and clinicians preferred line graphs and bar charts, since they were considered visually clear. Bar charts were most often interpreted correctly by patients, while clinicians had high interpretation accuracy across all graph formats. For presenting group level PROMs data, pie charts and line graphs were preferred: patients most often interpreted line graphs correctly; for proportional data, clinicians most often accurately interpreted pie charts. To guide clinical interpretation by distinguishing clinically relevant scores, PROMs scores were most often compared to patients’ previous scores, followed by comparison to mean norm population scores. Here, correct interpretation can be supported by highlighting patients’ clinically alarming scores with colors, and by using threshold lines across score bars or lines, or circles around alarming scores.

Furthermore, we looked into the challenges that may hinder graphic visualization format interpretation. An underlying cause of incorrect interpretation of graphs may be the lack of standardization in rules for interpretation; variability exists in score directionality (e.g. higher scores can either indicate better or worse outcomes), and scaling (e.g. scores ranging from 0 to 100 indicating the worst-to-best possible scores, or scores ‘normed’ to a defined population). Furthermore, meaningful interpretation of PROMs scores is complicated by the way the statistical and clinical significance of the findings (i.e. thresholds to distinguish clinical importance) are addressed [16, 18, 38, 39]. Therefore, effort must be made to present PROMs scores to patients and clinicians more accurately by: improving directional consistency by making use of standard descriptive labels [34, 37], clear label ratings (i.e. consistent scales ranging from 0–100 [37]), detailed information on the meaning of high and low scores [17], simple iconography [25], and brief definitions to understand what the PROMs scores represent [25]. Furthermore, it was suggested is to visualize only a limited number of symptoms, and to ask patients to prioritize the symptoms they want feedback on [35]. Afore mentioned factors to guide clinical interpretation of graphic visualization formats of PROMs data, were identified as well in a Delphi-consensus study by Snyder et al. [15]. In this study, a panel including 15 doctor or nurse clinicians, 10 participants who identified as patient or caregiver advocates, 12 researchers, and 6 members of journal editorial boards, were asked to review data display issues, and give their perspectives on these issues to develop consensus statements. The authors conclude that implementation of graphic visualization formats of PROMs data have enormous potential to promote patient-centred care, however, it is critical that patients and clinicians understand what PROMs scores mean. More specifically, they recommended to use exceptionally clear labelling, titling, and other annotations to address potential confusion in direction of scores, and warn for mixing score direction in a single display. Furthermore, for conveying score meaning, descriptive labels along the y-axis are expected to be helpful [15]. The Setting International Standards of Patient-Reported Outcomes and Quality of Life Endpoints in Cancer Clinical Trials (SISAQoL) guideline [40] reported recommendations on directionality of scale scores similar to Snyder et al. [15]. In order to enhance clinicians’ interpretation of PROMs scores they recommended to reduce the number of metrics presented (e.g. a maximum of six bars in bar graphs and 4 lines in line graphs), use coloured arrows (e.g. green for better and red for worse scores) and to accompany more complex displays like funnel plots with a detailed interpretation [40]. This guideline will contribute to standardize rules for interpretation and visualization.

Another suggestion for clarifying PROMs visualisation formats is to develop a dynamic dashboard for PROMs feedback [3, 6, 28]. This way, patients and clinicians are able to change between different graphic visualization formats. We imagine options such as comparing scores to norms or threshold scores, as not all patients may want to do so. Based on the hypothesis that serving individual preferences may facilitate interpretation accuracy, this could improve interpretation of PROMs scores as well. However, it should be taken into account that implementing a dynamic dashboard comes with challenges, like access and availability of suitable software and sufficient IT staff to support such a dashboard.

Last, the implementation of PROMs hinges on more factors than visualization of data, starting with the perceived value by patients and clinicians of discussing PROMs during clinical encounters. Nine studies included in our review noted that PROMs were perceived as valuable [4, 5, 12, 16, 25,26,27, 35, 36], where some studies (n = 5) showed mixed results regarding the usefulness of PROMs [1, 3, 14, 17, 19]. How patients value PROMs may determine the interest in graphic visualization of PROMs. This could have affected the presented results in this review.

This systematic review is limited by the fact that only papers published in English were included. Studies in other languages regarding locally successful implemented feedback of PROMs data might have been missed. Furthermore, in this review, no distinction is made between objective (i.e. does a person actually interpret scores correctly), and subjective (i.e. does the person says he or she interprets scores correctly) interpretation accuracy. Only two included studies made a distinction between the different types of interpretation accuracy [4, 17]; for the other studies, based on the description of the study methods, we believe objective interpretation accuracy was assessed. Therefore, future research may investigate potential differences between actual and perceived interpretation accuracy.

Another potential bias is that for the majority of included studies it remains unclear how questions about preferences and interpretation of the presented visualization format(s) were framed to the study population. Besides, study participants may have had different background knowledge about graphic visualization formats, presentation of formats, and the content of PROMs data. This could confound our findings about how data and graphs were interpret, understood, and valued.

Furthermore, this review did not consider different levels of patients’ health literacy, since this information was not presented in the majority of articles reviewed. Patients with a higher level of health literacy can have different preferences compared to patients with lower levels of health literacy [3]. Some patients with limited health literacy may not understand the longitudinal nature of data from left to right nor the temporal connection between different graphical elements [28]. Therefore, healthy literacy should be included as factor in studies dealing with data interpretation, for example measured through the use of the validated Health Literacy Questionnaire (HLQ) [41]. Furthermore, health literacy can differ among age, gender, and/or education level and therefore should be studied among these separate groups of patients in relation to PROMs visualization [42, 43].

In conclusion, there was no predominant graphical visualization format approach in terms of preferences or interpretation accuracy for both patients and clinicians. To guide the clinical interpretation of scores during clinical encounters, PROMs scores can be compared to patients’ previous scores or mean scores from a norm population, or compared to comparator scores or score thresholds. Furthermore, the use of colors, threshold lines, or circles around alarming scores can visualize the clinical meaning of PROMs scores that have been compared to previous scores, norms, thresholds, or reference populations. Finally, detailed clarification of each graph may be essential for accurate interpretation. Visualization strategies should therefore include descriptions of PROMs directionality as well as standard descriptive labels, brief definitions, and presentation of a limited number of symptoms in a graph.