FormalPara Key Points

This is the first study using a representative sample to show that people greatly overestimate the intended frequency of the verbal risk descriptors used to label side effect risk in patient information leaflets, especially when describing mild side effects.

Small changes to the wording used in the verbal risk descriptors will not solve this problem; the issue was the same for the three different forms of wording that we tested.

More radical changes (including abandoning the use of verbal risk descriptors) should be considered.

1 Introduction

In Europe, all medicines prescribed or sold over-the-counter must be accompanied by a comprehensive patient information leaflet (PIL), which is required to present the risk of potential side effects in ‘clear and understandable terms for the patient’ [1]. Over 70% of patients receiving a drug for the first time will read the accompanying PIL [2].

Effectively communicating information about the risk of side effects is difficult [3]. In 1998, European Commission (EC) guidelines suggested grouping adverse effects within a PIL according to five frequency bands and using a verbal label for each one [4]. Side effects could be ‘very common’ (experienced by more than one in ten patients), ‘common’ (up to one in ten, ‘uncommon’ (up to one in 100), ‘rare’ (up to one in 1000), or ‘very rare’ (up to one in 10,000). This way of quantifying the risk of adverse reactions was originally suggested by the Council for International Organizations of Medical Sciences (CIOMS) Working Group III in 1995 [5], and therefore is also applicable outside the European Union. Although not based on any empirical evidence [6], these EC recommended verbal risk descriptors have become widely used. Of the 50 most frequently dispensed medicines in England and Wales in 2012, 76% of the PILs used these verbal risk descriptors [7].

Since the guidelines were published, several studies have suggested that these verbal risk descriptors are problematic [8, 9]. It has been shown that UK students overestimate the risk associated with each of the verbal risk descriptors [10]. In another UK study, only seven out of 180 participants provided probability estimates for the verbal risk descriptors ‘common’ and ‘rare’ that fell within the EU guideline’s frequency range [11]. Studies with patients have found similar overestimations [12, 13], as has research with physicians, pharmacists and lawyers [14, 15]. In part, estimations seem to depend on side effect type, with ‘mild’ side effects given higher estimations than ‘severe’ side effects when described as ‘common’ or ‘rare’ [11].

Since these findings, the guidelines have been updated and suggest PILs should combine the verbal and numerical expressions (e.g. ‘very common, more than 1 in 10 people’) [16]. However, research has shown that this may not lead to more accurate side effect risk estimates than the verbal format [17] and still leads to significant risk overestimations when compared with numerical frequency bands alone [18]. The continued use of the same ‘very common’ to ‘very rare’ wording appears to still present problems for the public, even despite the addition of numerical information.

These findings are troubling. If patients systematically overestimate the risk of side effects from their medications, this may reduce their adherence and also increase the risk of symptoms occurring as a result of a nocebo effect [19]. Previous studies have limitations, however, often being based on small samples that are not representative of the general population. No study so far has used a national representative sample. Nor has any study sought to identify whether there are demographic or psychological factors that make individuals more or less likely to correctly estimate the numerical risk represented by these verbal descriptors. Psychological characteristics such as beliefs about medicines, optimism or perceived sensitivity to medicines are associated with medicine side effect expectations [20, 21] and may also affect how people perceive the risk information about medication.

In this study, we used a large cross-sectional survey of a representative sample of 18- to 65-year-olds in England in order to:

  1. 1.

    Assess how people interpret the risk associated with the EC recommended verbal risk descriptors and two sets of alternative verbal risk descriptors.

  2. 2.

    Investigate if people interpret the risk associated with verbal risk descriptors differently depending on whether they describe a mild or severe side effect.

  3. 3.

    Determine whether demographic or psychological factors are associated with correctly interpreting the risk implied by the EC recommended verbal risk descriptors.

2 Methods

2.1 Design

We commissioned the market research company Ipsos MORI to conduct an online survey of adults aged between 18 and 65 years living in England. Data collection took place between 18 March and 1 April 2016. This study was approved by the Psychiatry, Nursing and Midwifery Research Ethics Committee at King’s College London (reference HR-15/16-2104).

The same study was used to assess in detail factors associated with patients’ expectations of side effects conveyed by verbal labels of risk, the results of which have been submitted elsewhere.

2.2 Participants

Ipsos MORI recruited participants from an existing panel of people willing to take part in internet surveys living in England (approximately n = 160,000). We excluded over 65 s because of a concern that older adults who are members of an internet survey panel are not representative of the general population of older adults [22, 23]. Potential participants were emailed a link to the survey. After providing informed consent and clicking through to begin the survey, participants were allocated to receive questions about either mild or severe side effects. This was decided by the survey software on the basis of which condition had the lowest number of completed responses at that time. Panel participants typically receive points for every survey they complete; for our survey, participants received points equivalent to 75 pence.

2.3 Sample Size

Quotas based on participant age and gender (interlocked), location, and working status were used to ensure that the sample reflected the known demographic profile of adults aged 18–65 in England, according to data from the National Readership Survey [24]. We intended to recruit 1000 participants to provide us with a sample error of about plus or minus 3% at the total sample level.

2.4 Questionnaire Development

Where possible, we used or adapted items that had been previously developed and tested for their reliability and validity. We piloted all items with five members of the general public and rephrased items where necessary to improve clarity.

2.5 Primary Outcome: Verbal Risk Descriptor Probability Estimates

We included several items to assess participant understanding of the verbal risk descriptors. These asked people to consider a PIL for an unnamed drug that stated, for example, that ‘nausea is common’. Items then asked participants to estimate how many out of 10,000 people who take the drug would develop that side effect. We chose a consistent denominator for each item to prevent confusion among participants had denominators changed in each question. To prevent the need for participants to give responses of less than one, we decided that a denominator of 10,000 would be needed. Our choice to get participants to use numbers is supported by Schwartz and Woloshin [25], who argue that despite concerns that people do not understand the use of numbers, representative populations are competent in using numbers as a decision aid for choosing between two drugs [26].

Participants were asked about either mild side effects (‘headache’ or ‘nausea’) or severe side effects (‘seizure’ or ‘difficulty breathing’) depending on which condition they had been assigned to. We used the EC recommended verbal risk descriptors ‘very common’, ‘common’, ‘uncommon’, ‘rare’ and ‘very rare’ supplemented with the terms ‘very uncommon’ and ‘extremely rare’. We also included other terms used in the risk communication literature [27,28,29], which could be combined into similar scales based on likelihood (very likely, likely, somewhat unlikely, fairly unlikely, very unlikely, extremely unlikely) and chance (very high chance, high chance, fair chance, low chance, very low chance, extremely low chance). Verbal risk descriptors were presented in a random order to participants.

2.6 Demographic Factors

We asked participants about their age, gender, ethnicity, highest level of education, employment status and whether they or anyone in their household had a long-standing illness, disability or infirmity.

2.7 Psychological Factors

We adapted the single-item literacy screener [30] assessing health literacy and asked participants to rate how often they needed help reading PILs, and included one item that asked how often they read PILs when taking a new medication. Both were rated from 1 (‘never’) to 5 (‘always’). We used one item from the Health Anxiety Inventory [31] to assess health anxiety, which asked participants to select one of four statements describing their feelings over the past 6 months. These ranged from 1 ‘I do not worry about my health’ to 4 ‘I spend most of my time worrying about my health’.

We assessed optimism using the Revised Life Orientation Test [32], which consists of six items (plus four filler items), giving a total score from 5 to 30, with higher scores indicating higher optimism. Participant beliefs about medicines were assessed using the overuse and harm general subscales of the Beliefs about Medicines Questionnaire (BMQ) [33], providing scores from 5 to 20 for each subscale, with higher scores indicating higher perceived overuse or harm. Finally, how sensitive participants thought they were to medicines was assessed using the Perceived Sensitivity to Medicines scale [34], giving a score from 5 to 25, with higher scores indicating higher perceived sensitivity.

2.8 Analysis

We grouped side effect frequency estimates given by participants into the same bands as those suggested for use by the EC guidelines [4] and produced histograms to show the frequency with which each band was selected for each verbal descriptor.

We carried out a series of Mann–Whitney U tests to test if participants’ median estimates differed between mild and severe side effects.

We used a series of multinomial logistic regressions to test if any demographic or psychological characteristics were associated with participants under-, over- or correctly estimating the numerical risk associated with the EC recommended verbal risk descriptors. Participants’ numerical risk estimates were first recoded as an underestimate, overestimate or correct estimate for each descriptor. For the verbal risk descriptors of very common and very rare (where it is not possible to overestimate or underestimate, respectively), binary logistic regressions were carried out instead. For each of the regressions, all demographic variables and side effect types (mild or severe) were added to the regression in one block, and each psychological variable was added on its own, controlling for the previously entered variables.

For all analyses, answers of ‘don’t know’ or ‘prefer not to say’ were excluded. A maximum of 3% of participants answered ‘don’t know’ for any one question where this was an option, and this was 1% for ‘prefer not to say’. All analyses were carried out using SPSS 22. Because the frequency of participants’ side effect numerical estimates for each of the verbal risk descriptors did not change by more than 0.2% when using data weighted by age, gender, region and working status, we used unweighted data for our analyses.

3 Results

3.1 Sample Characteristics

A total of 1003 participants completed the survey and were included in the final sample (see Fig. 1 for response rates). Demographic information for the participants is given in Table 2.

Fig. 1
figure 1

Participant flow through the survey

3.2 People’s Interpretation of the Verbal Risk Descriptors

Figure 2 shows the frequency of participants’ numerical risk estimates for each verbal descriptor. Two distinct distributions were apparent; those for ‘high risk’ verbal descriptors, which portrayed a side effect as very common, common, very likely, likely, very high chance, high chance, or fair chance and those for ‘low risk’ verbal descriptors, which portrayed a side effect as uncommon, rare, unlikely and so on. Within these two groups, the distributions were near to identical regardless of what adjective (e.g. common, likely, chance) or adverb (e.g. very, high, fair) was used.

Fig. 2
figure 2

Participants’ estimates of the meaning for each verbal risk descriptor: a European Commission recommended verbal descriptors; b likely verbal descriptors; c chance verbal descriptors. *Added in for this study

For the ‘high risk’ verbal descriptors, most participants (84.4% and 81.2%, respectively) thought ‘very common’ and ‘common’ meant a risk of 1001–10,000 per 10,000 patients (i.e. more than one in ten) for mild side effects, with similar percentages seen for ‘very likely’ (83.8%), ‘likely’ (83.2%), ‘very high chance’ (85.2%), ‘high chance’ (84.0%), and ‘fair chance’ (74.7%). This pattern repeated itself for severe side effects, with the majority (56.3–71.2%) of participants giving estimates that corresponded to a risk of 1001–10,000 (more than one in ten). For ‘low risk’ verbal descriptors, the majority (61.8–76.7%) of participants provided estimates of 101–1000 (up to one in ten) or 11–100 (up to one in 100) for mild side effects, with this dropping to 43.5–70.6% for severe side effects.

3.3 The Effect of the Severity of Side Effects on People’s Interpretation of the Verbal Risk Descriptors

Participants’ numerical risk estimates were lower for each verbal descriptor when it described severe side effects compared to mild side effects (all p values <0.001, see Fig. 3).

Fig. 3
figure 3

Median estimates out of 10,000 given for each verbal risk descriptor: a European Commission recommended verbal descriptors; b likely verbal descriptors; c chance verbal descriptors. *Added in for this study. Bars represent the interquartile range

3.4 The Association Between Demographic and Psychological Factors on People’s Numerical Estimates for the EC Recommended Verbal Risk Descriptors

The proportions of participants giving correct, over- or underestimates for the EC recommended verbal risk descriptors are shown in Table 1. Table 2 shows the association between demographic and psychological variables with participants’ numerical estimates. Older participants and those from ethnic minorities were generally less likely to overestimate the numerical risk of the verbal risk descriptors. Participants with no academic qualifications were 63% less likely than participants with university degrees to give correct estimates for ‘very common’, and 56% less likely than participants with degrees to overestimate ‘common’. Participants who had someone in their household with a long-term illness or disability were 129% more likely than those without to underestimate ‘rare’. By far the most influential factor was whether the descriptor related to mild or severe side effects. In general, mild side effects were more likely to be overestimated than severe side effects.

Table 1 Frequency of how many people provided numerical risk estimates for each EC recommended verbal risk descriptor that were correct or incorrect according to the corresponding EC frequency bands
Table 2 Demographic and psychological factors associated with how well participants guessed the numerical risk estimate for very common, common, uncommon, rare and very rare

Most psychological characteristics had no association with whether participants estimated the numerical risk in accordance with the EC frequency bands. Optimism showed a small but significant effect for two verbal risk descriptors, with each 1-point increase in optimism resulting in participants being 4% less likely to overestimate ‘uncommon’ and 6% less likely to underestimate ‘rare’ descriptors. Belief about the harm of medicines also showed a small but significant effect for the ‘very common’ descriptor, with each 1-point increase in harm score resulting in participants being 6% less likely to give correct estimates. Finally, for each 1-point increase in health illiteracy, participants were 24% less likely to provide a correct estimate for ‘very common’. Similarly, for each 1-point increase in health illiteracy, participants were 27% less likely to overestimate ‘common’ compared with estimating it in accordance with the EC frequency bands.

3.5 Post-hoc Analyses

As an additional analysis, we tested how much variance could be explained by entering the predictors altogether in one model. For predicting correct or incorrect estimates for the verbal risk descriptors ‘very common’ and ‘very rare’ using binary logistic regression, both models were a good fit for the data, with both Hosmer and Lemeshow tests being non-significant (both p values > 0.068). However, using Nagelkerke’s R 2, the models only explained 10.3% of the variance in estimates for ‘very common’ and 3.1% of the variance in estimates for ‘very rare’. Similarly, for predicting correct, over- or underestimates for the verbal risk descriptors ‘common’, ‘uncommon’ and ‘rare’ using multinomial logistic regressions, the models were a good fit for the data, with all χ 2 tests being non-significant (all values > 0.232). Again, however, using Nagelkerke’s R 2, the models only explained 12.3% of the variance in estimates for ‘common’, 10.6% of the variance in estimates for ‘uncommon’ and 8.7% of the variance for ‘rare’ outcomes.

4 Discussion

4.1 Summary of Main Findings and Interpretation

There are several key findings from our work. First, the verbal risk descriptors recommended by the EC and commonly used in PILs are not accurately interpreted by members of the public as meaning the same things as the associated numerical expression. As shown in previous studies [10, 12, 14, 15], people widely overestimate what they mean in terms of frequencies. This overestimation of risk is important, making people less inclined to take their medication [10, 35] and leading to potentially self-fulfilling expectations of symptoms [36]. This overestimation generalises to other verbal risk descriptors not recommended by the EC; simply tweaking the wording of the verbal risk descriptors in use seems unlikely to resolve this problem. Instead, the issue may be a fundamental mismatch between how we are attempting to communicate (with five different divisions of frequency) and how people understand risk. Our data suggest people view verbal risk descriptors as meaning either ‘likely’ or ‘not likely’—all descriptors are mentally reinterpreted as one of those two versions.

Second, it is hard to establish who will interpret the risk information in accordance with the corresponding EC frequency bands. Even combining all our predictors into one model did little to explain the variance in estimates across the different verbal risk descriptors. Only age and side effect type showed a consistent effect. This follows previous research that has shown that older people have lower perceptions of risk in general [37]. The finding that mild side effects were particularly likely to be overestimated has also been shown previously [11] and may reflect the influence of the availability of a heuristic [38]; given that people will have had more experience with headaches than seizures, it seems likely that people will find it easier to recall an example of a headache, elevating their perceived likelihood. Ethnicity, education and health illiteracy did have an effect, but only for the higher verbal risk descriptors. White participants, participants with academic qualifications and those who were more health literate were more likely to overestimate these high verbal risk descriptors. It is possible that white participants are more likely to overestimate than participants from ethnic minorities, as it has been shown that they are prescribed more medications [39] and as such may have more experience with medication side effects. It is surprising that participants with academic qualifications and those with higher health literacy were more likely to overestimate, and we are unsure why this might be.

4.2 Implications for Side Effect Reporting Guidelines and Clinical Practice

Verbal risk descriptors have long been favoured for the presentation of side effect risk on the basis that they can break up long lists of side effects into more manageable sections based on frequency and convey the uncertainty of risk, and because some people are more comfortable with verbal than numerical information [40]. Current guidelines suggest PILs should combine the verbal risk descriptors with the numerical expressions [16]; however, we argue this is not enough, as it still leads to overestimation [17, 18]. If combined expressions are used, it remains important to use the correct verbal risk descriptor that is interpreted by people in the same way as the numerical expression that is associated with it. However, our survey has shown that verbal risk descriptors as a whole mislead rather than inform, leading readers to greatly overestimate their risk of side effects. We suggest that PILs should abandon the use of these verbal risk descriptors and instead side effect risk should be grouped under numerical frequency bands only. As well as having implications for PILs, the results of this survey also point out the need for clinical practitioners to reassure patients that side effects are much less likely than patients think. In addition, as mild side effects were overestimated more than severe, we suggest that practitioners may wish to focus in particular on correcting misperceptions about the likelihood of mild side effects.

4.3 Strengths and Weaknesses, and Future Research

This study is strengthened by its large sample size and the fact that it was demographically representative of 18- to 65-year-olds in the English population. While it is possible to question the validity of the data as it is unknown if online participants read the questions properly or if they were distracted with other tasks whilst completing the study [41], this issue may not be as big as suspected [42], and was partly offset by our exclusion of participants for ‘straightlining’ or ‘speeding’. It is limited, however, in terms of selection bias, as we do not know whether members of market research panels are psychologically representative of the general population in terms of attitudes to medicines and their risk of side effects.

It is possible that the finding that psychological variables poorly predict participants’ estimates might be due to a lack of quality in the measures used to capture these variables. This is unlikely for optimism, belief about medicines, and perceived sensitivity to medicines, which were measured using well validated scales; however, health anxiety, health literacy and PIL reading behaviour were assessed by modifying validated scales and creating bespoke items for this study.

The response mode we chose for participants when estimating the numerical risk of the verbal risk descriptors also could have affected our results. Participants were asked to give a number out of 10,000; however, past research has suggested that open-ended questions such as this are more susceptible to risk overestimation compared with questions that require selecting an answer from a few different response options [43]. We chose this method to make it easier for participants to express small probabilities, and to allow participants to give their exact thoughts rather than having to choose from select options covering a broad range of answers. Nevertheless, we would be interested in future research to see how the results differ comparing these different types of response options for estimating the numerical risk of verbal descriptors.

In addition many of the questions used in the survey were hypothetical, e.g. estimating risk of side effects to an imagined drug. Future research should replicate this study with patients given a newly prescribed medication to remove any limitations relating to the hypothetical scenario used in our survey. We excluded over 65 s because of concerns about how representative they are in online surveys. However, over 65 s are the heaviest medication consumers [44]; therefore, extension of our findings to this age group would be useful. Research should also examine whether use of numerical, rather than verbal, descriptors produce more realistic risk estimates among participants. As with verbal risk descriptors, different numerical formats are possible (e.g. reframing the risk in terms of the number/proportion of people who remain side effect free). Identifying the best way of presenting this information remains an important goal.

5 Conclusion

Members of the public commonly overestimate the risk associated with verbal risk descriptors. It may be difficult, if not impossible, to find the perfect verbal risk descriptors that are interpreted by the public in line with the different levels of side effect risk. It may be that PILs should abandon the use of verbal risk descriptors altogether. This will limit the opportunity people have to overestimate the likelihood of side effects, allowing patients to make informed decisions about their medication and reducing the occurrence of side effects brought on from negative expectations, e.g. due to the nocebo effect.