1 Background

Health Canada is the federal department responsible for maintaining and improving the health of Canadians [1]. Its mission is to make Canada one of the healthiest countries in the world through evidence-based decision making, public consultations, risk communications, and by encouraging Canadians to take an active role in their personal health [1]. Part of Health Canada’s responsibility is to identify, assess, and communicate safety information to Canadians using a variety of risk communication tools. Specific tools are selected according to the content, urgency, target audience, and developer of the risk communication [2]. These tools include Dear Healthcare Professional Letters, Notices to Hospitals, the Canadian Adverse Reaction Newsletter, Fact Sheets, Product Monographs, Recall Notices, Public Communications, Information Updates, It’s Your Health Publications, Foreign Product Alerts, and Public Advisories (PAs). PAs are a particularly important risk communication tool used for urgent and high-risk issues [2]. The importance of PAs are highlighted in their definition: “to inform the public of possible serious health hazards and enable Canadians to make informed decisions concerning the continued use of marketed health products” [2].

A PA template revision occurred in 2010 with a goal of improving the quality and accessibility of risk communications for the public. The revisions aligned Health Canada’s PAs with international regulators and attempted to clarify information through several mechanisms. The new template clearly identified health risks, actions to address the identified risks, and ways that Canadians could protect themselves (see Electronic Supplementary Material I for original and revised PA examples). These changes were recommended through different external drivers (e.g., The Office of the Auditor General of Canada and the Carlin Jury recommendations) and endorsed by Health Canada’s Expert Advisory Committee on the Vigilance of Health Products [3, 4]. The revised template shifted from a media-based “press release” format to a patient-directed “question–answer” style format and included the use of prioritized message order, boxed text, visual cues, and key bullets. Although not included in the template revisions themselves, an emphasis on easy-to-read titles and a reading grade level of 6–8 was targeted for all public communications, as per recommendations made in Health Canada’s Clear Writing Guide. Taken together, these revisions attempted to improve the way Health Canada communicated risk information to the general public.

Communicating health risk information is a key part of risk management and public health education [5, 6]. Health Canada’s risk communications are written with the assumption that readers are able to understand and make use of the information being provided, which is typically determined by the health literacy level of the reader. Health literacy is a person’s “ability to access, understand, evaluate and communicate information to promote, maintain and improve health in various life-course settings” [7]. Skills that contribute to health literacy include reading, writing, listening, speaking, numeracy, critical analysis, and communication and interaction skills [8]. Health literacy extends beyond general literacy, as it requires the reader to understand concepts related to science and medicine [8]. This poses a challenge in providing useful, evidence-based risk information while engaging Canadians through text they can understand.

The field of health literacy has experienced significant innovation throughout the past 25 years and continues to be a topic of interest as an important contributor to overall health [9, 10]. Low health literacy individuals mismanage chronic illness, use preventative services less, and have poorer health in general [11, 12]. Studies have shown that the link between health literacy and population health may have direct implications on healthcare spending and patient decision making [13]. Yet nearly 60 % of Canadians 16 years and older do not have the minimum health literacy levels needed to fully understand the health information they receive [13]. This is further pronounced in Canadian seniors, with more than 80 % having poor health literacy levels [13, 14]. Previous internal work found that PAs were written at a graduate student’s level—well above the average health literacy level of the general public.

Although we provide a general definition of health literacy, defining and measuring health literacy has not yet achieved consensus in the literature [15]. Furthermore, health literacy is often inferred from tests that measure the ‘health literacy burden’ of materials (i.e., how difficult materials would be for an individual with a particular health literacy level) as opposed to direct means, such as focus groups [16, 17]. This has led to a variety of tools being used to measure the health literacy burden, many of which vary in validity and applicability [1820].

For this study, two types of health literacy tools were used to compare PAs: readability formulas and suitability assessment of materials (SAM) tests. These two tools were selected for three key reasons. First, readability and SAM tests were found to be very common in the literature as a means to evaluate the clarity or health literacy burden of printed and/or computer-viewed materials [2123]. Second, the tools were inexpensive, easy to use, and not resource intensive; an important consideration during times of limited government spending [24]. Finally, they provided a systematic, reliable way to evaluate health literacy concepts in written and visual materials [24, 25].

Although readability and SAM tests overlap in what they measure, the two tools differ in what data they can provide. Readability tests use mathematical formulas to measure word length, number of syllables per word, number of words per sentence, and number of sentences per paragraph [26]. They provide an objective score that loosely translates into what school grade equivalent would be needed for an individual to read and understand the text [26]. For example, a text that scored 6.0 would generally be appropriate for a grade 6 student in elementary school.

Readability software was compared with a test that takes additional health literacy factors into consideration. The SAM test considers content, literacy demand, graphics, layout, font style, stimulatory factors, and motivational cues when evaluating texts [27]. The SAM test is subjective but has undergone extensive validation across various cultures to support it as a reflection of how low health literacy individuals would judge materials [27]. Members of the Johns Hopkins School of Medicine, University of North Carolina School of Public Health and Veterans Affairs Hospital contributed to determining if the SAM scores can measure how clear written text is for a low health literacy patient [27]. Although not validated by Health Canada, studies on risk communication tools support the validity of SAM tests in this context [28, 29].

The goal of this study was to compare PAs written using the original (Pre-format) and the revised (Post-format) template through readability and SAM tests. Additionally, the tests themselves were evaluated for their usefulness and applicability in a regulatory setting.

2 Materials and Methods

This retrospective study collected PAs from Health Canada’s website (http://www.hc-sc.gc.ca/dhp-mps/medeff/advisories-avis/index-eng.php). Only PAs written for marketed health products posted between 3 May 2009 and 4 May 2011 were considered for this study. A “health product” was defined as a pharmaceutical, biologic, natural health product, or medical device. 92 PAs were originally collected. This study excluded any non-English PAs; therefore, 46 PAs were included for analysis (14 “Pre-format change” and 32 “Post-format change”).

2.1 Readability Tests

Assessment of readability was performed using Readability Studio 2009, version 3.2.7.0 (Oleander Software, Vandalia, OH, USA). PA text was entered into the software by the evaluators. Non-body text, such as the advisory number, date, and the “for immediate release” disclaimer were not included. For “Post-format change” samples, the “Related Health Canada Web Content” section was also omitted since it is not directly part of the risk communication. All graphics, tables, corresponding titles, and captions were excluded from the readability tests. Bullet points were converted automatically by the software into sentences.

Seven different readability tests were performed on each PA and then compared with each other to determine school grade equivalents. An average of all seven readability tests was also obtained for each PA. These tests show the education level that would be required to understand the text, with results expressed in a school grade equivalent. Table 1 provides a list of the tests used, the applicable scoring ranges, and how these scores translate into school grade equivalents.

Table 1 Readability test score ranges and associated school grade equivalency [3035]

2.2 Suitability Assessment of Materials (SAM) Tests

Suitability assessment of materials testing was independently performed on each PA by three different evaluators. PAs were printed and assessed in this format. The method required evaluators scoring document elements as either 0 (“inadequate”), 1 (“adequate”) or 2 (“superior”) based on elements within five categories (Table 2). Overall SAM scores were summed and divided by the total possible SAM score to create a percentage. Percentages were interpreted as follows: 70–100 % was for superior material that was suitable for low health literacy individuals, 40–69 % was for adequate material that may or may not be understood by low health literacy individuals, and 0–39 % was for not suitable material that would not be understood by low health literacy individuals. For more details on how to score materials, the elucidation of each category/elements and construct validation of the SAM, refer to Doak et al. [27].

Table 2 The adapted suitability assessment of materials test, including categories and elements, used in this study [27]

Since the SAM was originally designed for patient education print materials, not all categories applied to the assessment of the PA. As a result, a modified SAM test was performed on all PAs in the study with the removal of certain categories/elements based on irrelevance or lack of applicability. Summary reviews, cover graphics, and lists/tables were not included because PAs do not typically have those items. “Interaction Used” was removed because PAs are not a health product risk communication designed to work through an interactive process. Lastly, “Cultural Appropriateness” was removed because it was outside the scope of this study.

2.3 Statistical Analysis

Statistical analyses were performed with GraphPad Prism 5 (GraphPad Software, La Jolla, CA, USA). The Mann-Whitney U test was used for comparisons between “Pre-format change” and “Post-format change” PAs to measure the significance of differences seen in the results. Differences were considered statistically significant at p < 0.05. All values are expressed as the mean ± standard error of the mean.

3 Results

3.1 Readability Tests

The results in Fig. 1 show the average readability score, in grade levels, of PAs developed with either the original (“Pre-format change”) or revised (“Post-format change”) template. The majority of PAs fell into the grade 13–15 range regardless of the template used. Furthermore, there was little to no difference observed between readability tests (see Electronic Supplementary Material II for readability and SAM scores).

Fig. 1
figure 1

Comparison of the average readability results using Public Advisories “Pre-format change” (n = 14) versus “Post-format change” (n = 32)

3.2 SAM Tests

The results in Fig. 2 show the average SAM score for PAs using the original or revised template. On average, PAs written using the original template scored adequately, at 51 %. PAs written using the revised template also scored adequately, at 69 %. The SAM scores increased, on average, by 18 percentage points and were statistically significant (p < 0.001). Grouping of PAs was attempted for similar products and classes of drugs, but no differences in scoring trends were observed.

Fig. 2
figure 2

Comparison of the average suitability assessment of materials results using Public Advisories “Pre-format change” (n = 14) versus “Post-format change” (n = 32). Values are mean ± standard error of the mean. SAM suitability assessment of materials

Individual categories in the SAM test were analyzed and, as seen in Fig. 3, significantly improved when the revised template was used. “Literacy Demand,” “Graphics,” and “Layout and Typography” were increased by 25.7 (p < 0.001), 19.7 (p = 0.020), and 27.8 (p < 0.001) percentage points, respectively.

Fig. 3
figure 3

Comparison of the average suitability assessment of materials results, by category, using Public Advisories “Pre-format change” (n = 14) versus “Post-format change” (n = 32). Values are mean ± standard error of the mean. C Content, G Graphics, L Literacy Demand, L&M Learning Stimulation and Motivation, L&T Layout and Typography, SAM suitability assessment of materials

An analysis of elements within the aforementioned categories indicated select improvements (Table 3). Two “Literacy Demand” factors, namely “Writing Style” (p < 0.001) and “Use of Learning Aids” (p < 0.001), improved significantly after the template was revised. In the “Graphics” category, “Captions” significantly improved (p < 0.001) from inadequate to adequate. “Layout” and “Subheadings” improved significantly (p < 0.001) in the “Layout and Typography” category, while “Typography” decreased significantly (p = 0.042). The use of subheadings improved from inadequate to superior. None of the elements under “Content” or “Learning Stimulation” changed significantly.

Table 3 Comparison of the average suitability assessment of materials results, by category and element, of public advisory “Pre-format change” (n = 14) versus “Post-format change” (n = 32). Values are mean ± standard error of the mean

4 Discussion

This study compared PAs, before and after a template revision, using two different health literacy tools: readability and SAM tests. Readability tests are objective and provide a quantitative assessment of the text, limiting subjectivity. This tends to be crude, however, as it gives an idea of text difficulty without taking the entire document into context [25]. Text layout, organization of information, and pictures are completely ignored, even though they may be important in reducing the health literacy burden for the reader [25]. The SAM test, on the other hand, is subjective in nature and assesses text while considering many factors omitted in a readability test [27]. Although it can assess whether text is adequate for low health literacy individuals, it suffers more easily from subjectivity issues and biases [27]. The results of each test are discussed below.

4.1 Readability Tests

The average reading grade level for PAs did not change with the template revision. The readability test results for PAs remained at a grade 13–15 level (i.e., requiring a college or university education to understand) after implementation of the revised template. These results were consistent among all seven readability tests used and demonstrated that no obvious advantage exists in using one method over another in terms of sensitivity and specificity. Although these findings are not surprising, given the limited impact the template change had on content development, this result highlights the need for further attention to how content is written and developed for PAs.

Readability tests were inexpensive and not resource intense but limited, overall, when examined for their usefulness and applicability in a regulatory setting. The limitations were intrinsic; readability tests use mathematical formulas to account for factors such as the number of words per sentence, syllables per word, etc. [36]. As such, readability scores need to be scrutinized when used alone to avoid misinterpretation: shortening of words/sentences does not necessarily make things easier to understand; people do not process text the same way a computer does; and readability formulas do not capture other important parts of the health literacy burden [36]. As mentioned earlier, many factors impact the complexity of understanding scientific and medical literature; therefore, the use of readability tests as a standalone measure should be cautioned unless combined with more robust tests [36].

4.2 SAM Tests

SAM tests consider a number of relevant factors such as presentation, context, and the use of images to measure the difficulty of a given text [27]. Although only capable of providing an estimate of the health literacy burden, SAM tests consider a greater array of health literacy factors than readability tests. Prioritized message order, boxed text, visual cues, and other factors contributed to better SAM scores in the revised PAs. PAs using the original format typically scored poorly (below 60 %) in many of the SAM categories. The overall “Pre-format change” average was ranked “Adequate”, but near the low end of the scale, at 51 % (Fig. 2). PAs using the revised template showed a significant improvement, with the overall average increasing by 18 percentage points (p < 0.001) and shifting towards the high end of the adequacy scale, at 69 %. This was only 1 percentage point away from achieving an average score of “Superior,” indicating that most materials were near suitable for low health literacy individuals.

Improvements in “Literacy Demand” were due to the use of an active voice, adoption of a more conversational style of writing, and addition of learning aids. Greater use of active voice in “Post-format change” PAs was apparent throughout, particularly in the “What You Should Do” section. For example, “Pre-format change” PAs would recommend contacting a healthcare professional in the following manner: “Consumers who have purchased ‘product X’ are advised not to use the product and to consult with a medical professional if they have used the product and have concerns about their health.” “Post-format change” PAs, however, would state: “Consult your healthcare practitioner if you have used any of these products and are concerned about your health”. This section used imperative tone and started with action verbs, such as “Consult,” “Read,” and “Report.”

Other improvements included the use of “road-signs,” or headers, which added structure and allowed the reader to better sort the information. An improved sentence structure, through a more dedicated use of context, ensured that important health-related information was more visible than in previous PAs. Improvements in context were important, but must be considered in relation to other elements. DeWalt et al. [37] reported that risk communication providers sometimes believe that context dictates readability and usability of a document. In reality, context is only one component of a clear risk communication and cannot solely determine how well the information will be understood by the end user. For this reason, DeWalt et al. [37] created a toolkit that was designed to address health literacy-based barriers in a variety of ways without over-relying on context.

Another category that significantly improved was “Graphics.” “Graphics” scored poorly with “Pre-format change” PAs in two areas: “Relevance” and “Captions.” The “Relevance” was inadequate because PAs generally failed to illustrate key points visually or contained visual distractions. “Captions” were rarely included or failed to provide a quick reference to the reader about the graphic. Although this section improved significantly (p = 0.020), from not suitable to adequate, failure to reach a superior score provides evidence that PAs did not fully capitalize on the potential for using graphics effectively. The “Post-format change” PAs did, however, use pictures, tables/charts, and other visual aids more often. Most of the PAs using the revised template included a photograph of the particular health product along with a short caption (typically the name of the product). These photographs were meant to be simple and provide readers with a visual aid to facilitate product recognition. The use of images has been shown to improve attention to and recall of health material, thus playing a significant role in reducing the health literacy burden of information [38].

The “Layout and Typography” category experienced the greatest increase, as the revised template focused mainly on format elements such as font, layout, subheadings, and “chunking.” Font was standardized, illustrations were added in logical sequence, and colored boxes were used to highlight and divide important text and headers into easy-to-read sections. “Layout” and “Subheadings or ‘Chunking’” had significant increases in SAM scores after the template revision was implemented, making it the category that had the largest impact on improving SAM scores. Interestingly, “Typography” decreased significantly (although only marginally in score). This was most likely attributed to printer settings when PAs were produced for analysis, as evaluators noted that font sizes were smaller for several revised PAs even though the original source material was standardized for type size.

The “Content” category of the PAs remained unchanged with the template revision. This result was not surprising given that there was no change in the need for risk communications, the type of information that was communicated, or the scope of the PA’s objectives. As such, the SAM scores for “Purpose,” “Content Topics,” and “Scope” remained similar between “Pre-format change” and “Post-format change” PAs. The overall score in the “Content” category remained superior, but this does not preclude further improvements in future PAs. Including the purpose directly in the title, tailoring the scope of the information to the target audience and providing a short summary at the end of the information could improve SAM scores for the “Content” section.

Similarly, the SAM score for the “Learning Stimulation and Motivation” category remained unchanged after the template revisions. This result was also not surprising since the format change did not focus on adding desired behaviors or motivational points. Although this category scored in the superior range for both “Pre-format change” and “Post-format change” PAs, this was likely due to how information was presented and not because of interactive components. As other media (e.g., social media) become more prevalent in the risk communication process, this category may need to be studied further to determine how best to capitalize on elements related to “Learning Stimulation and Motivation.”

Overall, the SAM test emerged as a useful and applicable tool for evaluating health product risk communications in a regulatory setting. The tool was inexpensive and provided a more robust analysis of PAs before and after a template revision. The results also highlighted the impact the PA template had on SAM scores, providing targets for further improvement.

4.3 Limitations

There were several limitations to this study that the authors would like to acknowledge. The use of cultural analysis was omitted from the SAM test. Ensuring that risk communications issued by Health Canada were sensitive and motivating to such a broad range of ethnic groups was considered outside the scope and resources of this study. Further study, in this regard, would add another dimension to the findings in terms of how various cultural and linguistic backgrounds may absorb the information relayed by PAs.

A similar analysis of French PAs would undoubtedly provide a more generalizable study. Given the two official languages of Canada are French and English, future studies would provide more insight into the health literacy burden of French PAs.

As previously stated, the SAM test is subjective by nature, which can lead to significant bias in the end results. This subjectivity can negatively impact inter-rater reliability since evaluators may interpret elements differently. The use of more than one evaluator is recommended to reduce the potential for bias; however, all evaluators should discuss the relevance of test elements and how scoring will be conducted before testing begins. For example, deciding what text will be included, what counts as a table versus a list or picture, and how readability will be measured can help improve reliability among different evaluator results.

Finally, assessment of comprehension by means of public consultations was not performed as part of this study. Although conducting public consultations and focus groups would vastly improve understanding of the use and comprehensibility of PAs, these measures are resource intensive. The SAM was designed and validated with this in mind and attempts to gather consultation-like data in its assessment of health information. Furthermore, the SAM test measures the health literacy burden, which can be used to infer how clear the material will be for low health literacy individuals. That being said, the SAM results should be supported with consultations, if resources are available, to measure how clear a health product risk communication is to the target audience.

5 Conclusion

Implementation of a revised PA template reduced the health literacy burden of Health Canada’s PAs as measured by SAM tests; however, the revisions did not improve readability. The SAM test, as described in this study, is a useful, applicable tool in a regulatory setting. The findings are important to drug regulators who communicate risk information pertaining to health products, but should also be considered by industry and public sectors as a best practice for measuring the health literacy burden of risk communications.

Future research should focus on supporting data obtained from SAM tests with public consultations (such as focus groups) to more accurately validate this test as a measure of risk communication effectiveness. As well, studies should investigate if the SAM test is an informative tool for evaluating the effectiveness of French risk communications from the standpoint of clarity and readability.