This study compared PAs, before and after a template revision, using two different health literacy tools: readability and SAM tests. Readability tests are objective and provide a quantitative assessment of the text, limiting subjectivity. This tends to be crude, however, as it gives an idea of text difficulty without taking the entire document into context . Text layout, organization of information, and pictures are completely ignored, even though they may be important in reducing the health literacy burden for the reader . The SAM test, on the other hand, is subjective in nature and assesses text while considering many factors omitted in a readability test . Although it can assess whether text is adequate for low health literacy individuals, it suffers more easily from subjectivity issues and biases . The results of each test are discussed below.
The average reading grade level for PAs did not change with the template revision. The readability test results for PAs remained at a grade 13–15 level (i.e., requiring a college or university education to understand) after implementation of the revised template. These results were consistent among all seven readability tests used and demonstrated that no obvious advantage exists in using one method over another in terms of sensitivity and specificity. Although these findings are not surprising, given the limited impact the template change had on content development, this result highlights the need for further attention to how content is written and developed for PAs.
Readability tests were inexpensive and not resource intense but limited, overall, when examined for their usefulness and applicability in a regulatory setting. The limitations were intrinsic; readability tests use mathematical formulas to account for factors such as the number of words per sentence, syllables per word, etc. . As such, readability scores need to be scrutinized when used alone to avoid misinterpretation: shortening of words/sentences does not necessarily make things easier to understand; people do not process text the same way a computer does; and readability formulas do not capture other important parts of the health literacy burden . As mentioned earlier, many factors impact the complexity of understanding scientific and medical literature; therefore, the use of readability tests as a standalone measure should be cautioned unless combined with more robust tests .
SAM tests consider a number of relevant factors such as presentation, context, and the use of images to measure the difficulty of a given text . Although only capable of providing an estimate of the health literacy burden, SAM tests consider a greater array of health literacy factors than readability tests. Prioritized message order, boxed text, visual cues, and other factors contributed to better SAM scores in the revised PAs. PAs using the original format typically scored poorly (below 60 %) in many of the SAM categories. The overall “Pre-format change” average was ranked “Adequate”, but near the low end of the scale, at 51 % (Fig. 2). PAs using the revised template showed a significant improvement, with the overall average increasing by 18 percentage points (p < 0.001) and shifting towards the high end of the adequacy scale, at 69 %. This was only 1 percentage point away from achieving an average score of “Superior,” indicating that most materials were near suitable for low health literacy individuals.
Improvements in “Literacy Demand” were due to the use of an active voice, adoption of a more conversational style of writing, and addition of learning aids. Greater use of active voice in “Post-format change” PAs was apparent throughout, particularly in the “What You Should Do” section. For example, “Pre-format change” PAs would recommend contacting a healthcare professional in the following manner: “Consumers who have purchased ‘product X’ are advised not to use the product and to consult with a medical professional if they have used the product and have concerns about their health.” “Post-format change” PAs, however, would state: “Consult your healthcare practitioner if you have used any of these products and are concerned about your health”. This section used imperative tone and started with action verbs, such as “Consult,” “Read,” and “Report.”
Other improvements included the use of “road-signs,” or headers, which added structure and allowed the reader to better sort the information. An improved sentence structure, through a more dedicated use of context, ensured that important health-related information was more visible than in previous PAs. Improvements in context were important, but must be considered in relation to other elements. DeWalt et al.  reported that risk communication providers sometimes believe that context dictates readability and usability of a document. In reality, context is only one component of a clear risk communication and cannot solely determine how well the information will be understood by the end user. For this reason, DeWalt et al.  created a toolkit that was designed to address health literacy-based barriers in a variety of ways without over-relying on context.
Another category that significantly improved was “Graphics.” “Graphics” scored poorly with “Pre-format change” PAs in two areas: “Relevance” and “Captions.” The “Relevance” was inadequate because PAs generally failed to illustrate key points visually or contained visual distractions. “Captions” were rarely included or failed to provide a quick reference to the reader about the graphic. Although this section improved significantly (p = 0.020), from not suitable to adequate, failure to reach a superior score provides evidence that PAs did not fully capitalize on the potential for using graphics effectively. The “Post-format change” PAs did, however, use pictures, tables/charts, and other visual aids more often. Most of the PAs using the revised template included a photograph of the particular health product along with a short caption (typically the name of the product). These photographs were meant to be simple and provide readers with a visual aid to facilitate product recognition. The use of images has been shown to improve attention to and recall of health material, thus playing a significant role in reducing the health literacy burden of information .
The “Layout and Typography” category experienced the greatest increase, as the revised template focused mainly on format elements such as font, layout, subheadings, and “chunking.” Font was standardized, illustrations were added in logical sequence, and colored boxes were used to highlight and divide important text and headers into easy-to-read sections. “Layout” and “Subheadings or ‘Chunking’” had significant increases in SAM scores after the template revision was implemented, making it the category that had the largest impact on improving SAM scores. Interestingly, “Typography” decreased significantly (although only marginally in score). This was most likely attributed to printer settings when PAs were produced for analysis, as evaluators noted that font sizes were smaller for several revised PAs even though the original source material was standardized for type size.
The “Content” category of the PAs remained unchanged with the template revision. This result was not surprising given that there was no change in the need for risk communications, the type of information that was communicated, or the scope of the PA’s objectives. As such, the SAM scores for “Purpose,” “Content Topics,” and “Scope” remained similar between “Pre-format change” and “Post-format change” PAs. The overall score in the “Content” category remained superior, but this does not preclude further improvements in future PAs. Including the purpose directly in the title, tailoring the scope of the information to the target audience and providing a short summary at the end of the information could improve SAM scores for the “Content” section.
Similarly, the SAM score for the “Learning Stimulation and Motivation” category remained unchanged after the template revisions. This result was also not surprising since the format change did not focus on adding desired behaviors or motivational points. Although this category scored in the superior range for both “Pre-format change” and “Post-format change” PAs, this was likely due to how information was presented and not because of interactive components. As other media (e.g., social media) become more prevalent in the risk communication process, this category may need to be studied further to determine how best to capitalize on elements related to “Learning Stimulation and Motivation.”
Overall, the SAM test emerged as a useful and applicable tool for evaluating health product risk communications in a regulatory setting. The tool was inexpensive and provided a more robust analysis of PAs before and after a template revision. The results also highlighted the impact the PA template had on SAM scores, providing targets for further improvement.
There were several limitations to this study that the authors would like to acknowledge. The use of cultural analysis was omitted from the SAM test. Ensuring that risk communications issued by Health Canada were sensitive and motivating to such a broad range of ethnic groups was considered outside the scope and resources of this study. Further study, in this regard, would add another dimension to the findings in terms of how various cultural and linguistic backgrounds may absorb the information relayed by PAs.
A similar analysis of French PAs would undoubtedly provide a more generalizable study. Given the two official languages of Canada are French and English, future studies would provide more insight into the health literacy burden of French PAs.
As previously stated, the SAM test is subjective by nature, which can lead to significant bias in the end results. This subjectivity can negatively impact inter-rater reliability since evaluators may interpret elements differently. The use of more than one evaluator is recommended to reduce the potential for bias; however, all evaluators should discuss the relevance of test elements and how scoring will be conducted before testing begins. For example, deciding what text will be included, what counts as a table versus a list or picture, and how readability will be measured can help improve reliability among different evaluator results.
Finally, assessment of comprehension by means of public consultations was not performed as part of this study. Although conducting public consultations and focus groups would vastly improve understanding of the use and comprehensibility of PAs, these measures are resource intensive. The SAM was designed and validated with this in mind and attempts to gather consultation-like data in its assessment of health information. Furthermore, the SAM test measures the health literacy burden, which can be used to infer how clear the material will be for low health literacy individuals. That being said, the SAM results should be supported with consultations, if resources are available, to measure how clear a health product risk communication is to the target audience.