Introduction

Breast cancer is the leading cause of cancer death in women worldwide. In the United States, it is the leading cause of death for women aged 20–59 years [1]. Numerous risk factors for the development of newly diagnosed breast cancer have been identified [2]. Factors such as age at menarche and family history are non-modifiable risk factors that significantly contribute to lifetime risk [3]. Modifiable risk factors, such as alcohol consumption and obesity, involve lifestyle choices that an individual could alter to reduce her personal risk of breast cancer [4]. Breast cancer risk assessment tools are used to give patients a sense of their level of risk to better individualize screening recommendations, as well as to inform women about modifiable risk behaviors [5].

Today, the Internet is an important source of risk information. A 2013 U.S.-based survey found that 59 % of people had searched the Internet for health information [6]. The risk knowledge gained by an Internet search is influenced not only by quality of information but also by an individual’s health literacy. Health literacy is “the degree to which individuals can obtain, process, and understand the basic health information and services they need to make appropriate health decisions” [7]. Limited health literacy is common, with 36 % of the U.S. population estimated to have limited health literacy [8]. When looking for health information, adults with limited literacy skills are more likely to use search terms that are broad, choose sites with an 11th grade or higher reading level, and click on advertisements [9]. For breast cancer risk assessment tools, health literacy can affect the user’s ability to comprehend the instructions to accurately complete the risk tool, as well as the user’s ability to comprehend the risk tool’s output.

Numerous studies have evaluated the importance of health literacy in cancer risk communication. Individuals with limited literacy skills are less likely to understand the purpose of cancer screening and less able to apply relative risk reduction information to their own personal cancer risk [1012]. This has a significant impact on the health of individuals with limited health literacy, as they report higher distress about developing cancer and lower rates of breast cancer screening [7, 13].

One opportunity to improve health outcomes for individuals with limited health literacy is to improve the design of health information [7]. The format of risk communication in online cancer risk assessment tools has been previously profiled [14]; however, the readability of breast cancer risk assessment tools has not been assessed. Readability describes the difficulty or ease of reading informational materials and consists of many factors in addition to the reading grade level, such as content and typography.

This study describes the overall readability of website pages that host or link to a breast cancer risk assessment tool. To accomplish this goal, this study evaluated not only a site’s reading grade level but also a number of other formatting and content characteristics that contribute to comprehension as guided by the Suitability Assessment of Materials [15]. This study then looks to identify areas of improvement for these websites.

Methods

Search protocol

To complete an Internet search for websites containing or linking to a breast cancer risk prediction model, we entered the following search terms: calculate breast cancer risk, breast cancer risk calculator, estimate breast cancer risk, assess breast cancer risk, and breast cancer risk assessment. We searched each term on three different search engines: Google, Bing, and Yahoo. Searches were performed in a web browser that did not have any search history, as search engines can customize returns based on previous searches. All searches were completed on June 12, 2014 to ensure no variability in return based on date of search.

To track the websites returned, we gave each unique site an ID number. A website was considered unique if its base site had not yet been returned on any search engine for any term. For example www.brightpink.org/Risk-Factors and www.brightpink.org/knowledge-is-power/assess-your-risk/ were considered to have the same ID number because they have the same base site, which is brightpink.org and were therefore treated as a single site for analysis.

Inclusion/exclusion criteria

Inclusion criteria for the sample were any unique site that contained either (1) a breast cancer risk assessment tool or (2) a link to a breast cancer risk assessment tool. Sites that were developed with the intention of bringing patients into the physician’s office for risk assessment and did not offer the risk tool online were therefore not included. Exclusion criteria were sites that presented the risk tool in the context of research articles, news articles, blogs, forums, or sites that contained only links. Sites that were non-U.S. based were excluded on the basis that a risk assessment tool developed for a non-U.S. population might not be applicable to U.S. patients. Other excluded sites were inaccessible, featured non-relevant content, or required an app or software download. We did not exclude ads. As stated prior, individuals with limited health literacy skills have been shown to preferentially click on ads [9].

Upon evaluation of sites that met the inclusion criteria, three of the coded sites evaluated the user for Hereditary Breast and Ovarian Cancer Syndrome and two sites featured a tool that evaluated the user for multiple cancers: breast, prostate, colon, melanoma, and lung cancer. These sites were included as they featured a tool that brought the user’s attention to their personal risk of breast cancer.

Two of the coded sites featured a breast cancer risk assessment tool that did not give output, but rather featured a series of yes/no questions and stated that the patient may be at higher risk for breast cancer if they answered yes to any question. These sites were included because they provide the user with more personalized breast cancer risk information than the general population risk.

Assessment of readability

To assess the readability of the breast cancer risk assessment tools, we evaluated sites using the suitability assessment of materials (SAM) and the SMOG Readability Formula [15, 16]. The SAM evaluates materials based on 22 factors that fall into one of six categories: content, literacy demand, graphics, layout and typography, learning stimulation and motivation, and cultural appropriateness (see eText 1 in the Supplement). Each SAM factor is given a score of (2) for superior, (1) for adequate, or (0) for not suitable. The sum total of all ratings for each website yields an overall superior, adequate, or not suitable rating for the site. The SMOG Readability Formula generates a numerical reading grade level based on the number of polysyllabic words in 30 sentences of text. For sites that did not contain 30 sentences, the SMOG offers a conversion table based on assessment of available sentences (see eText 2 in the Supplement).

Two independent researchers coded the same sections of 10 of the sites to demonstrate inter-rater agreement of 80 % or greater. To maximize potential output from risk tools, researchers were instructed to input an increased risk profile (>60 years old, no births, high BMI, etc.). For those SAM items on which researchers did not agree, a meeting was held with a third researcher who is an expert in health literacy and guidelines for interpretation of SAM criteria were refined (see eText 3 in the Supplement). Another 4 sites were coded by both researchers achieving >0.8 agreement on all 22 factors of the SAM. The same procedure was applied to the SMOG readability formula, with 100 % agreement. Once good inter-coder reliability was achieved, one coder coded the remaining websites.

Within each site, coders evaluated the pages that contained the introduction to the breast cancer risk assessment tool, the tool itself and the output. If the site just linked to the breast cancer risk assessment tool, coders evaluated the page containing the link to the breast cancer risk assessment tool. We decided to code the entirety of a webpage, even if only one paragraph contained breast cancer risk assessment tool information because a user would not be able to find the risk tool information without reading the entire page. Website coding occurred between July 1, 2014 and January 31, 2015.

Statistical analysis

Site characteristics and each of the 22 SAM categories were described using simple frequencies. To facilitate useful discussion about substantially contributing factors to overall scores, results were stratified by overall SAM score. SMOG reading grade level was evaluated using a statistical mean.

To evaluate the effect of host organizations on readability, we stratified SAM categories, as well as SMOG reading grade level by host organization type. Cancer centers, hospitals, and private practices were listed as separate organization types because the target population and patient education goals for these institutions are likely different [14]. Other organization types were commercial industry, healthcare industry, online media, advocacy/non-profit, government, and research group. For SAM categories, χ 2 test of independence was performed to compare stratified groups. For SMOG, one-way ANOVA was performed to compare mean reading grade level between stratified groups. All statistical analysis was performed using Stata 13.1 (College Station, TX: StataCorp LP).

Results

Our search returned 576 sites of which 42 met inclusion criteria and were ultimately coded (see Fig. 1). A complete list of the coded sites, sorted by overall SAM rating, can be found in Table 1. Only 21.4 % of sites achieved an overall superior rating, 64.3 % were deemed adequate, and 14.3 % were rated not suitable.

Fig. 1
figure 1

Flow chart of website collection

Table 1 Complete list of websites hosting or linking to breast cancer risk assessment tools, sorted by rating

In terms of website content, 52.4 % of sites hosted a breast cancer risk assessment tool, while the remaining sites linked to one or more tools (see Table 2). The majority of risk assessment tools, 61.9 %, used a Gail-based model. The second most common tool was Krames Staywell (9.5 %). All sites that linked to (rather than hosting) a breast cancer risk assessment tool linked to the National Cancer Institute’s Gail-based model. Output of the risk assessment tools varied with 83.3 % of tools yielding a numerical output (i.e. “Your lifetime risk for breast cancer is 11.6 %”) as opposed to a word output (i.e., “Your lifetime risk for breast cancer is higher than average”).

Table 2 Website characteristics

Sub-analysis was used to compare the frequency of obtaining a superior rating, in a given SAM category, stratified by overall SAM rating. Contribution of an individual SAM factor to the overall score was considered substantial when >50 % of sites, with an overall superior SAM rating, received a superior score in a given factor, while overall adequate and not suitable rated sites received a superior score at a frequency that dropped by 25 % in that same factor. The 6 of 22 SAM factors that contributed substantially to the overall superior rating were Content, Writing style, Context, Layout, Subheadings, and Model behavior (see Table 3). Description of the components of these 6 factors can be found in eTable 1 in the Supplement. The factors that overall superior rated sites failed to achieve at a 50 % level were Summary, Reading grade, Vocabulary, Cover graphic, Relevance of illustrations, and Interaction. Description of the components of these 6 factors is found in Table 4. Graphic type, List/tables explained, Captions, and Culture image factors were not included in this list because most sites were scored as “not applicable.”

Table 3 Frequency of obtaining a superior rating for an individual factor, stratified by overall ratinga
Table 4 Areas of suggested improvement for all sites

The website’s host affiliation was broken down into several categories (see Table 2). Online media, hospitals, cancer centers, advocacy, and healthcare industry made up the majority of our host sites. Chi-squared analysis of the frequency distribution of achieving a superior rating, in a given SAM factor, stratified by website affiliation was statistically significant for 3 of the 22 SAM factors: Summary χ 2 (16) = 27.25, p = 0.039, vocabulary χ 2 (16) = 27.65, p = 0.035, and Interaction χ 2 (8) = 15.54, p = 0.049. Online media and government sites were more likely to provide a summary, whereas all sites hosted by a private practice, in this study, did not provide a summary. Commercial industry sites were more likely to use common words instead of technical jargon. Only sites hosted by cancer centers made use of interaction to communicate risk. Overall SAM rating was not affected by host affiliation χ 2 (16) = 19.45, p = 0.25.

The average SMOG reading grade level of the sites was grade 12.1 (SD 1.6, range 9–15). Comparison of the mean SMOG reading grade levels, stratified by website affiliation, using one-way ANOVA yielded no significant difference in means (p = 0.50).

Discussion

The readability of breast cancer risk assessment tools is an important component of effective risk communication. Since the uptake of Internet-based health information is so prevalent, quality, Internet-based risk communication has the potential to alert users to personal, modifiable risk behaviors. Sites with low readability, however, could potentially mislead limited health literacy users about their breast cancer risk or discourage limited health literacy users from using a breast cancer risk assessment tool.

This study is the first to evaluate the readability of online breast cancer risk assessment tools and the information accompanying those tools. Using a search engine, we identified 42 unique sites that hosted or linked to a breast cancer risk assessment tool. Sites were hosted by a variety of organizations, the most frequent being online media, such as WebMD or healthcare organizations such as cancer centers, public hospitals, and private practices. The tool most sites hosted or linked to were Gail based. The Gail model was developed for use by healthcare providers, while most sites were developed for general public use. The development of an applicable, validated risk tool does not always lead to easy-to-communicate risk information for the broader population.

Using the SAM to rate readability, only 21.4 % of sites achieved an overall superior rating. Factors that contributed most to an overall superior rating were Content, Writing Style, Context, Layout, Subheadings, and Model Behavior. This means that sites that were rated as superior overall were rated as such because they started with an intro that explained why breast cancer is important or what the user was about to read (Context), featured content focused on desired behaviors (Content), spoke with an active voice (Writing Style), modeled behaviors specifically, for example, “you should have no more than one drink a day” (Model Behavior), subdivided long lists with subheadings (Subheadings), and presented the material in a visually easy to follow format (Layout).

Factors that even overall superior rated sites were unlikely to feature were Summary, Reading grade, Vocabulary, Cover graphic, Relevance of illustrations, and Interaction. This means that the majority of sites, regardless of rating, did not use a reading grade level of 5th grade or lower, common words or explanations for technical jargon, a friendly purposeful opening image, any illustrations at all, interactive learning, or end with a review of key points. This indicates that these are areas that websites could improve to increase the readability of their site.

The recommended reading grade level for patient-directed health information is 5th or 6th grade [17]. The average SMOG reading grade level of our sites was 12.1 and did not differ significantly based on host affiliation. While valid and easily reproducible, the SMOG reading grade level is on average one or two grade levels higher than other reading grade levels because the SMOG reports grade level required for 100 % comprehension [17]. Even so, the average reading grade level is above the reading skills of many patients. We recommend developers of health information materials use common wording (i.e., doctor instead of physician), words with fewer syllables and shorter sentences.

Strengths and limitations

The SAM tool was not originally developed for evaluation of web-based information, but rather paper brochures, booklets or audiovisual materials. The SAM has been used to evaluate web-based materials by numerous studies and organizations; however, certain aspects of the SAM are not directly translatable to a web-based platform. The SAM cannot take into consideration how many pages a user must click on to find all of the information. A brochure has a single trajectory, whereas a web user may follow any number of links on a website to a number of ends. In addition, the scores of many SAM factors are subjective. We created our own breast cancer risk assessment tool-specific guidelines for interpretation of the SAM factors in order to reach inter-rater agreement (see eText 3 in the Supplement). Inter-rater agreement of the SAM has not been validated in the past.

Webpages are constantly in flux, so the content and accessibility of sites could have changed since the conclusion of our evaluation. Additionally, this mode of risk communication is limited to individuals with access to Internet. Individuals with limited literacy rely more on interpersonal sources of information than their counterparts [8]. However, there is no reason to assume that clearer materials benefit only individuals with limited health literacy. Most patients prefer easy-to-read materials and guidelines suggest that health communication take “universal precautions” to ensure that materials are accessible for all individuals across literacy levels [15].

Conclusion

The readability of breast cancer risk assessment tools and the information accompanying those tools is critical to effective risk communication. Many studies have validated the tools, but none have yet to address the importance of making those tools accessible to our most at-risk population, adults with limited health literacy. Our study has elucidated the factors that affect whether a website achieves a superior readability score. If sites altered their content based on these scores, a larger audience of users could more readily understand crucial breast cancer risk information.