Introduction

COVID-19 is a disease caused by a novel coronavirus called SARS-CoV-2 causing a range of reactions from asymptomatic to serious respiratory distress and death. With a total of almost 1,300,000 cases in Canada and more than 24,000 deaths [1], it is a concerning disease affecting the lives of everyone. In addition, there is a higher risk of COVID-19-related severe disease and serious outcomes resulting in death in patients with cancer [2]. With this novel viral threat, the need for the general public to stay updated and search for guidance and advice is of utmost importance. Unlike prior pandemics globally, the current climate of COVID has been revolutionized by the ability for most Canadians to access Internet-based resources. Statistics Canada reported, in 2018, that 91% of Canadians aged 15 and older used the Internet, with 71% of seniors using the Internet [3]. Many cancer patients, especially younger ones, use the Internet as a source of patient information. A trend analysis in the USA showed increasing Internet usage in cancer survivors from 53.5% in 2011 to 69.2% in 2017 [4]. With “Coronavirus” being internationally the second most-searched term on Google in 2020 [5], it is highly likely that cancer patients are looking up information on COVID-19 on the Internet.

There can be a huge range in quality when it comes to online health information. In addition, there can be digital disparities among the patient population that make evaluating resources and applying it to their own decision-making more challenging. Validated structured tools exist to appraise the quality of online information and can be helpful in guiding both patients and clinicians in terms of quality resources. It is essential for healthcare professionals (HCPs) to know how to help patients navigate Internet resources by understanding the quality of online information and sharing reliable links.

In the current literature, there is a study published in BMJ assessing quality of COVID-19 online resources focusing on prevention and treatment. This study identified variable quality of Internet resources through searching 12 search terms via one search engine; however, it did not assess COVID-19 topics other than prevention and treatment (such as symptoms, vaccines, prognosis) and was not specific to cancer patients and their unique concerns [6]. A Brazilian thematic analysis discussed the impact of COVID-19 and childhood cancer online information on family functioning and found that there was little information specific to the relationship of COVID-19 with childhood cancer [7]. This study used only Google as a search engine, was specific to information on childhood cancer, and concentrated on language and family functioning rather than quality of online educational resources. While it is likely that cancer patients look to online resources for information about COVID-19, there are currently no published studies examining the quality of online COVID-19 resources for these patients with cancer.

The purpose of this study is to evaluate the quality of online education resources on the topic of COVID-19 and cancer with respect to accountability, interactivity, site organization, readability, and quality of content. A validated structured rating tool was used to assess each of these categories. The results of this study will provide information on the strengths and weaknesses of current COVID-19 and cancer online resources, assist healthcare professionals in recommending the most reliable Internet resources for patients, and contribute to developing new and improved web-based educational materials to facilitate patient-physician communication.

Methods

Study Design

This study applied a validated structured rating tool to evaluate the top websites relevant to providing information on COVID-19 and cancer for patients. To generate a list of websites, an Internet search using the term “COVID-19 and Cancer” was performed with 2 metasearch engines (Yippy, Dogpile) and 1 search engine (Google) on December 28, 2020. Yippy and Dogpile are metasearch engines that combine results from several search engines, including Google, Yahoo, and Bing. Completing the search in this manner allows Google to be weighted more heavily which reflects its popularity in usage [8]. The search term used was decided through a process of iterative consultation with experienced oncologists and researchers in digital health literacy. One relatively simple search term was used to reach the broadest and largest number of hits. All searches were conducted running macOS Catalina using Google Chrome. Chrome Incognito mode was used for all searches to prevent any personalized settings from affecting search results. The unique URLs of all websites from the search were recorded. Three hundred ninety-eight total sites were identified (including overlap between the three search sites used).

Inclusion and exclusion criteria were applied to the search results. To be included, the websites had to provide information about COVID-19 in the context of a cancer diagnosis and/or treatment, specifically mentioning one or more of the following: risk of COVID infection in cancer patients; effectiveness or safety of the COVID vaccine for cancer patients; cancer screening, appointments, or tests during COVID; and/or how to lower risk of COVID infection for cancer patients.

Exclusion criteria were the following: duplicated host site links, broken links, sites with only links and no other information, links to other publication(s) or search engine(s), blogs or discussion boards without information directed towards patients, sites exclusively for fundraising or advertising purposes, sites for professionals only (e.g., primary journal article), news articles, unrelated sites that did not provide patient information, websites requiring a paid subscription, websites not in English, dictionary without relevant information for patients, or non-websites (direct-to-pdf, YouTube videos, word documents, etc.). Thirty-seven websites were shortlisted upon passing inclusion and exclusion criteria, and these were all included for analysis. Each site was then given a value based on their average ranking on the three search engines’ order of appearance. The average rankings were then listed in an ascending order. These 37 sites were a representation of websites that provided patients with information relevant to COVID-19 and cancer, ranked from most to least likely encountered.

Next, the ranked list of websites was evaluated using a structured website evaluation tool [9]. This tool was initially developed in 2009 and has been validated iteratively over the last decade using design-based research principles applying it to multiple types of cancer [lung, breast, pancreatic, skin, colorectal, lymphoma, esophageal, prostate, thyroid, brain (GBM), hepatocellular, and gynecologic malignancies] as well as multiple users to assess its interrater reliability [10, 11]. Broadly, the tool assesses websites’ accountability, interactivity, organization, readability, and quality of content. Accountability criteria was adapted based on the Health on the Internet (HON) Foundation code [12], DISCERN scale [13], and Journal of the American Medical Association (JAMA) benchmark criteria [14], including evaluating whether aspects of authorship and attribution were disclosed. Interactivity was evaluated from a derivation of Abbott’s scale [15], assessing the presence of a search engine, video or audio support, a discussion board or forum, educational support materials, and the ability to submit queries to the webmaster. Readability was determined using www.read-able.com by inputting the introduction and risk factors sections of each website into the tool. If a section on risk factors was not present, either symptoms or prevention was used in concomitance with the introduction. The Flesch-Kincaid grade level score, Flesch-Kincaid reading ease, and Simple Measure of Gobbledygook (SMOG) index were recorded for each site. Finally, quality of content was evaluated based on coverage, accuracy, and objectivity. An author reviewed and summarized information from both UpToDate [16, 17] and the Canadian Cancer Society (CCS) [18] to develop a reference fact sheet. These two resources were chosen as benchmarks because they are both peer-reviewed evidence-based sites. In addition, UpToDate was used in prior studies that applied the same website evaluation tool. Past studies also used the National Comprehensive Cancer Network (NCCN); however, there was insufficient information on the NCCN site for patients regarding COVID-19 and cancer at the time of analysis (only links provided), so CCS was used as the alternative. The relevant sections included were definition of COVID-19, incidence/prevalence of COVID-19, etiology and/or risk factors for COVID-19 infection and mortality, specific concerns and considerations for cancer patients during COVID-19, symptoms of COVID-19, prevention and risk reduction of COVID-19 disease and complications, COVID-19 detection and workup, current treatment recommendations for COVID-19, information for cancer patients on the COVID-19 vaccine, and COVID-19 prognosis. Website coverage and accuracy were scored out of 2 for each section: 0/2 if the website had no information on that topic or all information was inaccurate, 1/2 if the website had some information on that topic but at least one fact was inaccurate or it did not meet all information criteria for full points, and 2/2 if the website met all criteria in that section and all information was accurate. The fact sheet and scoring criteria was reviewed by the principal investigator, an actively practicing radiation oncologist; after a comprehensive discussion, a consensus information sheet was completed, see Supplemental Table 1. A global accuracy score was given to each website based on concordance of information with the reference sheet. Sites were also given a score for objectivity if they used no persuasive language or viewpoints. Finally, the total score was determined by adding scores in all categories. See Table 1 for all category components used for evaluation. The highest-ranking websites were identified based on their overall score, shown in Table 2.

Table 1 Components of evaluation tool
Table 2 Highest-ranking COVID-19 and cancer websites according to the standardized evaluation tool

To determine interrater reliability of website evaluation, both authors used the structured rating tool to independently code 10 randomly selected sites. After the first iteration, the kappa value and intraclass coefficient constant (ICC) were calculated to be greater than 0.7 for each category. Thus, interrater reliability was felt to be stable. Subjectively though, there was some discrepancy identified in website classification, which was resolved through clarifying discussion. The remaining 27 sites were coded by one reviewer. The results were then evaluated with descriptive statistics. All figures for data were generated using GraphPad Prism 9.1.9 for Mac, GraphPad Software, San Diego, CA, USA, www.graphpad.com.

Results

In the initial search, “COVID-19 and cancer” generated 4,530,000 hits on Google but provided only 120 viable links. Yippy resulted in 36,800,000 hits with only 17 viable links. Dogpile does not disclose number of hits but provided 261 links. After applying inclusion and exclusion criteria, there were 32 Google websites, 7 Yippy websites, and 5 Dogpile websites shortlisted. Overlapping (duplicated) websites between search engines were deleted, and 37 unique websites remained. These websites were ranked based on average place in order of appearance on the three search engines. All 37 sites were included and analyzed. See Supplemental Fig. 1 for a depiction of the process of finalizing the list of websites used for analysis.

Website Affiliations/Classifications

Websites were classified into 4 categories depending on their ownership (Fig. 1).

Fig. 1
figure 1

Website affiliations of COVID-19 and cancer websites

Accountability

Accountability was evaluated with regard to disclosure of authorship, affiliations and credentials of author(s), attribution to reliable sources, disclosure of site ownership, external links, and recency of updated information (date of creation and last update or modification). Authorship was disclosed in only 43% (16/37) of websites, 38% (14/37) stated the authors’ affiliations, and 35% (13/37) stated the authors’ credentials. Twenty-four percent (9/37) of sites cited sources and each of these had at least one reliable reference (e.g., journal article, peer-reviewed site, academic or government site, textbook). Most of the sites with sources, 78% (7/9) to be exact, had three or more sources cited. In addition, 92% (34/37) of sites disclosed ownership, including any sponsoring or advertising on the site. There was at least one external link (not advertising) provided on 59% (22/37) of the sites, with only 5% (1/22) of them having 50% or fewer links accessible. Recency of creation and updates was the final measure of accountability. Most websites (76%, 28/37) revealed date of creation, 43% (16/37) stated a last date of modification, and 32% (12/37) were updated less than 3 months before the date of search (December 28, 2020).

Interactivity

Ninety-five percent (35/37) of sites used at least one interactive element. The specific aspects for interactivity are listed in Table 1. Search engines were the most common, present in 89% (33/37) of websites. Seventy-six percent (28/37) included video or audio support such as informational videos about COVID-19 and cancer. Some sites offered discussion boards and forums (24%, 9/37), while other sites had educational support materials such as workshops or modules (30%, 11/37). Sixty-eight percent (25/37) of sites enabled questions to be sent to the webmaster or author regarding COVID-19 queries.

Organization

Site organization was evaluated based on inclusion of five structural features: headings, subheadings, pictures/diagrams/tables, hyperlinks, and absence of advertisement. Eleven percent (4/37) of sites employed all five organizational tools, 65% (24/37) used four, 16% (6/37) used three, 8% (3/37) used two, and no sites used one or zero tools.

Readability

The prevalence in reading levels was evaluated using Flesch-Kincaid (FK) grade level score and the SMOG index (Fig. 2).

Fig. 2
figure 2

Readability of COVID-19 and cancer websites based on education level determined by FK grade level and SMOG index

Content Quality

Content quality was assessed based on both coverage and accuracy. Coverage was evaluated based on the presence or absence of the following: definition of COVID-19, incidence/prevalence of COVID-19, etiology and/or risk factors for COVID-19, special considerations for cancer patients during COVID-19, COVID-19 symptoms, prevention of COVID-19, COVID-19 detection and workup, information on the COVID-19 vaccine for cancer patients, COVID-19 treatment strategies, and COVID-19 prognosis. The coverage and accuracy of content regarding these COVID-19 and cancer topics is shown in Fig. 3. For global overall accuracy, 76% (28/37) of websites presented entirely accurate information, 22% (8/37) were mostly accurate, and only 1 website (3%) was identified as mostly inaccurate. Commercially owned sites had the highest prevalence of accurate information at 92% (11/12), though the one remaining site was the only website identified as mostly inaccurate. Seventy-one percent (12/17) of nonprofit sites had entirely accurate information. Interestingly, only 1 of the 3 identified government-owned sites were classified as entirely accurate, as the other two government-owned sites stated that no COVID-19 vaccine existed even though at the time of analysis (January 10, 2021), the vaccine was approved and being administered [16, 17].

Fig. 3
figure 3

Coverage and accuracy of content by topic of COVID and cancer websites

Table 2 summarizes the websites with the highest total scores using the standardized evaluation tool. The lowest score in our study was 11/59 and the highest score was 47/59. Only four websites scored more than 40/59, nine websites scored between 30 and 40, twelve scored between 20 and 30, and twelve scored below 20/59.

Discussion

The Internet is a key source of information for cancer patients, and with COVID-19 being one of the most popular search terms on the Internet, cancer patients are likely using the Internet as a means to learn about COVID-19 in the context of their cancer diagnosis. With the added stress of a cancer diagnosis, it is essential for patients to be able to access accurate, readable, and up-to-date information on COVID-19 and cancer. However, studies evaluating the quality of English websites on COVID-19 and cancer are lacking. In addition, online information for patients is difficult to regulate and there are few standards in place, as website accreditation, while recommended, is not often implemented. Thus, it is essential for healthcare professionals (oncologists, general practitioners, etc.) to help patients navigate Internet resources and find reliable, high-quality websites to access educational materials. To our knowledge, there has been only one study to date regarding COVID-19 and cancer resources, which was specific to childhood cancer and focused on thematic analysis and the concept of family functioning and support rather than quality analysis of the educational content [7]. Our study used a validated structured rating tool [9] to evaluate all relevant COVID-19 and cancer websites that patients may likely encounter. To do this, we looked at accountability, interactivity, organization, readability, and content quality. Of note, our study incorporated steps to confirm a high level of interrater reliability, which is a significant strength in our website classification.

In our comprehensive study, it was found that online resources for COVID-19 and cancer are sparse (only 37 relevant sites identified) and quality is quite variable. A likely explanation for the low number of relevant sites is that COVID-19 was only identified in December 2019 and recommendations are quickly changing, so the ability to produce patient information may be difficult. It was also found in our study that complete and accurate information about COVID-19 and cancer is lacking in most websites. More often than not, authorship and source citations were not present. With both of these accountability measures excluded in so many websites, it is likely difficult for patients to assess the validity and reliability of these Internet sources.

A previous BMJ study assessed the quality of COVID-19 prevention and treatment information and found variable website quality. This study found low EQIP, DISCERN, and JAMA scores for evaluated websites. The Ensuring Quality Information for Patients (EQIP) tool is a checklist for criteria including quality of written work, design, and coherence; the DISCERN tool is used to evaluate quality of information for treatment choices; and the JAMA benchmark evaluates credibility of Internet resources in authorship, attribution, disclosure, and recency of updates. Only a few websites were classified as “high-scoring” in each index [7]. Another study by Jayasinghe et al. analyzed quality of information in websites about COVID-19 targeted for the general public. This study utilized three search engines (Yahoo, Google, Bing) in assessing the top 100 COVID-19 websites and found that the majority of sites had moderate to low scores in readability (Flesch reading ease score), usability, reliability, and quality [19]. Both of these studies are useful in analyzing quality of COVID-19 resources in general, but do not assess the unique topics and concerns of cancer patients and did not utilize strategies such as Incognito mode to minimize bias incurred by the authors’ search history.

In terms of accountability, it has been previously demonstrated in studies that only one-fifth of patients can recall the website name or affiliation immediately after reading online medical information [20]. This shows that most patients pay minimal attention to the source and validity of the sites they read. While education on measures of website quality such as authorship, citations, and website update frequency may help, our study found that the majority of COVID-19 and cancer websites did not include these measures, with only a third of websites disclosing full authorship information, and 76% of websites lacking source citations completely. Recency of updates was also an issue, with only a third of websites having been updated less than 3 months before the search date. This is especially problematic for a topic such as a pandemic that evolves quickly due to unpredictable changes in guidelines and new evidence. It is also important to note that recency of updates greatly affects accuracy of information. Sites that were updated more recently were more likely to have up-to-date, accurate information about topics such as COVID-19 vaccine approval and use. When sites are not frequently updated, this can compromise patients’ assessment of site reliability.

With respect to content quality, our study assessed coverage and accuracy, which showed variable results. Most websites presented entirely accurate information, though one website presented mostly inaccurate information. Out of all the topics analyzed on January 10, 2021, COVID-19 vaccine information was the most likely to be inaccurate when covered. This may be because the vaccine was only recently approved for use in Canada (Pfizer-BioNTech COVID-19 vaccine was authorized for use on Dec. 9, 2020 and Moderna COVID-19 vaccine was authorized on Dec. 23, 2020) [21]. Thus, the website authors of the inaccurate sites may not have had time to update their online resource yet. In addition, it was somewhat surprising that government-owned sites were the most likely to have inaccurate information. Two out of the three government sites included for analysis stated that no COVID-19 vaccine had been made even though the vaccine was already approved and being administered before and on the date of analysis. A hypothesis for why this might be is that government sites may require more bureaucratic steps in dispersing official online content and as a result, may not update websites as frequently as other types of sites (commercial, non-profit organizations). In addition, there is the fact that the search was done in early-mid January and the government may have had lower staff during the weeks during and after winter holidays (i.e., Christmas, Kwanzaa, Hanukkah, New Year’s). Nonetheless, the results of this study may prompt government-based sites to consider updating information more regularly to ensure patients have timely access to accurate educational materials or at minimum, make a note of the last updated date and add a caveat that states information may not be entirely up-to-date along with links to more frequently updated and reliable sites. Interestingly, commercially owned websites had the highest prevalence of accurate information at 92%, but the one site that was identified as mostly inaccurate was also commercially owned. Non-profit organization websites were in the middle with 71% presenting entirely accurate information. These results indicate that the ownership of a website should not be the only indicator of accuracy for patient information and a more detailed analysis of website content is needed.

Coverage is also important to address when it comes to content quality. Most websites (84%) discussed special considerations for cancer patients in the context of COVID-19, including information on how cancer is a risk factor for more severe COVID-19 consequences, the importance of weighing risks and benefits of cancer care versus COVID-19 infection risk, and advice for readers to seek medical professional help from their cancer care team. This was reassuring since this category of information is likely relevant for patients searching “COVID-19 and cancer” as a search term. Other topics well-covered by websites included COVID-19 risk factors and prevention of COVID-19 infection. The least covered category (5%) was COVID-19 incidence and prevalence, and there was no information on any sites about the incidence of the disease in cancer patients specifically. COVID-19 prognosis and treatment were also poorly covered, with no information specific for cancer patients in either of these categories. The lack of this information online is notable, as past studies have shown that most metastatic cancer patients want details about prognosis and treatment, with the caveat of negotiation for the extent, format, and timing of information [22]. It is useful, then, for HCPs to be aware of areas where there is lack of online coverage on COVID-19 and cancer; with this information, they can tailor appointments with patients to bridge these knowledge gaps.

Readability is another area of importance, as online information is useless to a patient if it cannot be comprehended. The current recommendation by the National Institution of Health (NIH) and the American Medical Association for ideal health information text readability is below a sixth-grade level [23]. Unfortunately, no websites met this criterion using either the SMOG index or the Flesch-Kincaid grade level score. Jayasinghe et al.’s study analyzing general COVID-19 online resources used an older NIH recommendation of seventh grade and below, which resulted in three websites meeting criteria using the SMOG index but did not specify if any of those websites were below a sixth-grade level [19]. The high level of education needed to comprehend COVID-19 and cancer websites is problematic, as cancer patients with lower literacy levels would be less likely to obtain the information needed to properly protect themselves against COVID-19, for example, when it comes to prevention and vaccine information.

To improve comprehension, websites can incorporate organizational features such as headings, subheadings, pictures/diagrams/tables, and hyperlinks, while omitting advertisements. Past studies have found that including hyperlinks within main text and using visual aids enhances reading comprehension [24], especially when there is high readability. In addition, a paper by Sbaffi and Rowley showed that advertisements can negatively impact site credibility, while a clear layout improves credibility [25]. All of the COVID-19 and cancer websites used at least two of the five identified organizational features. Eleven percent of sites used all five organizational features and the majority (62%) used four out of five organizational features.

Another strategy that has been shown to improve credibility of sites is interactive features [25]. Interactivity has been shown to positively influence readers’ perceptions, attitudes, and behavioral intentions when using online sites, as well as improve health-mediating outcomes. In this study, interactive features included search engines, video/audio support, discussion boards/forums, educational support materials, and contact information for readers to submit questions regarding COVID-19 and cancer. Notably, almost all sites (95%) incorporated at least one interactive element, with search engines being the most common (89%). Only 68% of sites had contact information to allow queries to be sent to the author or webmaster.

There are several limitations to this study. First, only English websites were assessed in this study, so websites written in other languages were not looked at; as such, future research is needed to determine quality of other language-based COVID-19 and cancer websites. All searches were performed at one computer and geographic location. This may influence the results that were displayed by the search engines, as geographic location can impact the list of hits. In an attempt to minimize bias, the author completed the searches using Chrome Incognito mode, so any previous history or cookies did not impact the search; however, Incognito mode cannot block the user’s IP address. Future studies may repeat this search in different geographic locations to assess changes in search results and site quality. Since the search was completed in 1 day, it only represents quality of Internet resources as a snapshot in time; as websites are updated, content and quality of information may change. Finally, a limitation is the lack of diversity in search terms, such as alternative names for “COVID-19.” In future studies, variations of search terms such as “Coronavirus 2019” or “SARS-CoV-2” could be used.

Conclusion

This study is the first to evaluate quality of COVID-19 and cancer resources using a structured, validated rating tool with respect to accountability, interactivity, organization, readability, and content quality. It has shown that many COVID-19 and cancer websites lack authorship and citations, which makes validating their trustworthiness difficult. In addition, most websites are written at a high school or university reading level, which may be difficult for some patients to understand. Information accuracy was good in some areas, but lacking in others, such as vaccine updates, which emphasizes the importance of more frequent site updating, especially with a topic like COVID-19 that is highly dynamic. Our study found that information coverage was also lacking, with few websites discussing incidence, prognosis, or treatment of COVID-19. Only a small number of COVID-19 and cancer websites provided both accurate and complete informational materials. The top-scoring websites included in this study can be referred to by physicians looking to recommend online COVID-19 and cancer resources to patients (Table 2). In addition, we have created a patient-friendly checklist with the easy-to-remember acronym WAARI ORCA for factors and guiding questions to consider when assessing online health information, shown in Fig. 4. This can be used as a resource for patients (HCPs are welcome to access it as well) looking to evaluate and discuss health information websites. It is essential for physicians and other healthcare professionals to not only direct patients to reliable websites, but to also find resources to guide clinical conversations addressing patients’ specific questions and concerns.

Fig. 4
figure 4

Recommendation checklist for assessing online health information for physician–patient discussions