INTRODUCTION

Health information privacy is a core value of American health. The privacy of communication between healthcare providers and patients is protected by law, but no equivalent protection exists for individuals performing health-related searches on patient-facing websites. Consumer aggregated data obtained from searches on patient-facing websites can be used to create a health profile and even merge it with non-health-related information. These data profiles can be sold to other companies or used to curate targeted advertisement that follows the individual user. While this information collection is acknowledged, the degree of such data releases and specificity to the individual is often unknown to individual users.1 Therefore, we used a privacy inspection tool2 to determine prevalence and type of data tracking from commonly searched government and non-government health-related websites.

METHODS

Our sample was restricted to US-based websites from the most trafficked health websites measured by the website traffic monitoring service SimilarWeb (www.similarweb.com) and all sites from the Medical Library Association’s recommended websites for health information as of October 17, 2020.

To identify website monitoring and data collection, each website URL was examined with Blacklight, an internet-based tool which tests how website use surveillance on their users.3 We present data for each website including use of ad tracking, use of third-party tracking and identification “cookies,” and availability of use data for Facebook and Google Analytics tracking.

Data was described using frequencies and measures of central tendencies. The ANOVA test and chi-square test were used to assess differences in means and frequencies between government, non-profit, and commercial health websites. All analyses were conducted in R version 4.0.2 (R Foundation).

RESULTS

The average number of ad trackers across included websites was 2.11 (SD 0.60), 7.15 (SD 6.26), and 15.84 (SD 10.29) for government, non-profit, and commercial health websites, respectively (p < 0.001). The average number of third-party cookies across included websites was 1.11 (SD 1.05), 10.85 (SD 11.9), and 25.08 (SD 25.45) for government, non-profit, and commercial health websites, respectively (p=0.003). Regarding websites informing Facebook of user activity, 0 (0.0%) government website, 10 (50.0%) non-profit websites, and 15 (60.0%) of commercial websites provided user data to Facebook. Regarding websites informing Google analytics of user activity, 6 (67.7%) government websites, 14 (70.0%) non-profit websites, and 16 (64.0%) of commercial websites provided user data to Google analytics. Average search results are recorded in Table 1, while individual results can be found in Table 2.

Table 1 Mean and count frequency of data provided to outside organizations by Government, Non-profit, and Commercial website
Table 2 Compiled List of Reviewed Websites

DISCUSSION

All health websites studied provide data to ad trackers and third-party cookies. Popular commercial websites used substantially more third-party cookies and ad-trackers than non-profit websites, which had more than the average government website.

Health-related websites often serve as a supplement, in which patients can find answers and further explanations for disease and treatment options. Data provided this way can be used to construct a personal profile of personal health information, and subsequently be provided or sold to other companies for the purposes of improving advertising targeting, as is the case in the current report in which this relationship with Google and Facebook is observed. The degree to which this information is used this way is unkown.4 However, this appears to be a common practice, and a recent publication of COVID-19-related websites found similar results related to the prevalence of third-party tracking.5 Greater clarity of how websites use collected health data may allow better identification of online resources that maximize the privacy of users, while also provided helpful medical information.

This study has several limitations. First, we used one software program to examine website tracking, and the algorithm for monitoring data-privacy may vary.6 Furthermore, we limited our search to selected commonly trafficked websites. A broader sample may provide more granularity among health-related websites. Finally, it is not entirely known what the websites do with this information, or its overall benefits or harms.

We found that every category of health-related website examined provided information to ad trackers and created third-party cookies. Furthermore, providing data to Facebook and Google for targeted advertising was found to be relatively common, particularly among our sample of commercial and non-profit health-related websites. Searching for personal health information is not a private action and patients and providers must account for this when they search for health information online.