Cancer Data Visualization: Developing Tools to Serve the Needs of Diverse Stakeholders

To describe how cancer data visualization tools can catalyze novel research and inform local cancer control planning. In recent years, cancer centers across the nation have increasingly embraced geospatial tools to respond to the unique distributions of disease and multilevel risk factors within their catchment areas. Sylvester Comprehensive Cancer Center in Miami, Florida, recently developed a publicly available, interactive, web-based cancer data visualization tool called SCAN360. Using SCAN360 as an example, we discuss key considerations and challenges for the development of cancer data visualization tools that serve the needs of stakeholders in community, academic and government settings. Cancer data visualization tools are useful for stakeholders from diverse sectors and may facilitate a range of cancer control efforts, from identifying regions for targeted outreach and engagement, to informing the allocation of resources for government and community initiatives, and motivating collaborative, interdisciplinary cancer research.


Introduction
In the emerging era of precision public health [1], strategically aligning cancer control programming with local geographies of cancer burden, and the spatial organization of health determinants and population characteristics therein, is of utmost importance. In recent years, NCI-designated cancer centers (CC) across the nation have been encouraged to embrace geospatial approaches to cancer control planning in order to meet NCI-designation requirements around community outreach and engagement (COE). Specifically, as outlined in the current funding opportunity announcement, CCs must define and justify their catchment area using geographically defined boundaries (e.g., census tracts, zip codes, county lines, and state lines), monitor a variety of factors that characterize the local distribution of disease (e.g., underrepresented populations, socioeconomic status, demographic factors, rurality, and environmental exposures), and engage communities in cancer control efforts of particular relevance to the catchment area population [2].
Characterizing the geographic variability of cancer has been a core function of cancer epidemiology for centuries. Urban-rural differences in cancer frequency were first described in the literature as early as the 1840s [3], and maps of cancer mortality in Britain date back to the 1870s [4]. Monitoring geographic trends in cancer outcomes has provided the impetus for important surveillance initiatives in the USA, including NCI's Surveillance, Epidemiology, and End Results (SEER) program, established in response to the National Cancer Act of 1971, and the CDC's National Program of Cancer Registries, which followed the Cancer Registries Amendment Act of 1992 [5]. In recent decades, there has been increasing recognition that place-based characteristics, including social and built environments, shape the interplay of health determinants across the cancer continuum by, for example, supporting or undermining healthy lifestyle behaviors and access to health-promoting resources, and impacting exposure to infection and environmental carcinogens [6,7]. In recent years, research on the relationship between spatial context and cancer has grown rapidly. In their 2018 review, Korycinski et al. found that of the more than 800 publications from NCI-designated cancer centers that incorporated geospatial approaches, a majority (59%) had been published since 2010 and over 90% since 2000 [8]. A similar trend was reported in a 2015 review of studies examining cancer and the neighborhood social and built environment, where 82% of all reviewed articles had been published since 2010 [6]. Geocoded cancer registry data is frequently linked to external datasets like the US Census and Behavioral Risk Factor Surveillance System [9] to analyze relationships between cancer outcomes, population attributes, social determinants of health, and built environment characteristics.
Data visualization through mapping facilitates a range of public health research and advocacy efforts, from identifying and prioritizing resource allocation in regions that experience a disproportionate cancer burden, to generating hypotheses on the etiology of cancer and cancer disparities (e.g., by comparing the geographic distribution of disease to the spatial organization of proposed determinants). In the USA, visualizing the geography of cancer outcomes has been a part of the NCI's national surveillance efforts for nearly 50 years; since 1975, NCI has published national, county-level mortality atlases [10]. Since then, the use of maps by researchers and public health agencies and organizations to describe cancer trends has expanded greatly alongside advances in the accessibility and technological capabilities of geographic information systems (GIS) software [11]. Today, numerous web-based, interactive mapping tools are publicly available that describe the cancer burden at global, national, and state levels (e.g., The International Agency for Research on Cancer's Cancer Today tool [12], NCI's GIS Portal for Cancer Research [13], The US Cancer Statistics Data Visualizations Tool [14], and the North American Association of Cancer Registries' Cancer Maps [15]).
The utility of existing interactive visual tools for prompting targeted public health action around locally specific cancer needs is limited. A primary limitation is lack of granularity; the smallest geographic unit that public-facing mapping tools typically accommodate is county level. This is highly problematic for urban areas, where 80% of the US population currently resides, because county-level aggregations obscure important variability in the cancer burden and cancer needs of diverse communities that exist within a county border [16]. Additional drawbacks of many cancer data visualization tools include limited integration of area-level social and built environment characteristics, which are often captured in different datasets, and limited integration of community input on which characteristics to include based on local perceptions of conditions that impede health from cell to society. These drawbacks limit the ability of visualization tool users to appreciate how factors operating across multiple levels of influence interact to affect disease etiology.
Sylvester Comprehensive Cancer Center (SCCC) in Miami, Florida, recently developed a publicly available cancer data visualization tool called SCAN360 [7]. The tool was developed in response to a community stakeholder who wanted a resource that would allow her to better appreciate the risk conditions that were overrepresented within her neighborhood of residence and amenable to intervention or political advocacy. The tool was created by an interdisciplinary team of faculty and community stakeholders who combined their respective expertise to create a resource that could serve multiple, mutually beneficial purposes; for faculty from SCCC, COE requirements for NCI designation were at the forefront of planning. The interactive web-based platform integrates data on 19 cancer types from Florida's cancer registry with a variety of other area-based and/or geocoded indicators of spatial context and composition from sources like the US Census Bureau, the Florida Department of Health, the Robert Wood Johnson Foundation, and the Environmental Protection Agency. Although SCAN360 primarily focuses on cancer, data for several other disease outcomes are also available (e.g., hepatitis B, hepatitis C, and HIV). Through the platform, users select attributes of interest from point-and-click drop-down menus to generate a variety of figures and choropleth maps, where geographically defined areas are colored with varying degrees of saturation to reflect value ranges on variables of interest. Cancer data visualizations are available for various aggregations according to sex, race/ethnicity, age group, and five levels of geography: state, county, municipality-and censusdefined places, and city districts. In the following sections, using SCAN360 as an example, we will discuss considerations for the development of cancer data visualization tools for stakeholders from diverse sectors, and how coordinated efforts around cancer data visualization can serve the needs of the entire NCI network moving forward.

Cancer Data Visualization for Diverse Stakeholders
Stakeholders from different sectors (e.g., academia, government, community) may have differing needs (e.g., setting research priorities, making decisions about resource allocation, and mobilizing community organization efforts) and differing levels of statistical education when engaging with visualization tools. In this section, we will discuss the value of tools like SCAN360 for diverse stakeholders and considerations for enhancing their utility.

Community Collaborators and Outreach Teams
Maps are important resources for community mobilization around local health issues and resources. Providing visualizations that are locally meaningful requires aggregation at appropriate levels of geography. Community members who use SCAN360 can view maps that show the cancer burden at the neighborhood level. The two images ( Fig. 1) below, which show age-standardized prostate cancer incidence rates for Miami-Dade County overall and for neighborhood boundaries available as municipalityand census-defined places, demonstrate the value of this granularity. In Fig. 1a, the viewer's attention is drawn to a group of neighborhoods in the northern region of Miami-Dade County, and several in the Southern region, that are shaded a darker green to represent higher incidence. Clicking on a specific neighborhood boundary then displays a text box with the neighborhood name and corresponding incidence rate with 95% confidence intervals. This map could help to galvanize community-based organizations in neighborhoods like Miami Gardens, North Miami, and Golden Glades (in the North) and Princeton, Homestead and Florida City (in the South) to coordinate efforts around prostate cancer outreach in their regions. The county-level aggregation in Fig. 1b colors the entire region with a single shade and saturation level, masking the local variability in the prostate cancer burden. For CCs, visualization tools like SCAN360 can also inform the efforts of outreach and engagement teams who strive to match local needs with opportunity, and tailor education and screening initiatives to meet gaps in cancer prevention and health promotion. As an example, SCCC recently launched the Game Changer Vehicle, an air-conditioned motorhome equipped with private rooms for consultation, cancer screening, and multimedia education. Using visualizations from SCAN360, outreach teams at Sylvester were able to identify communities that experience an elevated cancer burden and strategically stationed the vehicle in those communities on a routine basis in order to maximize service reach. Furthermore, maps built for custom geographies that correspond to regions targeted for interventions (e.g., locations visited by the Game Changer Vehicle) can be used to monitor changes in disease patters over time. That is, provided enough data are collected, using maps based on small geographies, it will be possible to visualize the increased detection of early-stage cancer or the reduction in late-stage cancer as an area becomes saturated with equitable screening opportunities.
Several important caveats must be mentioned regarding neighborhood-level data in publicly available cancer visualization tools. First, data for small geographies are often too sparse to generate stable estimates of disease burden, and this sparsity raises further ethical concerns around patient confidentiality. SCAN360 handles this by suppressing values for geographies with fewer than ten cases and by displaying confidence intervals around point estimates. This approach is not perfect; although public health researchers are likely to understand the meaning of a confidence interval, lay members of the community may not. Visually conveying uncertainty in a manner that is accessible and intuitive for a wide audience is challenging; however, several techniques may be considered. One option is to generate bivariate maps that combine two visual variables where, for example, hues represent data values and saturation levels represent the degree of uncertainty. Researchers at the University of Washington Interactive Data Lab have conceptualized and tested an extension of the bivariate approach, called Value Suppressing Uncertainty Palettes that restricts the range of color output according to the magnitude of uncertainty [17]. An important next step for SCAN360 will be to incorporate bivariate approaches that visualize uncertainty and encourage users to draw conservative conclusions when appropriate. These plots are currently being developed by the SCAN360 team. To understand the value of these plots, imagine a traditional choropleth map that shows locations with high cancer rates in bright red. Some places, particularly small geographies, may have very high rates but a large amount of uncertainty. Traditional methods to show uncertainty, such as wide confidence bounds (or wide credible intervals), do not naturally display on choropleth maps. To show the uncertainty, the planned maps fade the bright red areas toward gray as a function of uncertainty. So, a location that has a very high rate with a lot of uncertainty would fade to a pale rose, whereas a location with a very high rate and a narrow confidence bound would remain bright red. The end result is a visualization that draws the eye to the places that are deeply concerning because the rate is both high and reasonably certain. An additional caveat is that the sub-county geographies in this tool are defined administratively and may not reflect neighborhood boundaries in the way that residents experience them [18]. To ensure the usefulness of visualizations for community collaborators, SCAN360 developers are currently building capacity to draw custom boundaries and generate estimates for the communities they represent.
Additionally, the SCAN360 team has prototyped tools to support comparisons (and make valid inferences about differences) between groups. When used in the hands of statistical experts, there is a place for comparing groups with traditional, p-value-based methods. However, these techniques are fraught with the danger of false discoveries driven by multiple comparisons. The potential harm to entire neighborhoods based on these false discoveries is real. Our prototypes allow users to specify a meaningful difference based on clinical/medical criteria or policy thresholds, using rules like a 10% increase or a difference in 20 cases per 100,000. The presence of differences is shown using "stoplight plots." The plots use a grid, with rows for different statistics/features (e.g., female breast cancer, prostate cancer for black males) and columns showing the rates for a specified place of interest, the state, and the USA. Colored circles, which look like red, yellow, and green (or blue in colorblind mode) in American traffic lights, indicate differences. The circles appear between the listed rates for the specified place of interest, state and country to indicate clinically meaningful differences between those locales. In "expert mode" the stop symbols are annotated to indicate a statistically significant value with a traditional p-value of 0.05. Different annotations can be added to show the meaningful differences that are statistically significant after adjusting for multiple comparisons.
Finally, the SCAN360 team is contemplating how to show rates through time. Traditional visualizations in academic journals show changes through time by plotting time on the x-axis, an outcome on the y-axis, and a heavy horizontal line to show a trend. Some journals add thin vertical lines, called whiskers, to represent the uncertainty bounds. Many people are not properly trained to interpret the uncertainty associated with the uncertainty bounds. Following the lead of researchers like Jessica Hullman, we are experimenting with dynamic graphs to show uncertainty [19]. For example, instead of showing a static trend line, we can show uncertainty as a line that wiggles up-and-down in the uncertainty range. So, instead of showing a trend line with wide whiskers, we show a line that wiggles across that wide range. Instead of showing a precise estimate with "short whiskers," we show a trend line that wiggles up-and-down within a small range. These plots afford the immediate impression either that an estimate line is "all over the place" or that the estimate is "stable." The development of these displays is critical for the dissemination of cancer statistics to a broad audience. If we fail to convey the uncertainty of our estimates, the visualization tool users may place too much trust in our mathematically rigorous best guesses.

Academic Researchers
Historically, mapping the spatial distribution of disease has led to major advancements in the field of epidemiology by generating hypotheses on disease etiology and risk factors. A prime example of this was the identification of asbestos as a carcinogen following observations of unusually high rates of lung cancer in coastal counties with ship-building industries on post-World War II mortality maps [20]. To date, SCAN360 has primarily benefited interventional research on cancer prevention and early detection by helping investigators identify communities where proposed programming can have the greatest impact. Investigators pursuing implementation research around HPV self-sampling are currently using SCAN360 to prioritize the selection of pharmacybased primary care clinic partners in neighborhoods with excess cervical cancer incidence. Others are using SCAN360 to identify areas with a high burden of tobacco-associated cancers for launching an investigator-initiated trial of a new point-of-care tool for detecting head and neck cancers.
As epidemiologists increasingly turn to geospatial approaches for understanding relationships between cancer outcomes and social and built environments, GIS-based visualizations (e.g., bivariate or side-by-side choropleth maps) may facilitate the observation of new patterns that merit further inquiry and intervention. Of course, due to the limitations of ecological analyses, observed or suspected associations should be interpreted with appropriate caution and understood as hypothesis generating. In addition to the visualization challenges related to data sparsity, it can be difficult to match measures of disease burden with those of spatial context when sub-county aggregations are not readily available or only available for dissimilar geographic boundaries. Population-based surveys designed by CCs to collect geocoded data from representative samples of their catchment areas may be warranted, and longitudinal cohort studies particularly valuable for visualizing temporal effects [21]. Novel applications of community-engaged research methodologies for collecting geospatial data (e.g., community-based participatory research, participatory mapping/GIS, citizen science, or crowdsourcing) might also be instrumental for acquiring more granular data on social and built environment characteristics that impact cancer and strengthen academic-community partnerships [8,[22][23][24]. Examples of community-engaged mapping approaches 34 include surveying local residents to define neighborhood boundaries [24]; working with residents to collect data on "activity spaces" (e.g., the spatial movement of people in a neighborhood during day-to-day life) in order to understand the geographic accessibility of neighborhood resources and quantify community participation [25]; and recruiting community voices in the development of novel metrics of neighborhood health [26,27].
Population health scientists are not the only group of researchers who stand to benefit from tools like SCAN360. Because epigenetic alterations can reflect environmental exposures, factors related to the spatial context of neighborhoods are also gaining traction in basic sciences like epigenetics [28,29]. Visualization tools may help basic science researchers to appreciate where and why epigenetic alterations occur, and lead to the identification of specific signatures that are amenable to targeted intervention. As others have noted, multilevel research linking molecular mechanisms to population health patterns will require interdisciplinary teams of basic scientists, population health researchers, and geographers [21]. Visualization tools like SCAN360 may be instrumental in catalyzing collaborative research efforts like these and disseminating the results to a wide audience.

Coordination Between Federal, State, and Regional Cancer Control Efforts
Cancer data visualizations can also inform policymakers on the development of public health initiatives and resource allocation. Legislatively mandated state cancer control and research advisory boards are a prime potential audience. In Florida, the state's advisory council approves a 5-year statewide cancer control and research plan outlining key objectives for outcomes across the cancer continuum. Regional collaboratives then prioritize objectives for their region and work with community stakeholders to monitor and meet them. Collaboratives have often nominated priority objectives for their region based on their knowledge of the local cancer burden, the expertise of regional members, or feedback from the region itself. To make the decision process more data driven, SCAN360 developers have started building a dashboard to show regional collaboratives where their region is on key objectives in the state's cancer control plan relative to where they should be, allowing them to prioritize their objectives based on the areas of greatest need. In the past, regional collaboratives have been able to access this type of data and visualization in primarily ad hoc ways; with SCAN360, collaboratives will have ready access to a comprehensive view of cancer in their regions, enabling them to make data driven and locally impactful decisions.
Looking Forward: Leveraging Visualization to Serve the Needs of the NCI Network NCI's designation requirements emphasize the importance of matching outreach and engagement efforts to the unique characteristics of geographically defined CC catchment areas. Visualization tools allow CCs to visually appreciate how targeted outreach and engagement may help attenuate key cancer risk factors and risk conditions, and ultimately the distribution of disease, over time. They also provide a way for diverse stakeholders to appreciate the power of academic-community collaboration.
Our examples of current and prospective applications of SCAN360 within SCCC's catchment area and across the state are translatable to CCs nationwide. Increasing utilization of geospatial approaches by CCs for outreach and engagement, as well as research, will create opportunities for novel multisite collaborations. Ethical concerns related to sharing geocoded patient data and potential breaches of confidentiality have challenged multi-institution health research historically; standard methodologies for both (a) protecting patient data while maintaining geographic integrity and (b) facilitating human subjects' research approvals for multisite studies are needed [21]. Additionally, coordinated efforts to harmonize the measures that populate visualization tools and are applied in geospatial analyses will enhance the feasibility of these collaborations.

Conclusion
Interactive visualization tools are useful for a variety of stakeholders. They may help community collaborators identify regions for targeted outreach; inform the design of multilevel interventions and selection of study sites by prevention scientists; generate novel research questions on the relationships between cancer etiology, cancer outcomes, and spatially organized determinants of health; motivate interdisciplinary research; and guide how governmental initiatives and regional cancer collaboratives allocate cancer control resources in their communities. Data sparsity, patient confidentiality and conveying statistical uncertainty require special consideration when designing interactive visualization tools for diverse audiences that include the lay public and professionals with different areas of expertise. As comprehensive cancer centers nationwide monitor and respond to the cancer burden within their geographies, visual tools may help demonstrate the effect of outreach and engagement on cancer control across the entire NCI network over time.
Data Availability Not applicable.
Code Availability Not applicable.

Conflict of Interest
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.