Background

Breast cancer is the top cancer in women worldwide and is increasing particularly in developing countries where the majority of cases are diagnosed in late stages [1]. Late stage of cancer at diagnosis is an important predictor of cancer mortality. Understanding geographic disparities in late stage of breast cancer at diagnosis is critical for cancer control activities. In many areas worldwide, cancer registry systems, available data and mapping technologies can provide information about late stage cancer by geographic regions, offering valuable opportunities to identify places where further investigation and interventions are needed. The objective of present study is to demonstrate the use of available data and geographic information systems to (1) examine geographic disparities in late stage breast cancer incidence, (2) identify factors that are associated with higher rates of late stage breast cancer across different geographic areas, and (3) highlight areas that might benefit from targeted interventions. It is hoped that the approach offered in this work will be utilized broadly and outside of US, where cancer registry systems and technologies offer the same opportunity to identify places that require specific cancer control interventions to reduce cancer burden.

Late stage breast cancer (LSBC): theoretical foundation

Many factors contribute to LSBC at diagnosis. Among the major factors are the underlying biological aggressiveness of the disease [2]; demographic and socio-economic characteristics [36], health insurance status [7, 8], accessibility to healthcare and diagnostic services [911], the availability and utilization screening tests [12].

Studies suggest that black women are more often diagnosed with LSBC than white women [1315]. Furthermore, African-American/Black women are more likely than other US race and ethnic groups to develop aggressive breast cancer that is estrogen and progesterone receptor negative for negative for human epidermal growth factor and with distant metastases at diagnosis [16]. Evidence of ethnic disparities suggests that Hispanic women experienced lower incidence rates of LSBC than non-Hispanic women [17].

Financial resources have been associated with breast cancer stage. Low socioeconomic status and poverty have been correlated with higher rates of late-stage breast cancer [5, 18, 19]. Other research evidence suggests that poor literacy limits patients’ understanding of cancer screening and of symptoms of cancer, potentially adversely affecting their stage at diagnosis [20].

In studies that examined health insurance, having no insurance or Medicaid coverage is associated with a higher proportion of LSBC [8, 21]. Kuzmiak [8] found that, compared to insured patients, uninsured patients had a 66 % higher likelihood of presenting with LSBC.

Furthermore, it has been suggested that accessibility to health care, measured by lower availability of mammography facilities, and/or primary care physicians—delays the detection of early stage breast cancer [10, 2225].

Other evidence suggest that a lower density of mammography facilities has been associated with the higher proportion of late stage breast cancer at diagnoses in rural areas [9] where there is more limited access to mammography facilities [25] and women are less likely to have received a mammogram [26]. Conversely, some studies found that geographical access to mammography facilities and primary care providers were not correlated with stage at diagnosis [27]. McLafferty [28, 29] found an “urban disadvantage”—higher proportions of late stage breast cancer in urban areas.

Many of the aforementioned contributors to LSBC are correlated with each other, change over time, and vary across geographical areas [13, 18, 20, 21]. Some factors such as the availability, cost, and accessibility of mammography screening and diagnostic services are potentially modifiable and addressing them could result in more women being diagnosed earlier when the disease is more amenable to treatment.

The aforementioned evidence of the major contributors to LSBC served as the theoretical foundation for selection of particular variables in our study. The next section provides an overview of these variables and data sources.

Methods

Data and variables

Data from the Surveillance, Epidemiology, and End Results (SEER) Program of the United States National Cancer Institute [30] were used as a source of outcome (dependent) variable in the analysis—late stage breast cancer incidence. SEER is comprised of 18 population-based cancer registries covering 28 % of total US population. The information collected includes primary tumor site, tumor morphology and stage at diagnosis, first course of treatment, and follow-up for patient’s vital status. SEER provides baseline measures of cancer incidence rates, survival and prevalence statistics, and change in incidence and survival trends over time for several geographic units, listed in order from the smallest to the largest: County, Health Service Area, State, and Nation, and makes public use files available to support population-based cancer research.

This study utilized SEER data on patients who were diagnosed with LSBC in the following eight states that comprise 26 % of the US population: California, Georgia, Iowa, Louisiana, Kentucky, New Jersey, New Mexico, and Utah. Cases of LSBC were defined as female, age 40 years or older, diagnosed with LSBC between 2006 and 2010. Late stage was defined as stage III and stage IV, using the guidelines from AJCC (American Joint Commission on Cancer) 6th Edition Cancer Staging Handbook [31].

In this study, the outcome variable is age-adjusted incidence rates of late stage breast cancer (LSBC) for females, 40 years of age and older, that were diagnosed in the period between 2006 and 2010 (the rates were age-adjusted to 19 age groups using the 2000 US standard population).

The independent variables in the analysis represent some of the major factors that were found to contribute to LSBC including: socio-demographic and economic characteristics (race, ethnicity, literacy, education, median income, health insurance coverage); accessibility to health care (urban–rural residence); availability of screening services (density of obstetrics and gynecology physicians and FDA approved mammography facilities) and utilization of mammography (percentage of women who utilized mammogram). The variables tested in the model with a detailed definition, data sources, and data timeline are shown in Table 1.

Table 1 Variables and data sources

The socio-demographic data were obtained from the US Census Bureau 2010 data file, and include the following variables: percentage of Black population; median household income, English literacy, education, health insurance coverage, and rural–urban residence.

Data on the number of OBGYN specialists in defined geographic areas was obtained from the Health Resources and Services Administration (HRSA) area resources file (ARF) [34]—estimates for the year 2010. These data were normalized by the total population to calculate a measure of OBGYN specialists per person.

The number of FDA approved mammography facilities was available from the US Food and Drug Administration (FDA)—estimates for the year 2010. These data were normalized by the total population in order to generate density measure of the number of facilities per person.

Breast cancer screening measures were available from the small area estimates (SAE) file [33]. These are model-based, bias-corrected measures of the percentage of female population ages 40 and older who reported having a mammography test in the past 2 years—estimates for the period 2000–2003.

Geographic unit of analysis

Data for all variables in this analysis were initially assigned to a county, the largest administrative unit in the United States. However, since the LSBC data are too sparse to provide stable 5-year incidence rates at the county level, the data were aggregated into health service areas (HSA’s) in order to assure stability of rates. HSA’s were originally defined by the National Center for Health Statistics as larger geographic areas comprised of one or more counties and are defined such that most residents in the region obtain hospital care from the same set of hospitals [32]. The original HSAs were modified by The National Cancer Institute so that any HSA that crossed state or SEER Registry boundaries were split and all counties from one HSA were in one state and/or SEER Registry. There are 944 HSAs in the US that contain 3141 counties according to the modified HSA definition. Since the modified HSAs are delineated using geopolitical boundaries of counties and states, this makes them compatible with many data systems (e.g. census data), thus increasing the possibilities for data analysis [32].

Analysis

All statistical analyses were performed in SPSS 16.0 software. We plotted the mean rates of late stage breast cancer by SEER state with confidence intervals; subsequently we ran Analysis of Variance (ANOVA) with Bonferroni adjustment to test whether or not the observed variations in the mean rates of LSBC were statistically significant between the states (p < 0.05), and if so, which states differ significantly. Secondly, we classified the HSA-level incidence rates of LSBC for all eight SEER states using tertiles and computed the proportion of HSAs in each class by state. In addition we generated a map of incidence rates of LSBC to show the location of HSAs with high, medium, or low rates across the eight SEER states. In the next step, we ran the “backward” stepwise linear regression to determine the factors that best explain LSBC incidence for females, ages 40 and older. Our analysis included the ten potential predictor variables listed in Table 1 and individual states as covariates. Backward stepwise regression essentially does multiple regression a number of times, each time removing the weakest correlated variable. The final model contains variables that best explain the variation in the dependent variable. Finally, we mapped the strongest predictors of LSBC by HSA of each SEER state in order to examine their geographic distributions and identify areas that may be in need of intervention. All maps were created using ESRI (Environmental Systems Research Institute, Inc. Redlands, CA) ArcGIS 10.1 software.

Results

Figures 1 and 2 illustrates the variation in the mean rates of LSBC for women 40 years of age and older, by state. The mean for eight SEER states is 46.3 per 100,000 women. Ranked in order from highest to lowest state, New Jersey has, on an average, the highest incidence rates of LSBC (48.2 per 100,000 women), followed by Georgia, Kentucky, Louisiana, California, Utah, Iowa, and New Mexico with the lowest (33.6 per 100,000).

Fig. 1
figure 1

Mean rates and confidence intervals of LSBC incidence, females, Ages 40 and above, 2006–2010

Fig. 2
figure 2

LSBC incidence by HSA, females, ages 40 and above, 2006–2010

To test for significant differences in late stage breast cancer incidence among the eight states, we run an analysis of variance and get an F statistic value of 4.241, with p < 0.0001, which indicates that the variation in mean rates of LSBC by state is statistically significant. The results of Bonferoni test (Table 2) reveal where the significant differences lie: New Mexico has, on an average, significantly lower incidence rates of LSBC than New Jersey (p < 0.0001), Georgia (p < 0.0001), Kentucky (p < 0.0001), and California (p < 0.003). There are no significant differences in mean rates of LSBC between other states.

Table 2 States that have significantly different LSBC incidence rates

Figure 2 represents a map of LSBC incidence by HSA. Incidence rates are classified into tertiles (low, medium, and high values) and percent contribution of HSAs to each class of values is summarized in Table 3: the state with the highest mean (48.2 per 100,000 women)—New Jersey, has 70 % of HSAs with high incidence rates of LSBC (>46.0 per 100,000 women), and the remaining 30 % with medium rates (40.1–46.0). In contrast, New Mexico—the state with the lowest mean (33.5) has 80 % of HSAs with low rates (16.0–40.0) and 20 % with medium rates. In California, most of the HSAs (60 %) have medium rates. Georgia, Kentucky, and Louisiana each have greater proportion of HSAs in the medium and high range, respectively, while Utah and Iowa have proportionately more HSAs in low and medium ranges.

Table 3 Percentage of HSAs with low, medium, or high rates of LSBC by SEER state

Table 4 shows the descriptive statistics for variables that were entered into a stepwise regression, while Table 5 reveals which of those variables best explain the variation in LSBC incidence. The hypothesis being tested is: there is no association between late stage breast cancer and each of the predictors, assuming other predictors are associated with LSBC. We rejected the null hypothesis based on the following results: model R = 0.493 and adjusted R2: 0.205, indicating that about 20 % of the total variance in LSBC incidence rates is explained by the regression model. The standardized coefficient shows that mammography density is the most strongly associated predictor of LSBC with a negative effect. On average, for each unit increase in Mammography density, the LSBC incidence rate decreases by 0.383.

Table 4 Descriptive statistics for variables entered in the regression analysis
Table 5 Best model fit: effects of state, demographic, and health characteristics on LSBC incidence rates

Other HSA-level factors that explain the variation in LSBC across the eight SEER States include: percent of population with college degree or higher, English literacy (both significantly negatively associated with LSBC incidence), and percent of Black population (significantly positively associated with LSBC). After considering these four predictors, the states of New Jersey, California, and Kentucky still have significantly higher rates of LSBC than New Mexico (the reference category).

The maps illustrate geographic disparities in: Density of FDA approved mammography facilities (Fig. 3), population with BA degree or higher (Fig. 4) and Black population (Fig. 5)—the three strongest predictors of LSBC found in this study. More importantly, the maps highlight where to focus targeted interventions. Our model suggests that areas with low density of mammography, low educational attainment, and high percentage of Black population tend to have higher incidence rates of late stage breast cancer. In the first map (Fig. 3), red-shaded areas represent HSAs with high incidence rates of LSBC (>46.0 per 100,000) and, at the same time, with less than 3 FDA approved mammography facilities per person. In the second map (Fig. 4), shaded areas highlight HSAs with high incidence rates of LSBC and, at the same time, relatively low educational attainment (less than 10 % of population having BA degree or higher). The last map (Fig. 5) highlights the areas where both the incidence of LSBC and proportion of Black population are high.

Fig. 3
figure 3

Number of FDA approved mammography facilities per person by HSA

Fig. 4
figure 4

Percent of population with BA degree or higher by HSA

Fig. 5
figure 5

Percent of black population by HSA

Conclusions

In this study we sought to describe the geographic disparities in late stage breast cancer incidence across eight states in the US and identify areas where LSBC are common, and where further research could help better identify reasons for the high incidence of late stage diagnoses and interventions could be used to modify factors contributing to the high rates of LSBC. For example, identification of areas with higher rates of LSBC and factors contributing to them may help identify where resources might be needed to increase screening for breast cancer and provide greater availability of services that can provide more aggressive treatments.

We found heterogeneity across the eight states examined in the incidence of late stage diagnosis, with the state with the highest percent, New Jersey at 48 per 100,000 women, having an incidence rate 30 % higher than that of the lowest, New Mexico, where the rate was 33.5 per 100,000 women. Our results indicate that New Mexico has significantly lower incidence rates of LSBC than four other states.

We also found that lower density of screening mammography units within HSAs were associated with higher rates of late stage breast cancer in our sample. This is consistent with prior findings regarding lack of accessibility to health care—whether measured by lower availability of mammography facilities and primary care physicians—delays the detection of early stage breast cancer [10, 11, 2225]. Other studies suggest that a lower density of mammography facilities units has been associated with the higher proportion of late stage breast cancer at diagnoses in rural areas [9] where there is more limited access to mammography facilities [26] and women are less likely to have received a mammogram [27]. Our population-based sample covered all parts of the states studied, and so included both urban areas as well as rural areas. In our study, however, the percentage of the HSA’s population that was urban did not contribute to late stage breast cancer rates.

McLafferty [28, 29] found an “urban disadvantage”—higher proportions of late stage breast cancer in urban areas. The driving factor of late stage breast cancer may be limited use of mammography facilities whether the barrier is access and distance or demographic and socioeconomically based. Again, our study does not support nor contradict these findings, given that the urban–rural factor was not significant predictor of late stage.

Financial resources have been associated with breast cancer stage. For example, low socioeconomic status and poverty have been correlated with higher rates of late-stage breast cancer [5, 18, 19]. In the current study the percentage of people within an HSA who had a college education was inversely associated with rates of late stage breast cancer across the eight states. A college education is likely a surrogate for a higher income as well as higher educational attainment. In addition, it has been suggested that poor literacy limits patients’ understanding of cancer screening and of symptoms of cancer, potentially adversely affecting their stage at diagnosis [20]. In our study, poor English literacy was significantly associated with higher rates of late stage breast cancer.

In studies that examined health insurance, having no insurance or Medicaid coverage is associated with a higher proportion of LSBC [8, 21]. In the current study an increased percentage of people within HSAs who had private health insurance did not contribute to late stage breast cancer rates.

The US cancer registry systems and available data and mapping technology can provide detailed information about late stage breast cancer by geographical regions, offering valuable opportunities to identify cancer-related health disparities and areas where further investigation and interventions are needed. We mapped and highlighted Health Service Areas that have any combination of high late stage breast cancer incidence and significantly associated factors; the obvious motivation was to identify areas that might benefit from targeted interventions.

Further work, however is needed in order to capture additional factors that drive the differences in LSBC incidence among states, and that did not come as significant in our model. Identifying underlying reasons for geographic variation presents many challenges, including missing data and measurement issues. In addition, some area level data is collected and released for research in varying time periods; this is a limitation of our study where there is a substantial variation in timeline of some data; this could have affected our results if there were secular trends over time for specific predictors or LSBC. In addition, factors available at the geographical level are often imprecise or do not fully capture important underlying domains. For example, the density of obstetricians and gynecologists, used as an indicator of one type of physician who commonly refers women for mammographic screening; or the density of mammographic facilities as a proxy for the capacity of the geographic area to provide mammograms to the population of women are inherently limited. Nevertheless reducing cancer-related geographic disparities is an important goal.

Strengths of the study included population-based cancer registries that capture almost 100 % of all cancers in defined geographic areas, well-documented and standardized methods across states for categorizing data from registries into stage, coverage of roughly 28 % of the US population, and the opportunity to compare late stage rates across states in contrast to most earlier research studies, which focused on smaller geographic units. Geo-spatial analysis of patterns of late stage breast cancer can be useful to inform targeted interventions in the areas that need them the most.

In conclusion, this study suggests that in the eight US states examined, higher rates of late stage breast cancer are more common in areas with predominantly black population (disparity in race), where English literacy, percentage of population with college degree (disparity in socio-demographic characteristics) and screening availability are low. The approach described in this work may be utilized both within and outside US, wherever cancer registry systems and area-level potential predictor variables and mapping technologies are available to identify and better characterize areas with high risks of cancer or cancer-related outcomes that may benefit from further investigation and interventions to reduce the cancer burden.