Introduction

The state and national Environmental Public Health Tracking (EPHT) Programs are building a network of integrated health and environmental data to provide nationally consistent data and information related to environmental health. These data will be used to perform a variety of environmental health surveillance functions such as monitoring trends in environmental hazards and disease, exploring relationships between environmental hazards and disease, identifying populations at risk, and guiding intervention and prevention strategies (www.cdc.gov/nceh/tracking).

Participants from academic institutions, the US Centers for Disease Control and Prevention (CDC), environmental agencies, state health departments, and data steward organizations, are working together in teams to define the nationally consistent data and indicators that will be part of the Network. These data steward organizations are made of members who collect and manage health data from a variety of sources such as hospital and vital statistics records. With regards to air pollution, the EPHT program is first focusing on ambient levels of ozone and PM2.5 because these data are routinely collected across the country and there are established associations of short-term exposure to these pollutants with adverse health. The EPHT Air Team was formed to develop recommendations for ambient ozone and PM2.5 data indicators and combined air quality and health measures to be included in the EPHT Network. As part of these efforts, the team has begun discussions on how to produce and disseminate air quality health impact indicators to the public, environmental health professionals, and policy makers.

Health and environmental agencies have a long history of tracking trends in health and environmental factors separately. For example, environmental agencies monitor ambient air to ensure that it meets regulatory standards and they disseminate reports describing geographic and temporal trends in air pollution. Health agencies track trends in cardiovascular and respiratory diseases through survey data, hospital administrative data and mortality registries. EPHT efforts will add value to the current surveillance systems by facilitating the analysis of linked health and air quality data in order to better characterize the ambient air–health relationships and estimate public health impacts. EPHT analysis of ambient air–health relationships can potentially help identify susceptible sub-populations and changes in the numbers adverse health outcomes related to ambient air pollution over time. Ongoing surveillance activities will also generate hypotheses requiring further research. Research studies are often conducted as a one-time project, designed to answer specific unresolved questions. It is important to note that EPHT activities are not meant to become or usurp an ambient air epidemiology research program; rather, the goal is to integrate well-developed analytic methods into routine surveillance. The key issue for EPHT is how to “track associations” between ambient air and health outcomes in a consistent, reliable, and sustainable manner that supports public health practice (Paulu and Smith 2008) while at that the same time integrating the knowledge we have gained from research studies.

Health agencies are in a unique position to be able to routinely estimate the regional and local health impacts of air pollution on health in their jurisdictions in a timely fashion. This stems from the health agencies’ access to health data resolved at fine geographic and temporal scales which can be used for surveillance purposes without going through lengthy review processes that encumber research. They can coordinate these efforts with environmental agencies in their jurisdictions to help focus program activities to address the adverse outcomes of exposure to air pollutants.

This paper focuses on the EPHT program’s progress to date on developing nationally consistent data and methods to measure the public health impacts of ambient air quality. We describe the types of air quality and health outcome data that will initially be available and how they might be used to estimate associations and public health impacts of ambient levels of ozone and PM2.5. We list some the obstacles which will need to be overcome in order to share health data and estimate air–health impacts which meet the needs of a variety of stakeholders.

The benefit of generating public health impact estimates is that the information can more clearly communicate the effects of air pollution to policy makers than separate air or health indicators. While it may be useful to report that the average annual PM2.5 level in a city is 10 μg/m3, it may be more informative to report the magnitude of health impacts such as the number of deaths or years of life lost attributable to these levels. The EPA periodically conducts national-level health impact assessments to estimate the costs and benefits of proposed regulations (EPA 1999, 2005). Several groups have performed similar analyses at the sub-state level to analyze local air quality issues (Hubbell and Fann 2008). However, within the U.S., the health burden of air pollution is not regularly reported for local areas as part of public health practice.

EPHT data

The EPHT network will provide access to a wide range of standardized health and environmental data and indicators from the participating state and national networks (CDC 2006). These will be made available to users at varying levels of detail and with levels of security commensurate with the sensitivity of the data. This section will describe the air and hospitalization data that will be available on the EPHT network.

Health data

Participating EPHT state health departments will be creating standardized data on hospital inpatient admissions for asthma and acute myocardial infarction. These standardized data will allow the state and national EPHT programs to develop tools for routine surveillance activities which can include the development of health impact estimates of air pollution at the local, state, and national level. There is currently no national hospital database available for EPHT. However, 46 states and the District of Columbia require the reporting of hospital discharge data according to individual state laws. The data elements include day and time of admission, primary and secondary discharge disease diagnosis, the address, and the date of birth which can be used to calculate age.

The challenge with creating a national hospital discharge database is that many of the data elements of interest, including ZIP code and day of admission, are considered to be confidential. States vary in their policies that permit use of these data. Generally, access to these data elements requires a review by the data stewards and/or an Institutional Review Board for each individual study or surveillance project. Though these data are available for studies which link confidential health data to environmental data upon study completion, public health personnel are often required to destroy these linked data due to concerns over risk of disclosure of confidential information. Providing confidential sub-county data with the admission dates to the National EPHT network may not be feasible without clear assurances that the data stewards can maintain control of who has access to the data and for what specific purposes. Though research projects often have well-defined research protocols listing the hypotheses being tested and the research methods used, surveillance activities are often not as clearly defined. In 2007, the EPHT state health departments provided de-identified aggregated health data and public health indicators, such as numbers and rates of inpatient hospital admissions summarized by month, year, and county, to the National EPHT Network. Development of health impact estimates at the state and local level may require health data at finer geographic scales. Counties are large, containing diverse neighborhoods with varying rates of disease and risk factors, so smaller areas are needed to preserve the geographic variability of the information. For example, ZIP code level maps, which are smoothed or aggregated to preserve confidentiality, could identify areas in need of additional public health programs. Analyses of the short-term effects of pollution will also require access to daily counts of health events. The day of admission is considered confidential information. Though confidentiality limitations may prevent states from the routine sharing of finely resolved data, these limitations should not be a barrier for state to use their own data in conjunction with nationally consistent methods to conduct surveillance activities. The EPHT program is exploring creating a partially de-identified subset of the data containing daily counts of admissions by ZIP code which could be shared on a secure network for EPHT analyses.

Hospitalization data stewards have recently begun to augment the databases of hospital inpatient stays with data on emergency room (ER) visits. In New York, there are approximately three and a half times as many emergency room visits for asthma as there are inpatient admissions, greatly improving case ascertainment. ER indicators may be added to the national network when more data become available.

Air data

The Air Quality System (AQS) database (http://www.epa.gov/ttn/airs/airsaqs/index.htm) contains ambient air pollution data collected by EPA, state, local, and tribal air pollution control agencies to assess air quality. AQS also contains meteorological data, descriptive information about each monitoring station. There are roughly 1,000 monitors across the US that measure ambient ozone and 1,000 that measure PM2.5.

The AQS data are generally considered to provide the most accurate estimates of air quality at a given time and place; however, the times and locations of the monitored data are limited. Monitors can be located some distance from where people live and spend time and thus may not accurately reflect the ambient levels people are exposed to. In addition, daily monitoring values are often not available. For example, ozone is often only monitored in the warmer months while measures for PM2.5 are often available every third day. Many different approaches have been used to assign exposure levels to the health data. These methods range in complexity from assigning the nearest monitors or the average of monitors within a county, to interpolating concentrations across space using interpolation techniques such as Kriging.

The EPA Community Multiscale Air Quality (CMAQ) modeling system is one model that incorporates the important physical and chemical functions associated with the dispersion and transformations of air pollution (www.epa.gov/asmdnerl/CMAQ, www. cmascenter.org). CMAQ approaches air quality as a whole by including state-of-the-science capabilities for modeling multiple air quality problems, including tropospheric ozone, fine particles, air toxics, acid deposition, and visibility degradation. CMAQ relies on emission estimates from various sources, including the EPA Office of Air Quality Planning and Standards’ current emission inventories, measured emission rates from major utility stacks, and model estimates of natural emissions from biogenic and agricultural sources. CMAQ also relies on meteorological predictions that include assimilation of meteorological observations as constraints. Emissions and meteorology data are fed into CMAQ and run through various algorithms that simulate the physical and chemical processes in the atmosphere to provide estimated concentrations of the pollutants. The grid resolutions for CMAQ are typically 36 × 36 km per grid for the “parent” domain, and nested within that domain are 12 × 12 km grids. The parent domain typically covers the continental United States, and the nested 12 km grid covers the Eastern or Western United States. Currently, 12 km resolution is recommended for most applications as the highest resolution. Improvements will be made to the CMAQ modeling system as emission inventories and chemistry models are further developed by the scientific community.

The EPA has developed statistically based ambient air estimates for use in EPHT. These estimates are derived through the combination of the AQS monitoring data with CMAQ modeled data using a hierarchical-Bayesian (HB) space–time statistical model (Banerjee et al. 2004; Holland et al. 2003). This approach attempts to combine the best characteristics of each source of spatial information for prediction over time when both sources of data are available. The model assumes that each source provides information about the underlying true pollutant surface. Air monitors are assumed to measure the true pollutant surface with some measurement error, but no bias. In contrast, numerical model output is assumed to approximate the variability of the true surface while exhibiting both measurement error and bias (additive and multiplicative) across space and time. Also, the model allows for the inclusion of covariates, such as daytime population density, to account for possible pollution–population relationships.

This Bayesian hierarchical space–time model gives more weight to accurate monitoring data in areas where monitoring data exists and relies on bias-adjusted model output in non-monitored areas. The model assumes that both monitoring data and CMAQ data provide good information about the same underlying pollutant surface, but with different measurement error structures. This approach offers the ability to predict important pollution gradients and uncertainties that might otherwise be unknown using interpolation results based solely on air monitoring data. These surrogate measures of air quality can be compared to adverse health outcomes.

Table 1 compares the features of the two types of air quality data being proposed for use in EPHT, (1) ambient air quality monitoring data from the AQS and (2) statistically based ambient air estimates that result from monitored and modeled air quality data. The datasets will provide alternate characterizations of daily 8-h maximum ozone concentrations and daily average PM2.5 concentrations. Each method has its own strengths and weaknesses, and either could be used to create air quality or linked air–health indicators for analyses within an EPHT framework. A summary of some of the types of air characterization data that should be considered for use in EPHT was provided at a recent workshop on methodologies for tracking health effects of air pollution (White 2008).

Table 1 Features of the two types of air quality data being proposed

Linked data

To better understand the impacts of ambient air quality on cardiovascular and respiratory disease hospitalizations, it would be useful for the data to be linked both in time and by geography. The hospitalization records can either be geocoded to the street address or to the ZIP code centroid. This provides geographic coordinates for each hospitalization record. The air monitoring data or statistically based air data also contain geographic coordinates along with the dates for which the air pollution levels are estimated. This allows for the linking of the data by day and location (Haley et al. 2007).

This type of linked data set could be considered confidential by the data stewards since both day and geographic coordinates are provided for each case, so it might not be placed on an open-access EPHT network. Access to the underlying health data requires specific protocols be approved by the IRB and/or the data stewards, and that data be destroyed when the analyses are completed. For example, in one EPHT project, the departments of health in New York, Wisconsin, and Maine obtained approval to share hospitalization data among the states creating a combined linked dataset that each state could use to answer specific questions. Each state negotiated with their state-specific IRB/Data owners for approval to share with the other states. Once the project was completed, the dataset was destroyed. In the future, more flexible agreements are needed that will allow EPHT staff longer term access to confidential data. This will facilitate surveillance by eliminating lengthy project-by-project IRB and data steward approval processes, and preserving cleaned geocoded data.

One option being discussed within the EPHT program is for each state to prelink the air and confidential health datasets, and then remove some of the confidential fields prior to making it accessible on the national network. For example, one could create an analytical dataset to measure the short-term health effects of air pollution using case-crossover analysis; the dataset would contain the required case/control links between air pollution, weather, and ZIP code level socioeconomic data without giving the ZIP code or date of admission. However, this dataset could be deconstructed by a computer-savvy individual to reveal the actual date and ZIP code, by matching to the publicly available air pollution, weather, and SES data. Another option is to develop a mechanism for people to access and analyze the confidential data without ever having to “see” it. Software programs could be developed that would allow users to run the analysis and return only the results which are non-confidential. This might put a burden on the data stewards who typically do not manage hospitalization databases in a geographic format that could be easily linked with other datasets. The software analysis methods would need to be flexible while also preventing the inadvertent release of confidential data.

The health impacts of air pollution can also be reported at a cruder resolution by applying “off-the-shelf” concentration–response(C-R) functions to air quality data and summarized health data. For example, estimation of the long-term health effects of PM2.5 cannot be calculated using raw surveillance data; these health impacts are based on C-R functions from cohort studies. Thus, health and air quality data by county and year could be used for producing county-level impact air quality–mortality indicators.

Figure 1 summarizes the interaction among the EPHT air quality and health data, simple air and health indicators, linked air–health analyses and indicators, Census data, and external C-R functions. The EPHT analyses are shown to focus on identifying temporal, geographic, demographic differences in C-R functions which would provide additional information to reduce the bias in the local health impact indicators.

Fig. 1
figure 1

Interaction among the EPHT air quality and health data, simple air and health indicators, linked air–health analyses and indicators, Census data, and external C-R functions

Progress to date

In the first years of the EPHT project (2002–2005), CDC, EPA, and the health departments of New York, Wisconsin, and Maine collaborated in the Public Health Air Surveillance Evaluation (PHASE) Project. As part of this project, the three state health departments worked in parallel to estimate the short-term effects of PM2.5 and ozone on hospitalizations within their respective states using air characterization data provided by EPA. The air data was derived from hierarchal-Bayes models and spatial interpolations using kriging. The PHASE Team selected the EPA’s statistically derived air quality data primarily to fill in the missing space and time components of the air monitoring data while maintaining the “ground truth” found in these data. They calculated statewide estimates of relative risk (e.g., percent increase in risk of hospitalization per 10 μg/m3 increase in PM2.5) and the corresponding attributable risk (i.e., number of hospitalizations triggered by the acute impacts of ozone pollution above background levels in 2001) using case-crossover analysis. In addition, since Maine Department of Health had access to 4 years of emergency room data, they also measured the association between asthma emergency room visits and ambient air quality over time (Paulu and Smith 2008).

The PHASE Team selected case-crossover analysis since the method can accommodate assigning exposure estimates to individual subjects in a single analysis rather than conducting multiple time-series analyses for each exposure area. Case-crossover analysis has been shown to be a comparable and alternative methodology to Poisson time-series regression analysis (Lu and Zeger 2007; Fuentes 2008). In this design, cases serve as their own controls. A subject’s exposure near the time of a health event (case period) is compared with exposures at previous or subsequent points of time when that subject was a non-case (control period). The case-crossover method controls for confounders such as age, gender, ethnicity, or area since cases are their own controls. The case-crossover approach can also be used to control for potential time-varying confounders such as seasonality by design, thereby obviating the need for complex modeling used in other time-series methods. This design can also be used to assess effect modifiers. With this study design, we might explore if the C-R function varies over time, between sociodemographic groups or with proximity to specific sources of pollution. For example, Xu et al. (2008) conducted case-crossover analyses in an area near Pittsburgh, where a steel coke plant had operated for many years. The study revealed reductions in cardiorespiratory disease hospitalizations associated with reductions of ambient levels of coarse particulates (PM10) with the closing of the plant. In addition, a decrease in the C-R function was noted which could be the result of the changes in the composition of the particulate matter due to the closing of the plant. The PHASE Team developed a technical background report summarizing the methodology (Haley et al. 2007) and the Case-Crossover Analysis Tool (C-CAT) to facilitate the calculation (Abraham et al. 2006), so that these types of analyses could be carried out in the future with additional state partners and years of data.

Future activities

Future EPHT linked air quality health analyses would involve analyzing, for state or sub-state areas, the association between air quality data and health tracking data, identifying statistical relationships. Such analyses could describe the degree to which individual-level risk factors, such as age and co-morbid conditions, and community level risk factors, such as poverty, modify the association between air pollution and health outcomes. The analyses can also track time trends in the relationship between air pollution and health (Paulu and Smith 2008; Shin et al. 2008; Burnett et al. 2005). The resulting C-R functions, if properly quality assured, could serve as a basis for tracking air quality health relationships in order to target preventative public health activities.

The weakness with state-specific analyses and cohort studies is that the effect of exposures to ambient air pollution on health events is relatively small and imprecise and thus the confidence intervals may be larger than the measured effects. This was seen in two previous single state EPHT case-crossover studies which looked at the short-term exposures of air pollution. The first measured the associations of asthma emergency room visits and ozone by year, sex, and age group in the State of Maine (Paulu and Smith 2008) while the second study looked at cardiovascular hospitalizations and fine particle pollution in New York State (Haley et al. submitted for publication). Similarly, the confidence interval range was larger than the measured health effect in a large American Cancer Society cohort study of cardiopulmonary mortality and long-term exposure to PM2.5 (Krewski et al. 2005). In the future, the EPHT program could statistically combine analyses from multiple states to provide more robust estimates following the methodology of other multi-city or multi-state projects such as NMAPPS (Dominici et al. 2003) and APHEIS (LeTertre et al. 2005; Boldo et al. 2006).

The difficulty is that C-R functions developed in other geographic areas or during other time periods may not be directly transferable to the local area of interest (Hubbell and Fann 2008). C-R functions could vary due to many factors such as differences in the susceptibility of the population, access to health care, medical treatment, exposure, and pollutant mix. The results of state and sub-state analyses in conjunction with existing peer-reviewed literature will thus play an important role in developing local estimates of the acute impact of air pollution. Through the process of continuous improvement and development of practical and uniform methods, health impact assessments will become more accurate at the local level.

The EPHT program is currently developing nationally consistent health and air quality data that can be routinely used to calculate measures of association and public health impacts of ozone and PM2.5. The interagency agreement between CDC and EPA led to the production of daily statistically derived pollutant concentrations across continental US between 2001 and 2005 with the hope these estimates will be produced annually thereafter.

Though the air data will reside on a national network, it is not yet clear where nationally consistent health and linked air–health data will reside due to confidentiality concerns. Will these data be held with data stewards or with the state and national EPHT networks? CDC will need to work closely with state health departments and data stewards to address this question.

Closer collaboration between environmental health agencies and academic institutions is needed within the EPHT program to train public health professionals within the state health departments and implement consistent methods that can be used to routinely estimate the health impacts of air pollution. Among a number of technical and scientific issues, the new methodology will need to consider individual and neighborhood level effect modification. EPHT states will need to work together with experts to review analytic results when health impacts are updated. Currently, there is no up-to-date system in place to easily share and combine the results of analyses in a central repository.

Communication of air quality health information is important for the EPHT and it will be important for the EPHT program to understand how the public health messages disseminated by APHEIS and other groups have been received by the public and policy makers (Wartenberg 2008). Clearly, defining the intent and audience for air impact indicators will be an important first step.