Abstract
Combining detailed county-to-county migration data with Toxics Release Inventory data, and fine-scale \(\hbox{PM}_{2.5}\) concentration levels, we investigate the relationship between internal migration, income of migrant and non-migrant households and county-level differences in environmental quality. We show that households moving to “cleaner” counties are relatively “richer”—a result consistent with a sorting by income in the spirit of Tiebout (1956). An implication of this finding is that internal migration could contribute to the persistence of disparities in pollution exposure at the county-level.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Inequalities in exposure to pollution among income groups are a source of major public concern as they seem to persist over time (Colmer et al. 2020; Jbaily et al. 2022). From the perspective of environmental justice, the question is then whether households self-select by income across areas with different levels of environmental quality. An ancillary question is “Are these correlations between income and pollution the result of firms’ strategic decisions based on local demographic characteristics?” This paper explores the first question using internal migration data from the United States (US).Footnote 1 We analyze whether out-moving households’ destination choices are consistent with a sorting by income across levels of pollution.
The idea that internal migration could, at least partially, create the income-pollution correlations reported in the literature can be theoretically motivated by Tiebout (1956)’s canonical sorting model. Households “vote with their feet” and sort across areas that provide their optimal bundle of private goods, housing, and neighborhood characteristics. If environmental quality is an amenity that is valued by households, but these households differ in terms of their income, a location choice model in the spirit of Tiebout (1956) predicts a stratification of local areas by income, where poorer households end up in more polluted areas.
To explore this mechanism empirically, we use detailed county-to-county migration data from 2010 to 2014 provided by the US Internal Revenue Service (IRS). We combine this dataset with county-level pollution and demographic data for the home and destination counties. An interesting feature of the IRS dataset is that we observe the income of moving and non-moving households. To the best of our knowledge, this is the first study that directly identifies a sorting by income across areas with different levels of environmental quality using county-to-county migration data. The IRS county-to-county migration data series provides a comprehensive view of internal migration patterns within the US. It captures information from 95 to 98% of all tax filers, making it the most extensive data source available for tracking population movements between counties.
Our outcome of interest is the relative income of households moving from a home county h to a destination county d. It is defined for each home and destination county pair (h, d) as the ratio of the average income of households moving from h to d to the average income of households staying in h. Using a linear specification with county-pair fixed effects, we estimate the relationship between the relative income of out-migrant households and environmental quality at their chosen destination. If households self-select across locations as predicted by Tiebout’s sorting model, we expect out-moving households from county h to sort by income with lower-income households from h moving to the more polluted destinations among the set of destinations of county h, while higher-income households choose less polluted areas.
To measure environmental quality in a county, we consider two indicators of local pollution: \(\hbox{PM}_{2.5}\) concentrations (Meng et al. 2019) and the number of facilities reporting to the Environmental Protection Agency (EPA)’s Toxics Release Inventory (TRI). \(\hbox{PM}_{2.5}\) is known to have adverse health effect and is considered as a good general measure of air pollution. The chemicals covered by the TRI Program are typically local and were chosen because they pose a threat to the environment and human health. The data about toxic waste management is easily accessible on the EPA website, where households can obtain information about polluting facilities in their home and potential destination counties.Footnote 2
In all our specifications, we use a wide range of econometric controls that could affect households’ decision to move to a particular county (e.g., employment opportunities, amenities, and other demographic characteristics). Destination choices are based on the comparison between the home area and the possible destinations, and the attributes of a destination county might be viewed differently by individuals living in different home counties. For this reason, all the county-level attributes are expressed as the difference between the values for the home and destination counties. To address concerns regarding the endogeneity of local air pollution levels, we adopt an instrumental variable (IV) approach, following prior research (Bento et al. 2015; Lang 2015). This approach exploits changes in PM\(_{2.5}\) concentrations driven by a county’s designation of non-attainment or re-attainment status by the EPA under the Clean Air Act Amendments (CAAA).
Consistent with the existence of a sorting by income, we find that on average, households that move to a county with a lower \(\hbox{PM}_{2.5}\) concentration, or less TRI reporting facilities, are relatively richer. We check for the robustness of our results by looking at various sub-samples of our data. We restrict our analysis to within-state or out-of-state migration, we exclude large and sparsely populated counties, etc. Overall, our findings suggest that household self-selection across destinations could play a role in the persistence of inequalities in exposure to pollution at the county-level.
With our specification, we observe the destinations households chose, but we do not observe what they could have selected from (their choice set). In the Appendix, we provide additional insights to address this concern by examining a destination model with home and destination counties characteristics. This allows us to study internal migration patterns and the trade-offs between moving costs, job opportunities, and other amenities faced by US households.
We contribute to the literature in several ways. We first add to the growing literature studying destination choices of internal migrants. Using the IRS data, Curtis et al. (2015) and DeWaard et al. (2016) investigate migrant destinations after Hurricane Katrina, while Frey (2009) and Molloy et al. (2011) explore more generally the possible explanations to the US migration slowdown. Davies et al. (2001) use repeated cross-sections of the IRS data to study the relationship between interstate migration, relative economic opportunities, and cost of moving.Footnote 3 The novelty of our paper lies in the combination of migration patterns at the county level with measures of local environmental quality. This allows us to study how households sort across destination counties based on their income.
The heterogeneity in the responses to changes in location characteristics among different types of migrants has received limited attention in the internal migration literature. Recent exceptions are Baum-Snow and Hartley (2020), Aydemir and Duman (2021) and Chen et al. (2022).Footnote 4 Chen et al. (2022) is of particular interest for our paper, as they find different effects of air pollution on net internal migration flows in China across gender, education and occupation. Besides the fact that we focus on an alternative household attribute (income), our study differs in the identification strategy. Instead of estimating our destination choice model for different income groups, we explore the existence of a sorting by income by regressing a direct measure of relative income of out-moving households on differences in local pollution.
Research focusing on residential mobility and local environmental quality commonly uses changes in aggregate demographic characteristics and difference-in-difference (or similar) approaches to investigate whether households locally or regionally sort by income (Been and Gupta 1997; Kahn 2000; Cameron and McConnaha 2006). However, studies have shown that these approaches might be problematic. Building on a standard sorting model, Banzhaf and Walsh (2008) illustrate that the predictions of difference-in-difference models regarding the effect of changes in local environmental quality on average income are ambiguous, except for “large” changes.Footnote 5 More fundamentally, Depro et al. (2015) note that these approaches do not allow to identify the impact of local pollution on residential mobility because this impact depends on the characteristics of the home and destination areas. Without information about the destination county, we don’t know whether households, who changed residence, moved to a more (or less) polluted area compared to the home county.
In our paper, we overcome this issue by relying on the IRS migration data, which is very detailed and refers directly to the yearly flows of in- and out-migrants in a given county. An alternative approach based on the structural estimation of households’ willingness to pay is proposed in Depro et al. (2015) or Freeman et al. (2019). While their analyses concentrate on specific communities within Los Angeles County or selected cities in China, benefiting from detailed data, our study encompasses all counties in the contiguous US. One exception is Close and Phaneuf (2017) who also use the IRS county-to-county migration data to estimate residents’ marginal willingness to pay to avoid air pollution. However, our analysis diverges as we specifically examine the interplay between migrants’ income and local environmental quality.
2 Conceptual Framework
The motivation for our analysis is based on Tiebout’s (1956) canonical model of residential sorting, which has inspired more recent general equilibrium models of location choice (e.g., Banzhaf and Walsh 2008). In these models, there is a set of possible locations that differ in terms of their level of public goods or amenities and households have to choose where to live subject to a budget constraint. Because households prefer neighborhoods with “better amenities”, the housing demand in those areas is higher, leading to higher housing prices. Households are therefore facing a trade-off between consumption and local amenities. In particular, a low-income household might not be willing to pay as much as a high-income household to live in a high-amenity neighborhood, as they have to prioritize necessary goods, such as everyday clothing or food. At equilibrium, if households differ only in terms of their income, they will sort by income across locations with different levels of amenities. If the local amenity of interest is environmental quality, Tiebout’s result implies that poorer households end up in more polluted areas, while richer households can afford areas with better environmental quality.
To examine whether the observed correlations between income and pollution could be, at least partially, attributed to households self-sorting across locations by income, we rely on internal migration data and explore the relationship between destination choices and income of households changing residence in a given year. Consider the set of households who decided to move from their home area h in year t and the set of associated destinations \(d \in \{1,\ldots ,N\}\) that differ in their level of environmental quality. If these households choose where to move subject to a budget constraint, a residential sorting model in the spirit of Tiebout (1956) predicts that out-migrant households sort by income level from the most polluted to the least polluted destinations in \(\{1,\ldots ,N\}\). Out-migrant households with an average income higher than the average income of households staying in area h move to locations that are less polluted than their home area, and we expect the least polluted destinations to attract the households with the highest income (relative to the income of households in h). Similarly, out-migrant households with an average income lower than the average income of households staying in h move to more polluted destinations.
This motivates the following empirical specification:
where our dependent variable is the relative income of out-migrant households and is defined for each pair of home and destination locations (h, d) as the ratio of the average income of households moving from h (where they were living in year \(t-1\)) to d to the average income of households staying in county h in year t:
The vector \(\Delta E_{d-h,t} = E_{d,t} - E_{h,t}\) contains our pollution measures and indicates how polluted a destination d (\(E_{d,t}\)) is, relative to the level of pollution in the home area (\(E_{h,t}\)). \(\Delta X_{d-h,t} = X_{d,t} - X_{h,t}\) includes a set of control variables (differences between destination and home characteristics). This specification captures the idea that destination choices are based on the comparison between the home area and the possible destinations. The location-pair fixed effects, \(\alpha _{dh}\), account for the unobservable time-invariant heterogeneity among pairs of locations. \(\tau _t\) represents the year fixed effects and \(\varepsilon _{d,h,t}\) is the error term. In Eq. (1), \(\Delta E_{d-h,t}\) and \(\Delta X_{d-h,t}\) are adjusted for a 1-year lag to capture the environmental and socio-economic conditions prevalent in the home and destination areas during the year preceding households move to their new residence in area d.
The coefficient of interest is \(\gamma _1\). Given the characteristics of a home area h, it captures how the relative income of out-moving households is associated with differences in pollution across destinations. A sorting by income across destinations is consistent with \(\gamma _1<0\): when the level of pollution at destination decreases relative to the home area, the income of households who choose that destination increases (relative to the income of households staying back).
3 Data and Preliminary Evidence
We compile a comprehensive county-level data file on local pollution and migration for the lower 48 states in North America. We supplement this data file with county-level demographic data. We provide descriptive statistics in Table 1. We have data for 3109 counties over a 5-year sample period, which gives us 15,545 county-year observations. 37% of counties are categorized as metropolitan counties and 42.5% are defined as urban counties.Footnote 6 Finally, about 7% of the counties are coastal counties (excluding the Great Lakes).
3.1 Migration Data
We gather county-to-county migration data from the IRS data files for the 3109 counties located in the contiguous US between 2010 and 2014. The IRS uses individual federal tax returns between two tax years and identifies migrants and non-migrants in a county.Footnote 7 A snapshot of this data, where the home county is Autauga in Alabama (State code = 1 and County code = 1), is provided in Table A.1 (Appendix A). For example, we observe that 10 households move to Shelby County in Tennessee from Autauga in Alabama and, on average, they have the highest level of income—they earn about 2.17 times more than non-migrants in Autauga County. By contrast, the average income of households moving to Clayton County in Georgia is 86% lower than the average income of non-migrants in Autauga.
As shown in Table 1, around 1800 households move out of a county every year and household net migration is slightly positive.Footnote 8 The average relative income is 0.717 indicating that the income of households moving from a home county h to a different county was on average 28% lower than the income of households staying in county h. In Fig. 1, we present the average net migration for all US counties between 2010 and 2014. Counties with positive net migration are blue and those with negative net migration are red. We observe that many households move to Florida, the suburbs of Dallas, Houston, and San Antonio/Austin areas, while many seem to move out of Northeast counties.
In terms of destination choices, 2968 counties sent households to at least one other county in the US in 2010, with an average of 25 different destinations per county. Finally, every year, the proportion of out-moving households relocating to a different state is about 30%.
The IRS dataset is considered as an attractive data source to conduct migration research in the US (Frey 2009; Molloy et al. 2011; Curtis et al. 2015). Because this dataset is based on administrative records, it is available annually and relatively comprehensive (it covers 95–98% of the US tax filers and their dependents). There is one main limitation of this dataset. Even though the overwhelming majority of householders file tax returns, some categories of the population are most likely to be underrepresented in the data, namely the undocumented populations, the elderly, and college students (Gross 2003; DeWaard et al. 2016). Hauer and Byars (2019) provide a comparison of the three main sources of migration data in the US (i.e., the Decennial Census long form, the American Community Survey, and the IRS data). They note that, despite the limitation in terms population coverage, the IRS dataset is the largest migration data source for count flows between counties (e.g., the ACS data contains about 2% of the observations in the IRS data) and is more appropriate for researchers interested in annual comparisons of migration patterns.Footnote 9
3.2 Pollution Measures
For each county, we construct two measures that capture different aspects of local environmental quality.
PM\(_{2.5}\) data \(\hbox{PM}_{2.5}\) is an ambient fine particulate matter with a diameter that is generally 2.5 micrometers and smaller. \(\hbox{PM}_{2.5}\) can remain airborne for long periods and travel hundreds of miles. Concentration in a given location consists of both locally emitted \(\hbox{PM}_{2.5}\) (due to industrial activity or traffic congestion), but also pollution released elsewhere that is transported by the wind. Particulate Matters are a widely used measure of local environmental quality (Deryugina et al. 2019; Chen et al. 2022; Greenstone and Hanna 2014). It is well established that ambient \(\hbox{PM}_{2.5}\) has adverse effects on the human respiratory system, especially for children, and increases the mortality risk.
This paper takes advantage of the availability of recent and fine-scale annual \(\hbox{PM}_{2.5}\) concentration estimates compiled by Meng et al. (2019).Footnote 10 The annual ambient \(\hbox{PM}_{2.5}\) concentration data is available at a \(0.01^{\circ }\) by \(0.01^{\circ }\) resolution. We therefore map the concentration estimates to US counties boundaries.Footnote 11 The average \(\hbox{PM}_{2.5}\) concentration at the county level is \(7.63\,{\upmu }\hbox{g}/\hbox{m}^3\) over our sample period.
TRI data The TRI is a US database established by law that requires private and government facilities to report annually their waste management and pollution prevention activities. The reporting requirements are detailed in section 313 of the EPCRA (Emergency Planning and Community Right to Know Act). A plant has to report to the TRI if that plant belongs to a North American Industry Classification System (NAICS) code, identified by the EPA’s TRI Program, and manufactures, processes or uses designated hazardous or toxic chemicals above a reporting threshold set by the EPA. In addition, these plants must have at least 10 full-time employees. The facilities subject to mandatory reporting are denoted as TRI reporting facilities or TRI reporters.
The TRI Program covers hundreds of chemicals that are known to pose a threat to human health and the environment, including lower birth weight or higher infant mortality rates (Currie and Schmieder 2009; Agarwal et al. 2010; Currie et al. 2015). The EPA website provides plant-level waste management information on quantities of toxic waste recycled, combusted for energy recovery, treated, released (to the air, water, and land) or otherwise disposed of, both on- and off-site for each chemical. The EPA has also developed a toxicity-weighted index, which gives the total plant-level environmental releases (on-site and off-site) across all media and all chemicals (in pounds).
We construct our TRI-related measure in two different ways: (1) the number of TRI reporting facilities in a county, and (2) county-level total on-site toxic releases.Footnote 12 Note that we exclude off-site releases as they correspond to toxic chemicals transferred to a receiving facility (for disposal), which may not necessarily be located in the same county. Given the self-reported nature of TRI releases, the number of TRI reporting facilities in a county is our preferred measure of pollution.
Even though preliminary data on plant-level waste management practices is available with a 1-year lag on the EPA website, the final TRI National Analysis is released by the EPA with a lag of 2 years. For example, the TRI data available to potential migrant households in 2010 provide information about toxic chemicals released in 2008. For this reason, we use the number of TRI reporting facilities and total toxic releases on-site lagged by 2 years in all our estimations. On average, there are about 7 TRI reporting facilities in a county and about 960,000 pounds of toxic releases (on-site and off-site) per county. 21.3% of counties, did not have any TRI reporting facility over the sample period (about half of them were metro or urban counties). These numbers are relatively stable overtime.
Compared to \(\hbox{PM}_{2.5}\), which can travel thousands of miles, the chemicals covered by the TRI Program are very local pollutants and a county might be too large to represent the population closely exposed to polluting activities.Footnote 13 However, our focus is not on the actual health impact of toxic chemicals but rather on household reactions’ to an increase in the number of polluting facilities in their area. The publication of TRI data might play a role in their migration decisions even if they are not directly affected by the presence of a TRI facility.Footnote 14
Since the first public announcement in June 1989, the TRI data has received continuous media attention. Hamilton (2005) shows that public concerns about toxic pollution can spillover to adjacent areas through media reports. Saha and Mohr (2013) document that many of the newspaper articles released shortly after the publication of EPA data reported toxic releases from the largest polluters in relatively large areas (States, Counties, or Metropolitan areas).
3.3 Other Factors Affecting Migration Decisions
Based on the determinants identified in the migration literature, we consider 3 categories of control variables (\(\Delta X_{d-h,t}\)) that affect county-to-county migration patterns and might also lead to a sorting by income: (1) factors related to income and employment opportunities, (2) other local amenities, and (3) social and demographic characteristics.
The significant expenses associated with changing residence, including moving costs, have been recognized as key factors influencing migration patterns within the migration literature. While the distance between counties serves as a standard measure of such costs, in our model, we capture this factor through the incorporation of county-pair fixed effects.
3.3.1 Income and Employment Opportunities
Previous research (Davies et al. 2001; Hatton and Tani 2005; Beine and Coulombe 2018) find that differences in earnings, local economic or labor market conditions are correlated with households’ decision to change residence. The Unemployment rate at the county level is compiled from the US Department of Commerce, Bureau of Economic Analysis, while the annual median household income comes from the Small Area Income and Poverty Estimates Program (US Census Bureau).Footnote 15 Over our sample period, the median household income is about $45,000, while the unemployment rate is about 8%.
We also collect data from the Quarterly Census of Employment and Wages (QCEW) provided by the Bureau of Labor Statistics (BLS). The county business patterns report the number of establishments and employment by industry. To account for the size of the economy in the county or the employment opportunities (without capturing activities related to TRI chemicals), we consider the number of non-TRI reporting establishments, which are all the establishments that are not in a NAICS code identified by the TRI Program. To address the concern that our measures of local pollution capture the industrial composition of a county, we control for the Number of manufacturing establishments, defined as the number of establishments in the NAICS codes 31–33.
3.3.2 Other Amenities and County Demographic Characteristics
Counties with desirable recreational amenities might be more attractive to moving households. Using the county business patterns from the BLS, we compute the county-level number of amenity establishments as the number of establishments in NAICS 71 (Arts, Entertainment, and Recreation), and NAICS 72 (Accommodation and Food Services). We account for Metro, urban, and rural moving patterns within a county pair using a dummy variable to identify movements from a Metro or urban county to a rural county.
As suggested by some studies (McCormick 1997; Hatton and Tani 2005), house prices could be an important driver of destination choices. House prices could also capitalize the value of local amenities. In our empirical analysis, we use the house price index from the Federal Housing Finance Agency, which captures the evolution of house prices within an area.Footnote 16 We expect areas with a higher index (i.e., that have experienced a larger increase in house prices relative to the base year) to attract relatively richer households.
Internal migration and destination choices are also affected by demographic characteristics, e.g., age and level of education (Baum-Snow and Hartley 2020), presence of minorities (Boustan 2010), or population density (Davies et al. 2001).Footnote 17 In our empirical specification, we include the following county-level demographic characteristics: college ratio (American Community Survey), estimated county population, median age and Black and Hispanic population ratios (US Census Bureau, Population Division).Footnote 18 Over our sample period, the average county has around 100,000 inhabitants, and the average Black and Hispanic ratios are both around 9%.
3.4 Preliminary Evidence of Disparities in Destination Choices
Figures 2, 3 and 4 illustrate the large inter-county variations in pollution over our sample period. Darker red indicates a higher level of pollution (\(\hbox{PM}_{2.5}\) concentration, number of TRI reporters normalized by each county’s total number of establishments or toxic releases), while non-polluting counties are in white. TRI reporters (and toxic releases) are present in counties all across the US (other than in the middle of the country). This is very different from Fig. 3, in which the highest \(\hbox{PM}_{2.5}\) concentrations are all located in the East part of the country. This suggests that the TRI and \(\hbox{PM}_{2.5}\) pollution measures capture different aspects of pollution. Over our sample period, the correlation between \(\hbox{PM}_{2.5}\) and the number of TRI reporting facilities is 0.120.
A prerequisite for identifying a sorting by income using migration data is that there is enough variation in out-migrant income and county-level pollution across destinations chosen by households from a given home county. Table 2 provides descriptive statistics by year for home counties with at least two destinations. In 2010, the average home county has a relative income range of 0.99. In other words, given our definition of relative income (see Sect. 2), the difference in income between the richest and the poorest out-migrant households is on average equivalent to the home county median income. For the average home county in 2010, the range of \(\hbox{PM}_{2.5}\) concentrations at destination is \(3.53\,{\upmu }\hbox{g}/\hbox{m}^3\). This corresponds to approximately 1/3 of the range of \(\hbox{PM}_{2.5}\) concentrations across all counties in 2010.
4 Instrumental Variables for \(\hbox{PM}_{2.5}\)
The evolution of air pollution levels over time is potentially endogenous due to the presence of omitted variables that could be correlated with both local air pollution and households’ decisions to relocate. To address this concern, an important body of literature (Chay and Greenstone 2005; Bento et al. 2015; Lang 2015; Isen et al. 2017) uses EPA non-attainment designations after the enactment of the CAAA as an instrument to measure changes in air pollution.
4.1 The Clean Air Act Amendments (CAAA)
In response to the adverse health impacts of consistently elevated concentrations of major air pollutants, the US Congress enacted the Clean Air Act of 1970. Major amendments were subsequently added in 1977 and 1990. A pivotal component of the 1970 Clean Air Act is the establishment of federal air quality standards, known as the National Ambient Air Quality Standards (NAAQS), for key air pollutants. This legislation mandates the EPA to designate each county as either in attainment or non-attainment status for each pollutant, contingent upon whether the relevant standard is exceeded.
A county’s designation as out of attainment has important implications due to stringent regulations imposed by the CAAA on polluting entities within non-attainment areas. Specifically, states and counties are obligated to devise a State Implementation Plan (SIP) for EPA approval, outlining strategies to mitigate pollution levels in non-attainment counties and establish plant-specific regulations for each significant pollution source.
In 1987, the EPA began regulating particulates less than 10 micrometers in diameter (PM\(_{10}\)), for which the negative health effects were deemed particularly severe.Footnote 19 Designation of non-attainment areas for PM\(_{10}\) occurred in 1990. In July 1997, the EPA promulgated new standards for PM\(_{2.5}\) specifically, and federal designation of attainment status began in 2005 (based on 2001 through 2003 air quality monitoring data), with a subsequent revision in 2006 and re-designation starting in 2009 (based on concentrations from 2006 to 2008). After consistently meeting the EPA’s standard for a pollutant for three consecutive years, a county can apply for designation as a “re-attainment county". However, if the county fails to maintain this standard at any point, it is re-designated as a non-attainment county.
4.2 Attainment Status as an Instrument
We focus on changes in county’s attainment designation for particulate matter emissions to identify exogenous changes in air pollution. We extract county-level attainment status data for the six key pollutants covered by the 1990 CAAA from the “Green Book Non-attainment Areas” available on the EPA’s website.Footnote 20 In 2009 (beginning of the re-designation following the revision of PM\(_{2.5}\) standards), 265 counties in our sample were in non-attainment status for PM\(_{2.5}\). 89 counties changed status over our sample period.
The model outlined in Sect. 2, i.e., Eq. (1), is based on disparities in characteristics among county pairs. Our primary instrument for the difference between destination and home county PM\(_{2.5}\) concentrations is therefore the divergence in attainment status for particulate matters (\(\Delta\)PM NonAttainment\(_{d-h,t}\) = PM NonAttainment\(_{d,t}\) - PM NonAttainment\(_{h,t}\)). Rather than employing a simple binary indicator capturing whether a county is in attainment or not, we adopt a more nuanced approach, reflecting the persistence of non-attainment status and its potentially heterogeneous impact on air quality improvements. For each county k in our sample (\(k=h,d\)), the variable PM NonAttainment\(_{k,t}\), spans values from 0 to 3. It is equal to zero if county k is designated in attainment in year t. Values from 1 to 3 indicate the frequency of non-attainment designation up to year t (based on the 1987, 1997 and 2006 standards).
Moreover, we consider an over-identified model by adding a second instrument, \(\Delta \ Attainment\ County_{d-h,t}\), which is a dummy variable indicating whether both counties h and d are designated as attainment counties for all the pollutants covered by the CAAA in year t. Formally, the first-stage regression in this two-stage least squares estimator is:
The other controls (\(\Delta X\)) are the same as in our baseline model (1). This formulation is consistent with Auffhammer et al. (2009) who examine how a county’s designation in year t affects PM\(_{10}\) concentrations in year t at different monitors located in this county. It reflects the fact that non-designations in year t are based on monitoring data from the past 3 years (see also Lang 2015).Footnote 21 In the second stage, we use the predicted differences in PM\(_{2.5}\) concentrations from Eq. (2) in place of the actual values in Eq. (1).
Differences in attainment status of destination and home counties serve as a valid instrument. Prior research confirms that county-level non-attainment status is correlated with subsequent air pollution levels within a county (Auffhammer et al. 2009; Lang 2015). Moreover, conditional on other observable home and destination counties characteristics, differences in non-attainment status affect migration decisions only through their impact on local air pollution. One issue would be if efforts aimed at reducing pollution to maintain attainment (e.g., limiting manufacturing activity) independently affect location choices. However, we believe that with our extensive set of control variables (including, for example, the number of manufacturing establishments, or the unemployment rate), we effectively control for such possible channels.
5 Results
Based on the conceptual framework outlined in Sect. 2, we explore the existence of a sorting by income across US counties by estimating how the relative income of households moving from county h to county d covaries with differences in environmental quality between home and destination counties (coefficient \(\gamma _1\) in Eq. (1)). For a given year, we can only compute the relative income of a county pair (h, d) if some households move from h to d. Therefore, we don’t have a balanced panel.
With our specification, we observe the destinations households selected, but we don’t observe their original choice set. To gain insights into the factors influencing households’ destination choices, we analyze, in Appendix B, the correlation between household migration flows and the characteristics of both the origin and destination counties. We find that disparities in local air pollution levels (\(\hbox{PM}_{2.5}\)) are strongly associated with the patterns of migration between counties. In other words, alongside employment opportunities (proxied by the difference in unemployment rates between origin and destination) and moving costs (proxied by the distance between two counties), environmental quality seems to play a role in households’ selection of destination. However, this analysis does not allow us to identify sorting by income, as households moving to different locations might have heterogeneous attributes.
In Table A.3, we provide summary statistics for the regression variables in Eq. (1). The average relative income for a county pair (h, d) is lower than 1, suggesting that households moving to a different county have a lower income than households staying back. At the same time, the difference in the number of TRI reporting plants between the destination county and home county is slightly positive. On average, the destination counties have more TRI reporting facilities than the home counties.
The estimation results of Eq. (1), with county-pair fixed effects and time effects, for all US counties are presented in Table 3.Footnote 22 In all specifications, we cluster standard errors at the county-pair level. For completeness, Columns 1 and 2 report the results from OLS specifications. Columns 3 and 4 show the results when \(\Delta\)Log of PM\(_{2.5}\) is instrumented using differences in counties attainment status (\(\Delta\)PM NonAttainment and \(\Delta\)Attainment County). For these specifications, first-stage results, the Hansen J-statistic and relevant F-statistics are reported. In Columns 1 and 3, the TRI-related measure is the number of TRI reporting facilities, while Columns 2 and 4 show the results when the TRI-related measure is the county-level total toxic releases on site.
For the variables \(\Delta\)Log number of TRI reporters\(_{d-h,t}\) and \(\Delta\)Log total releases on site\(_{d-h,t}\), a zero may have two different significations. First, it is possible that the destination and home counties’ TRI reporting plants or total releases are the same. Second, it is possible that both counties have no TRI reporting facilities or no toxic releases. To identify this condition, we include a dummy when both counties do not have any TRI reporters and a dummy when both counties do not have any toxic releases.
In all specifications, the coefficients of our pollution measures have the expected sign. When the county-level \(\hbox{PM}_{2.5}\) concentration (or the number of TRI reporting facilities) increases in the destination county compared to the home county, the relative income of households moving to that county decreases. In the IV specifications (Columns 3 and 4), both the TRI-related measures and PM\(_{2.5}\) levels are statistically significant at a level of 1%. Our results are therefore consistent with a sorting by income across destinations with different levels of environmental quality: households that move to “cleaner” counties are relatively “richer”. In Columns 3 and 4, first-stage results confirm that our instruments are highly correlated with \(\hbox{PM}_{2.5}\) concentration levels. Moreover, the F-Statistic suggests that weak instruments are not an issue, while the p-value associated with the Hansen J-statistic implies that we cannot reject that the choice of instruments are valid. It is also interesting to note that the IV estimates of the PM\(_{2.5}\) coefficients are much larger (in absolute terms) than OLS estimates.
Beyond these variables of particular interest to us, estimates in Table 3 reveal that households also sort across locations with different economic opportunities. Destinations where the median income and the level of economic activity (measured by the number of non-TRI establishments) are higher or the unemployment rate is lower (relative to the home county) attract on average relatively richer households.
Wealthier households also tend to prefer less populated areas (everything else being equal). As expected, there is a positive (but only statistically significant at a level of 10%) association between the relative household income of out-migrants and the difference in house price index, i.e., households moving to counties where property values have increased more (relative to the base year) are “richer”. Finally, differences in Hispanic population ratio between the home and destination counties are strongly correlated with the relative household income of out-migrants. Destination counties with a larger proportion of Hispanic residents (compared to the home county) attract relatively poorer households.
5.1 Robustness Checks
In this section, we explore the robustness of the relationship between differences in pollution measures and relative income of out-migrants. All the results are reported in Tables A.4 and A.5 and are obtained by estimating Eq. (1) using the IV estimator with county-pair fixed effects and time effects.
5.1.1 Within-State and Out-of-State Migration
The 2019 Current Population survey documents that the primary motivation for moving varies by type of movers. In particular, long-distance moves are primarily motivated by employment opportunities, while shorter-distance moves are mostly associated with housing-related reasons.
Table A.4 (panel A) reports the estimates of the main variables of interest for the within-state sample in column 2 (home and destination counties belong to the same state) and out-of-state samples in column 3 (column 1 shows the results for all contiguous counties and is the same as column 3 in Table 3). The results for the other explanatory variables are available upon request. The \(\hbox{PM}_{2.5}\) concentrations and TRI-related measures have the expected sign. However, the coefficient associated with the number of TRI reporters is statistically insignificant for interstate migration.
5.1.2 Personal Income
Next, we use relative individual income (instead of household income) as our dependent variable in Eq. (1). The personal income of out-migrants (respectively non-migrants) is obtained by dividing the aggregate income of out-migrants (respectively non-migrants) by the number of individuals moving out (respectively staying back). The sign and magnitude of the coefficients associated with differences in PM\(_{2.5}\) concentrations (Table A.4, panel B) are similar to those obtained using relative household income in Table 3 and panel A of Table A.4.
5.1.3 Sub-samples of Counties
To address the concern that counties are relatively large areas in which pollution or income might vary significantly, we estimate Eq. (1) for different sub-samples of our data. First, we restrict our sample to (1) county-to-county migration within metro counties (which are typically smaller), (2) county-to-county migration within urban and metro counties, or (3) county-to-county migration from urban/metro counties to any type of county. Results are reported in Table A.5 (columns 1–3). Second, we exclude from our sample the 10% largest counties in terms of land area (column 4). Finally, in columns 5 and 6, we split our sample of counties into two groups based on their population density.Footnote 23 In sparsely populated counties, residential options are likely limited to a few small towns. As a result, individuals may have fewer choices about where to live to avoid pollution. In contrast, densely populated counties offer households greater flexibility to relocate within the county. Our results for \(\hbox{PM}_{2.5}\) remain qualitatively the same for all sub-samples of counties.
5.1.4 Crime Rate
The crime rate is another type of amenity known to affect migration decisions, we do not include this variable in our main analysis as the FBI crime rate in per capita terms is missing for around one third of our observations, mostly rural counties. Note that our results (available upon request) remain qualitatively unchanged when we estimate our model with crime rate as an additional control.
6 Conclusion
This study introduces a novel method for examining the presence of income-based sorting across locations with diverse environmental qualities. Our empirical approach integrates detailed county-to-county IRS migration data, fine-scale \(\hbox{PM}_{2.5}\) concentrations data, and TRI data to explore the relationship between pollution levels and the migration patterns of US households from 2010 to 2014. Our findings suggest that destination counties with lower pollution levels than the migrants’ home counties attract wealthier households. From the perspective of Environmental Justice, this outcome is consistent with households self-selecting based on income across areas with different levels of environmental quality. Furthermore, our research contributes to the existing literature on internal migration by emphasizing how socioeconomic characteristics shape households’ responses to differences in destination attributes.
Given the growing political attention to inequalities in pollution exposure, our findings hold relevance for informing environmental policies. In particular, our results illustrate that addressing environmental justice issues requires not only refining air-quality standards and TRI reporting, but also addressing income disparities, as households “vote with their feet”.
However, this study has its limitations. Firstly, beyond income, a fundamental issue underlying the Environmental Justice movement is the correlation between race and pollution exposure. It would be interesting to apply the methodology proposed in this paper to investigate how differences in local environmental quality influence the racial/ethnic composition (as defined and collected by the US Census) of migration flows between different counties. Unfortunately, the lack of such information in the IRS data does not allow us to perform this analysis.
A second potential limitation of our study is the necessity to aggregate pollution data at the county level due to the availability of IRS data only at that level. As highlighted by Banzhaf et al. (2019), this aggregation may hide within-county variations in pollution exposure and give rise to the “ecological fallacy". However, these authors note that the ecological fallacy generally tends to mask environmental injustices in coarser data, suggesting that our results could be interpreted as conservative lower bounds. An ideal future extension would involve combining county-level migration data with detailed individual data to better capture household-specific and within-county pollution variations.
Finally, in our sample, we only observe household income when households relocate from one location to another. The original choice set remains unobserved. While the destination choice model in Appendix B partially addresses this concern, fully characterizing and modeling the complete choice set would extend beyond the scope of this paper. Despite these limitations, our work underscores the importance of considering socioeconomic factors in addressing environmental inequalities.
Notes
In addition, local newspapers regularly report on the toxic waste releases of local polluting firms (Campa 2018).
A few studies use direct measures of internal migration in other countries than the US. Hatton and Tani (2005) derive a measure of internal migration from the records of the British National Health Services, while Beine and Coulombe (2018) use internal migration data derived by Statistics Canada from income tax reports. Both studies focus on the impact of immigration on internal migration.
Aydemir and Duman (2021) examine how the role of migrant networks at destination differs across migrant types (in terms of skills, age at migration, and reason of migration), while Baum-Snow and Hartley (2020) document that the trade-offs between local amenities and economic opportunities vary substantially across households with different socioeconomic characteristics.
Using a two-community example, they demonstrate that improvements in the environmental quality of the most polluted area give rise to in-migration from the poorest households of the other community, leading to an increase in average income in both communities. Only large improvements (i.e., improvements that reverse the ordering of communities in terms of local pollution) produce non-ambiguous predictions.
To define metropolitan (metro), urban and rural counties, we use the 2013 rural–urban continuum codes provided by the US Department of Agriculture’s Economic Research Service. Counties are divided as either metro or non-metro counties (2013 Office of Management and Budget). Metro counties are then distinguished by the population size of their metro area, and non-metropolitan (non-metro) counties by their degree of urbanization and adjacency to a metro area.
Note that the income reported in this dataset reflects the income earned during the year preceding the filing year. Depending on the exact date at which a household moved from one county to the other, this income may include income earned both in the home and destination county.
Net migration in a county in a given year is defined as the difference between the total number of households that move in that county from all the other counties (in-migration) and the total number of households that move out (out-migration).
Hauer and Byars (2019) explain that this data exists in multiple files making its analysis rather cumbersome. This has probably hindered the widespread adoption of this valuable resource for US migration scholarship.
These estimates were obtained from chemical transport modeling, satellite remote sensing, and ground-based measurements.
Note that these estimates of \(\hbox{PM}_{2.5}\) concentration provide a better measure of the county-level concentrations than air pollution data from the EPA’s air quality System database. This database provides hourly data at the pollution-monitor level for pollutants that are regulated by the Clean Air Act. A county’s pollution-monitor readings may not adequately measure the average pollution exposure for county residents due to the sparse placement of monitors within counties (Deryugina et al. 2019).
Not all TRI reporting plants necessarily report chemical releases, as they might be managing their toxic waste in a different way. We provide an example of TRI reporters and total releases in Table A.2. The facilities with TRI plant IDs 36067NNCMP100JE and 3606WSHRMN74CUN reported releases for all the years shown in the sample, while the plant with TRI plant ID 36066TWBCL17JES did not have any releases in 2010 and 2011, but still reported to the TRI Program. In our dataset, they all appear as TRI reporting facilities for the entire sample period.
Using individual level data, Currie et al. (2015) show that the openings or closings of toxic plants (i.e., plants reporting a release to the TRI Program) have an impact on birth outcomes within a 1-mile radius of the plant location. Other studies (Currie and Schmieder 2009; Agarwal et al. 2010) identify health effects of TRI chemicals at the county level.
Some studies using hedonic methods (Currie et al. 2015; Mastromonaco 2015) have found that the information provided by the TRI affects the housing market within one or two miles of the plant location only, but Bayer et al. (2009) show that costs associated with re-location might cause hedonic methods (which typically assume that individuals are free to move) to underestimate households willingness-to-pay for air quality.
The SAIPE are available for all counties, starting in 2011. For 2009 and 2010, we linearly extrapolate the county median income based on the data from 2011 to 2014.
We use 2008 as a base year.
Davies et al. (2001) document that in the context of state-to-state migration, individuals are more likely to move to more populated areas and provide two interpretations for this result: (1) these areas are perceived as offering better economic opportunities, and (2) search costs are lower as information about available opportunities in those areas is usually more easily accessible.
The college ratio is defined as the proportion of the population (older than 25 years old) with a college degree. The Black (respectively Hispanic) population ratio is defined as the Black (respectively Hispanic) population divided by the total county population. The Black (respectively Hispanic) population are the individuals who indicated “Black or African American alone” (respectively “Hispanic alone”) as the response to the question “What is this person’s race?” in the US Census.
The 1970 Clean Air Act authorized the EPA to enforce a NAAQS for total suspended particles (particles less than 100 \(\upmu \hbox{m}\) in diameter).
The legal framework permits counties to be categorized as “partial attainment”, wherein only specific geographic regions within the county are designated as non-attainment. For analytical purposes, we treat these partially attaining counties as non-attainment counties, as they remain subject to regulatory measures.
We estimate Eq. (1) for various subsets of our control variables and results (available upon request) remain qualitatively unchanged.
We rank all the counties according to their population density in 2010 (first year of our sample) and counties in the top decile are categorized as high density counties (column 6 of Table A.5). The results for the other lower-density counties are shown in column 5.
The PPML estimator corrects for both under- and over-dispersion. Gourieroux et al. (1984) showed that PPML is consistent and an asymptotically normal estimator can be obtained without specifying the probability density function of disturbances representing a specification error in the parameter of the Poisson distribution. Further, Silva and Tenreyro (2011) showed that, even if the conditional variance is not proportional to the conditional mean, the PPML will still be consistent and is generally well-behaved when the proportion of zeros is large.
99.21% of our 48,313,860 observations are zeros.
We use the number of Superfund sites on the National Priority List (NPL), provided by the EPA. The NPL is the list of sites of national priority among the known releases or threatened releases of hazardous substances, pollutants, or contaminants throughout the United States and its territories. As for the TRI-related measures, we include a dummy when both counties do not have any Superfund sites.
We re-estimate our models with different subsets of control variables and the main results (available upon request) are qualitatively the same.
References
Agarwal N, Banternghansa C, Bui L T (2010) Toxic exposure in America: estimating fetal and infant health outcomes from 14 years of TRI reporting. J Health Econ 29(4):557–574
Auffhammer M, Bento A M, Lowe S E (2009) Measuring the effects of the clean air act amendments on ambient PM10 concentrations: the critical importance of a spatially disaggregated analysis. J Environ Econ Manag 58(1):15–26
Aydemir A, Duman E (2021) Migrant networks and destination choice: evidence from moves across Turkish provinces. IZA discussion paper
Banzhaf H S, Walsh R P (2008) Do people vote with their feet? An empirical test of Tiebout. Am Econ Rev 98(3):843–63
Banzhaf S, Ma L, Timmins C (2019) Environmental justice: the economics of race, place, and pollution. J Econ Perspect 33(1):185–208
Baum-Snow N, Hartley D (2020) Accounting for central neighborhood change, 1980–2010. J Urban Econ 117:103228
Bayer P, Keohane N, Timmins C (2009) Migration and hedonic valuation: the case of air quality. J Environ Econ Manag 58(1):1–14
Been V, Gupta F (1997) Coming to the nuisance or going to the barrios—a longitudinal analysis of environmental justice claims. Ecol Law Quart 24(1):1–56
Beine M, Coulombe S (2018) Immigration and internal mobility in Canada. J Popul Econ 31(1):69–106
Bento A, Freedman M, Lang C (2015) Who benefits from environmental regulation? Evidence from the Clean Air Act Amendments. Rev Econ Stat 97(3):610–622
Boustan L P (2010) Was postwar suburbanization “white flight”? Evidence from the Black migration. Q J Econ 125(1):417–443
Cameron T A, McConnaha I T (2006) Evidence of environmental migration. Land Econ 82(2):273–290
Campa P (2018) Press and leaks: Do newspapers reduce toxic emissions? J Environ Econ Manag 91:184–202
Chay K Y, Greenstone M (2005) Does air quality matter? Evidence from the housing market. J Polit Econ 113(2):376–424
Chen S, Oliva P, Zhang P (2022) The effect of air pollution on migration: evidence from China. J Dev Econ 156:102833
Close B T, Phaneuf D J (2017) Valuation of local public goods: migration as revealed preference for place
Colmer J, Hardman I, Shimshack J, Voorheis J (2020) Disparities in PM2.5 air pollution in the United States. Science 369(6503), 575–578
Currie J, Schmieder J F (2009) Fetal exposures to toxic releases and infant health. Am Econ Rev 99(2):177–183
Currie J, Davis L, Greenstone M, Walker R (2015) Environmental health risks and housing values: evidence from 1600 toxic plant openings and closings. Am Econ Rev 105(2):678–709
Curtis K J, Fussell E, DeWaard J (2015) Recovery migration after Hurricanes Katrina and Rita: spatial concentration and intensification in the migration system. Demography 52(4):1269–1293
Davies PS, Greenwood MJ, Li H (2001) A conditional logit approach to US state-to-state migration. J Reg Sci 41(2):337–360
De Silva DG, Hubbard TP, Schiller AR (2016) Entry and exit patterns of “Toxic’’ firms. Am J Agric Econ 98(3):881–909
De Silva DG, McComb RP, Schiller, AR, Slechten A (2021) Firm behavior and pollution in small geographies. Eur Econ Rev 103742
Depro B, Timmins C, O’Neil M (2015) White flight and coming to the nuisance: Can residential mobility explain environmental injustice? J Assoc Environ Resour Econ 2(3):439–468
Deryugina T, Heutel G, Miller NH, Molitor D, Reif J (2019) The mortality and medical costs of air pollution: evidence from changes in wind direction. Am Econ Rev 109(12):4178–4219
DeWaard J, Curtis KJ, Fussell E (2016) Population recovery in New Orleans after Hurricane Katrina: exploring the potential role of stage migration in migration systems. Popul Environ 37(4):449–463
Freeman R, Liang W, Song R, Timmins C (2019) Willingness to pay for clean air in China. J Environ Econ Manag 94:188–216
Frey W (2009) The great American migration slowdown. Brookings Institution, Washington, DC
Gourieroux C, Monfort A, Trognon A (1984) Pseudo maximum likelihood methods: theory. Econometrica 53(2):681–700
Greenstone M, Hanna R (2014) Environmental regulations, air and water pollution, and infant mortality in India. Am Econ Rev 104(10):3038–72
Gross E (2003) US population migration data: strengths and limitations. Internal Revenue Service Statistics of Income Division, Washington, DC. http://www.irs.gov/pub/irs-soi/99gross_update.doc
Hamilton J (2005) Regulation through revelation: the origin, politics, and impacts of the Toxics Release Inventory Program. Cambridge University Press, Cambridge
Hatton T J, Tani M (2005) Immigration and inter-regional mobility in the UK. Econ J 115(507):342–358
Hauer M, Byars J (2019) IRS county-to-county migration data, 1990–2010. Demogr Res 40:1153–1166
Isen A, Rossin-Slater M, Walker W R (2017) Every breath you take-every dollar you’ll make: the long-term consequences of the Clean Air Act of 1970. J Polit Econ 125(3):848–902
Jbaily A, Zhou X, Liu J, Lee T-H, Kamareddine L, Verguet S, Dominici F (2022) Air pollution exposure disparities across US population and income groups. Nature 601(7892):228–233
Kahn M E (2000) Smog reduction’s impact on California county growth. J Reg Sci 40(3):565–582
Lang C (2015) The dynamics of house price responsiveness and locational sorting: evidence from air quality changes. Reg Sci Urban Econ 52:71–82
Mastromonaco R (2015) Do environmental right-to-know laws affect markets? Capitalization of information in the Toxic Release Inventory. J Environ Econ Manag 71:54–70
McCormick B (1997) Regional unemployment and labour mobility in the UK. Eur Econ Rev 41(3–5):581–589
McFadden D (1973) Conditional logit analysis of quantitative choice behavior. In: P. Zarembka. ed., Frontiers of Econometrics, Academic Press, New York
Meng J, Li C, Martin R V, van Donkelaar A, Hystad P, Brauer M (2019) Estimated long-term (1981–2016) concentrations of ambient fine particulate matter across North America from hemical transport modeling, satellite remote sensing, and ground-based measurements. Environ Sci Technol 53(9):5071–5079
Molloy R, Smith C L, Wozniak A (2011) Internal migration in the United States. J Econ Perspect 25(3):173–96
Rosenthal S S, Strange W C (2004) Evidence on the nature and sources of agglomeration economies. In: Handbook of regional and urban economics, vol 4. Elsevier, 2119–2171
Saha S, Mohr RD (2013) Media attention and the Toxics Release Inventory. Ecol Econ 93:284–291
Silva J S, Tenreyro S (2011) Poisson: some convergence issues. Stand Genomic Sci 11(2):207–212
Tiebout C (1956) A pure theory of local expenditures. J Polit Econ 64(5):416–424
Acknowledgements
We thank George Deltas, seminar audiences from Lancaster University and KU Leuven, and participants at the 50th Annual Conference of EARIE, the 2022 Annual European Association of Environmental and Resource Economists Conference, the 48th annual Regional Science Association International-British and Irish Section conference, and the 2023 Regional Studies Association annual conference for comments and suggestions. We also thank the editor and two anonymous referees for helpful comments that significantly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no relevant material or financial interests that relate to the research described in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
See the Tables A.1, A.2, A.3, A.4 and A.5.
Appendix B: Households Location Decisions
In this section, we investigate how US households choose where to move given the trade-offs between moving costs, job opportunities, and environmental and other amenities.
1.1 Net Migration Patterns
The structure of the IRS data allows us to calculate the net migration for a given county for a given year. Net migration in a county in a given year is defined as the difference between the total number of households that move in that county from all the other counties (in-migration) and the total number of households that move out (out-migration). A snapshot of the dataset used to estimate the net migration model is presented in Table B.1. If we consider the same example as in the main text (Autauga County in Alabama), we obtain that, for 2013, net migration is positive while, for 2014, it is negative.
We explore how the year-to-year changes in net migration covary with lagged pollution measures and demographic characteristics in the home county h:
We denote \(\textbf{E}\) as the vector containing our two pollution measures at the county level. \(\textbf{X}\) are county demographics. \(\alpha\) and \(\tau\) are county/state and time fixed effects. \(\epsilon\) is the error term. We estimate this empirical model using a simple linear regression, where PM NonAttainment and Attainment County are used to instrument PM\(_{2.5}\) concentration levels in the home county. We have 3109 counties over the period 2010–2014.
We present our county-level net migration results in Table B.2. In columns 1 and 3, we use the log number of TRI reporters, while in columns 2 and 4, we use the log of total toxic releases (in pounds). Columns 1 and 2 include state and time fixed effects, while columns 3 and 4 show the estimation results with county and time fixed effects.
In column (1), the coefficients associated with the number of TRI reporters and PM\(_{2.5}\) concentration in a county are negative and statistically significant. Once we allow for county fixed effects, the coefficients of all our pollution measures become insignificant. However, the IV estimates with county fixed effects have to be interpreted with caution as the first-stage F-statistics is significantly lower than the ones in columns (1) and (2), with state fixed effects. Finally, a higher unemployment rate in county h is associated with a lower net migration in county h.
Using net migration patterns and changes in local environmental quality does not allow us to identify a sorting by income across US counties. Net migration can be zero or minimal if a similar number of people move into and out of a county, but the households moving into or out of the county may have very different characteristics and come from counties with very different levels of pollution (see Depro et al. 2015). To address this issue, we formulate a destination choice model that exploits information regarding both the origin and destination counties, enabling us to explore internal migration patterns within the US.
1.2 Destination Choice Model
We consider the destination choice of a household i, located in county h, who has to decide where to move among N possible destination counties. In our model, we assume that a household migration decision is based on the comparison of home and destination attributes (defined in section 3 of the paper). We express the indirect utility from destination d for household i as follows:
where \(\Delta E_{d-h,t}\) contains our measures of environmental quality, while \(\Delta X_{d-h,t}\) includes other characteristics of county d (relative to county h) that might attract migrants (e.g., differences in unemployment rates, wages, number of amenity establishments, house price index). \(D_{dh}\) includes the proxy for moving costs. The disturbance term \(\epsilon _{i,d,h,t}\) is independent and identically distributed.
In a utility-maximization framework, household i will choose to move to county d at time t if their indirect utility associated with moving to county d is larger than the indirect utility they could obtain by moving to any other county k, i.e., \(V_{i,d,h,t}\ge V_{i,k,h,t}\,\text {for all}\,k\ne d\), and \(d,k \in \{1,\ldots ,N\}\). In order to have closed form expressions for a household’s choice probabilities, we assume that \(\epsilon _{i,d,t}\) follows a Type 1 extreme value distribution. We also assume that each household knows their private costs and expected utility. This asymmetric information assumption enables us to convert the discrete actions of households into continuous location choice probabilities. Using the results of McFadden (1973), we model household i’s choice of destination location (d - county) using a conditional Logit model:
where \(m_{i,d,h,t}\) equals 1 if a household i, living in county h chooses location d and 0 otherwise. By modeling a household migration choice as a function of the difference between home and destination characteristics, we assume that households moving to the same destination county but from different home counties are treated differently. In that respect, the home county characteristics can be interpreted as the average characteristics of households from county h.
Our summary statistics in Table 1 indicate that about 1800 households move from one county to another each year. Given that each household has 3108 different counties to choose from, we have, on average, about 5.6 million observations for a given county in a given year. As we consider the 3109 counties in the lower 48 states, the number of observations is about 17.4 billion for a given year or 87 billion for the whole sample period (5 years). With the large number of observations in this dataset, estimating the location choice model using the conditional Logit technique becomes computationally intensive and challenging. We overcome this issue by adopting an approach similar to the one used by Rosenthal and Strange (2004), De Silva et al. (2016) and De Silva et al. (2021) when studying firm location decisions. Due to the volume of data, they aggregate firm entry by location to obtain the number of firms entering each area in a given year, and they estimate their models using a Poisson pseudo-maximum likelihood (PPML) estimator, with year fixed effects. De Silva et al. (2016) also show that, when studying entry decisions of firms reporting releases to the TRI, the results (in terms of sign and significance of the coefficients) of the conditional Logit and PPML estimations are very similar.
We therefore aggregate household movements from a home county to a destination county. For each of the 3109 counties in a given year, there will be 3108 choices of destination counties. This gives us 9,662,772 (\(3109 \times 3108\)) county-to-county observations per year. Since we have 5 years, our aggregated sample size is 48,313,860 observations. Let \(H_{dht}\) denote the total number of households moving from county h to county d at time t. We estimate the following specification using the PPML methodFootnote 24:
where \(\tau _t\) is a time effect. \(D_{dh}\) includes the proxy for moving costs. We use geographical distance, as a proxy for the costs associated with moving from one county to another. We define geographical distance between two counties as the distance between the home and destination counties centroids. As moving to a different state is associated with additional costs (e.g., transferring a vehicle insurance and registration documentation, applying for a new driving license...), we include a dummy variable Out of State that takes the value 1 for interstate migration.
In this estimation, we maintain the integrity of the original conditional Logit setup. Initially, our interest lies in analyzing the cross-sectional variation of location choices within a given year, while accounting for household unobservable heterogeneity through household fixed effects. Given that households tend to move infrequently, each household is likely to feature only once during our 5-year period. Consequently, a conditional Logit model is estimated by incorporating fixed effects by year exclusively. Similarly, in our final PPML specification, we introduce time fixed effects to capture cross-sectional variations across destination counties.
Note that we are unable to use our IV approach or incorporate county-pair fixed effects into equation (4). Despite having 48 million observations, the variable \(H_{dh,t}\) is strictly positive for only 0.79% of our dataset, resulting in estimation issues.Footnote 25 With only five observations per county pair and a substantial number of county pairs exhibiting zeros over the 5-year period, there is insufficient temporal variation within county pairs.
As our models do not allow for county fixed effects, we are able to include an additional measure of environmental quality: the number of listed Superfund sites within a county in a given year.Footnote 26 We also account for Metro, urban, and rural moving patterns using dummies to identify households moving within Metro counties, urban to Metro counties, Metro to urban counties, rural to Metro counties, Metro to rural counties, urban to rural counties. Further we control for moving patterns among coastal and non-coastal counties using dummies to identify households moving from coastal to coastal counties, to a coastal county from a non-coastal county, and to a non-coastal county from a coastal county.
Table B.3 summarizes our results for the destination choice model estimated for all contiguous US states. Specifications differ in terms of the pollution measures included in the regressions. The coefficients of two of our pollution measures (i.e., \(\hbox{PM}_{2.5}\) and Superfund sites) have the expected sign and are statistically significant. When the concentration of \(\hbox{PM}_{2.5}\) or the number of Superfund sites at destination increases, compared to the home levels, less households will migrate to that county. The parameter estimated in column (3) shows that a 1% increase in the difference of \(\hbox{PM}_{2.5}\) concentrations between home and destination counties is associated with an 11% decrease in the number of households moving between these home and destination counties. The coefficient of our TRI-related pollution measures are not significant, but results in columns (1), (3) and (4) show that households are about twice less likely to move between counties that do not have any TRI reporters (or toxic releases), than between county pairs where one of the counties has at least one TRI reporting facility.
Households tend to avoid counties with higher unemployment rates. In addition, counties with more manufacturing establishments (relative to the home county) appear to be less attractive. Among the amenity variables included in the model, the coefficient of the house price index is statistically significant (at 10% level): households are more likely to move to places where house prices have increased less relative to the base year. In terms of demographic characteristics, our estimates reveal that if the proportion of individuals with a college degree in the home county increases compared to the proportion in the destination county (i.e., \(\Delta\) College ratio\(_{d-h,t-1}\) decreases), we observe more households moving from county h to county d. This is consistent with prior evidence that more-educated individuals are more likely to move.
Finally, moving costs are strongly associated with household decisions to move to a particular destination county. Households do not seem to move out of state and, they don’t move too far as the coefficient of the distance variable is negative.Footnote 27
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
De Silva, D.G., Schiller, A.R., Slechten, A. et al. Tiebout Sorting and Toxic Releases. Environ Resource Econ 87, 2487–2520 (2024). https://doi.org/10.1007/s10640-024-00893-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10640-024-00893-8