1 Introduction

The direct effects of regional development interventions may be amplified by positive spillovers or offset by negative spillovers. Yet these spillovers are often ignored in applied studies by popular impact evaluation frameworks that treat the spatial units as independent of surrounding areas. An emerging set of studies finds that spillovers should be considered. For example, De Castris et al (2023) examine whether European cohesion policy targeted at less developed areas increases subsequent local economic growth (at the NUTS-3 level); allowing for spillovers gives total (positive) impacts about one-fifth larger than direct effects. Gambina and Mazzola (2023) find a similar result for Italian provinces, and national cohesion projects have even larger indirect effects. Spillovers also occur at finer spatial scale. For example, Qiu and Tong (2021) study effects of Light Rail Transit (LRT) on housing prices in Edmonton, Canada. A spatial difference-in-differences (SDID) model yields a negative direct effect of C$20,000 lower prices for homes within 2 km of a LRT station, which is exacerbated by a further C$5000 fall from indirect effects that result from property price spillovers. A model that ignores spillovers yields estimated treatment effects that are only two-thirds as large.

A particularly salient place to study these issues is China, which uses many regional development interventions, and whose pace of growth means that samples of just a couple of decades can observe changes that may have taken far longer to unfold in other places. Under China’s hierarchical governance, these interventions often involve altering administrative ranks. While cities elsewhere may arise organically in a bottom-up way from agglomeration forces, under China’s top-down approach only the central government may bestow city status on a particular area. Therefore, an ongoing aspect of urbanization in China is administrative conversions, with over 250 counties upgraded to city status (as either districts or county-level cities) in the last two decades, which is more than ten percent of all counties (Fig. 1).Footnote 1 Potentially important impacts follow from this upgrading because cities have larger quotas for converting land from agricultural use to urban use, can levy higher taxes on urban construction and keep a larger share of revenue from land sales, have higher local public sector employment and generally more prominence for investment.

Fig. 1
figure 1

Cumulative number of counties upgraded: 2001–2019

Table 1 Potential benefits for a county in getting upgraded to city or district status

In light of the importance of this particular type of regional development intervention, a growing empirical literature uses econometric models to evaluate impacts of administrative upgrading. These studies use panel data to test if various indicators of economic activity, such as GDP or night-time lights, grow faster in the upgraded counties compared to their peers that were not upgraded. This literature has yielded mixed findings; an early study found counties upgraded from 1994 to 1997 performed no better than those that stayed as counties, for an evaluation period ending in 2004 (Fan et al 2012). A more recent study of the same episode found positive impacts becoming apparent after 2004, especially in more densely populated eastern parts of China where initial agglomeration forces were stronger (Tang 2021).Footnote 2

There are several empirical challenges confronting these studies and the one that we examine concerns potential spillovers. These could occur if mobile resources, such as labour and capital, move from neighbouring counties into newly upgraded cities, causing a negative effect (Chen and Partridge 2013). This matters not just distributionally but also empirically; if comparison groups are subject to a negative spillover it can exaggerate apparent treatment effects if the spillover effect is ignored (Tang 2021). Typically, the studies in this literature use difference-in-differences frameworks that generally rely on a stable unit treatment value assumption (SUTVA) of independence between the spatial units (Delgado and Florax 2015).

The spillovers could also be positive if upgraded counties become growth poles after relaxation of their land-related constraints (Lichtenberg and Ding 2009), allowing economic activity in hinterlands to also flourish. County-to-district upgrades often lower transport costs from newer, better, infrastructure that has network efficiencies, improving market access and firm productivity (Tang and Hewings 2017). Indeed, classic development theories emphasize a role for positive spillovers when explaining the variation in economic performance; these may be between industrial sectors (Rosenstein-Rodan 1943), through demand-side linkages (Park and Johnston 1995) but also through geographic externalities (Ravallion 2005).

Despite the potential importance of spillovers for understanding effects of the county upgrading process in China the literature to date has mainly ignored them or else used only partial or informal approaches. For example, Tang (2021) drops counties that are closer than 50 kms to upgraded counties (amounting to 16% of county-year observations), in case the nearby counties had been subject to some spillover from the treated units but he otherwise uses an econometric framework that assumes that spatial units are independent. Along the same lines, Zeng and You (2022) drop 46%, 57% and 66% of their county-year observations based on removing those within bands of 50 km, 60 km, and 70 km from upgraded counties in case of spatial spillovers in their outcome measure. The particular distance thresholds do not have an underlying theoretical basis, such as a spatial econometric model that could allow for both local and global spillovers. Relatedly, Bo (2020) includes the treatment status of contiguous neighbours in his prefectural-level study, and also includes the treatment status of same-province prefectures as a covariate. However, as discussed below, only local spillovers will have their effects observed with this specification, which is effectively a spatial lag of the covariate; if there also are global spillovers these will operate through the spatial lag of the outcome variable, and so these will be missed with this more restricted framework.

Given this limited and only partial evidence on the role of spillovers, the current study applies spatial econometric models to a 20-year panel of almost 2500 county-level units to examine effects of converting China’s counties to cities (either as county-level cities or as districts). We use GDP and night-time lights as our indicators of local economic activity. We start with a very general spatial autoregressive model with spatial autoregressive errors (SARAR) that encompasses some popular models, such as the spatial Durbin model, spatial lag model, and spatial error model. In this particular setting and for the period we study, the spatial lags appear to be relevant for the outcomes, the covariates and the errors. The positive direct effects on economic activity of a county being upgraded are amplified through positive indirect effects. The indirect spillovers are especially apparent in the eastern regions of China where economic activity and population are more densely concentrated. The models without spatial lags give estimated effects of county upgrading that are only two-fifths to two-thirds as large as the estimated total effects coming from the spatial models. Given that models without spatial lags have been almost exclusively what is used to date, these findings suggest that the literature studying county upgrading in China might benefit from more widespread use of spatial models that explicitly allow spillover effects to be studied. More generally, our example adds to the growing list showing the importance of allowing for spatial spillovers.

In the next section we provide details on China’s administrative hierarchy, describe some pathways through which counties can be upgraded, and show that prior studies in this literature have not used spatial econometric models. Our data and methods are discussed in Section III. Section IV has the results, using both county-level GDP and night-time lights as indicators of economic activity. Section V has the discussion and conclusions.

2 Background and context

2.1 China’s administrative hierarchy

In China’s administrative hierarchy four levels of local government exist: province, prefecture, county, and town. The central government can bestow city status at any of the first three levels (Fan et al 2012). For example, four cities directly under the central government—Beijing, Tianjin, Shanghai and Chongqing—have equivalent status to provinces. Notably, even though they are designated as cities, this is an administrative rather than a functional characterization: Chongqing, for example, is as large as the country of Austria and has third-level units (districts and counties) whose population density ranges from 26,000 people per square kilometer (Yuzhong district, which is the main business area) to as low as 60 people per square kilometer (Chengkou county). Likewise, the second level units designated as prefectural-level cities (which have a rising share over time, as shown below) consist not only of core urban areas, whose subsidiary spatial units are districts, but also may include counties that are largely rural, and county-level cities that have some modest urban development.

The variation in population density of the three main types of third-level spatial units illustrates their differing degree of urbanity, even though they all have the same de jure rank. In the 2020 census, the median population density of districts was 930 residents per km2, the median for county-level cities was 345 persons per km2 and the median density for counties was 135 persons per km2. In other words, county-level cities are almost three times more densely populated than counties, and districts are almost three times more densely populated than county-level cities.Footnote 3

In addition to density, the important differences between the three types of third-level spatial units relate to administrative power, fiscal benefits, land-related issues, reputation and the priority they may receive in the plans of decision makers at higher-levels (Table 1). For example, more land is available for urban construction and higher taxes can be imposed on this activity once a county becomes a county-level city (Lichtenberg and Ding 2009; Fan et al 2012) while conversion to district status also reduces land-related hurdles to the expansion of urban built-up area (Ma 2005). China has a complex system of inter-governmental fiscal transfers (Tan and Tan 2024), and transfers from prefectural-level governments to districts generally exceed those going to counties (Lu and Wang 2023). There are also more subtle benefits of being a district or county-level city that include factors related to prestige and policy priority (Chung and Lam 2004; Shen 2007; Deng et al 2022).

2.2 Pathways to upgrading

There are several way for a county to be upgraded to become either a county-level city or a district. Some counties adjacent to existing big cities are converted to districts to aid expansion of the existing urban area and relieve pressure on higher density districts nearer to the central city. For example, between 2001 and 2015 five counties on the outskirts of Beijing were converted to districts and one of these (Daxing) subsequently was chosen for Beijing’s second international airport. In the neighbouring Tongzhou District, which was a county until 1997, the new Beijing Municipal Administrative Center is being developed by transferring many of the city government offices out of the central city, along with movements of the headquarters of some centrally-administered state-owned enterprises.

In less urbanized regions of China, a driving force for the conversion of counties into districts has been the growing territorial coverage of prefectural-level cities (given that there must be at least one district per prefectural-level city). At the time of the 2000 census, spatial units at the second subnational level that were organized in the form of prefectural-level cities covered territory that contained just under 87% of China’s resident population (Fig. 2, top panel). By the time of the census in 2020, 94% of China’s population resided in the areas organized as prefectural-level cities, with expansion especially in southwest and northern China (Fig. 2, bottom panel). The difference between the two maps is 36 prefectural-level cities in 2020 that did not have that status in the year 2000; the need for each prefectural level city to have at least one district is a driving force for some counties to be converted (as shown below).Footnote 4 While there are still some physically large areas not organized as prefectural-level cities, as seen in the map in the bottom panel of Fig. 2, these large areas (especially in Xinjiang) have only a low share of population and economic activity.

Fig. 2
figure 2

The growing spread of prefectural-level cities from 2000 to 2020 in China

A good example of this second pathway comes from Haidong prefectural-level city in Qinghai province, to the east of the provincial capital, Xining. When this prefectural-level city was created in 2013, one of Haidong’s six counties, Ledu, got upgraded to become a district (as prefectural-level cities are required to have at least one district). Two years later, adjacent Ping’an county, which borders Xining to the west, also got upgraded. This sequence is shown in Fig. 3, with the administrative status of each of the constituent spatial units in Haidong shown in 2012, 2014, and 2016. The maps also show a non-administrative indicator of urban areas, based on DMSP night-time lights (with red for the brightest areas, orange and green for less bright areas and white for unlit (completely rural) areas).Footnote 5 The main urban axis through Haidong is on a slightly tilted east-to-west orientation, along the Huangshui river (a tributary of the Yellow river); a route also used by China’s G6 Beijing–Lhasa Expressway. The upgrading for Ledu and Ping’an reflects this increasing urban activity linked to Xining, 40 km to the west. It is also notable that the upgrading of one county was quickly followed by the upgrading of an adjacent one; indicating some non-independence between spatial units.

Fig. 3
figure 3

County-to-district upgrading in Haidong City, Qinghai Province, China

The other common type of county upgrading is to county-level city; about 50 counties underwent this type of upgrade in the last two decades, contributing to about one-fifth of the total shown in Fig. 1. The county to county-level city conversion had been practiced more widely in the 1980s, when China’s urbanization strategy prioritized the development of small cities. According to the standards in place at the time, almost one-half of all counties met the criteria for conversion in terms of their share of non-agricultural hukou population, economic activity (GDP and industrial output) and local budgetary revenues (Chung and Lam 2004). Yet standards may not have been uniformly applied given that some counties without much agglomeration potential still got upgraded (Fan et al 2012). This type of upgrading then fell out of favour for a decade from the mid-1990s, as shown clearly in year-by-year counts of upgrades recorded by China’s Ministry of Civil Affairs (see Fig. 2, in Tang 2021).

2.3 Prior research evaluating effects of county upgrading

The economic effects of de facto upgrading of counties to county-level city or district status has attracted interest from researchers. The studies in this literature apply panel data econometric models to various samples (usually of third-level spatial units), to test if there is faster post-upgrading growth in indicators of economic activity; typically these studies use local GDP as the economic activity indicator but a couple use luminosity. Our summary of a selection of these studies is in Table 2, which also includes a few where some of the results are for spatial units at the prefectural level (e.g. Deng et al 2022) because of the similarity of data and methods and also because the urbanization processes studied are similar. Typically a difference-in-differences framework is used as the non-experimental evaluation approach, given that the counties subject to upgrading are unlikely to be a random selection.

Table 2 Selected studies analysing upgrading of county-level units in China

A key feature of the difference-in-differences approach is that it typically treats spatial units as independent of neighbours. In other words, a stable unit treatment value assumption (SUTVA) is relied upon. While alternative difference-in-difference estimators that relax this assumption of independence have been developed using spatial econometric models (e.g., Delgado and Florax 2015) these have not been used in the studies of upgrading in China. In fact, none of the studies we reviewed seem to use spatial econometric models (as seen from column 3 of Table 2) although these is some discussion of possible spillovers in a few of the papers. For example, Tang (2021) and Zeng and You (2002) drop some untreated counties if they are near to the upgraded counties, as sensitivity analyses in case outcomes for the nearby counties had been affected by spillovers coming from the treated units. Perhaps the closest to using a spatial model is Bo (2020) who includes treatment status of contiguous neighbours; a spatial lag of a covariate can generate local but not global spillovers (LeSage and Pace 2009) so this specification restricts the types of spillovers.Footnote 6

3 Data and methods

Our review of studies evaluating effects of giving selected subnational units in China city status shows that spatial econometric models have not been used, even for studies with a (limited) discussion of spillover effects. In this section, the data and estimation framework used here to shed some light on the nature of these spillovers are described. In contrast to the prior studies, we are less interested in quasi-causal analyses of particular treatments (such as converting counties to county-level cities). Instead, our interest is in using a comprehensive modelling framework to help to establish where and how spillovers may occur.Footnote 7 In doing so, we adopt the language of spatial econometrics (especially LeSage and Pace (2009)) in terms of direct, indirect, and total “impacts” even though some branches of economics increasingly reserve the term “impacts” for contexts with explanatory variables subject to either explicit manipulation (as in randomized control trials) or that vary due to some naturally occurring (partial) randomization that allows use of instrumental variables (Gibson et al 2023). Even if the patterns that we find are considered as correlations these patterns of spatial spillovers should still be present (and so should be accounted for) in other studies that use more formal quasi-causal models. In other words, the results reported below are partly an exercise in allowing the data to speak, to see whether modelling decisions that a priori rule out spatial spillovers have a sound basis.

3.1 Data sources

In order to empirically analyse spillovers we use two measures of economic activity—local GDP and local luminosity. China reports GDP for different types of spatial units in a variety of publications from the National Bureau of Statistics (NBS). We rely on three main sets of these: (i) annual editions of the China Statistical Yearbook (county-level) (in Chinese it is Zhongguo Xianyu Tongji Nianjian (Xianshi Juan)), (ii) annual editions of the China City Statistical Yearbook (known as Zhongguo Chengshi Tongji Nianjian), and (iii) annual editions of the Statistical Yearbook for each city or province (for example, the Beijing Statistical Yearbook) (NBS, various dates). These different types of publications are needed because, for example, annual GDP of districts comes out in a different report than the annual GDP of counties. Each edition reports on GDP the previous year, so we use the 2001 to 2020 editions to obtain annual GDP data from 2000 to 2019. Overall, we have a balanced panel of annual GDP data for each of n = 2342 units at the 3rd level of the sub-national administrative hierarchy, where these units maintain a consistent spatial definition from 2000 to 2019.

The luminosity data are DMSP annual composites from satellites F14, F15, F16 and F18 that collectively cover each year from 2000 to 2019.Footnote 8 The stable lights product used here (where ‘stable’ simply means ephemeral lights from sources such as from fires are removed) provides 6-bit digital numbers (DN) that range from 0 to 63 (with higher values for greater luminosity). The technical details on the steps used to create these data are in Baugh et al (2010) and Ghosh et al. (2021).Footnote 9 We use the sum of lights for each third-level spatial unit in each year, following previous research for China that finds that this luminosity indicator is the best proxy for local economic activity (Zhang and Gibson 2022).

To find when counties were converted to either county-level city or district status we compared administrative settings of all spatial units listed in the 2000 population census with the listings for the same places in the 2010 and 2020 population censuses. These comparisons identified counties whose status had changed, and we then went to the specific page for each of those places in the Baidu Encyclopaedia (baike.baidu.com). Within the place-specific page the administrative history section gave the precise data of any change in administrative status.

The summary statistics for the two outcome measures and for the indicator of whether a county was upgraded are reported in Appendix Table 1 We show results for observations that are year-by-spatial unit, and also time-averaged results for the ever-upgraded subset of the spatial units. The baseline differences shown by this comparison are controlled for by the spatial fixed effects in the estimation framework discussed below.

3.2 Estimation framework

Spatial econometric methods let us examine the nature of possible spillovers, and are used for this purpose in many contexts (Krisztin et al 2020; Asyahid and Pekerti 2022). A key aspect of these models aiding the study of spillovers is that possible interactions between spatial units are summarized with a \(N \times N\) spatial weights matrix, W. In this study we use a row normalized contiguity weights matrix that has values of one for neighbours and zero otherwise, with a diagonal of zeros because a spatial unit cannot neighbour itself. At the level of spatial disaggregation that we use, the average spatial unit in China has six neighbours.

In what follows, the proxy for economic output in spatial area i in year t is denoted as \(O_{it}\), where the two proxy variables we use are log GDP in our main specification, and the log of the sum of night-time lights in our sensitivity analysis. The indicator for whether a spatial unit has been upgraded is \(D_{it}\) the \(\mu_{i}\) are time-invariant fixed effects for each spatial unit, the \(\vartheta_{t}\) are year fixed effects, and \(e_{it}\) is a random error. By using the spatial weights matrix we can allow for spatial lags, which are averages of these variables over the neighbouring units.

Our starting point is a very general model, which is a spatial autoregressive model with spatial autoregressive errors (SARAR).Footnote 10 This model allows for spatially lagged dependent variables, spatially lagged independent variables and spatially lagged errors:

$$O_{it} = \lambda WO_{it} + \beta_{1} D_{it} + \beta_{2} WD_{it} + \mu_{i} + \vartheta_{t} + \rho Wu_{it} + e_{it}$$
(1)

The SARAR model allows for changes in an outcome variable in a given area to have effects on contemporaneous outcomes in other areas (via the autoregressive spatial lag of the dependent variable, if \(\lambda\,{\ne}\,0\)). It also allows changes in independent variables (such as being converted from county to city status) to affect not only own-area outcomes but also outcomes in neighbouring areas \(\left( {{\text{if}}\;\beta_{2} \ne 0} \right)\). The \(\rho Wu_{it}\) term allows for spatial autocorrelation, where errors for a given area correlate (ρ) with a weighted average of the errors from surrounding areas. Equation (1) nests a spatial Durbin model if \(\rho = 0,\) a spatial auto-regressive model (aka spatial lag model) where only the dependent variable is spatially lagged if \(\beta_{2} = \rho = 0\), a spatial error model where only the errors are spatially lagged (if λ \(= \beta_{2} = 0)\), and the most restrictive of all, which is an aspatial model with no spatial lags \(\left(\ {{\text{if }} \lambda = \beta_{2} = \rho = 0} \right)\). The aspatial model has been the approach underpinning the previous studies of county upgrading (Table 2). The encompassing nature of Eq. (1) allows for a general-to-specific model selection strategy which appears to be more robust than the reverse simple-to-general selection strategy, especially if they are any anomalies in the Data Generating Process (Mur and Angulo 2009).

An important feature of spatial econometric models is that lags of either the outcome variable or of independent variables (but not of errors) mean that total effects of changes in an independent variable—e.g. whether a county is upgraded—may be quite different to what the regression coefficient on the dummy variable for being upgraded shows. Thus, while \(\hat{\beta }_{1}\) is the object of interest in the typical model without spatial lags, in the spatial models when either the spatial lags of outcomes or the spatial lags of independent variables are non-zero then \(\hat{\beta }_{1}\) does not capture the total effect of a change in the administrative status of a county. A useful decomposition of the more complex spatial relationships that occur relies on rewriting Eq. (1) in matrix notation (for simplicity, subscripts are dropped and fixed effects and error terms combined in v because the errors do not affect this decomposition) as:

$$O = \left( {I - \lambda W} \right)^{ - 1} \left( {D\beta_{1} + WD\beta_{2} } \right) + \left( {I - \lambda W} \right)^{ - 1} v{ }$$
(2)

Following Elhorst (2012), the \(N \times N{ }\) matrix of partial derivatives can be written (noting that diagonal elements of W are zero) as:

$$\frac{\partial O}{{\partial D_{k} }} = \left( {I -\lambda W} \right)^{ - 1} \left( {\beta_{1k} I_{N} + \beta_{2k} W} \right)$$
(3)

where Dk is the upgrade status in spatial unit k. The total marginal effect on output that is associated with a county being upgraded has two components, a direct one and an indirect one, that may both vary over space. The estimator that we use follows LeSage and Pace (2009) in reporting a single direct effect, that averages the diagonal elements of the matrix in (3) and a single indirect effect that averages the row sums of the non-diagonal elements of that matrix. Indirect effects arise not just from adjacent area units, if \(\beta_{2k} \ne 0,\) but also from (potentially) all areas through the spatial autoregressive effect if \(\lambda \ne 0\). Thus, there can be both local and global spillovers and when these are accounted for, averages from the matrix of derivatives \(\partial O/\partial D_{k}\) may be quite different to the estimated direct impact effect, \(\hat{\beta }_{1}\).

4 Results

The results of estimating Eq. (1) and then imposing various restrictions on the parameters and estimating the nested models are given in Table 3 (for GDP) and Table 4 (for luminosity). The nesting restrictions are rejected in all cases, so that the SARAR models appear to be the most data-acceptable models for both GDP and luminosity. The discussion therefore concentrates mostly on the results in column (1) for the SARAR model, and then by way of contrast on column (5) for the standard two-way fixed effects panel data model that does not allow for any spatial lags. The column (5) models are similar to the models used in the literature to date (such as some of those studies summarized in Table 2).

Table 3 Relationships between county upgrading and the change in economic output (log GDP) in China: 2000 to 2019
Table 4 Relationships between county upgrading and the change in luminosity (log DMSP) in China: 2000 to 2019

The lack of data acceptability for any of the nesting restrictions indicates that for the spatial units and period we study, the interactions are occurring through the spatial lags of the outcomes, the lags of the treatment, and the lags of the errors. There are two consequences of this pattern. First, partial approaches that only allow for local spillovers, such as including the treatment status of nearby spatial units (e.g., as in Bo 2020), may not reveal the full pattern of spillovers. Second, the regression coefficients by themselves do not tell the full story and the matrices of marginal effects based on Eq. (3) need to be taken into consideration.

The results of the marginal effects calculations are reported in the “average impacts” rows of Table 3 and 4, using the decomposition due to LeSage and Pace (2009). These results indicate that the positive direct relationship between a county being upgraded and economic activity is amplified by positive indirect effects. For example, from column (1) of Table 3 it is apparent that the upgrading is associated with GDP being 13% higher, comprised of seven percent as the direct effect and six percent as the indirect effect (thus, the spillovers and feedback effects provide a component that is almost as large as the direct effect).

When luminosity is used as the proxy for local economic activity, as a sensitivity analysis in case of mistrust in China’s GDP figures, upgrading is associated with 18% higher activity (10 percent direct and eight percent indirect). One caveat to this result is that DMSP data on night-time lights are subject to blurring, and so tend to overstate similarity for nearby areas (Zhang et al 2023).Footnote 11 To provide some evidence on this issue, Fig. 4 has time-series of Moran’s I statistics from all n = 2342 county-level spatial units each year. The I statistic is a popular measure of spatial autocorrelation, ranging from -1 to + 1 (higher values show greater similarity). The Moran’s I for the GDP of the third-level spatial units is around 0.15 but it is far higher, at around 0.4, for DMSP night-time lights.Footnote 12 The higher I statistic most likely is an overstatement; more spatially precise luminosity data that are free of blurring issues—from NASA’s Black Marble series—have I statistics about halfway between values for DMSP and for GDP. Hence, there is likely to be an exaggerated similarity of nearby areas with the DMSP data, which could contribute to the appearance of larger spillovers. Thus, we consider the results in Table 4 to be an upper bound for the contribution of indirect spillover effects.

Fig. 4
figure 4

Spatial autocorrelation in GDP and night-time lights data

If results in column (1) are compared with those in column (5) for Table 3 and 4 it provides some insight into possible distortions in the literature on effects of county upgrading in China. If spatial lags are not included in the models (a restriction that is inconsistent with the data, according to the nesting tests), GDP appears to be just under six percent higher in the upgraded counties, post-upgrading (based on column (5) of Table 3). Yet the SARAR model results, which are the ones supported by the data, show total upgrading effects of 13% higher GDP. In other words, the standard model that a priori rules out spillovers only shows about two-fifths of the effect. If we, instead, use luminosity as the economic activity indicator the same pattern is visible with the aspatial model in column (5) of Table 4 giving an effect that is only about two-thirds of what the data-consistent model in column (1) shows. In other words, it is possible that prior studies have understated the economic effects of converting counties to cities because they have not allowed for positive spillovers.

4.1 Heterogeneity analysis

There is huge variation across space in the density of economic activity in China and so some geographic heterogeneity in the strength of the spillovers is likely. We explore this in Fig. 5, which presents results for the Eastern region of China versus the rest. The usual disaggregation divides China into Eastern, Middle, and Western regions as the top-level geographic breakdown but these regions are quite unequal in economic size. The Eastern region produces 56% of GDP, is home to 46% of residents (as of the 2020 census) but has only 17% of the land area. While the direct effects on GDP of converting counties to cities are of similar magnitude in the East and the rest (at just under eight percent higher GDP), the indirect spillover effects are twice as large in the East as they are in the rest of China (ten percent versus five percent). The results in Fig. 5 are consistent with what Tang (2021) finds, of stronger total effects of upgrading in the East where the initial agglomeration forces are stronger.

Fig. 5
figure 5

Spillovers appear to especially matter in China’s Eastern regions

There are at least two implications of stronger indirect spillovers in the high density Eastern region. In terms of measurement, China lacks a Metropolitan Statistical Area (MSA) designation, for large population nuclei plus adjacent communities that are highly integrated, typically seen in commute-to-work patterns (Forstall and Chan 2015). Elsewhere, such as the United States, the MSA concept helps to study the spatial patterns of economic development without limits imposed by data based on administrative boundaries. The links between nearby areas in Eastern China seen in the strong spillovers suggest a MSA approach could be useful, even if other parts do not require this statistical innovation (because of the weaker spillovers). Relatedly, in terms of policy, China in the last decade saw dispersed urbanization, as smaller cities experienced faster growth in resident population (comprised of both de jure holders of local hukou plus de facto residents with hukou from elsewhere), than what big cities, that are especially in the Eastern region, experienced. This was a sharp change from the prior period (between the 2000 and 2010 censuses), where big and small cities both had urban population growth rates of above 8% per annum. A dispersed urbanization may potentially forego some agglomeration gains that could produce up to 20% higher per capita GDP under concentrated patterns of urban growth (MGI, 2009). The stronger spillovers in the Eastern region favour a concentrated form of urbanization there, if the gains from agglomeration are to be exploited.

5 Discussion and conclusions

In this paper we have used econometric models that explicitly allow the pattern of spatial spillovers to be revealed through the estimated coefficients on the various spatial lags. In contrast, prior analyses of the regional development intervention we study—administrative conversion of some of China’s counties into cities—use estimation frameworks where the observations are treated as independent of their neighbours and so do not allow the pattern of spillovers to be freely revealed. We find that all three types of spatial lags—of the outcomes, the treatment, and the errors—are statistically relevant, with restrictions to arbitrarily set any of these lags to zero rejected by the data. Consequently, positive direct relationships between a county being upgraded and post-upgrading local economic growth are amplified by positive indirect relationships. These positive indirect relationships operate both locally and globally. If these spillovers are ignored, the upgrading effects seem only two-fifths as large, according to the economic activity indicator that we consider more reliable, which is the reported GDP of China’s third level sub-national units (counties, county-level cities, and districts).

Our objective in reporting these results is to inform the growing literature that sets out to evaluate impacts of China’s county upgrading process. While there has been some limited discussion of spatial spillovers in this literature, the empirical approaches that have been used are informal, such as dropping observations within various distance bands of treated units in case the observations were affected by spillovers. One study did include the upgrade status of adjacent spatial units (Bo 2020) but that approach implicitly restricts spillovers to operate locally rather than globally. Instead, the focus of much of the existing literature has been to provide reassurance to readers that trends in economic activity for counties whose status had not changed were a reasonable counterfactual for what might have happened in the upgraded counties if they hadn’t been upgraded. In some ways our approach is the reverse; we use a modelling framework that lets the data speak freely about the nature of the spatial spillovers, even if the patterns revealed may be considered as just conditional correlations. We have no reason to believe that the spillover patterns would be different if a quasi-causal framework had been used that involved testing for parallel trends, trimming on propensity scores and so on. In other words, we expect that the spillover patterns when the data are allowed to speak freely should also show up in quasi-causal analyses as long as those analyses do not a priori limit the nature of the spillovers that can be revealed.

There are several ways to extend the current analysis. A more nuanced way to allow for agglomeration forces that generate spatial spillovers might use asymmetric weights so that a more populous urban area has a greater effect on an adjacent county than the reverse. In the Section II example of Ledu District, in Haidong prefectural-level city of Qinghai, it is likely no coincidence that the next county upgraded was Ping’an, to the west, rather than the Minhe Hui and Tu Autonomous County to the east. The provincial capital of Xining, with a population of two million, is just 40 km to the west; in the other direction the next big city—Lanzhou in Gansu province—is almost 200 km away. So, a gravitational pull for urban development in Haidong is probably more towards the west, where the nearest large market is, and a weights matrix that allows for asymmetries would be one way to recognise such patterns. Presumably, such a weights matrix could be combined with a difference-in-differences setup to extend the estimator proposed by Delgado and Florax (2015) that allows for spatial interactions.

Another extension would consider treatment intensity. In our results, all converted counties get the same value of the treatment indicator (= 1). A more flexible approach would allow for a sequence of treatments, such as from county to county-level city and then to district. It remains to be seen whether these extensions would alter our core finding of the need to allow for spillovers when evaluating China’s regional development interventions. To the extent that the finding of positive spillovers persists, it does suggest that upgrading has successfully created agglomeration effects, especially in the Eastern region. In contrast, some of the early evaluations, such as Fan et al (2012) reached more pessimistic conclusions about upgrading and so there may be grounds to revisit these earlier findings.

Moving beyond China, our study also has implications for the international literature. First, we provide another example to add to the emerging evidence from Europe and North America that spillovers should be considered in studies that evaluate regional development interventions. If these spillovers are a priori ruled out by using empirical approaches that rely on stable unit treatment value assumptions the results may provide a misleading guide to the actual impact of the intervention. Second, our results raise a methodological question, for the developers of new spatial estimation approaches to think about. There are at least two sets of modelling decisions faced in applied research studying regional development interventions; the approach to causal effects estimation (including just estimating conditional correlations as we do), and the framework for allowing any spatial spillovers to reveal themselves. Some recent causal effects studies allowing spatial spillovers start with particular spatial models, such as a spatial Durbin model or a spatial lag model, and generalize the DID model with the particular structure of spatial lags embedded in their choice of spatial model. Instead, here we started with a more general spatial model and tested to see if nested models, like the spatial Durbin model, were data-acceptable; restrictions to nest the special case models were rejected in all cases. So the question arises as to which decision to focus on first—the causal effects estimation framework or the spillover effects framework, and whether the two decisions have an interaction, as may occur if first settling upon a particular spatial model so as to implement a particular causal effects approach obscures some spatial effects that otherwise could have been revealed if a different causal effects framework had been used.