Impacts of Urban Rail Transit on City Growth: Evidence from China

This research examines the effects of urban rail transit (URT) on city growth measured by the increases in population, gross domestic product (GDP) and employment rate. Forty cities which have URT systems by the end of 2019 in China are taken as investigated samples. Research data related to URT extent, population, GDP, employment rate and five types of control variables which are individual, people's living, economic, science and education, and infrastructure are utilized and their applicability is verified. Panel data models are applied to analyze the effect of URT on city growth, and the robustness of the model estimation results is assessed. The study further analyzes the heterogeneity in the effects of URT systems on cities with different economic development levels. The estimated results indicate that the opening and expansion of URT have a positive effect on the population of the city. URT promotes the development of the urban economy and increases employment opportunities. Nevertheless, because of population migration, URT has little effect on the employment rate. In addition, the positive effect of URT on urban growth is most obvious for cities with a relatively high level of economic development.


Introduction
Urban rail transit (URT), as a vital part of urban public transportation, is generally regarded as a commodity with positive externality. It has become an important tool in promoting city growth [1,2] and makes it more convenient for people to travel within the city [3], which increases urban economic activity and thereby promotes the overall development of the city.
However, not everyone thinks that the introduction of URT into cities is a rational choice. URT is a durable good and needs magnanimous investment [4]. The Urban Rail Transit Association conducted a survey of 12 cities in China and found that the ratio of revenue to expenditure for URT lines was only about 54.00%, which means that the URT project cannot be profitable even if URT lines are already in operation. The costs of URT are so high that URT projects require many subsidies to maintain normal operations [5]. Opponents of URT believe that the costs of introducing URT are far greater than the value brought by URT. In order to justify these subsidies, proponents of URT frequently argue that URT can promote urban growth [6]. To determine whether it is worth investing in URT projects, the spillover effects of URT on city growth are widely studied by scholars and policymakers worldwide [7,8,10].
Saxe et al. [11] applied ordinary least squares (OLS) to identify the spillover effect of the Sheppard Subway Line on the net greenhouse gas emissions within the city. Multiple regression cannot capture the heteroscedasticity and autocorrelation of random interference terms, which may cause estimation bias. Borck [12] creatively used a quantitative equilibrium model and found that a large investment in public transit could decrease pollution by 0.017. However, difference-in-differences (DID) and spatial econometric models have been the most common models applied to investigate the spillover effects of URT on air quality in recent years [13,14]. The operation of URT can ameliorate a city's air quality, but the railway construction process has a great negative impact on the environment [15]. Li et al. [4] demonstrated that the reduction in air pollutants resulting from the expansion of URT could generate economic profits by improving citizens' health conditions. Xiao et al. [16] applied a spatial DID method which added DID terms into spatial econometric models, and pointed out that URT also influences various air pollutants in diverse ways.
Compared with time series and cross-sectional data, panel data not only can capture the impacts of URT within one city, but also can identify the differences among the impacts of URT in different cities which significantly reduces the problem of endogeneity. Gibbons and Machin [17] found a positive relationship between URT and property prices by analyzing panel data for London from 1997 to 2001. Similarly, Arum and Fukuda [18] applied a panel data model considering individual effects and concluded that public transit played an important part in determining residential land values. The hedonic model, which is the panel data model in real estate, has gradually become the main model for analyzing the changes in real estate prices [19]. Owing to the introduction of URT, the real estate industry and city amenities are vigorously developed [20][21][22][23]. Moreover, Trojanek and Gluszak [13] examined 34,274 sets of housing price panel data from 2008 to 2015 in Warsaw and claimed that URT could affect property prices even before it was put into operation.
To date, studies on the impacts of URT on population, gross domestic product (GDP) and employment rate as measures of city growth are relatively rare, although some achievements have been reported [24][25][26]. The panel data model and DID model are widely used in these studies. Population can be affected by URT in diverse ways [27]. Beyazit [28] analyzed the quantitative changes in population and employment before and after the Istanbul Metro was put into operation and pointed out the increases in population and employment. However, these increases may have been caused by influencing factors other than URT. In order to justify the causal relationship between URT and city growth, Calvo et al. [29] used OLS and found a positive relationship between URT and population. On the basis of this work, Gonzalez-Navarro and Turner [6] built an updated panel data model after solving the problems of confounding dynamics and omitted variables, and reported that URT had economically nonsignificant effects on the population after examining 632 of the largest cities in the world. This conclusion seems to be contrary to previous studies, but in fact, when the research perspective expands to the world, the characteristics of some countries would be ignored, so that the results obtained are not applicable to all countries. To date, few studies have concentrated on the impact of URT on populations in developing countries, which is a main focus in this study. Moreover, URT has a positive effect on the economic development of the city [30] by affecting GDP and employment [31]. Zhang [5] applied a DID model and showed a positive effect of URT on GDP growth, while this effect was much larger in megacities with populations of more than 6.150 million. In addition, it makes sense to study the impact of URT on GDP per capita, which can reflect changes in the living standards of citizens.
At present, most studies focusing on the externality of URT are empirical, which means that although the positive spillover effects of URT are observed, the underlying mechanism driving these changes has not yet been found. Furthermore, when applying the panel data model, studies typically regard city effects as fixed effects, without discussing their applicability, and ignore time effects. Therefore, this research uses the panel data model and considers city and year effects to investigate the causal relationship between URT and city growth.
The rest of this paper is organized as follows. The data are collected and preliminarily analyzed in Sect. 2. Section 3 presents the method for data verification and the panel data models. Section 4 analyzes the applicability of the data and the regression results, verifies the robustness of the results, and discusses the heterogeneity. The conclusions of this study and relative policy recommendations are provided in Sect. 5.

Data Description
Investigating the spillover effect of URT within a country can effectively reduce problems with endogenous variables [32]. Since 2011, the State Council, the Ministry of Industry and Information Technology, and other departments have successively established policies to support and standardize the development of URT in mainland China. The past decade has witnessed rapid development of URT, with the URT operating kilometers nearly quadrupled from 1708.400 km in 2010 to 6902.500 km in 2019, and growth is expected to continue at a high pace over the next few years. Thus, mainland China provides a good sample for studying the externality of URT. Forty cities that had established URT by the end of 2019 were selected as studied cities. Figure 1 shows the distribution of these cities. The density of the studied cities is relatively high in southeastern China because of the unbalanced and inadequate development. As a result, it is necessary to analyze the heterogeneous effects in the remainder of this study.
With respect to city growth, which is the explanatory variable in this study, city variables including population, GDP and employment rate are introduced, as shown in Table 1. Compared with resident population, registered population can reflect the population change more accurately. Furthermore, real GDP is a more accurate measure of the level of the urban economy than nominal GDP. Consequently, this study selects registered population, real GDP and urban employment rate as the population variable, GDP variable and employment rate variable, respectively. These data derive mainly from the National Bureau of Statistics website.
In addition to using dummy variables to measure the scope of URT as the main explanatory variable, this study adopts operating kilometers, which is the length of URT lines in operation in the city, as the measurement of URT extent, as shown in Table 1. These data are collected manually from 2010 to 2019. Each observation is defined as a city-year [6]. URT data mainly derive from the China Urban Rail Transit Annual Report.
Other factors that might affect city growth are analyzed as control variables to solve the problem of omitted variables, which eliminates the problem of endogeneity. The specific content of these control variables is interpreted in Table 2. Due to incomplete data acquisition, the missing data are supplemented using the expectation maximization algorithm. The high collinearity of explanatory variables in the regression model may lead to increased variance in parameter estimation, which leads to inaccuracy in parameter estimation. Figure 2 shows the Pearson correlation coefficients among the study variables. The color shade of each square is directly proportional to the positive correlation between the variables. The darker color block indicates that the two variables have a higher linear correlation, which would cause estimation bias. According to   Figure 3 describes the quantitative relationship between population and URT. Figure 3a presents the cross-sectional relationship in 2019. The abscissa is the population in each studied city, while the ordinate is the corresponding URT operating kilometers. The slope of the fitting curve between population and URT is positive and indicates that cities   Figure 4 describes the relationship between URT and GDP. The cross-sectional relationship is shown in Fig. 4a, and the time series relationship for Beijing is presented in Fig. 4b. It can be seen that there is a positive relationship between URT and population. Similar to Fig. 4a and b, Fig. 5a and b show the cross-sectional relationship and time series relationship of URT and employment rate, respectively. The method for calculating the employment rate in this study is the ratio of employees to the sum of registered unemployment and employees. Since the number of registered unemployed is generally less than the actual number of unemployed, the calculation result for the employment rate may be higher than the actual number. Nevertheless, the employment rate calculated in this study still has certain reference value. It seems that there is no direct connection between URT and employment rate.  There are several urban clusters in China, including the Beijing-Tianjin-Hebei Urban Agglomeration, Yangtze River Delta Urban Agglomeration, and Pearl River Delta Urban Agglomeration, which would cause spatial autocorrelations among city variables. That is, the GDP, population and employment rate of the cities within the urban agglomeration may interact with each other spatially. Moreover, regression results would be biased if there are high spatial connections between city variables in the panel data.
As shown in Eqs. (1) and (2), Moran's I and Geary's C are generally introduced to measure the degree of spatial autocorrelation among variables. Moran's I is between -1.000 and 1.000. Moran's I is greater than 0.000, indicating that the variable has a positive correlation in space. The absolute value of Moran's I is directly proportional to the spatial correlation of the variable. Geary's C is generally between 0.000 and 2.000. Geary's C of less than 1 indicates that the variables have a positive spatial correlation. Geary's C and Moran's I move in the opposite direction. Geary Moreover, the instability of time series would lead to spurious regression or spurious correlation. Accordingly, unit-root tests are carried out to increase the credibility of the results. The study data include 40 cities and 10 years, belonging to short panel data. According to the characteristics of the data, the Levin-Lin-Chu unit-root test (LLC) and Im-Pearson-Shin unit-root test (IPS) are applied in this study.

Model Establishment
A pooled panel data model, which is shown in Eq. (3), is established to investigate the causal relationship between URT and city growth.
where Y it denotes the city variables of city i in year t, U it represents the extent of URT, X it denotes the vector of control variables, and e it represents the stochastic disturbance term. e it of each city i in the panel data may have an autocorrelation between different years. In addition, e it may have different probability distributions for different X. Therefore, standard error is clustered in this study to solve the autocorrelation and heteroscedasticity problems. However, Eq. (3) considers panel data as cross-sectional data, which ignores the heterogeneity among cities or years that would be potentially associated with independent variables and would lead to estimation bias. In order to compensate for this deficiency, two main inference problems-individual interference and time influenceshould be addressed to make the model more accessible. The specific solution is to add city effects and year effects into the model and then verify their necessity. City effects principally derive from each city in the sample and remain time-invariant, while year effects result from time and remain unchanged with the cities.
By adding city effects l i into Eq. (3), a city effects model is established, which is shown in Eq. (4). l i can be considered either as a constant or as a random disturbance term, which is rarely discussed in studies to date. This study provides a new paradigm for such panel data models.
In the case of covðl i ; X it Þ 6 ¼ 0, the endogeneity of l i should not be ignored, so l i should be regarded as fixed effects. In order to make OLS available, l i should be eliminated. The first approach is to subtract the average of the time for a given city i on both sides of Eq. (4), which is shown in Eq. (5), so that l i are eliminated through subtraction. This estimation approach uses intra-group deviation information for each city, which means that the estimated results remain unbiased even though covðl i ; X it Þ 6 ¼ 0. However, the subtraction process also eliminates other time-invariant variables which can be observed, such as city individual variables. The second approach is to consider city effects l i as dummy variables D i , which is described in Eq. (6). For each studied city, a dummy variable is added to represent the characteristic of the city. The significance of city dummy variables' coefficients is positively correlated with the need for adding city effects into the model. The heterogeneity in each city can be estimated through c i . However, too many dummy variables need to be introduced when there are too many cities in the sample.
In the case of covðl i ; X it Þ¼0, l i should be taken as random effects. The generalized least squares method is used to estimate the city random effects model. In addition, the Hausman test can be used to test the correlation between l i and X it by determining whether the difference between calibration results estimated by the city fixed effects and city random effects models converges to zero, through which a more rigorous model is obtained.
On the basis of Eq. (4), year effects k t are added to answer the question of whether unobservable effects varying from years may still exist; thus the double effects model, which is shown in Eq. (7), is established. The estimation approach for this model is transforming year effects k t into dummy variables D t , which is shown in Eq. (8). The significance of year dummy variables' coefficients is positively correlated with the need to add year effects.
4 Empirical Results Table 3 lists the results of the spatial correlation test of the variables of interest in this study. The results indicate that the degree of spatial correlation of each variable is basically zero, which means there is nearly no spatial correlation in the panel data. However, variables such as U and pop are related in space at the 10.000% significance level (i.e., p value 0.010). As a result, it is vital to take spatial correlation into consideration when analyzing the robustness of the estimated results of the panel data model to deal with the problem of endogeneity in the variables of interest. Table 4 shows the results of the unit root test for variables of interest. The p values of the inspection coefficients are almost all zero, indicating that the panel data do not contain the unit root, which means that the study data are stable in the time series. Therefore, the panel data model is suitable for this study. Table 5 provides the primary estimated results for the quantitative relationship between URT and population. The main explanatory variable is the URT operating kilometers, while the explained variable is population. In Table 5, the numbers in parentheses are the p values for the corresponding coefficients, and the asterisks indicate that the corresponding data are unavailable. The parentheses and asterisks in the subsequent tables after Table 5 in this study have the same meaning. Column 1 presents the main results for the basic panel data model using pooled regression. Standard deviations are clustered at the city level to avoid autocorrelation and heteroscedasticity problems. Columns 2 and 3 reveal the results for the two regression approaches with the city fixed effects model, while column 4 presents the results for the city random effects model. The p value of the Hausman test is 0.000, which confirms that the city fixed effects model is more consistent with the data than the city random effects model. The results of the double effects model are illustrated in column 5. In column 6, the variable of interest is replaced by the dummy variable, which is equal to 1 if the city opens URT and 0 otherwise. Corresponding to column 3 of Table 5, Table 6 shows the values of the city fixed effects and the corresponding p values calculated using the second approach of city fixed effects. The fixed effect of Shanghai is omitted because Shanghai is labeled as the baseline group. The coefficients of city dummy variables are significant at the 1.000% level in most cities, so adding city effects into the model is indispensable. In addition, corresponding to column 5 of Table 5, Table 7 explains the year effects calculated by the double effects model and the corresponding p values. The year 2010 is labeled as the baseline group. None of the year effects coefficients is significant at the 1.000% level, indicating that there are no statistical correlations among the studied years. Therefore, year effects should not be considered. In summary, the city fixed effects model fits the data best.

General Characteristics
According to the city fixed effects model, columns 2 and 3 in Table 5 show that the opening and operation of URT can significantly increase the urban population. In general, a 10.000-km increase in the extent of URT would result in a 30,000-person increase in population, which is significant at the 1.000% level. The reason is that the low travel costs and availability of efficient URT would allow citizens to travel more conveniently within the city [29], and would then promote the development of many aspects of the city [5]. The development of the city is bound to increase the attractiveness of the city, which will undoubtedly attract more people to migrate to the city. Table 8 provides the impacts of URT on the GDP and employment rate by applying the city fixed effects model. Columns 1, 2 and 3 present the effect of URT on GDP, while columns 4, 5 and 6 describe the effect of URT on the employment rate. Columns 1 and 4 estimate the city fixed effects model using the first regression approach, while columns 2 and 5 apply the second regression approach. Columns 3 and 6 replace the variable of interest with the dummy variables, respectively.
The results reveal that URT has a positive effect on GDP. A 10.000-km increase in URT operating kilometers leads to a 2.914-billion-yuan increase in GDP, which is significant at the 1.000% level. URT provides a more convenient way for people to travel within the city and saves travel time, so citizens can invest more energy and time in creating value, thereby promoting the development of the urban economy. Nevertheless, the regression results show that the coefficient of URT to the employment rate is particularly small and nonsignificant, indicating that URT has little or no effect on the employment rate. By analyzing the elastic relationship between URT and city variables, this study identifies the underlying reasons that URT has no impact on the employment rate.

Robustness Tests
The robustness of the results estimated in Sect. 4.2 needs to be tested from many aspects to verify the causal relationship between URT and city variables. The first method replaces the main explanatory variable to the logarithm of the operating kilometers of URT, and the explained variables are substituted by the logarithm of population and GDP. Then the coefficients of ln U stand for the elastic relationship between URT and city variables. Some cityyears are discarded because URT is not put into operation during these city-years and it makes no sense to calculate the logarithm of zero. If the coefficient of ln U is still positive and significant, it verifies the positive effect of URT on city variables. Table 9 lists the estimated results after changing the variables. Columns 1 and 3 apply the first regression approach of the city fixed effect model, while columns 2 and 4 show the results estimated by the second regression approach. The results suggest that the elasticity coefficients of URT to population and GDP are positive and significant at the 1.000% level. A 10.000% increase in URT operating kilometers leads to a 4.140% increase in population and a 2.710% increase in GDP. Nevertheless, URT does not change the employment rate significantly. The results of elastic analysis are consistent with the previous results.
However, the impact of URT on GDP is weaker than that on population, which means that URT instead has a negative effect on GDP per capita. From another perspective, combining the impacts of URT on population and employment rate, although URT does not increase the  Table 7 Year effects in the double effects model  employment rate of the city, the number of employees increases owing to population growth. Therefore, URT has the ability to promote the development of the urban economy and create employment opportunities. However, the contribution of the population increased by URT to GDP is lower than that of the original citizens. The fundamental reason lies in the flow of population. People migrating to cities with URT usually do not have strong consumption capacity. Moreover, the migration of the nonworking population such as older people and children brought by the increased labor population owing to URT has negative effects on the per capita GDP and employment rate.
Although it has been verified that the spatial relationship between cities is not close, some results of spatial tests are still significant at the 10.000% level. Spatial econometric models (SEC) are usually regarded as efficient ways to identify the spatial relationship among studied variables [33]. Therefore, this study applies SEC as the second method to avoid the interference of spatial correlation on the estimated results. Four spatial econometric models, including the spatial autoregressive model (SAR), spatial error model (SEM), spatial autocorrelation model (SAC) and spatial Durbin model (SDM), are introduced in this study. In the case where the coefficient of U remains positive and significant, the causal relationships between URT and city variables are proved. The results of SEC are shown in Table 8. According to the Akaike information criterion (AIC) and Bayesian information criterion (BIC), SDM fits the data best. Each coefficient of U in Table 10 is positive and significant at the 1.000% level, which means that the primary estimated results are robust.
In addition, a placebo test is carried out as the third method to further verify the robustness of the results from another perspective. There may exist other invisible influencing factors unrelated to URT that increase the population or GDP at the same time as the opening of URT. When this happens, if URT does not have a substantial impact on the explained variable, the p value of the coefficient of U is still positive. In order to avoid the above situation, the opening year of URT in each studied city is changed to a set virtual URT opening year dummy variable. For each city, the variable equals 0 before the virtual URT opening year and 1 otherwise. If the coefficients of U change from significant to nonsignificant, the causal relationships between URT and city variables are reversely proved. On the contrary, the coefficients of U remaining significant means that it is the invisible factors that are responsible for the positive relationships between URT and population or GDP rather than URT. Columns 1, 3 and 5 in Table 11 show the real effect of URT, while columns 2, 4 and 6 reveal the virtual effect. The coefficients in columns 2 and 4 are nonsignificant, indicating that the regression results estimated by the panel data models are robust.

Heterogeneous Attributes
Although the general impacts of URT on city growth have been presented and proved, not all cities respond to URT in the same way [9], so it is vital to discuss whether the effects of URT on cities with different development levels are heterogeneous. In mainland China, city level is a standard measure that can reflect the comprehensive development level of the city; thus cities are classified according to city level in this study. Table 12 illustrates the heterogeneous effects of URT on population and GDP. Each column derives from the second approach of the city fixed effects model. Columns 1, 2 and 3 present the results of the heterogeneous effects of URT on population in first-tier, second-tier and third-tier cities, respectively. Columns 4, 5 and 6 are similar to columns 1, 2 and 3 except that the explained variable is GDP.
A horizontal comparison among the coefficients shows that the ability of URT to increase population and GDP is stronger in first-tier and second-tier cities than in third-tier cities. Thus, the economic development level of a city is positively related to the positive effect of URT. This result is consistent with the previous conclusion. The opening and expansion of URT within a city would attract more people migrating to the city, and the increase in the population would pose challenges to urban infrastructure and public service. The urban infrastructure services are more complete and more developed in first-tier and second-tier cities, so URT can have a greater positive impact on city growth.

Conclusions
In this research, 40 cities which have operated URT by the end of 2019 have been selected to investigate the effects of URT on population, GDP and employment rate by   establishing panel data models. Robustness tests are carried out to verify the estimated results. The heterogeneous effects of URT on city growth have also been analyzed by classifying these cities in the sample according to their level of economic development.
The results indicate that URT can increase population and GDP and can create employment opportunities for the city. However, URT has a negative effect on per capita GDP and little effect on the employment rate. The positive effects of URT on population and GDP are more apparent in more economically developed cities. As a result, although the opening and expansion of URT lines can promote the development of the urban economy, the increasing population it attracts would also bring great challenges to urban infrastructure and public service. Therefore, policymakers should adopt a cautious attitude towards the introduction or expansion of URT. When opening and expanding URT, supporting urban infrastructure should be synchronously developed. Moreover, policies should focus on attracting and developing industries to provide more job opportunities at the same time. In addition, policymakers ought to give priority to the development of URT in cities at a higher economic level to gain greater benefits. URT can also affect other factors measuring the development level of cities and living standards of citizens, such as total factor productivity, which requires further study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.