1 Introduction

The effect of infrastructure on aggregate output and productivity has long been debated by researchers and practitioners. Many countries have relied on investment in infrastructure to boost output and promote economic development. Among them is China, whose zest for infrastructure construction started as early as the 1980s and continued in the new century with various development programmes aiming at stimulating regional growth and reducing regional development disparities. Overall, the country spent an average 8.5% of its GDP on infrastructure over the period of 1992–2011, overtaking the USA and the European Union to become the world’s largest investor in infrastructure.

However, China’s large-scale investment in infrastructure has raised concerns among economists and policy makers. One of the issues is whether massive and continuous injection of capital into infrastructure projects can help the country maintain its growth in a sustainable manner. Another question of interest is whether improving infrastructure in less developed areas can boost regional economic growth. These concerns are not just specific to China, but also relate to the probably oldest debate in the literature on infrastructure. That is, whether infrastructure capital can contribute to aggregate output in the long run.

The literature on the role of infrastructure could perhaps be traced back to the works of Rosenstein-Rodan (1943) and Hirschman (1957), which have highlighted the importance of capital investment in promoting growth. However, it is not until the 1970s (see, e.g. Arrow and Kurz 1970) and in the 1980s and 1990s (see Romer 1986, 1990; Lucas 1988; Barro 1990) that public capital has been theoretically modelled in an aggregate production function. Empirical studies on the impact of infrastructure took off with the seminal work of Aschauer (1989), which concludes that the marginal productivity of public infrastructure spending is two to four times higher than that of private capital. Large output elasticities of infrastructure are also found in subsequent studies by Munnell (1990) and Ford and Poret (1991). However, these findings have been questioned on methodological grounds and the high rates of return to infrastructure investment reported are often dismissed as implausible. One of the major caveats of the earlier studies and, to a less extent, some later ones is the failure to take into account non-stationarity of the data, which may lead to the spurious regression problem—a possible reason for unrealistically high estimates of the productivity of infrastructure (Gramlich 1994). Some studies attempt to avoid this problem by transforming data into first differences, but at the expense of destroying the long-term relationships (Munnell 1992). Another issue of concern is the potential reverse causality between output and infrastructure investment. Failing to take into account the problem runs the risk of jumbling the estimates of the output elasticity of infrastructure with the income elasticity of the demand for infrastructure services, leading to biased estimation results.Footnote 1

This paper joins the discourse by investigating the contribution of infrastructure to aggregate output based on empirical evidence from China. Over the last three decades, China experienced phenomenal economic growth, enlarging regional development disparities and waves of massive infrastructure investment made by both national and provincial governments as part of development policy packages. Thus, Chinese provinces provide an interesting empirical context to shed light on whether infrastructure capital has a long-run effect on aggregate output. To address the aforementioned methodological issues of earlier studies, we follow Calderón et al. (2015) and adopt a panel cointegration approach to deal with the non-stationarity of the variables included in our model and investigate the long-run effect of infrastructure. Although there has been a growing body of literature looking at the impact of infrastructure drawing from data on Chinese provinces (e.g. Demurger 2001; Ding et al. 2008; Fan and Chan-Kang 2008; Shiu and Lam 2008; Yu et al. 2012, 2013; Shi and Huang 2014), the analysis in this paper adopts an empirical strategy which distinguishes itself in some ways from existing China-specific studies.

First, our paper focuses on steady-state long-term relationships, a research issue which is difficult for most earlier studies to address due to short sample periods that do not allow some of infrastructure’s effects (like indirect ones) to set in. More recent studies do employ longer periods (e.g. Shiu and Lam 2008; Yu et al. 2012, 2013), but the sample period used in this paper covers the years over which there were most marked changes in both aggregate output and infrastructure stocks. Secondly, in order to ascertain that estimates of the output elasticity of infrastructure are not confounded with the income elasticity of the demand for infrastructure, this paper tests the direction of causality based on the procedure developed by Dumitrescu and Hurlin (2012). Thirdly, existing China-specific studies usually focus on individual types of infrastructure. This paper considers the multidimensional aspect of infrastructure by constructing synthetic indicators, using principal component analysis (PCA) from four core infrastructure assets, namely electric power, telecommunications, paved roads and railways. Surely, it is important to examine the impact of individual infrastructure types, which may vary to some extent from one category to another. However, as noted by Agénor (2010), different infrastructure networks are complementary to each other. For instance, having electricity to produce commodities but no roads to carry them to the markets limits the productivity effects of a programme designed to expand electricity generation capacity and transmission networks. Therefore, it is the joint availability or operation that will generate more efficiency gains in a growing economy (Agénor 2010). The use of synthetic infrastructure indices in our paper is motivated to capture multidimensionality and the overall availability of infrastructure. Indeed, two composite infrastructure indictors are constructed—one derived from the conventional method of PCA and the other based on Robust PCA which takes into account outliers.

Finally, we do not assume a common linear production function, due to an observed factor specific to each province in our sample. To this end, we derive the pooled mean group estimates as well as the estimates for each of the sample provinces. This makes our approach different from existing studies which usually address heterogeneity in infrastructure’s impact by running separate regressions for sub-samples of geographical areas. The analysis allows us to examine whether there is provincial difference in the infrastructure-growth nexus and explore possible explanations. It also casts light on a related research issue: whether increasing infrastructure investment in poor areas can help reduce regional development disparities.

It should be noted that our analysis using data on China has the potential to provide research insights and policy implications of wider relevance. China has transformed itself from one of the poorest countries in the world to a middle-income economy. Meanwhile, it has witnessed dramatic improvement in infrastructure facilities. The findings from such an empirical context regarding the long-run impact of infrastructure on aggregate output may be interesting to other developing countries. The part of the analysis which attempts to explore provincial differences in infrastructure’s impact and possible underlying factors may be able to shed light on why infrastructure has high productivity in some locations but negligible effects in other places. Related policy implications are not just limited to China, but also relevant to other countries (even developed countries).

The rest of the paper is organised as follows. Next section provides a brief overview of economic growth and infrastructure development in China over the last 30 years. Sections 3 and 4 detail the econometric strategy and data issues, respectively. The results of the empirical analysis are presented in Sect. 5. The last section concludes.

2 Economic growth and infrastructure in China

Since the economic reform initiated in the late 1970s, China has experienced unprecedented economic growth, with GDP growth averaging about 10 per cent over the last three decades. The growth performance of China has, however, been uneven across provinces. The development strategy implemented in the pre-1979 era and the early years of economic reform put emphasis on heavy industries, which concentrated in the north-eastern provinces and part of the central region. Since the mid-1980s, provinces along the eastern coastline were encouraged to grow, where special economic zones were established and more and more firms started to export. The growth momentum of the region was accelerated after 1992 when more favourable policies were granted, making it the most economically advanced area of the country. In order to reduce the resultant regional development disparity and promote economic growth in interior provinces, the ‘Open-up the West’ Strategy was initiated from 2000, followed by the ‘North-east Revival’ and ‘Rise of Central China’ schemes.

Large amounts of resources have been mobilised in China to develop and improve infrastructure facilities. Investment made in the 1980s was to alleviate energy-related bottlenecks, and, therefore, priority was given to the development of the gas and oil, coal, and electric power sectors (Naughton 2007). The transport and telecommunication sectors received much less recourses for the ensuing years, with their share in state fixed-asset investment at around 10% in comparison with that of the energy sector at 20% (Demurger 2001). Things started to change from 1992 when infrastructure was reasserted as a major development priority and implemented as an integral part of preferential policies applied to coastal provinces. This led to fast expansion of transport and telecoms facilities in the region. It was followed by an even faster growth in infrastructure investment around the year 1998, when infrastructure spending was used as an instrument of fiscal policy to stimulate economic growth. In the new millennium, with the unfolding of the programs aimed to promote economic development in the western, central and north-eastern parts of the country, both the central and local governments invested heavily in infrastructure projects, leading to significant improvement in the overall infrastructure endowments in these areas (Yu et al. 2012). In particular, provinces with abundant energy resources (e.g. Xinjiang and Inner Mongolia) received massive funding to build themselves as energy centres of the country. All the aforementioned infrastructure investments were topped up by a recent wave of infrastructure spending following the global financial crisis, making China the world’s largest investor in infrastructure.

3 Econometric methodology

3.1 Model specification

Following the existing literature we adopt an augmented production function in which infrastructure appears alongside physical capital, labour and human capital as factors of production. Keeping with most of earlier studies, the assumption of constant returns to scale is imposed, which leads to the following empirical specification after taking logarithms of the variables and subtracting the labour variable from both sides of the equation:

$$ y_{it} = \beta_{1} k_{it} + \beta_{2} hc_{it} + \beta_{3} z_{it} + \gamma_{i} + \eta_{t} + \varepsilon_{it} . $$
(1)

yit denotes the logarithm of output per worker in province i at time t; kit denotes the logarithm of physical capital per worker; zit is infrastructure capital per workerFootnote 2; and hcit stands for human capital. \( \eta_{t} \) allows for time-specific effects, whilst \( \gamma_{i} \) capture province-specific effects. The residual \( \varepsilon_{it} \) reflects the influence of shocks that affect the (log) level of output per worker which is assumed uncorrelated across provinces and over time.

Estimation results based on Eq. (1) may be spurious if individual variables in the specification are non-stationary. This is a methodological issue which many earlier studies and in particular those using data on Chinese provinces have failed to deal with. The first step in our empirical strategy is, therefore, to ascertain the time series properties of the variables.

3.2 Unit root tests

To test for the order of integration of the variables, we use panel data unit root tests. Indeed, we consider three tests, namely the Levin et al. (2002)-LLC, the Im et al. (2003)-IPS and the Pesaran (2007) panel unit root tests. The first two tests assume cross-sectional independence; however, if provinces are spatially dependent as they usually lie in the same geographical area, the assumption of the independence of error processes will be violated. Thus, the LLC and IPS tests in this context might lead to spurious results. The Pesaran (2007) test allows us to deal with this issue.

The basic framework followed by these tests can be viewed as an extension of the standard (augmented) Dickey–Fuller test and takes the following form:

$$ \Delta w_{it} = g_{it}^{'} \theta + \eta_{i} w_{it - 1} + \sum\limits_{j = 1}^{k} {\rho_{ij} } \Delta w_{it - 1} + \vartheta_{it} , $$
(2)

where wit denotes each variable under consideration, k is the lag length, the vector \( g_{it}^{'} \) includes panel-specific fixed effects or panel-specific fixed time effects and is the corresponding vector of coefficients.

The LLC test assumes the coefficient of the auto-regressive term to be homogeneous across all i (i.e. \( \eta_{i} = \eta \)) and examines the null hypothesis of H0 :η = 0 against the alternative \( H_{1} :\eta < 0 \). In the IPS test, however, the coefficient of the auto-regressive term is allowed to vary across the different units. This test applies a standardised t-bar statistic which is based on estimating separate unit root tests and averaging their ADF t-statistics. That is, \( \bar{t} = \sqrt N \left( {t_{iT} - N^{ - 1} \sum\nolimits_{i = 1}^{N} {E(t_{iT} } )} \right)/\sqrt {N^{ - 1} \sum\nolimits_{i = 1}^{N} {\text{var} (t_{iT} } } ) \), where tiT is the individual ADF t-statistics for the N cross-sectional units, and E(tiT) and Var(tiT) are, respectively, the mean and variance of tiT .

The Pesaran (2007) test addresses cross-sectional dependence by including the cross-sectional mean of the lagged values of \( w_{it} \) and its differences. The corresponding test is then defined as the simple average of the individual cross-sectional ADF regressions: \( {\text{CADF}} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {t_{i} } \), where ti is the t statistic of the OLS estimate of the auto-regressive term in the modified version of Eq. (2).

3.3 Cointegration tests

If the variables in our model are found non-stationary, one can proceed to test whether they are bound together in the long run, i.e. whether they are cointegrated. If so, Eq. (1) represents a long-run relationship between permanent movements in the (log) level of output per worker, infrastructure per worker, physical capital per worker and human capital. Several cointegration tests have been proposed in the literature, including Kao (1999)’s residual panel cointegration test and Pedroni (2000)’s residual panel cointegration test. However, as noted by Calderón et al. (2015), these tests only ascertain the presence of cointegration among the variables; but do not provide any information on the cointegration rank. These tests, therefore, implicitly assume the existence of one cointegrating relationship. In a model with four variables (as in our context), there is a possibility that more than one cointegration relationship exists. In order to ascertain whether there exists cointegration as well as the number of cointegration relations, this paper adopts the Johansen–Fisher panel cointegration test proposed by Larsson et al. (2001).

To illustrate how the test is conducted, let us consider a panel data set that consists of a panel of N cross-sectional units (provinces in our case) (i = 1, … N) observed over T time period (t = 1, … T). The data generating process for each group in our sample can be represented by the following heterogenous vector error correction model:

$$ \Delta y_{it} = \Pi_{i} y_{i, t - 1} + \sum\limits_{k = 1}^{k_{i}- 1} {\varGamma_{ik} } \Delta y_{i, t - k} + u_{it} . $$
(3)

k is the number of lags; \( u_{it} \) is an i.i.d. error term; \( \Pi = \alpha \beta^{'} \) with α being a \( p \times r \) matrix of short-run adjustment coefficients; and \( \beta^{'} \) a \( p \times r \) matrix of long-run cointegrating relations. The Johansen–Fisher panel cointegration procedure consists of testing the hypothesis that all of the N groups in the panel have at most r cointegrating relationships among the p variables. For this end, Larsson et al. (2001) consider the following rank hypotheses, for all i = 1, … N

$$ H_{0} :{\text{ rank}}(\Pi_{i} ) = r_{i} \le r $$
(4)
$$ H_{1} :{\text{ rank}}(\Pi_{i} ) = p. $$
(5)

Similar to the trace statistics from Johansen (1995), the trace statistic for each group i can be expressed as:

$$ LR_{iT} (H(\gamma )|H(p)) = - 2\ln Q_{iT} H(\gamma )|H(p), $$
(6)

where \( H\left( r \right):{\text{rank}}\left( \Pi \right) \le r \) and \( H\left( p \right):{\text{rank}}\left( {\Pi = p} \right) \).

Defining the LR-bar statistic as the average of the N individual trace statistics: \( LR_{iT} (H(\gamma )|H(p)) \), Larsson et al. (2001) propose the use of a standardised LR-bar statistic as a basis for the panel cointegration rank test, which is:

$$ \gamma_{{\,\,\overline{LR}}} (H(\gamma )\left| {H(p))} \right. = \frac{{\sqrt N (\overline{LR}(H(\gamma )\left| {H(p))} \right. - E(Z_{k} ))}}{{\sqrt {\text{var} (Z_{k} )} }}. $$
(7)

\( E(Z_{k} ) \) and \( \text{var} (Z_{k} ) \) are, respectively, the mean and variance of the variable Z which follows the same asymptotic distribution as the individual trace statistics.

To test for the existence of cointegration, one can let r take the value 0 and see whether the null hypothesis \( H_{0} :{\text{rank}}(\Pi ) = 0 \) can be rejected. In order to test whether there is only one single cointegration relation, one can impose the value of 1 on r and see whether the hypothesis \( H_{(r)} :{\text{rank}}(\Pi ) \le 1 \) is true.

3.4 Estimation of panel data model

As our cointegration results show later, there exists a single cointegrating vector among the variables in Eq. (1). We interpret this single cointegration relation as the long-run production function and therefore adopt a single-equation approach to estimate the coefficients of the variables in concern.Footnote 3 Several estimators for cointegrated panel data have been proposed in the literature. In choosing the appropriate estimator for purpose, we adopt the following strategy. First, we take the nature of our data set into consideration. Indeed, our data set consists of 29 Chinese provinces over the period 1985–2012; thus, N = 28 and T = 28 are moderate in size. Second, cross-sectional dependence has become, increasingly, an important issue in panel data estimations. Thus, choosing an estimator that overcomes this problem is crucial in deriving consistent estimates. For this reason, we use two estimation techniques in the main analysis, namely Pesaran (2006)’s common correlated effects mean group (CCEMG) estimator and the augmented mean group (AMG) estimator introduced by Eberhartd (2012). These estimators allow for unobserved correlation across panel members (cross-sectional dependence). Moreover, it has been shown that the CCEMG estimator provides consistent estimates of the slope coefficients and standard errors under the more general case of multifactor error structure and spatial error correlation, and performs well in small samples and can handle the presence of autocorrelation in the residuals and unit roots in the common factors (Pesaran and Tosetti 2011). We also conduct a robustness check by using estimators proposed by Bai et al. (2009).

To implement the CCEMG, the error term in Eq. (1) can be rewritten as having a multifactor structure, in the form of \( \varepsilon_{it} = \omega_{it}^{'}\,f_{t} + v_{t} \). ft is a vector of \( k \times 1 \) unobserved common factors, which affect each of the provinces with different intensities, and \( \upsilon_{t} \) is the province-specific error term which is assumed to be weakly dependent across the cross-sectional units. Rewriting Eq. (1) as \( y_{it} = \delta_{i}^{'} x_{it} + \gamma_{i} + \eta_{t} + \omega_{it}^{'} f_{t} + \upsilon_{t} \), where xit is the set of our regressors, one can obtain the estimated coefficient CCEMG as, \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\delta }_{\text{CCEMG}} = N^{ - 1} \sum\nolimits_{i} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\delta }_{i} } \).

The AMG estimator improves on the standard mean group by including a ‘common dynamic process’ extracted from a pooled OLS regression of first differences, which provides a panel-equivalent average movement of the unobserved common factors. Common factors are those factors that are time specific and common across provinces. The AMG is a two-stage procedure, which can be expressed as follows:

$$ {\text{stage}}(1): \, \Delta y_{it} = \delta_{i}^{'} \Delta x_{it} + \sum\limits_{t = 2}^{T} {c_{t} } \Delta D_{t} + e_{t} $$
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{c}_{t} \equiv \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mu }_{t} $$
$$ {\text{stage}}(2): \, y_{it} = \delta_{i}^{'} x_{it} + r + c_{i} t + d_{i} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mu }_{t} + e_{t} $$
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\delta }_{\text{AMG}} = N^{ - 1} \sum\limits_{i} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\delta }_{i} } $$

where we have (T − 1) year dummies D in first difference with corresponding parameter vector \( c_{t} \) from which the year dummy coefficients that are relabelled as \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mu }_{t} \) are collected. In the second stage, \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\mu }_{t} \) is included as an additional regressor in each of the N standard province regressions which also include linear trend terms \( t \) to capture omitted idiosyncratic processes that evolve in a linear fashion over time.

3.5 Panel causality test

The estimates of the output elasticity of infrastructure and associated inference obtained from the single-equation estimation based on Eq. (1) are only valid when infrastructure is weakly exogenous. If not, the estimates obtained could mix up the output elasticity of infrastructure with the income elasticity of the demand for infrastructure. To infer the causal relationship between the variables, we adopt the methodology developed by Dumitrescu and Hurlin (2012). This approach accounts for heterogeneity in the data series and assumes that all coefficients are different across the units in our sample. Dumitrescu and Hurlin (2012) extend the Granger (1969) contribution to panel data. To illustrate, let us assume \( x_{it} \) and \( y_{it} \) are the two stationary series (which could be the differences of non-stationary variables). To test whether \( x_{{}} \) Granger causes \( y_{{}} \), the following underlying model can be used:

$$ y_{it} = a_{1} + \sum\limits_{k = 1}^{K} {\theta_{ik} } y_{it - k} + \sum\limits_{k = 1}^{K} {\delta_{ik} } x_{it - k} + \varpi_{it} $$
(8)

with \( i = 1, \ldots ,N \) and \( t = 1, \ldots ,T \). The procedure to determine the existence of causality consists of testing for significant effects of past values of \( x_{{}} \) on the present value of \( y_{{}} \). The null hypothesis, which corresponds to the absence of causality for all individuals in the panel, can thus be defined as:

$$ H_{0} :\delta_{i1} = \delta_{i2} = \cdots = \delta_{iK} = 0,\quad \forall i = 1, \ldots ,N. $$
(9)

The alternative hypothesis can be written as:

$$ H_{1} :\delta_{i1} = \delta_{i2} = \cdots = \delta_{iK} = 0,\quad \forall i = 1, \ldots ,N_{1} , $$
(10)
$$ \delta_{i1} \ne 0,\;{\text{or}}\;\delta_{i2} \ne 0\;{\text{or}}\; \ldots \;{\text{or}}\;\delta_{iK} \ne 0,\quad \forall i = N_{1} + 1,\;N_{1} + 2, \ldots ,N $$

where \( N_{1} \in \left[ {0,N - 1} \right] \) is unknown. For causality to exist for all individuals, the following must hold \( N_{1} < N \); otherwise, the null hypothesis applies. If \( N_{1} = 0 \), this would imply that there is causality for all individuals in the panel.

With the above in mind, Dumitrescu and Hurlin (2012) propose a Wald test derived from the average of individual Wald statistics associated with the test of the non-causality hypothesis for units \( i = 1, \ldots ,N \). They define the average Wald associated with the null homogeneous non-causality hypothesis (HNC) as:

$$ W_{{N\!T}}^{{H\!N\!C}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {W_{iT} } , $$
(11)

where \( W_{iT} \) represents the Wald statistic for the ith cross-sectional unit corresponding to the individual test \( H_{0} :\delta_{i} = 0 \). Using Monte Carlo experiments, Dumitrescu and Hurlin (2012) show that the proposed standardised panel statistics have very good small sample properties and are robust in the presence of cross-sectional dependence.

4 Data

The data used in this paper consist of a balanced panel of 29 Chinese provinces over the period 1985–2012. Table 9 in Appendix provides the list of sample provinces. Output is measured by real GDP per worker in the 1987 prices. The variable physical capital is constructed using the perpetual inventory method. The values of provincial physical capital stocks for the year 1984—the year before the beginning of the sample period—are taken from Zhang and Wu (2004). They are then used to calculate capital stocks for the sample period from 1985 to 2012, based on \( K_{it} = K_{it - 1} (1 - \delta_{it} ) + I_{it} \), where \( \delta_{it} \) denotes the depreciation rate and Iit the gross fixed capital formation. In line with Zhang and Wu (2004), 9.6% is used as the depreciate rate in the calculation.Footnote 4 Calculated capital stocks are converted to the 1987 prices by using the deflators obtained from China Statistical Yearbook and then transformed into the per worker terms by using the data on provincial employment from China Labour Statistical Yearbook. Human capital is proxied as the % of provincial population with at least 9 years of compulsory school education, with data from China Population Statistical Yearbook.

To capture the multidimensional nature of infrastructure capital, we consider four categories of infrastructure assets, namely electric power, telecoms, paved roads and railways. Electricity is measured by total amount of electricity generated, with data obtained from China Energy Statistical Yearbook. Telecoms infrastructure is measured by the number of landlines and cellular phones, and the two transport indicators are total length of paved roads and that of railway routes. The data are collected from China Statistical Yearbook. The indicators are converted into the per 1000 worker terms by using the data on provincial total employment from China Labour Statistical Yearbook. Summary statistics of the variables are provided in Table 1.

Table 1 Descriptive statistics

The strategy used in this paper is to construct a synthetic indicator of the four infrastructure stocks as a measure of the overall availability of infrastructure. By doing so, we can also deal with some empirical difficulties arising from introducing a variety of infrastructure indicators as inputs in the production function. Given that our estimations are based on panel cointegration analysis, it would be computationally difficult to accommodate all the four variables in a single regression (because of over-parametrisation). Another problem relates to multicollinearity if individual infrastructure indicators are highly correlated, as it is the case in our context. Bearing this in mind, we construct a composite index for infrastructure derived from principal component analysis (PCA), in which the underlying variables are standardised in order to abstract from their units of measurement. The analysis shows that the first principal component has an eigenvalue of 2.92, whilst other components have eigenvalues lower than 1. The Kaiser rule suggests that we retain components with eigenvalue higher than 1. Moreover, the first component explains around 73% of the variance of the four variables. Finally, the overall Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy is 72.5% with the individual KMO ranging from 67 to 81%. Therefore, we use the first principle component to construct the composite infrastructure index, as follows:

$$ 0.54 \times { \ln }\left( {\text{paved}} \right)_{it} + 0.55 \times { \ln }\left( {\text{electricity}} \right)_{it} + 0.35 \times { \ln }\left( {\text{rail}} \right)_{it} + 0.53 \times { \ln }\left( {\text{telecommunication}} \right)_{it} , $$
(12)

where paved is the length of paved roads (in kilometres per 1000 workers); electricity is power generation (in 10 million kwh per 1000 workers); rail represents the length of rail lines (in kilometres per 1000 workers); and telecommunication is the number of main lines and mobile phones (per 1000 workers). The synthetic index is highly correlated with the individual infrastructure indicators used in its construction. More precisely, the correlation coefficients are 87% for paved roads, 96% for electricity generation, 74% for railways and 97% for telecommunications.

The standard approach of PCA as adopted above may become less valid because the covariance matrix used in the calculation may be sensitive to outliers (if any). To overcome this potential problem, we construct another composite infrastructure index based on a robust principal component analysis, as follows:

$$ 0.49 \times { \ln }\left( {\text{paved}} \right)_{it} + 0.52 \times { \ln }\left( {\text{electricity}} \right)_{it} + 0.48 \times { \ln }\left( {\text{rail}} \right)_{it} + 0.51 \times { \ln }\left( {\text{telecommunication}} \right)_{it} $$
(13)

We use this infrastructure index in the main regression analysis. Nevertheless, we also report the results using the composite indicator in which outliers are not dealt with, as the robustness check.

5 Empirical results

The analysis starts with checking the stationarity of the variables and testing for cointegration. It then moves on to the estimation of the long-run elasticities of output with regard to infrastructure and other inputs in the production function as specified in Eq. (1).

5.1 Panel unit root tests and cointegration results

Table 2 reports the results of the panel unit root tests. Both the IPS and LLC tests suggest that the null hypothesis of unit root cannot be rejected at the 5% significance level. The two tests are less valid if there is cross-sectional dependence, which is the case in our context (see Table 10 in Appendix). The results from the Pesaran (2007) test are therefore more insightful, which also point to non-stationarity of the variables. However, applying the tests to the first difference of the variables produces results which lead to the rejection of the null of non-stationarity.Footnote 5 We therefore conclude that all the variables are I(1).

Table 2 Panel unit root results

The next step is to test for cointegration. The results are summarised in Table 3, with the upper panel reporting those from the Johansen–Fisher test and the lower part the panel LR-bar test. Both the trace and eigen statistics from the Johansen–Fisher test indicate the rejection of the null of no cointegration, and we therefore conclude that the variables are cointegrated. In order to find out the number of cointegration vectors, we rely on the panel LR-bar test. It is evident from the results that the null that the maximum rank is zero (i.e. no cointegration) is rejected. However, the test does not reject the maximum rank of 1. Based on these results, it can be concluded that the null hypothesis of a common cointegrating rank for all provinces in the panel cannot be rejected. This implies that for each group in our sample, there exists one cointegrating relationship, which can be interpreted as the infrastructure-augmented production function as expressed in Eq. (1).

Table 3 Panel cointegration test results

5.2 Estimation results from long-run cointegration relationship

As discussed earlier, the Pesaran (2006)’s common correlated effects mean group (CCEMG) estimator and the augmented mean group (AMG) estimator introduced by Eberhartd (2012) are adopted to estimate Eq. (1). One of the reasons to employ them relates to the concern of cross-sectional dependence. A test of cross-sectional dependence against the residuals obtained from the standard mean group estimator yields a test statistic of 44.93 (with a p value of 0.00). This confirms the existence of the issue and justifies the use of CCEMG and AMG.

Table 4 reports the results using the synthetic infrastructure index based on robust PCA. The first thing to notice is that the two econometric methods produce very similar results: the parameter estimates are very similar in magnitude except for the variable of human capital which is nevertheless statistically insignificant. The estimated coefficient of physical capital is 0.254 from CCEMG and 0.225 from AMG, suggesting a long-run positive output elasticity of this variable. The magnitude of the coefficient is a bit lower than that found in the most influential cross-country studies using data sets which include observations on China (e.g. Canning 1999; Canning and Bennathan 2002; Calderón and Servén 2004; Candelon et al. 2013; Calderón et al. 2015). Although caution should be taken when comparing cross-province results like ours with those from cross-country studies, the relatively lower coefficients of physical capital shown in the table may be an indication of problems with China’s extensive growth—a strategy which relies heavily on investing aggressively. The human capital variable registers a positive coefficient and falls just below the 10% significance level based on CCEMG and is statistically no different from zero according to the AMG method.

Table 4 Estimation of long-run elasticities of output (period = 1985–2012; no. of provinces = 29)

Turning to the focus of the study, we find the estimated coefficient of infrastructure is positive and highly significant in statistical terms, irrespective of the econometric techniques used. The magnitude of the coefficient is 0.126 or 0.148, in the range of estimates in cross-country studies which use samples including developing economies and employ indicators of individual infrastructural sectors. The coefficients are also similar in size to those found on electricity and telecoms infrastructure for a full sample of Chinese provinces in Zhang and Ji (2018) and that on transport stocks in Yu et al. (2012). The magnitude of infrastructure’s coefficient in our analysis is relatively larger in comparison with the results of Calderón et al. (2015) in which a composite infrastructure indicator is registered with a coefficient around 0.8 for a sample of 88 industrial and developing countries including China. One plausible explanation is that our larger coefficient may reflect an overall underprovision of infrastructure at the national level in China, whilst the Calderón et al. (2015) study includes some sample countries in which there is an oversupply of infrastructure and/or factors that hamper the materialisation of infrastructure’s contribution to economic growth. Note that infrastructure capital appears twice in the augmented production function of Eq. (1)—once on its own and once as part of total physical capital. It’s more appropriate to interpret the coefficient of the infrastructure variable as the output effect of increasing infrastructure stocks when holding overall physical capital constant. As pointed out by Canning (1999), Calderón and Servén (2004), Candelon et al. (2013) and Calderón et al. (2015), which all adopt the same approach, a positive and significant coefficient of infrastructure indicates that infrastructure capital is more productive than overall capital in boosting aggregate outputs. All in all, the results lead us to conclude that infrastructure is a strong determinant of economic growth in Chinese provinces.

One merit of the two econometric techniques is that they take into consideration unobserved common factors, which in the case of this study are shocks to GDP correlated across provinces. The estimation results shown in the second column of Table 4 are thus less prone to noises caused by events like the 1997 Asian financial crises, the 2008 global financial crises and macroeconomic policies or measures implemented nationwide in China. Moreover, the CCEMG estimators can take into account multifactor error structures and spatial error correlation that are very likely in our sample of Chinese provinces. Therefore, the coefficient estimates obtained from these techniques are consistent even when there is nuisance spatial dependence across provinces (i.e. a province’s growth is affected by changes in other provinces to the extent that the latter deviate from their steady-state equilibrium).Footnote 6

To explore the robustness of the results, we replicate the same exercise using the alternative measure of infrastructure—the composite indicator constructed from the standard PCA. Table 5 reports the results. The estimated coefficients of physical capital are very similar, in terms of size, to those shown in Table 4. So are the estimates of the output elasticity of human capital, although it becomes marginally significant based on the method of CCEMG. It seems that the parameter estimates of the alternative infrastructure indicator are slightly smaller than those of the robust-PCA index. A close look at the two synthetic indices reveals that the outliers fall into two types. One includes observations in the earlier years of the sample period on provinces like Jiangsu, Zhejiang and Shandong, which turned out to have not very low GDP but relatively poor infrastructure endowments at the time. This phenomenon occurred in the early years of China’s economic reform when such provinces had to overstretch on their limited infrastructure facilities which had not been fed with much state investment. The other group of outliers are the data points covering the most recent years of the sample period for provinces like Qinghai, Inner Mongolia and Xinjiang, which have received heavy investment in infrastructure under the ‘Open-up the West’ program and as part of the country’s plan to transform them into major energy suppliers. It seems that leaving the outliers undealt with tends to produce lower estimates for the effects of infrastructure, because for the former group it suggests that infrastructure is not much required to generate high output and for the latter it implies that high infrastructure stocks do not lead to more output. Nonetheless, the coefficient estimates are positive and statistically significant, confirming that infrastructure plays an important role in promoting growth in China.

Table 5 Estimation of long-run elasticities of output: alternative infrastructure index (period = 1985–2012; no. of provinces = 29)

We also conduct another robustness check by adopting the CupBC (continuously updated and bias corrected) and CupFM (continuously updated and fully modified) estimators proposed by Bai et al. (2009). The two estimators, which are iterative in nature, are able to accommodate cross-sectional dependence generated by unobserved global stochastic trends and allow for the joint estimation of the slope parameters and the stochastic trends. Another appealing aspect of the estimators is that they remain valid in the presence of mixed stationary and non-stationary factors, as well as the observed regressors. The estimation results are summarised in Table 6. Overall, they are qualitatively similar to those obtained from the CCEMG and AMG estimators.

Table 6 Estimation of long-run elasticities using estimators proposed in Bai et al. (2009)

The above analysis tells us that a long-run relationship exists among the variables in Eq. (1). To complement the analysis and investigate the direction of causality, we employ the procedure developed by Dumitrescu and Hurlin (2012), as discussed in Sect. 3.5. Table 7 presents the results, which indicate that causality runs from infrastructure to output. This finding remains robust irrespective of the measures of infrastructure used. Thus, it is fair to say that the estimates of infrastructure’s coefficient obtained from our analysis have not mixed up the output elasticity of infrastructure with the income elasticity of the demand for infrastructure.

Table 7 Heterogenous panel causality

So far the analysis has assumed a common linear production function for all provinces. It is possible that there are regional differences in the impact of infrastructure in a country like China. To this end, we obtain the estimates for individual provinces and present the coefficient of the infrastructure variable obtained from the CCEMG and AMG techniques in the first two columns of Table 8. Provinces are grouped into three regions according to the classification which is based principally on geographical location but also reflects the differences in economic development, with the eastern region being the most developed and the western area the least developed. It is evident from the results that the estimated coefficient of infrastructure is positive and significant in all the coastal provinces, with a magnitude larger than that obtained from the full sample in the majority of the cases. It is in general smaller for the provinces in the central region and even turns negative in the cases of Guangxi and Jiangxi (although insignificant). In the western region, the estimated parameter is statistically no different from zero for at least half of the provinces, indicating there is no discernible impact of infrastructure on economic growth or, more precisely, infrastructure stocks there are no more productive than overall physical capital.

Table 8 Estimation of long-run elasticity of output with regard to infrastructure: individual provinces

The cross-province heterogeneity in infrastructure’s effect may be due to various reasons. A possible explanation lies in quality differences in infrastructure across provinces. It is agreed in the literature that the effectiveness of infrastructure depends crucially on its quality. Despite a lack of comprehensive data on the quality of provincial infrastructure stocks, there have been observations that coastal areas are endowed with better quality infrastructure (e.g. more high-speed railways and better maintenance of highways) (Bai and Qian 2010; Yu et al. 2012). Differences in quality and maintenance of infrastructure may have contributed to differences in infrastructure’s growth-enhancing role.

Another plausible explanation is the one we have used to explain the relatively higher output elasticity of infrastructure found in our analysis in comparison with that of the cross-country study of Calderón et al. (2015); that is, the relative shortage or over-supply of infrastructure capital. As indicated by Canning and Pedroni (2007), Aschauer (2000) and Cadot et al. (2006), the extent to which an increase in infrastructure (at the price of lowering investment in other physical capital) can raise aggregate outputs depends on an optimal allocation between the two types of capital. Following this line of argument, it is predicted that the efficacy of infrastructure varies according to the ratio of the two types of capital. To examine whether the relative shortage or oversupply of infrastructure is a possible explanation to the provincial differences in the coefficient estimate, we compute the ratio of infrastructure stocks (the robust-PCA measure) to total physical capital. Due to the lack of data on costs or prices of constructing infrastructure facilities, we are unable to construct ratios with both types of capital measured in the monetary terms. Nonetheless, the ratios constructed in this paper are comparable across provinces and over time. The last column of Table 8 shows the ratios of individual provinces averaged over the sample period. It can be seen that, roughly speaking, high ratios are more likely to associate with low and/or insignificant coefficients of the infrastructure variable.

To shed further light on this, a panel threshold regression proposed by Hansen (1999) is carried out with the infrastructure-to-physical-capital ratio being the threshold variable. This is to see whether the effect of infrastructure varies according to the ratio, or whether there is nonlinearity in the effect conditional on the relative shortage or oversupply of infrastructure stocks. It should be noted that it is practically difficult (if not impossible) to integrate the CCEMG or AMG estimator into a panel threshold regression model. Therefore, we only undertake a standard threshold regression and the magnitude of the coefficients obtained are not directly comparable to those shown in Tables 4 and 5. The estimation reveals that there exists a single threshold and that infrastructure’s coefficient is 0.041 and statistically significant when the ratio is smaller than 165 and it is − 0.020 but statistically insignificant with higher ratios. The result implies that infrastructure complements other physical capital only to the extent that there is no relative oversupply. Although it is by no means a rigorous analysis, the exercise indicates that the marginal productivity of infrastructure tends to be higher in places with relative undersupply of infrastructure and low or even null in locations where infrastructure is too much in relation to other physical capital.

6 Conclusions

Infrastructure investment has long been used as a development tool, but its effectiveness remains a controversial issue among researchers and practitioners. One of the oldest debates in the literature is whether infrastructure capital contributes to aggregate output in the long run. A related question is whether it is beneficial to increase infrastructure investment in less developed regions to boost their growth. This paper attempts to shed empirical light on these issues by employing an infrastructure-augmented production function approach and drawing on the case of China, which has now become the world’s largest investor in infrastructure. A panel data set of 29 Chinese provinces over the period 1985–2012 is analysed based on an econometric strategy which addresses some methodological limitations of earlier literature. Synthetic indices are constructed to capture multidimensionality of infrastructure, based on physical indicators of four types of infrastructure assets. The analysis deals explicitly with non-stationarity of the variables and tests for the direction of causality.

The analysis results show that infrastructure, on average, impacts positively on growth in China. Infrastructure capital is in general more productive than physical capital in boosting output. The causality test indicates that causality runs from infrastructure to growth. However, the analysis on individual provinces suggests that there are provincial differences in the impact of infrastructure, probably due to the differences in the relative overprovision or shortage of infrastructure. From the policy perspective, the findings provide empirical support for development strategies involving the use of infrastructure investment. It is desirable for countries or regions which suffer from inadequate infrastructure to increase infrastructure stocks and benefit in the long run. However, if the over- or under-supply explanation is plausible, the effectiveness of such policies may be limited if adopted by countries (developing and developed ones alike) or areas with high infrastructure stocks relative to, for instance, physical capital. In other words, policy makers should use infrastructure investment as a means of boosting growth or promoting regional development only to the extent that it does not lead to oversupply of infrastructure.