On the estimation of spatial stochastic frontier models: an alternative skew-normal approach

This paper deals with an alternative approach to combine spatial dependence and stochastic frontier models using a large statistical literature on skew-normal distribution functions. I show how to combine a spatial dependence structure with a stochastic frontier model, that is, (1) straightforward to estimate, (2) able to combine spatial dependence and a technical efficiency term in a single error term, and (3) produce consistent estimates. With smaller sample sizes estimation of the parameter, governing technical efficiencies becomes imprecise. The consistency of parameter estimation is shown using simulations, and I provide an empirical application to estimate spatially correlated technical efficiencies within an European regional production function context.


Introduction
One of the most distinct features of European regions is that they differ widely in their economic performance, even when controlling for regional characteristics, such as sector structure and population size. Obviously, countries differ in terms of institutions, culture, stability, and so forth, which determine for a large part the international differences in economic performance. However, wealth and income are sometimes even more dispersed within countries than across countries. To illustrate this, Fig. 1 shows the dispersion of relative regional GDP per capital across European countries.
European income seems to be concentrated within large metropolitan areas (most notably-the capital cities) such as Paris, London, Luxembourg, Oslo, and Stockholm. Apart from this urban-rural divide, large differences are present as well  Econometrics (2015) A strongly related research question deals with the exact nature of economic performance and how to measure it. To do so, endowment levels should be taken into account. In the economics literature, this can be reflected by the use of regional production functions (see, e.g. Rodríguez-Pose and Crescenzi 2008; Basile et al. 2012). Given the size of production factors, such as labour and capital, regions should attain a certain production level, but usually produce suboptimal. The distance between the optimal and actual production level is usually measured by technical (in)efficiencies and, stochastically, modelled by a stochastic frontier approach.
There is already a sizeable literature dealing with benchmarking regions using regional technical efficiencies modelled by a stochastic frontier approach (see, amongst others, Driffield and Munday 2001;Brock 1999;Puig-Junoy 2001;Puig-Junoy and Pinilla 2008;Alvarez 2007;Otsuka 2017). 2 This literature usually deals with the relative (sectoral) performance of regions, and this is the approach this paper takes as well. The production factors are then usually constituted of the aggregates of various forms of labour (high skilled and low skilled) and capital (both physical and human) within a region.
However, taking only local endowments into account boils down to an absolute location approach: it does not matter where the region is located with respect to its neighbours. However, the relative location of the region matters as well as regions are intrinsically connected to each other in networks formed by trade, knowledge spillovers, commuting, and migration (Thissen et al. 2016). It is crucial to control for this spatial dependence as omitting it might lead to bias-at least in the estimation of technical efficiencies (Anselin 1988).
The literature that combines spatial dependence and stochastic production frontiers is, although relatively recent, already sizeable. Most studies employ a parametric approach, and the enumeration that follows is definitely not conclusive. One of the first parametric studies was Barrios and Ladado (2010), who uses an iterative back-fitting algorithm to find consistent parameter estimates although they do not allow for correlation between the technical efficiency and spatial dependence structure. Pavlyuk (2010) uses as well a parametric approach, but does not report how he estimates consistently both the spatial dependence process and technical efficiencies. Fusco and Vidoli (2013) and Vidoli et al. (2016) separate out the error term in a spatial lag structure and technical efficiencies, with an application to the Italian wine sector. Kinfu and Sawhney (2015) apply a spatial stochastic frontier analysis to maternal care in India. Glass et al. (2013) decompose productivity growth using a spatial autoregressive model, whereafter Glass et al. (2016) extend the analysis to a spatial panel setting. Finally, Jiang et al. (2017) apply a fixed effects stochastic frontier model to energy efficiency in Chinese Provinces. In addition, there is a smaller literature that resorts to a Bayesian approach and simulation techniques, i.e. Schmidt et al. (2009), Areal et al. (2010, and Tsionas and Michaelides (2016).
A specific feature that applies to most of the studies above is that they model the spatial dependence and efficiency processes separately (see, e.g. Fusco and Vidoli 2013). Then, as I will argue below, the error term is by definition multivariate as it is a combination of a normal and truncated normal distribution, where one of them or even both are multivariate due to the involved spatial correlation structure, which makes estimation cumbersome.
In contrast, this study applies a alternative approach firmly rooted in the statistical literature. Using a relatively straightforward skew-normal distribution approach, I show how to combine a spatial error structure with a stochastic frontier model, that is, (1) straightforward to estimate, (2) able to combine spatial dependence and a frontier model in a single error term, and (3) produce consistent estimates. The latter is shown by a simulation study, where-although all parameters are consistent-it is clear that the parameter measuring technical inefficiencies is very inefficient (i.e. large standard errors) with small amounts of observations. Skew-normal distribution is not often applied in the econometric stochastic frontier literature with as notable exception Chen et al. (2014), although they are looking at fixed effects panel models instead of spatial dependence models.
The remainder of this paper is structured as follows. The next section introduces the concept of regional technical efficiencies and discusses some measurement issues. Consecutively, it treats the modelling (and its associated estimation) of technical efficiencies in two ways: a mainly econometric and a more statistical one. 3 The last subsection deals with the introduction of spatial dependence in stochastic production frontiers. Section 3 provides simulation results to indicate the performance of the proposed estimation methods, within small and realistic samples as usually encountered when benchmarking (European) regions. Section 4 provides an application of spatial stochastic frontier modelling and gives an estimation of the average technical efficiencies of European NUTS2 regions in the period 2000-2010. The last section concludes by indicate how in the proposed framework, more complex spatial dependence structures can be incorporated in stochastic frontier models. Estimation of these models, however, requires complex multivariate likelihood or simulation techniques.
various forms of shirking on the work floor. Or firms do not have access to the same technology and have therefore different output levels.
This, however, creates a problem. If most firms do not produce according to profit maximization, but systematically lower than that, then traditional production function estimates are biased. 4 Namely, not being able to optimize profits or costs leads to the fact that firms end up beneath an estimated ideal profit level. Consequently, in the literature associated with stochastic production functions, the error terms are usually composed error terms: the traditional error term reflecting noise and a new error term-being strictly positive-measuring a firm's inefficiency.
Analogously to firms, regions with similar inputs do not necessarily attain the same production level as well. Partly, this may be due to missing covariates (such as not being able to correctly measuring human and social capital, but partly this may be caused by the fact that inputs are not always deployed as efficient as possibledue to local or national institutions, social structures, etc.
To control for this, the regional science literature has borrowed from the firmspecific efficiency literature the concept of regional stochastic production frontier analysis. As Fig. 1 clearly shows, some regions are probably more efficient than others-even within the same country. And this should be reflected when benchmarking those regions.
To illustrate the concept of a regional stochastic production frontier, the left production isoquant in Fig. 2 shows how a region's technical efficiency can be measured. Denote y as the given maximum attainable production a region can get using the production factors x 1 and x 2 , say capital and labour. Regions A 1 , A 2 , A 3 , B 1 , B 2 , and B 3 are all producing inefficiently. With the production factors x 1 and x 2 that theoretically enable them to produce y, they produce on average ŷ . The distance between

Fig. 2
Production possibility frontiers in a standard (left) and a spatial structure (right) 4 A similar line of reasoning could be held for minimizing cost functions. The remainder of this section deals with production functions, but note that the same arguments hold for costs functions as well.
Y and ŷ is then a measure for the average efficiency. More precisely, average efficiency is defined as the ratio ̂ ∕ where is the length of the line between the origin and y. As a result, technical efficiencies must be smaller than 1.
Estimation of technical efficiencies may, however, be biased in the presence of spatial dependence or unobserved spatial heterogeneity amongst regions. Namely, when one assumes that only neighbouring regions benefit from other regions' technological knowledge through the traditional Marshallian channels of shared customers and suppliers, shared labour pools or spillover mechanisms (or just through unobserved spatial heterogeneity), then straightforward estimation of technical efficiencies is biased. This can be seen in the right production isoquant depicted in Fig. 2. Assume that regions A 1 , A 2 , andA 3 belong to country A and regions B 1 , B 2 , and B 3 belong to country B, then it quite well conceivable because of spatial unobserved heterogeneity or spatial dependence that the technical efficiencies of the regions in country A and B are related. In Fig. 2, region B 3 might not produce inefficiently at all given the fact that neighbouring regions in country B produce less efficiently. This works as well the other way around. Regions very central in a network and surrounded by very efficiently producing regions have besides a strong economic structure probably a very favourable relative location as well. In this context, efficiency can be related to advantages related to the absolute location, while spatial dependence relates to the relative location. To control for the inefficiency in both production and the geographical location, this paper incorporates a spatial correlation structure in stochastic production frontiers.
To show how one can incorporate spatial dependence in regional cross-sectional stochastic production frontier models, I first revisit concisely the non-spatial stochastic frontier model in Sect. 2.1. 5 Thereafter, in Sect. 2.2, an alternative estimation and not commonly used estimation method is introduced. 6 Finally, I show how one can readily incorporate spatial dependence in stochastic production function frontier models in Sect. 2.3.

Stochastic production frontiers
To start with, assume for simplicity that the production, y i of firm i (i ∈ {1, … , N)) can be modelled in a cross section as a Cobb-Douglas production function, thus (using vector notation) 7 : where X is the matrix of production factors, the vector parameters of the Cobb-Douglas production function and TE denotes the so-called firm-specific technical efficiency. Thus, TE is a distance measure of the firm to the (maximum) production of the best production firm there is-within the sample of firms. As a consequence, TE must be smaller or equal to one for each firm. Aigner et al. (1977) and Meeusen and van den Broek (1977) already specified (1) by assuming that TE = exp(−u) , where u represents a stochastic variable. Assuming a logarithmic specification yields: where u being a stochastic variable as well, where u ∼ N(0, 2 u ) and v ∼ N(0, 2 v ) and with the explicit condition that u > 0.
For likelihood purposes, one usually considers the composite stochastic variable = v − u . Further, usually both u and v are conveniently considered independent. This enables us to find the marginal density of , namely: Note that the marginal distribution of is a conditional distribution of u and v and that u and v are intertwined by this conditional nature. 8 Note further that an estimate of the technical efficiency can now be obtained by finding the distribution of f (u| ).
Obviously, estimation of this model with ordinary least squares regression creates a bias because of the simultaneous appearance of the two stochastic variables with one being truncated at zero. The traditional estimation procedure uses a likelihood procedure based on the density in Eq. (3). However, introducing a more complex error structure in specification (2) is rather cumbersome and not very intuitive. The next subsection proposes therefore an alternative specification and corresponding estimation procedure, which is more straightforward to adapt.

A skew-normal approach
The stochastic error structure, , in specification (2) dates back to Weinstein (1964) and can be rewritten in its most simple form as the sum of a normal and a truncated normal distributed variable: where and are independent N(0, 1) variables, and ∈ (−1, 1) . Here, the stochastic variable is generated by means of convolution. A different genesis of can be realized by conditioning: where ( , ) is distributed as a bivariate normal random variable with correlation . From here, it is quite straightforward to show that both geneses (4) and (5) of lead to the same density function: which is called the skew-normal density function. 9 The parameter in density (6) is a skewness parameter and determines the shape of the density function. Density (6) is shown in Fig. 3 for some values of the parameter . When is positive, the density is skewed to the right and when it is negative, it is skewed to the left. When is zero, the density becomes a standard normal density function and if → ∞(−∞) , then the density converges to the half-normal density; 2 (x) for z ≥ 0(≤ 0).
Conveniently, if ∼ SN( ) and ln(y) = ln (X ) + , then the affine transformation ln(y) ∼ SN(ln (X ), 2 , ) holds, which can be expressed as: ∼ 2 ((ln y) − ln (X ); 2 ) ( (ln (y) − ln (X ))). The seminal paper in this field is by Azzalini (1985). Other relevant references with respect to the skew-normal distribution are, among others, Azzalini and DallaValle (1996), Azzalini and Capitanio (1999), Azzalini (2005), and Arellano-Valle and Azzalini (2006Azzalini ( , 2008. Note that in this case ln X , 2 and can be seen as a location parameter, a scale parameter and a skewness parameter, respectively. The direct relation between specification (2) and (7) can be seen as well through stating ln y i − ln X i = (v|u > 0) = , where and where = ∕ √ 1 − 2 , = u , and √ 1 − 2 = v . The latter equality signifies the intrinsic relation between u and v which is implicit in specification (2). Note that specification (2) only holds when < 0 . I do not explicitly impose this condition on the model, but choose to leave this as an empirical test.
Estimation of the density in (7) is rather straightforward. When using maximum likelihood, the log likelihood can be denoted as: where i is the ith observation of the vector .
Skew-normal distributions are not much used in econometrics, but for this purpose, they will do very nicely. 10 They allow us to use a single error term instead of a composite one, which has some benefits (such as clarity) when working with multivariate distributions. Moreover, the interpretation of the parameters seems as well more intuitive (using scale, location, and skewness parameters). A disadvantage of using skew-normal distributions is the need to use a re-parametrization of the parameters in order to estimate them properly.
The next subsection introduces a spatial variant of this distribution function and applies it to both spatial lag and spatial error models.

Spatial dependence in stochastic production frontier models
Adopting the skew-normal distribution enables us to directly adopt a spatial lag modelling approach [or SAR model as defined by LeSage and Pace (2009)] as follows: where is defined by (8). Likewise for the spatial error model: Some authors, such as Fusco and Vidoli (2013) choose to directly separate the spatial efficiency term and the spatial error structures as, e.g.: corresponding with model (4). Note that apart from interpretation issues, this actually raises additional difficulties with the fact that and should now be related to each other in a very intricate multivariate way.
If we now adopt the standard notation that A = [I − W] and B = [I − W] , then the log likelihood specified in (9) is straightforwardly adapted to a spatial lag or spatial error model. For instance, the log likelihood for a stochastic frontier model with spatial dependence in the error term resolves to: Finally, I need to calculate the technical efficiencies as resulting from the vector . For this, I need an expression for (u| ) . Dominguez-Molina et al. (2003) give a generic expression for u| 11 , namely being a normal distribution with mean and variance equal to: Estimation of, e.g. the likelihood of (12) yields ̂ , ̂ (using ̂ ) and ̂ which can then be used to draw from u| simulation wise and derive the expectation for each region.

Simulation
Already preluding the estimation results in the next section, I set up the simulation with the following cross-sectional structure (in vector notation):  Table 1 gives the results of a simulation exercise with only a frontier model. Here, the number of observations (250, 1000, 10,000) is varied as well as the simulated value of ( − 0.2 , − 0.5 , − 0.8 ). All variables, except for ̂ , behave as expected and conform theory. They converge to their true values as the sample size gets bigger. ̂ also converges to its true value, but only for large sample sizes and large true 's. Moreover, its standard deviation is much larger than the other parameters, making this parameter relatively imprecise to estimate.
To simulate a spatial stochastic frontier model, I use the spatial weight matrix, W, from the empirical application (see Sect. 4). 12 So, the model I now estimate is the same as model 15, but now I have in addition the following specification for the error term : with as before. The size of the weights matrix is 256 × 256 , so the sample size is restricted. Therefore, I vary now (with − 0.2 , − 0.5 , and − 0.8 ) and (with 0.2, 0.5, and 0.8). For time constraints, I now draw ln(K) , ln(L) , u and v 100 times. Table 2 presents the simulation results of the corresponding frontier model with an error structure. Conform Table 1, it is clear that with small sample sizes (in this case being 256), the parameter measuring technical efficiencies ( ) is not very precisely estimated, whereas all other parameters are very close to their true value. When the true is closer to one or, to a lesser extent, when gets higher, estimation becomes slightly more efficient, but not by much.

Empirical application: the efficiency of European regions
In this section, I apply the concept of spatial stochastic frontiers to European NUTS-2 regional production functions. The next subsection first describes concisely the data, and the subsequent subsection gives the estimation results.

Data and specification
NUTS-2 (Nomenclature of Units for Territorial Statistics) is a geocode standard for referencing the subdivisions of European countries for statistical purposes, where the addition 2 stands for the geographical level of more or less provinces. I use two databases. For labour, I use the European regional database by Cambridge Econometrics: a database containing detailed sectoral information about the regional provision of labour (see Cambridge Econometrics 2015). For regional gross value added and capital, I adopt the supply and use tables as used previously inThissen et al. (2016) and explained in detail in Thissen and Diodato (2013a, b). This allows us to deal with one of the prevailing data problems in this literature: the calculation of the capital stock. Typically, this is done with a perpetual inventory method. However, this could be problematic, since shocks in the capital stocks (e.g. by deaths or migrations of a firms) do not manifest themselves in the short run. Because there is information on regional value added of capital ( V K ) across regions, sectors and years (so V K r,s,t = r r,s,t K r,s,t ), I can circumvent this problem by using data on sectorspecific interest rates for capital and thus calculate the capital stock per region, year, and sector ( K r,s,t ). 13 To avoid idiosyncratic shocks, the data used are the mean over the period 2000-2010, and the economic sectors that they comprise are: agriculture, energy and manufacturing, construction, distribution market services, and non-market services. The countries included in the estimation can be seen in Fig. 1 and are basically all EU25 countries except for Romania and Bulgaria. The total number of NUTS-2 regions in the dataset is 256. Its geographical distribution is shown in Fig. 1.
To define the spatial weight matrix W, I use a k-nearest neighbour algorithm with k = 4 , where the k-nearest neighbours get a weight of 1. The weights of all other neighbours are set at 0. Finally, I row-standardize W.
These data allow us to estimate the following Cobb-Douglas function: where Y is gross value added, L is the number of workers multiplied by the average hours worked per week, K is the amount of capital, r is the region, and is an error term that can be distributed normally or skew normally. The next subsection provides the results for various sectors and specifications of the production function of (17). Table 3 gives the results for various econometrics specifications for the energy and manufacturing sector. I start with the OLS estimation. The factor rewards (or output elasticities) for capital and labour are not conform theory (typically, labour should get an elasticity of around 0.7 and capital of around 0.3), although not (17) ln Y r = 0 + ln(L r ) 1 + ln(K r ) 2 + r , significantly different from constant returns to scale. A frontier analysis does not alter those strange results, although the likelihood improves significantly. Finally, allowing for spatial dependence (whether that would be a SEM or a SAR frontier model) does not change the estimates of the factor rewards, considerably. However, it is clear that a SEM frontier model performs best in terms of log likelihood. Moreover, a of almost 0.8 indicates significant spatial dependence in the error terms, which should be reflected in the estimations of the regional technical efficiencies. Figure 4 shows the technical efficiencies across European regions for the energy and manufacturing sector as generated by a models with an error structure as given by Eq. 8. Technical efficiencies are measured between 0.2 and 0.6, and clearly, they are spatially correlated, with relatively high technical efficiencies in the centre of Europe (as in France, Belgium, the Netherlands, and Germany) and relatively low technical efficiencies in Poland, Portugal, Greece, and the northern part of the UK. Figure 5 shows the difference between the efficiencies which are generated by a SEM frontier model and the non-spatial technical efficiencies. Clearly, the introduction of spatial dependence ensures that regions in the centre become less efficient and regions in the periphery become relatively more efficient. (The distribution of technical efficiencies over regions becomes more homogeneous.) In other words, regions in the periphery produce technically inefficient but less so when taking their geographical location into account. Thus, where regions are located matters just as their economic structure.

Results
As the frontier model with spatial error structure perform best in Table 3, I extend the analysis and apply this model to other sectors. The results are shown in Table 4. Clearly, and already indicated by the simulation exercise, the other sectors do now show evidence of a frontier model structure as the parameters are not only all close to zero but also have very large standard errors. 14

In conclusion
The main aim of this paper is to introduce spatial dependence in stochastic production frontier analysis. I do so by using a skew-normal distribution function approach, which I argue is (1)  spatial dependence and technical efficiencies, and (3) produce consistent estimates. These results can be interpreted using the discussion on relative and absolute geographic location. The size of endowments and thus maximum attainable production are caused by a mixture of absolute geographic location and historical path dependence. Similarly for regional efficiency, as it can be argued that they are mainly caused by institutions and social structures. However, spatial dependence measures the location within the network and could thus be a measure for the relative location. Central regions just performed better because they have better access to production inputs and technology. When comparing regions' performance, it would be fairer to control for the region's location within the network. Obviously, there is more to this because technical efficiencies itself may be spatially dependent instead of the error structure in total. (For instance, there are spillovers in the adoption of new technology that improve the efficiency or there are specific institutions, such as former guilds or unions, that prohibit the adoption of new technologies spatially concentrated.) In any case, when looking at the efficiency of regions, taking into account spatial dependence-whether in the inefficiency part or not-strongly affects the estimates of technical efficiencies in the energy and manufacturing sector.
Unfortunately, the parameter which governs technical efficiencies (in this case the skew-normal parameter or the parameter in the traditional literature) is volatile when the parameter itself is small or with small amounts of observations (which is typically the case in spatial econometrics applications)-whether in a spatial setting or not. The simulation exercise shows that this does not affect the other parameters but that one should be careful in drawing strong conclusions when applying (spatial) frontier analyses with a small number of observations, such when analysing European regional performance.
For our empirical application, when looking at the energy and manufacturing sector in European regions, taking spatial dependence into account controls more or less for the core-periphery pattern in Europe. Thus, regions in the periphery do not produce that inefficiently only because of their economic structure, but partly as well because of their location and the related diminished access to knowledge and information. Obviously, the estimations are restrictive regarding the data and specification I use. Ideally, one would like to model larger regional datasets, to test the alternative skew-normal approach to spatial stochastic frontier models. A viable avenue for further research would be to use regional panel data instead of cross-sectional data.