Robust stochastic frontier analysis: a Student’s t-half normal model with application to highway maintenance costs in England
- 270 Downloads
Abstract
The presence of outliers in the data has implications for stochastic frontier analysis, and indeed any performance analysis methodology, because they may lead to imprecise parameter estimates and, crucially, lead to an exaggerated spread of efficiency predictions. In this paper we replace the normal distribution for the noise term in the standard stochastic frontier model with a Student’s t distribution, which generalises the normal distribution by adding a shape parameter governing the degree of kurtosis. This has the advantages of introducing flexibility in the heaviness of the tails, which can be determined by the data, as well as containing the normal distribution as a limiting case, and we outline how to test against the standard model. Monte Carlo simulation results for the maximum simulated likelihood estimator confirm that the model recovers appropriate frontier and distributional parameter estimates under various values of the true shape parameter. The simulation results also indicate the influence of a phenomenon we term ‘wrong kurtosis’ in the case of small samples, which is analogous to the issue of ‘wrong skewness’ previously identified in the literature. We apply a Student’s t-half normal cost frontier to data for highways authorities in England, and this formulation is found to be preferred by statistical testing to the comparator normal-half normal cost frontier model. The model yields a significantly narrower range of efficiency predictions, which are non-monotonic at the tails of the residual distribution.
Keywords
Stochastic frontier analysis Robust analysis Student's t Outliers Wrong KurtosisJEL Codes
C12 C13 C44 C46 D24 R42 R531 Introduction
Frontier analysis is concerned with the measurement of efficiency relative to an estimated production or cost frontier. The presence of noise in the sample is potentially problematic in two ways: it affects the position of each decision making unit (DMU) relative to the frontier, and it affects the precision of the estimates of the shape of the frontier itself. The magnitudes of these effects vary from one method to another: deterministic methods, such as data envelopment analysis (DEA) (Charnes et al. 1978) and corrected ordinary least squares (COLS) are particularly sensitive, given that they make no allowance for noise. In contrast, Stochastic Frontier (SF) Analysis explicitly controls for noise, mitigating its impact on the estimated frontier and on individual efficiency scores.
The range of efficiency scores may still be very large in the presence of data with outlying observations. In terms of empirical motivation for this paper, this paper is in response to the authors’ finding of an implausibly wide range of efficiency scores in our work studying cost drivers and cost efficiency in the highways maintenance operations of local authorities in England, which utilises detailed data on operating and capital expenditure provided by each authority. This can be narrowly explained as being due to a combination of under-reporting and over-reporting of expenditure, unobserved investment cycle effects, and extreme weather events. However we also came across this issue in a number of other datasets, across numerous sectors including hotels in Taiwan (Chen 2007), container ports (Cullinane et al. 2006), regional iron and steel production in China (Lin and Wang 2014), crop farms in Poland (Latruffe et al. 2004), French insurance companies (Fecher et al. 1993), Belgian municipalities (De Borger and Kerstens 1996), US banks (Bauer et al. 1998), and police forces in England and Wales (Drake and Simper 2003). In general, as observed by Gupta and Nguyen (2010), financial data are often heavy tailed. For example Berger and Humphrey (1991) observe heavy tailed distribution of costs in banking, an industry to which efficiency analysis is often applied. As such the issue in this paper has broad applicability across the performance literature.
In this paper we outline and discuss the merits and empirical application of a new stochastic frontier model which accommodates the influence of outlying observations. This is the stochastic frontier model with a Student’s t distribution for the noise term. The advantages of this model over previous proposals lie in its flexibility, since the degree of kurtosis is no longer fixed but allowed to vary with the degrees of freedom parameter. In fact, the Student’s t distribution nests (or more precisely, contains as a limiting case) the normal distribution as the degrees of freedom parameter approaches infinity. For any given distribution of u_{i}, our model encompasses the standard SF model. This enables testing down against the standard model, in contrast to previous proposals which utilise non-nested specifications, and lets the data determine the extent to which outlying observations influence the kurtosis of the noise error term. Thus our model is an original and significant contribution to the literature, not just in being able to better accommodate outlying observations in efficiency analysis relative to the standard SF model, but it is the first contribution to contain as a testable limiting case the standard SF model. As such our model provides a natural extension to the tools of practitioners in the field.
The structure of this paper is as follows: Section 2 reviews existing methods available in the literature to handle a large number of outliers in frontier analysis. Section 3 introduces a t-truncated normal stochastic frontier model for dealing with heavy-tailed noise and discusses efficiency prediction. Testing of the normal distribution hypothesis is considered in Section 4. Section 5 presents Monte Carlo evidence on the performance of the maximum simulated likelihood estimator of the model. Section 6 applies the Student’s t-half normal model to data on highways maintenance costs in England and compares the results to those obtained from normal-half normal model, and Section 7 gives our summary and conclusions. Some technical results appear in the Appendices.
2 Approaches to outliers in stochastic frontier analysis
Given that the presence of outliers, where these are attributed to noise rather than inefficiency, implies a leptokurtic^{1} noise distribution, the normal distribution usually assumed for v is inadequate since it is mesokurtic for any given parameter values. Intuitively, we may therefore expect outliers in the data to result in an exaggerated spread of efficiency scores for two reasons: first, because of an inflated estimate of the scale of the distribution of u, and second because of insufficient shrinkage of u towards its mean, especially at the extremes. That is to say, if leptokurtosis in the noise term due to outliers is not taken into account, residuals will be attributed disproportionately to inefficiency rather than noise, particularly in outlying observations. This motivates the development of alternative SF models that can accommodate outliers.
A review of the literature in this area provided by Stead et al. (2018) covers several different approaches to outliers: the use of alternative efficiency predictors, thick frontier analysis, heteroskedastic SF models, and the use of alternative noise distributions. Other possible approaches include right truncation of the inefficiency distribution to restrict the range of possible efficiency predictions, and detecting and removing outliers.
The detection and removal (or reweighting) of outliers is a common approach to dealing with outliers in regression analysis, with outliers identified on the basis of the extent to which an observation has a disproportionate effect on parameter estimates, captured by Cook’s distance (Cook 1977), Mahalanobis distance (Mahalanobis 1936), and similar measures. However, existing methods may not be appropriate in the case of skewness of the composed error. For example, Cook’s Distance explicitly assumes normally distributed errors, while in SF analysis the composed error has a skew normal distribution (Azzalini 1985), and in the normal-exponential case it follows what is known as an exponentially modified Gaussian distribution (Grushka 1972). One approach to outliers could be to derive measures of influence and leverage appropriate for SF models, although these would depend upon the particular specification used.
Finally, we can account for outliers by assuming an appropriate distribution for v. In principle, any distribution that is symmetric, centred around zero and unimodal is an appropriate candidate for the distribution of v. Although far more attention has been paid in the literature to the distribution of u, several such alternatives have been suggested. Outside of the SF literature, Lange et al. (1989) suggest the use of the Student’s t distribution for the error terms as a robust alternative to OLS. Lange and Sinsheimer (1993) discuss estimation when errors are drawn from the logistic, slash, t, and contaminated normal distributions. Note that the latter is simply the mixture of two normal distributions with the same mean but differing variances. All of these distributions have heavier tails than the normal distribution, and therefore offer greater robustness to outliers. Reviews of the SF literature by Greene (2008) and Parmeter and Kumbhakar (2014) include some discussion of alternative noise distributions.
In the context of SF analysis, Tancredi (2002) proposes a model in which v is drawn from a Student’s t distribution and u from a half t distribution. Tchumtchoua and Dey (2007) study the Student’s t-half t model in a Bayesian setting. Griffin and Steel (2007) and Hajargasht and Griffiths (2018) also discuss estimation of Bayesian SF models with Student’s t noise using Markov Chain Monte Carlo (MCMC) and variational Bayes methods, respectively. Nguyen (2010) introduces two additional alternative heavy tailed distributions for v: the Laplace distribution and the Cauchy distribution – note that the latter is a Student’s t distribution with degrees of freedom equal to one – and derive formulae for Cauchy-truncated Cauchy and Cauchy-half Cauchy SF models. Applications are shown in Gupta and Nguyen (2010). Horrace and Parmeter (2018) study Laplace-truncated Laplace and Laplace-exponential^{2} SF models. Noting that the Laplace distribution is ordinary smooth, in contrast to the normal distribution, which is supersmooth, they conjecture that the Laplace distribution may be advantageous with respect to the deconvolution of ε into v and u, since optimal convergence rates for deconvolution problems decrease with the smoothness of the noise distribution, being in particular much slower when the noise distribution is supersmooth rather than ordinary smooth (Fan 1991; 1992).
The performance of these models in the presence of outliers is interesting, but under-explored. Following the previous discussion, given the use of heavy-tailed distributions for v, we might expect these models to be more robust to outliers in terms of parameter estimation, and in terms of yielding less extreme efficiency predictions for outlying observations. Indeed, Tancredi (2002) shows that as sε_{i}→∞, f_{u|ε} (u|ε) becomes completely flat in the Student’s t-half t model, in contrast to how f_{u|ε} (u|ε) becomes a degenerate distribution at zero as sε → ∞ in the normal-half normal model. Similarly, Horrace and Parmeter (2018) show that f_{u|ε} (u|ε) – and hence E(u|ε) – is constant for positive values of sε in the Laplace-exponential case. Focusing more explicitly on outliers, Stead et al. (2018) propose a model in which v follows a logistic distribution and u follows a half normal distribution, and show that the model results in a smaller estimate for VAR(u) and yields a considerably narrower spread of efficiency scores than the normal-half normal model, with little change in exp[E(−u|ε)] for large |sε|. Each of these cases suggest that the choice of a heavy tailed distribution for v significantly affects the prediction of efficiency, and the uncertainty in prediction, at one or both tails, producing less extreme efficiency predictions.
3 The Student’s t-truncated normal SF model
In this section, we propose a robust SF model in which v follows a Student’s t distribution and u follows a truncated normal distribution. The Student’s t distribution is for v has a number of attractive properties in the present context. First, it is a heavy tailed distribution, and thus more robust to the presence of outliers. Second, it avoids the arbitrary assumptions on the degree of kurtosis in v embedded in existing models—the normal, logistic, and Laplace distributions have excess kurtosis of 0, 1.2, and 3, respectively, regardless of parameter values. Ultimately, the kurtosis of v is an empirical question, and therefore we should ideally use a distribution for v for which kurtosis is flexible. The kurtosis of the Student’s t distribution depends upon the degrees of freedom parameter. Third, the Student’s t distribution nests the Cauchy distribution when the degrees of freedom parameter is equal to one and the normal distribution as it approaches infinity. Therefore an SF model with a Student’s t distribution for v encompasses a model in which v follows a Cauchy or normal distribution for any given distribution of u. This enables testing against these alternatives. In the latter case, we are testing against the standard SF model, which could be interpreted as a test of robustness to outliers.
Below, we derive simulated log likelihood functions and efficiency predictors for the Student’s t-truncated normal SF model, and discuss estimation and hypothesis testing. Results for the Student’s t-half normal and Student’s t-exponential models can be obtained via some simple modifications. Extensions to other distributions of u are straightforward if the quantile function of that distribution has a closed form, while in many other cases—as with the Student’s t-gamma—the simulated log likelihood function becomes slightly more complex.
3.1 Formulation and estimation
And by substituting (13) or (14) into (11), we have the simulated log-likelihood function for the Student’s t-truncated-normal and Student’s t-exponential models respectively. Other proposed distributions for u, such as the Weibull (Tsionas 2007) and Rayleigh (Hajargasht 2015) distributions also have closed form quantile functions. However, even in cases in which the quantile distribution has no analytical expression, such as the gamma distribution, forming the simulated log-likelihood is possible—see Greene (2003).
Proceeding with the Student’s t-truncated normal variant of the model, (11) and (13) give the simulated log-likelihood function. First order conditions for maximisation are given in Appendix B. One remaining issue is the method of taking random draws: we prefer to use Halton draws, which aim for good coverage of the unit interval rather than randomness: this significantly reduces the number of draws needed to approximate the integral (see Greene (2003) for a fuller discussion).
3.2 Parameter identification
Note that since the moments of the Student’s t distribution are variously undefined or infinite for small a, this is also the case for moments of ε, so these estimators exist only when a is sufficiently large. Further, just as the method of moments approach to estimation of the standard SF model breaks down when μ_{3} ≤ 0, μ_{4} must be sufficiently large that the denominator in (22) is positive for method of moments to be used in this case.^{4}In Section 5, we undertake Monte Carlo simulations to analyse the performance of the model for a variety of values of α = 2, a = 4, a = 5, a = 10, and a → ∞, the latter corresponding to v ~ N(0,1).
Well-known results from Waldman (1982) show that the OLS estimator is a stable stationary point of the log-likelihood function for the standard model, and that the identification of the σ_{u} parameter hinges on the skewness of the OLS residuals. Horrace and Parmeter (2018) show that, while these results do not apply to the SF model with Laplace noise since OLS is not the limiting estimator as Var(u) → 0, an analogous result applies: the least absolute deviations (LAD) estimator is a stationary point of the log-likelihood^{5}. It is trivial to show that a similar result applies to our model, i.e. that the ML estimator of a regression model with Student’s t errors is a stationary point in our log-likelihood function. Horrace and Wright (forthcoming) show that such a stationary point exists under very weak distributional assumptions about v and u. This suggests that identification of σ_{u} in our model depends upon the skewness of the residuals from the Student’s t regression model. However, while Horrace and Parmeter (2018) point out that in the Laplace case the LAD stationary point is not stable due to non-differentiability of the log-likelihood function in the limiting case, this does not appear to apply in the Student’s t case, in which the log-likelihood function is everywhere differentiable.
3.3 Efficiency prediction
4 Hypothesis testing concerning the noise distribution
We present simulation evidence regarding the distribution of the LR, under the null hypothesis that v is normally distributed, in Appendix C. We do this to verify that the results of Chen and Liang (2010) do apply in this case. Note that we could reparameterise the model to include an inverse degrees of freedom parameter, 1/a, in which case the standard model is the limiting case as 1/a → 0; however, this boundary remains an open boundary. In particular, we note that in this setting, and the two other examples appearing in the literature discussed above (Weibull and upper truncation point in the inefficiency distribution), the null hypothesis corresponds to a parameter value at an open (and not closed) boundary. Chen and Liang (2010) state that their result still holds when the boundary is an open boundary, but we wish to verify this. The evidence presented in Appendix B lends support to the idea that \(LR\sim \chi _{1:0}^2\) under the null hypothesis. We therefore consider this test to be appropriate for this purpose.
5 Monte Carlo simulations
And v is a standard Student’s t random variable with degrees of freedom parameter a = 2, a = 5, a = 10, and a → ∞, respectively. Note that in the latter case, v is normally distributed, so we draw from the standard normal distribution. For each of these DGPs, we then vary the number of observations used in each replication. We consider the cases N = 100, N = 200, and N = 1000. This will give some insight into the small sample performance of the model.
Normal-half normal vs. Student's t-half normal model under various data generating processes—1000 replications
N = 100 | N = 200 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Normal-half normal | Student’s t-half normal | Normal-half normal | Student’s t-half normal | ||||||||||
DGP 1 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 0.765 | 1.429 | 1.641 | 1.241 | 1.427 | 0.589 | 0.824 | 1.528 | 1.876 | 1.237 | 1.239 | 0.514 |
β _{1} | 1 | 1.001 | 0.983 | 0.547 | 0.991 | 0.993 | 0.166 | 0.974 | 0.991 | 0.547 | 0.994 | 0.998 | 0.116 |
σ _{v} | 1 | 2.243 | 1.974 | 2.209 | 0.995 | 1.032 | 0.295 | 2.537 | 2.209 | 2.207 | 1.025 | 1.053 | 0.219 |
σ _{u} | 1 | 1.326 | 0.004 | 2.948 | 0.847 | 0.433 | 4.397 | 1.211 | 0.004 | 2.795 | 0.922 | 0.718 | 4.936 |
a | 2 | ∞ | ∞ | – | 4.1E+04 | 2.073 | 7.0E+05 | ∞ | ∞ | – | 2.233 | 2.118 | 0.689 |
DGP 2 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.132 | 1.057 | 0.636 | 1.246 | 1.484 | 0.583 | 1.156 | 0.983 | 0.577 | 1.260 | 1.377 | 0.522 |
β _{1} | 1 | 0.996 | 0.999 | 0.163 | 0.997 | 1.000 | 0.146 | 0.997 | 1.001 | 0.110 | 0.998 | 1.000 | 0.098 |
σ _{v} | 1 | 1.301 | 1.316 | 0.307 | 1.008 | 1.056 | 0.262 | 1.339 | 1.353 | 0.242 | 1.032 | 1.066 | 0.207 |
σ _{u} | 1 | 0.831 | 0.990 | 0.839 | 0.706 | 0.333 | 0.755 | 0.810 | 1.057 | 0.771 | 0.687 | 0.578 | 0.674 |
a | 4 | ∞ | ∞ | – | 3.3E+07 | 4.545 | 8.3E+08 | ∞ | ∞ | – | 8.4E+05 | 4.473 | 2.6E+07 |
DGP 3 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.143 | 1.010 | 0.603 | 1.221 | 1.383 | 0.595 | 1.174 | 0.985 | 0.534 | 1.229 | 1.197 | 0.527 |
β _{1} | 1 | 1.005 | 1.008 | 0.150 | 1.004 | 1.004 | 0.137 | 1.006 | 1.008 | 0.100 | 1.004 | 1.004 | 0.094 |
σ _{v} | 1 | 1.203 | 1.218 | 0.265 | 0.982 | 1.027 | 0.263 | 1.247 | 1.250 | 0.194 | 1.023 | 1.057 | 0.205 |
σ _{u} | 1 | 0.829 | 1.010 | 0.779 | 0.749 | 0.580 | 0.759 | 0.784 | 1.030 | 0.700 | 0.723 | 0.764 | 0.680 |
a | 5 | ∞ | ∞ | – | 2.8E+08 | 5.759 | 4.6E+09 | ∞ | ∞ | – | 3.0E+07 | 5.652 | 8.3E+08 |
DGP 4 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.120 | 0.987 | 0.519 | 1.158 | 1.078 | 0.571 | 1.155 | 1.005 | 0.464 | 1.166 | 1.016 | 0.509 |
β _{1} | 1 | 0.999 | 1.001 | 0.121 | 1.001 | 1.000 | 0.122 | 1.002 | 1.003 | 0.092 | 1.003 | 1.004 | 0.092 |
σ _{v} | 1 | 1.060 | 1.082 | 0.215 | 0.936 | 0.985 | 0.249 | 1.096 | 1.095 | 0.161 | 0.983 | 1.001 | 0.191 |
σ _{u} | 1 | 0.856 | 1.050 | 0.660 | 0.821 | 0.957 | 0.730 | 0.814 | 1.017 | 0.592 | 0.808 | 1.027 | 0.652 |
a | 10 | ∞ | ∞ | – | 2.9E+08 | 10.983 | 2.6E+09 | ∞ | ∞ | – | 2.1E+08 | 11.015 | 2.4E+09 |
DGP 5 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.075 | 0.959 | 0.478 | 1.067 | 0.897 | 0.546 | 1.078 | 0.955 | 0.404 | 1.041 | 0.892 | 0.453 |
β _{1} | 1 | 1.000 | 0.997 | 0.124 | 1.000 | 0.996 | 0.127 | 1.003 | 1.001 | 0.084 | 1.004 | 1.001 | 0.085 |
σ _{v} | 1 | 0.936 | 0.944 | 0.178 | 0.864 | 0.889 | 0.218 | 0.973 | 0.972 | 0.137 | 0.914 | 0.914 | 0.165 |
σ _{u} | 1 | 0.902 | 1.062 | 0.593 | 0.923 | 1.159 | 0.682 | 0.900 | 1.047 | 0.501 | 0.956 | 1.156 | 0.568 |
a | ∞ | ∞ | ∞ | – | 5.9E+08 | 361.929 | 7.6E+09 | ∞ | ∞ | – | 2.8E+08 | 152.462 | 2.2E+09 |
N = 1000 | |||||||
---|---|---|---|---|---|---|---|
Normal-half normal | Student’s t-half normal | ||||||
DGP 1 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.157 | 1.706 | 2.206 | 1.220 | 1.091 | 0.400 |
β _{1} | 1 | 0.988 | 0.997 | 0.491 | 0.999 | 0.998 | 0.049 |
σ _{v} | 1 | 3.115 | 2.607 | 3.412 | 1.047 | 1.052 | 0.125 |
σ _{u} | 1 | 0.759 | 0.003 | 2.733 | 0.724 | 0.876 | 0.517 |
a | 2 | ∞ | ∞ | – | 2.118 | 2.107 | 0.274 |
DGP 2 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.183 | 0.941 | 0.497 | 1.188 | 1.043 | 0.401 |
β _{1} | 1 | 1.000 | 1.000 | 0.049 | 0.999 | 1.001 | 0.045 |
σ _{v} | 1 | 1.390 | 1.378 | 0.146 | 1.033 | 1.032 | 0.130 |
σ _{u} | 1 | 0.774 | 1.084 | 0.645 | 0.772 | 0.957 | 0.519 |
a | 4 | ∞ | ∞ | – | 4.359 | 4.222 | 0.971 |
DGP 3 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.160 | 0.946 | 0.455 | 1.191 | 1.044 | 0.408 |
β _{1} | 1 | 1.000 | 0.998 | 0.044 | 1.000 | 0.999 | 0.042 |
σ _{v} | 1 | 1.281 | 1.264 | 0.122 | 1.031 | 1.024 | 0.132 |
σ _{u} | 1 | 0.805 | 1.087 | 0.588 | 0.771 | 0.966 | 0.531 |
a | 5 | ∞ | ∞ | – | 5.582 | 5.262 | 1.692 |
DGP 4 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.099 | 1.002 | 0.326 | 1.070 | 0.953 | 0.352 |
β _{1} | 1 | 1.000 | 0.998 | 0.040 | 1.000 | 0.998 | 0.040 |
σ _{v} | 1 | 1.120 | 1.112 | 0.090 | 0.999 | 0.990 | 0.118 |
σ _{u} | 1 | 0.878 | 1.005 | 0.414 | 0.925 | 1.074 | 0.450 |
a | 10 | ∞ | ∞ | – | 3.7E+04 | 10.266 | 5.4E+05 |
DGP 5 | Mean | Median | St. Dev. | Mean | Median | St. Dev. | |
β _{0} | 1 | 1.034 | 0.990 | 0.213 | 0.971 | 0.922 | 0.232 |
β _{1} | 1 | 1.000 | 0.999 | 0.036 | 1.000 | 0.999 | 0.036 |
σ _{v} | 1 | 0.999 | 0.996 | 0.071 | 0.959 | 0.952 | 0.084 |
σ _{u} | 1 | 0.954 | 1.008 | 0.264 | 1.046 | 1.104 | 0.292 |
a | ∞ | ∞ | ∞ | – | 6.1E+08 | 124.929 | 1. 3E+10 |
Across each of the four DGPs, and regardless of the number of observations, the mean parameter estimates from the Student’s t-half normal model are similar to those from the normal-half normal model, and both tend to be reasonably close to the true values. An exception to this is that in the N = 100 and N = 200 cases for the DGPs with finite a, the mean estimates of the degrees of freedom parameter are many times greater than the true values from the DGP.
However, the median estimates of a are much closer to the true values. The reason for the large difference between the mean and median estimates of a is that the distributions of the estimates are skewed by a small number extremely large values. We find that these very large values occur in repetitions in which the kurtosis of the OLS residuals is low, and specifically where the excess kurtosis— i.e. kurtosis over and above that of the normal distribution—is near to or less than zero. This is significant since, as with the discussion in Section 3.23.2, it further suggests a link between excess kurtosis and identification of the degrees of freedom parameter a, analogous to the link between skewness and identification of the σ_{u} parameter in the standard model.
In the N = 1000 repetitions, this ‘wrong kurtosis’ issue does not arise in the a = 4 or a = 5 cases, but does in the a = 10 case. In the α = 2 case, ‘wrong kurtosis’ does not arise in either the N = 200 or N = 1000 repetitions. Intuitively, this is due to the fact that the Student’s t distribution with a = 10 already closely resembles the normal distribution, and further increases in a yield relatively small changes to the shape of the distribution. Just as the probability of ‘wrong skew’ arising in the standard model decreases as the number of observations or the signal to noise ratio, or both, increase (Simar and Wilson 2010), it appears that the probability of ‘wrong kurtosis’ is negatively related to sample size and positively related to the degrees of freedom parameter.
Finally, on the subject of ‘wrong kurtosis’, it is important to note that whilst this phenomenon is similar to the ‘wrong skew’ problem in terms of potentially arising due to the luck of small sample draws from a ‘correct kurtosis’ distribution, the practical implications of this occurrence are not as severe as in the wrong skew case. In the wrong kurtosis case, the estimation of the Student’s t-half normal stochastic frontier models recovers the normal-half normal model estimates, whilst under ‘wrong skew’ the estimate of the variance of the inefficiency distribution approaches zero, indicating no evidence of inefficiency.
Estimates of σ_{v} from the normal-half normal model for DGPs 1–4 tend to be greater than 1, as expected. For DGPs 2–4, in which VAR(v) is defined, the standard deviations of v are 1.414, 1.291, and 1.118, respectively. Interestingly, the mean estimates of σ_{v} from the normal-half normal model correspond closely to these values. This suggests that, although the normal-half normal model cannot capture the kurtosis of v, it approximates the variance of the noise term relatively well.
Looking at the standard deviations of the parameter estimates from our model, we can see that the sampling variation is generally less than that from the normal-half normal for the a = 2, a = 4 and a = 5 cases. This reflects the greater sensitivity of the standard model to outlying observations, and suggests that our model is indeed more robust to the presence of heavy tails. For the a = 10 and v ~ N(0,1) cases, the estimates from the normal-half normal model generally have lower standard deviations than those from our model.
Summary of within-replication differences in parameter estimates
N = 100 | N = 200 | N = 1000 | |||||
---|---|---|---|---|---|---|---|
Mean | St. Dev. | Mean | St. Dev. | Mean | St. Dev. | ||
DGP 1 (a = 2) | β _{0} | 0.477 | 1.634 | 0.413 | 1.854 | 0.062 | 2.185 |
β _{1} | −0.010 | 0.524 | 0.021 | 0.534 | 0.011 | 0.488 | |
σ _{v} | −1.248 | 2.263 | −1.512 | 2.262 | −2.068 | 3.432 | |
σ _{u} | −0.479 | 5.254 | −0.289 | 5.659 | −0.035 | 2.710 | |
\(\sqrt {{\mathrm{Var}}\left( v \right)}\) | – | – | – | – | – | – | |
DGP 2 (a = 4) | β _{0} | 0.114 | 0.599 | 0.105 | 0.550 | 0.004 | 0.500 |
β _{1} | 0.001 | 0.086 | 0.001 | 0.053 | 0.000 | 0.022 | |
σ _{v} | −0.293 | 0.321 | −0.307 | 0.256 | −0.357 | 0.169 | |
σ _{u} | −0.125 | 0.798 | −0.124 | 0.724 | −0.002 | 0.649 | |
\(\sqrt {{\mathrm{Var}}\left( v \right)}\) | 0.199 | 0.685 | 0.131 | 0.343 | 0.051 | 0.127 | |
DGP 3 (a = 5) | β _{0} | 0.078 | 0.515 | 0.055 | 0.483 | 0.031 | 0.420 |
β _{1} | −0.001 | 0.061 | −0.001 | 0.035 | 0.000 | 0.015 | |
σ _{v} | −0.221 | 0.273 | −0.224 | 0.197 | −0.249 | 0.135 | |
σ _{u} | −0.080 | 0.678 | −0.061 | 0.632 | −0.034 | 0.544 | |
\(\sqrt {{\mathrm{Var}}\left( v \right)}\) | 0.111 | 0.341 | 0.067 | 0.211 | 0.037 | 0.103 | |
DGP 4 (a = 10) | β _{0} | 0.038 | 0.370 | 0.012 | 0.284 | −0.029 | 0.237 |
β _{1} | 0.002 | 0.031 | 0.001 | 0.020 | 0.000 | 0.007 | |
σ _{v} | −0.123 | 0.172 | −0.113 | 0.126 | −0.121 | 0.088 | |
σ _{u} | −0.036 | 0.484 | −0.005 | 0.366 | 0.047 | 0.307 | |
\(\sqrt {{\mathrm{Var}}\left( v \right)}\) | 0.033 | 0.164 | 0.011 | 0.094 | −0.002 | 0.063 | |
DGP 5 (a → ∞) | β _{0} | −0.009 | 0.249 | −0.037 | 0.186 | −0.064 | 0.075 |
β _{1} | 0.000 | 0.022 | 0.001 | 0.010 | 0.000 | 0.002 | |
σ _{v} | −0.072 | 0.110 | −0.059 | 0.084 | −0.040 | 0.033 | |
σ _{u} | 0.021 | 0.322 | 0.057 | 0.240 | 0.092 | 0.096 | |
\(\sqrt {{\mathrm{Var}}\left( v \right)}\) | −0.003 | 0.104 | −0.013 | 0.066 | −0.018 | 0.026 |
From the above, we can see that, as expected following the discussion in Section 2, the Student’s t-half normal model tends to produce lower estimates of σ_{u} for a given replication. This holds for DGPs 1, 2, and 3, in which the kurtosis of v is greatest. Also, for DGPs 1–3, we see that the normal-half normal model yields higher estimates of σ_{v}; however, given the discussion around the differing interpretations of σ_{v} in the two models, we have also looked at the differences in \(\sqrt {{\mathrm{Var}}\left( v \right)}\) directly. Table 2 shows that, while the Student’s t-half normal model tends to yield lower estimates of σ_{v} when a is small, its estimates of the standard deviation of v in fact tend to be higher. This supports the expectation, following the discussion in Section 2 and in Stead et al. (2018), that a model with a heavy-tailed distribution in v should attribute more of the overall error variance to noise than to inefficiency.
6 Application to highways maintenance costs in England
In this section, we apply a Student’s t-half normal model to the dataset on highway maintenance costs in England used by Stead et al. (2018). In England, responsibility for road maintenance is divided between Highways England—until 2015 the Highways Agency—a publicly-owned company responsible for the strategic ‘trunk road network’, and the county councils and unitary authorities, who maintain the non-trunk roads within their boundaries. Our data are from the CQC Efficiency Network^{6} and consist of costs and cost drivers associated with local authorities’ highway maintenance activities.
The majority of previous studies of road maintenance costs have focussed on the issue of marginal costs, and their implications for road pricing, rather than relative efficiency. These use data on motorways and canton roads in Switzerland (Schreyer et al. 2002), motorways in Austria (Sedlacek and Herry 2002), roads in Sweden (Haraldsson 2006; Jonsson and Haraldsson 2008), trunk roads in Poland (Bak et al. 2006; Bak and Borkowski 2009), and motorways and federal roads in Germany (Link 2006; 2009; 2014).
With respect to efficiency studies, Wheat (2017) undertakes the first study of local road maintenance costs in Britain and utilised a forerunner of the dataset under consideration in this paper. The author considered the optimal scale of operation as well as evidence for the cost efficiency of local highway authorities. A normal-half normal stochastic frontier model was used. A further study relating to efficiency in road maintenance is that of Fallah-Fini et al. (2009), which applies DEA to data on eight counties of the US state of Virginia, with expenditure, traffic and equivalent single axle loads as inputs, road area and quality indicators as outputs, and climate factors as non-discretionary variables.
We used an unbalanced panel consisting of data on the 70 English unitary authorities and county councils that were members of the CQC Efficiency Network in 2015–16 and supplied data for at least one year from 2009–10 to that year; this gives us 327 observations in total. Cost data were supplied to the network by each authority according to definitions agreed by a working group of network members, and relate to carriageway maintenance activities only, i.e. they exclude costs associated with related activities such as winter service and footway maintenance. The dataset is updated annually for a new round of analysis. In this study we use the dataset from the 2015–16 round, which was the first year that the network ran. We observe large differences in unit costs, with a large number of extreme outliers in both directions, which are clearly the result of reporting errors. As a result, standard SF models yield a wide range of efficiency predictions, motivating the development of robust SF methods.
In line with the previous literature mentioned above—see Link (2014) for a summary—we use road length and traffic variables as output variables. Detailed breakdowns of local authorities’ total road lengths into urban and rural and by classification are publicly available from the Department for Transport (DfT). Roads in the UK are classified as: M (motorways), A, B, classified unnumbered, or unclassified; we hereafter refer to the latter two as C and U, respectively. The trunk road network, maintained in England by Highways England, consists of motorways and trunk A roads, leaving non-trunk A roads and all B, C, and U roads as the responsibility of local authorities. Our road length data therefore exclude motorways and trunk A roads, and likewise we use traffic data supplied by DfT which relate only to local authority maintained roads.
Outputs from the Student’s t-half normal and normal-half normal models
Student’s t-half normal | Normal-half normal | |||||
---|---|---|---|---|---|---|
Estimate | s.e. | Sig | Estimate | s.e. | Sig | |
β _{0} | 16.058 | 0.092 | *** | 16.035 | 0.145 | *** |
β_{1} (ln URL) | 0.149 | 0.107 | 0.127 | 0.171 | ||
β_{2} (ln RRL) | 0.895 | 0.113 | *** | 0.917 | 0.179 | *** |
β_{3} (ln URL^{2}) | 0.236 | 0.043 | *** | 0.241 | 0.063 | *** |
β_{4} (ln RRL^{2}) | 0.082 | 0.010 | *** | 0.085 | 0.016 | *** |
β_{5} (ln URL lnRRL) | −0.064 | 0.028 | ** | −0.081 | 0.044 | * |
β_{6} (ln TRAFFIC) | 0.366 | 0.099 | *** | 0.415 | 0.154 | *** |
β_{7} (RDCA) | 0.432 | 0.094 | *** | 0.464 | 0.144 | *** |
β_{8} (RDCBC) | −0.071 | 0.026 | *** | −0.071 | 0.039 | * |
β_{9} (RDCU) | −0.004 | 0.003 | −0.005 | 0.005 | ||
β_{10} (PROP_{UA}) | 8.690 | 1.941 | *** | 7.810 | 3.241 | ** |
β_{11} (PROP_{UB}) | 1.642 | 2.279 | 0.662 | 3.869 | ||
β_{12} (PROP_{UC}) | 0.273 | 1.196 | 0.448 | 2.054 | ||
β_{13} (PROP_{UU}) | 0.969 | 0.547 | * | 1.090 | 0.835 | |
β_{14} (PROP_{RA}) | 2.612 | 1.045 | ** | 2.120 | 1.571 | |
β_{15} (PROP_{RB}) | 2.417 | 1.056 | ** | 2.678 | 1.544 | * |
β_{16} (PROP_{RC}) | 1.015 | 0.641 | 0.983 | 0.988 | ||
β_{17} (YEAR) | 0.038 | 0.011 | *** | 0.045 | 0.017 | *** |
β_{18} (lnWAGE) | 0.779 | 0.223 | *** | 0.891 | 0.340 | *** |
(1−β_{18}) (ln ROCOSM) | 0.221 | – | – | 0.109 | – | – |
σ _{ u} | 0.535 | 0.046 | – | 0.568 | 0.015 | – |
σ _{ v} | 0.233 | 0.016 | – | 0.276 | 0.030 | – |
a | 5.198 | 1.510 | – | ∞ | – | – |
Log Likelihood | −187.06 | – | – | −189.14 | – | – |
Estimated error variances
Student’s t-half normal | Normal-half normal | |
---|---|---|
VAR (u) | 0.104 | 0.117 |
VAR(v) | 0.088 | 0.076 |
VAR(ε) | 0.192 | 0.194 |
Table 4 shows that, in line with our expectations, and the findings of Stead et al. (2018) relating to the logistic-half normal model, that the Student’s t-half normal model results in a lower estimated variance in inefficiency than the normal-half normal model, and that more of the total error variance is attributed to v. The overall error variance is also slightly lower, likewise mirroring the results from the logistic-half normal model. Following from this we expect to find a considerably narrower distribution of efficiency predictions from the Student’s t-half normal model owing to both the lower estimated VAR(u) and the greater shrinkage towards the mean resulting from the higher estimated VAR(v).
Summary of efficiency scores
Student’s t-half normal | Normal-half normal | |
---|---|---|
Mean | 0.721 | 0.660 |
Minimum | 0.527 | 0.225 |
Maximum | 0.855 | 0.918 |
Given the similarity of the frontier parameter estimates, as shown in Table 3, the residuals from the Student’s t-half normal and normal-half normal models are highly correlated. However the relationships between the residuals and efficiency predictions is shown by Fig. 2 to be substantially different between the two models: for values of residuals between around −0.5 and 1.5, the slope of the function is considerably flatter in the Student’s t-half normal case, so that a change in ε results in a much smaller change in exp[−E(u|ε)]. However, the most striking difference is that the relationship is non-monotonic in the Student’s t-half normal case, with exp[−E(u|ε)] beginning to decrease slightly for the smallest values of ε and increase very considerably for the largest values of ε. This is in contrast to the standard SF model, and also the logistic-half normal model as shown by Stead et al. (2018), in which the relationship is monotonic.
The explanation for the large reduction in the range of the efficiency predictions relative to those from the standard model is therefore twofold. First, the use of a heavy-tailed distribution for v results in greater shrinkage of efficiency predictions at the tails, as is the case in the normal-logistic model (Stead et al. 2018). Second, the non-monotonicity in E(u|ε) means that the highest and lowest efficiency scores belong not to the most outlying observations in either direction, as in the standard model, but to observations with less extreme estimated residuals.
The intuitive explanation for this non-monotonicity in the Student’s t-half normal case is that for outlying observations, the uncertainty associated with exp[−E(u|ε)] increases, and further increases in |ε| are attributed to v to such an extent that there is a reduction in exp[−E(u|ε)]. More formally, we know that the monotonicity of E(u|ε) is linked to the log-concavity property of the distribution of v: specifically, E(u|ε) is a weakly (strictly) monotonic function of ε for any weakly (strictly) log-concave f_{v} (Ondrich and Ruggiero 2001). Goldberger (1983) discusses the log-concavity of the (standard) normal, logistic, Laplace and Student’s t distributions, and these results are easily extended to the nonstandardised cases with scaling σ_{v}. It is notable that each of these distributions are log-concave everywhere, with the sole exception of the Student’s t distribution, which is log-concave where \(v \le \sqrt \alpha \sigma _v\) but log-convex where \(v \ge \sqrt \alpha \sigma _v\), i.e. at the tails^{7}. This explains why E(u|ε) can be non-monotonic, changing direction at the tails, when v follows a Student’s t distribution, whereas E(u|ε) has been noted to be either weakly or strictly monotonic everywhere when v follows a normal, logistic, or Laplace distribution.
Generalising slightly outside this particular empirical application, the finding of non-monotonicity in efficiency predictions with respect to residuals is useful for applications in economic regulation (RPI-X regulation). This model has interesting incentive properties which may help overcome informational asymmetries in economic regulation between the regulated firms and the regulator. In particular the non-monotonicity property should help discourage firms from trying to game the process by submitting over-favourable data e.g. under reporting costs in the case of cost benchmarking.
As discussed in Section 4, we are interested in testing whether v is normally distributed, in which case the Student’s t-half normal contains as a limiting case the normal-half normal model. In addition to this hypothesis we are also interested in testing the null hypothesis of no inefficiency. The likelihood ratio follows a \(\chi _{1:0}^2\) distribution in both cases. Log-likelihoods are given in Table 3, and are used to calculate the likelihood ratio statistic as shown in (32). For the first null hypothesis, this gives a likelihood ratio of 4.155 and a corresponding p-value of 0.021. For our second null hypothesis, the Student’s t-half normal model reduces to a regression model with Student’s t errors: we do not report the results of this regression, except the log-likelihood, which is −189.140. Therefore the corresponding likelihood ratio statistic is 4.150 and the p-value is 0.021. We therefore reject the null hypotheses of normally distributed v and zero inefficiency, indicating that this model performs better than either the standard SF model or the Student’s t regression model for this data.
Overall our empirical application demonstrates that the t-half normal SF model is indeed supported by our data relative to the more standard normal-half normal SF model. Further the frontier parameter estimates are plausible and very similar in both models. In terms of the efficiency predictions, the Student’s t-half normal SF model yields a much more plausible spread of efficiency predictions at the tails.
7 Summary and Conclusions
This paper proposes a new stochastic frontier model as a means to account for outlying observations in the context of stochastic frontier analysis. A failure to account for outliers in the standard stochastic frontier model can lead to an exaggerated wide range of efficiency predictions, with the efficiency predictions relating to the least efficient firms being implausibly low. This problem is apparent in our application to highway maintenance departments in England, and also in several other applications we identify in the literature, across a range of countries and industries.
We propose a model combining a Student’s t distribution for noise, v, with a half normal distribution for inefficiency, u. Our model is an original and significant contribution to the literature, not just in being able to better accommodate outlying observations in efficiency analysis relative to the standard normal-half normal stochastic frontier (SF) model, but it is the first contribution to contain as a testable limiting case the standard SF model. As such our model provides a natural extension to the tools of practitioners in the field. The advantages of this distribution are that the kurtosis of v is determined by a degrees of freedom parameter which is freely estimated, and that it encompasses as a limiting case the normal distribution as this parameter approaches infinity. This means that the heaviness of the tails of v, reflecting the prevalence of outliers in the data, is flexible, and that testing down to the standard SF model is possible. We derive the log-likelihood and efficiency predictors for the t-truncated normal SF model, and discuss extension to other distributions for u, which is straightforward.
We consider how to test the null hypothesis of normally distributed noise against the alternative of a heavier tailed distribution. We show that the associated LR test statistic is distributed as a mixture chi-squared through appealing to results in Chen and Liang (2010), and provide simulation evidence that the test statistic does follow the proposed distribution under the null in large samples.
Simulation evidence is provided for the maximum simulated likelihood estimator of our model. As well as showing that the estimator performs well in recovering the true parameters of our DGP for small values of the Student’s t shape (degrees of freedom) parameter, the simulations indicate that in the case of the true DGP being the standard SF (normal-half normal) model, the Student’s t SF model recovers very similar estimates to the standard SF model. The combination of this finding, coupled with the ability to test the hypothesis of a normally distributed noise error against a heavier tailed distribution using the LR test statistic, provides reassurance as to the robustness of a modelling approach based on starting with the Student’s t SF model and testing to see if a standard SF model is an appropriate simplify restriction.
The simulation results also highlight the possibility of ‘wrong kurtosis’, specifically where the excess kurtosis of the OLS residual distribution is less than zero. In this case the maximum simulated likelihood estimates approach those from the normal/half-normal model and as such ‘wrong kurtosis’ is similar to ‘wrong skew’ previously identified in the literature. Just as the probability of ‘wrong skew’ arising in the standard model decreases as the number of observations or the signal to noise ratio, or both, increase (Simar and Wilson 2010), it appears that the probability of ‘wrong kurtosis’ is negatively related to sample size and positively related to the degrees of freedom parameter. It is important to note that the practical implications of this ‘wrong kurtosis’ are not as severe as in the wrong skew case. In the wrong kurtosis case, the estimation of the Student’s t-half normal stochastic frontier models recovers the normal-half normal model estimates, whilst under ‘wrong skew’ the estimate of the variance of the inefficiency distribution approaches zero, indicating no evidence of inefficiency.
We apply a Student’s t-half normal model to estimate a cost frontier using a dataset on English local authorities’ highway maintenance costs, and compare the model’s outputs with those from the normal-half normal model. We find similar frontier parameter estimates from the two models, though with reduced standard errors in the t-half normal model. We implement testing against the standard SF model, and find that we are able to reject the null hypothesis of normally distributed v. This implies that it is important to account for heavy tails in our data.
The main empirical differences between the two models are firstly, a reduced estimate of VAR(u) and an increased estimate of VAR(v) and secondly a reduced range of efficiency predictions according to exp[−E(u|ε)]. Thirdly, we find a non-monotonic relationship between residuals and exp[−E(u|ε)], in contrast to the standard SF model. This could be a very an important feature of the model in practical applications, for example in regulatory settings, where this could incentivise correct reporting of data by regulated firms, as the alternative ‘gaming’ behaviour of under reporting, say, costs, many actually reduce the firm’s efficiency prediction.
Finally, we have integrated our model into the LIMDEP/NLOGIT econometric software (Greene 2016), and have also written a Stata package enabling estimation of the model^{8}, which means that it is ready for use by practitioners. An interesting avenue for future research would be to consider the impact of Student’s t noise in the context of the various panel data SF specifications, for example models accounting for unobserved heterogeneity.
Footnotes
- 1.
The terms leptokurtic, mesokurtic, and platykurtic are used to denote distributions with positive, zero, and negative excess kurtosis. That is, kurtosis greater than, equal to, or less than that of the normal distribution.
- 2.
Note that the left truncation of a Laplace distribution at or above zero results in an exponential distribution.
- 3.
An alternative approach is to note that the Student’s t distribution is a scale mixture of normal distributions where the mixing distribution is inverse gamma distributed. To estimate a Student’s t-half normal SF model, we could approximate a scale mixture of skew normal distributions with draws from an inverse gamma distribution. It is unclear if this approach has any advantage over that used in this paper, however it is slightly less convenient in terms of implementation, particularly when it comes to changing the distribution of u.
- 4.
Note also that the estimator (21) is substituted into (22) and (23), and (22) into (23), in place of the unknown true values. However, if 2 < a ≤ 4, the method of moments estimator for \(\widetilde \sigma _v\) exists but depends on a, for which the method of moments estimator does not exist. An alternative approach could be to treat a as a fixed tuning parameter, and substitute its value into (23) for \(\widetilde \sigma _v\).
- 5.
It is well known that, just as OLS is the ML estimator for a regression model with normal errors, LAD is the ML estimator for a regression model with Laplace errors.
- 6.
- 7.
Note that as α → ∞, i.e. the distribution of v approaches normality, we would therefore expect the non-monotonicity seen in Fig. 2 to become less prominent. Simulation evidence, not included here for brevity’s sake, indicates that this is the case: the larger a is, the more closely E(u|ε) corresponds to the normal-half normal case. For sufficiently large a, the relationship may be monotonic within the range of the estimated residuals.
- 8.
The package rfrontier, which can be found at www.its.leeds.ac.uk/bear, enables estimation of stochastic frontier models in which v follows a Student’s t, Cauchy, or logisitic distribution.
Notes
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
References
- Aigner D, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Econ 6(1):21–37Google Scholar
- Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12(2):171–178Google Scholar
- Bak M, Borkowski P (2009) Marginal Cost of Road Maintenance and Renewal in Poland, CATRIN (Cost Allocation of TRansport INfrastructure cost) Deliverable D6, Annex 2. ITS, University of Leeds, Leeds, UKGoogle Scholar
- Bak M, Borkowski P, Musiatowicz-Podbial G, Link H (2006) Marginal Infrastructure Cost in Poland, Marginal Cost Case Studies for Road and Rail Transport Deliverable D3, Annex 1.2C. ITS, University of Leeds, Leeds, UKGoogle Scholar
- Battese GE, Coelli TJ (1988) Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. J Econ 38(3):387–399Google Scholar
- Bauer PW, Berger AN, Ferrier GD, Humphrey DB (1998) Consistency conditions for regulatory analysis of financial institutions: a comparison of frontier efficiency methods. J Econ Bus 50(2):85–114Google Scholar
- Berger AN, Humphrey DB (1991) The dominance of inefficiencies over scale and product mix economies in banking. J Monet Econ 28(1):117–148Google Scholar
- Charnes A, Cooper WW, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2(6):429–444Google Scholar
- Chen C-F (2007) Applying the stochastic frontier approach to measure hotel managerial efficiency in Taiwan. Tour Manag 28(3):696–702Google Scholar
- Chen Y, Liang K-Y (2010) On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems. Biometrika 97(3):603–620Google Scholar
- Cook RD (1977) Detection of influential observation in linear regression. Technometrics 19(1):15–18Google Scholar
- Cullinane K, Wang T-F, Song D-W, Ji P (2006) The technical efficiency of container ports: comparing data envelopment analysis and stochastic frontier analysis. Transp Res Part A: Policy Pract 40(4):354–374Google Scholar
- De Borger B, Kerstens K (1996) Cost efficiency of Belgian local governments: a comparative analysis of FDH, DEA, and econometric approaches. Reg Sci Urban Econ 26(2):145–170Google Scholar
- Drake L, Simper R (2003) The measurement of English and Welsh police force efficiency: a comparison of distance function models. Eur J Oper Res 147(1):165–186Google Scholar
- Economou P (2011) On model selection in the case of nested distributions—An application to frailty models. Stat Methodol 8(2):172–184Google Scholar
- Fallah-Fini, S., Triantis, K. and de la Garza, J.M. 2009. Performance measurement of highway maintenance operation using data envelopment analysis: Environmental considerations. In: Proceedings of IIE Annual Conference. Miami, USA: Institute of Industrial Engineers.Google Scholar
- Fan J (1991) On the optimal rates of convergence for nonparametric deconvolution problems. Ann Stat 19(3):1257–1272Google Scholar
- Fan J (1992) Deconvolution with supersmooth distributions. Can J Stat / La Rev Can De Stat 20(2):155–169Google Scholar
- Fecher F, Kessler D, Perelman S, Pestieau P (1993) Productive performance of the French insurance industry. J Product Anal 4(1):77–93Google Scholar
- Geweke JF, Keane MP, Runkle DE (1997) Statistical inference in the multinomial multiperiod probit model. J Econ 80(1):125–165Google Scholar
- Goldberger AS (1983) Abnormal selection bias. In: Amemiya T and Goodman LA, (eds). Studies in Econometrics, Time Series, and Multivariate Statistics. New York, USA: Academic Press, 67–84Google Scholar
- Greene WH (2003) Simulated likelihood estimation of the normal-gamma stochastic frontier function. J Product Anal 19(2):179–190Google Scholar
- Greene WH (2008) The econometric approach to efficiency analysis. In: Fried HO, Lovell CAK, Schmidt SS, (eds). The Measurement of Productive Efficiency and Productivity Growth. Second ed. New York, USA: Oxford University Press, 92–159Google Scholar
- Greene WH (2016) LIMDEP Version 11.0 Econometric Modeling Guide. Econometric SoftwareGoogle Scholar
- Griffin JE, Steel MFJ (2007) Bayesian stochastic frontier analysis using WinBUGS. J Product Anal 27(3):163–176Google Scholar
- Grushka E (1972) Characterization of exponentially modified Gaussian peaks in chromatography. Anal Chem 44(11):1733–1738Google Scholar
- Gupta AK, Nguyen N (2010) Stochastic frontier analysis with fat-tailed error models applied to WHO health data. Int J Innov Manag, Inf & Prod 1(1):43–48Google Scholar
- Hajargasht G (2015) Stochastic frontiers with a Rayleigh distribution. J Product Anal 44(2):199–208Google Scholar
- Hajargasht G, Griffiths WE (2018) Estimation and testing of stochastic frontier models using variational Bayes. J Product Anal 50(1):1–24Google Scholar
- Haraldsson M (2006) Marginal Cost for Road Maintenance and Operation—A Cost Function Approach, Marginal Cost Studies for Road and Rail Transport Deliverable D3, Annex. ITS, University of Leeds, Leeds, UKGoogle Scholar
- Horrace WC, Parmeter CF (2011) Semiparametric deconvolution with unknown error variance. J Product Anal 35(2):129–141Google Scholar
- Horrace WC, Parmeter CF (2018) A Laplace stochastic frontier model. Econom Rev 37(3):260–280Google Scholar
- Horrace WC, and Wright IA, forthcoming. Stationary points for parametric stochastic frontier models. Journal of Business & Economic Statistics Google Scholar
- Hurst S, 1995 The Characteristic Function of the Student t Distribution. Financial Mathematics Research Report 006-95. Australian National UniversityGoogle Scholar
- Jondrow J, Knox Lovell CA, Materov IS, Schmidt P (1982) On the estimation of technical inefficiency in the stochastic frontier production function model. J Econ 19(2):233–238Google Scholar
- Jonsson L, Haraldsson M (2008) Marginal Costs of Road Maintenance in Sweden, CATRIN (Cost Allocation of TRansport INfrastructure cost) Deliverable D6, Annex 1. VTI, Stockholm, SwedenGoogle Scholar
- Lange K, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comput Graph Stat 2(2):175–198Google Scholar
- Lange KL, Roderick JAL, Jeremy MGT (1989) Robust statistical modeling using the t distribution. J Am Stat Assoc 84(408):881–896Google Scholar
- Latruffe L, Balcombe K, Davidova S, Zawalinska K (2004) Determinants of technical efficiency of crop and livestock farms in Poland. Appl Econ 36(12):1255–1263Google Scholar
- Lin B, Wang X (2014) Exploring energy efficiency in China׳s iron and steel industry: A stochastic frontier approach. Energy Policy 72:87–96Google Scholar
- Link H (2006) An econometric analysis of motorway renewal costs in Germany. Transp Res Part A: Policy Pract 40(1):19–34Google Scholar
- Link H (2009) Marginal Costs of Road Maintenance in Germany, CATRIN (Cost Allocation of TRansport INfrastructure) Deliverable D6, Annex 3. VTI, Stockholm, SwedenGoogle Scholar
- Link H (2014) A cost function approach for measuring the marginal cost of road maintenance. J Transp Econ Policy (JTEP) 48(1):15–33Google Scholar
- Mahalanobis PC (1936) On the generalized distance in statistics. Proc Natl Inst Sci (India) 2:49–55Google Scholar
- Meeusen W, van Den Broeck J (1977) Efficiency estimation from Cobb-Douglas production functions with composed error. Int Econ Rev 18(2):435–444Google Scholar
- Nguyen, N (2010) Estimation of Technical Efficiency in Stochastic Frontier Analysis. PhD thesis, Bowling Green State University.Google Scholar
- Ondrich J, Ruggiero J (2001) Efficiency measurement in the stochastic frontier model. Eur J Oper Res 129(2):434–442Google Scholar
- Parmeter CF, Kumbhakar SC (2014) Efficiency analysis: A primer on recent advances. Found Trends Econ 7(3-4):191–385Google Scholar
- Schreyer C, Schmidt N, Maibach M (2002) Road Econometrics–Case Study Motorways Switzerland. UNITE (UNIfication of accounts and marginal costs for Transport Efficiency) Deliverable 10, Annex A1b. ITS, University of Leeds, Leeds, UKGoogle Scholar
- Sedlacek N, Herry M (2002) Infrastructure Cost Case Studies. UNITE (UNIfication of accounts and marginal costs for Transport Efficiency) Deliverable 10, Annex A1c. ITS, University of Leeds, Leeds, UKGoogle Scholar
- Self SG, Liang K-Y (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82(398):605–610Google Scholar
- Simar L, Wilson PW (2010) Inferences from cross-sectional, stochastic frontier models. Econom Rev 29(1):62–98Google Scholar
- Stead AD, Wheat P, Greene WH (2018) Estimating efficiency in the presence of extreme outliers: A logistic-half normal stochastic frontier model with application to highway maintenance costs in England. In: Greene WH, Khalaf L, Makdissi P, Sickles RC, Veall M, and Voia M. (eds). Productivity and Inequality. Cham, Switzerland: Springer, 1-19Google Scholar
- Stevenson RE (1980) Likelihood functions for generalized stochastic frontier estimation. J Econ 13(1):57–66Google Scholar
- Tancredi, A (2002) Accounting for Heavy Tails in Stochastic Frontier Models. Working Paper No. 2002.16. Department of Statistical Sciences, University of PaduaGoogle Scholar
- Tchumtchoua S, Dey DK (2007) Bayesian estimation of stochastic frontier models with multivariate skew t error terms. Commun Stat - Theory Methods 36(5):907–916Google Scholar
- Train KE (2009) Discrete Choice Methods with Simulation. Second ed. Cambridge, UK: Cambridge University PressGoogle Scholar
- Tsionas EG (2007) Efficiency measurement with the Weibull stochastic frontier. Oxf Bull Econ Stat 69(5):693–706Google Scholar
- Waldman DM (1982) A stationary point for the stochastic frontier likelihood. J Econ 18(2):275–279Google Scholar
- Wang WS, Schmidt P (2009) On the distribution of estimated technical efficiency in stochastic frontier models. J Econ 148(1):36–45Google Scholar
- Wheat P (2017) Scale, quality and efficiency in road maintenance: evidence for English local authorities. Transp Policy 59:46–53Google Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.