Robust Stochastic Frontier Analysis: A Student’s t-Half Normal Model with Application to Highway Maintenance Costs in England

The presence of outliers in the data has serious implications for stochastic frontier analysis because they may distort parameter estimates and, crucially, lead to an exaggerated spread of efficiency predictions. One way of increasing the robustness of the model to outliers is to alter the distributional assumptions about the two-sided error component so that it allows for thick tails. Several existing proposals specify thick tailed distributions for both error components in order to arrive at a closed form log-likelihood function, limiting the analyst’s choice of efficiency distribution. Stead et al. (2017) demonstrate that simulation methods may be used to pair a logistically distributed noise term with any of the canonical efficiency distributions, and that a far narrower range of efficiency predictions is obtained by doing so. We extend this idea by proposing a Student’s t distribution for the noise term, which generalises the normal distribution by adding a shape parameter governing the degree of kurtosis, therefore having the advantages of greater flexibility in the thickness of the tails and of nesting the normal distribution. We estimate a Student’s t-half normal cost frontier for highways authorities in England and find that the model yields a significantly narrower range of efficiency predictions. We examine testing against the standard stochastic frontier model. A.D. Stead • P. Wheat Institute for Transport Studies, University of Leeds, Leeds, LS2 9LJ, UK e-mail: a.d.stead@leeds.ac.uk W.H. Greene Department of Economics, Stern School of Business, New York University, New York City, New York, 10012, USA


Introduction
Frontier analysis is concerned with the measurement of efficiency relative to an estimated production or cost frontier. The presence of noise in the sample is problematic in two ways: it affects the position of each decision making unit (DMU) relative to the frontier, and it affects the shape of the frontier itself. The magnitudes of these effects vary from one method to another: deterministic methods, such as data envelopment analysis (DEA) (Charnes et al., 1978) and corrected ordinary least squares (COLS) are particularly sensitive, given that they make no allowance for noise. In contrast, Stochastic Frontier (SF) Analysis explicitly controls for noise, mitigating its impact on the estimated frontier and on individual efficiency scores.
The range of efficiency scores can still be very large in the presence of data with outlying observations. The specific motivation for this study is the authors' finding of an implausibly wide range of efficiency scores in our work studying cost drivers and cost efficiency in the highways maintenance operations of local authorities in England, which utilises data on operating and capital expenditure provided by each authority. This can be narrowly explained as being due to a combination of under-reporting and over-reporting of expenditure, unobserved investment cycle effects, and extreme weather events. However we also came across this issue in a number of other data sets, and a casual look at the empirical SF literature seems to indicate that the ranges of individual efficiency predictions are sometimes difficult to reconcile with a priori expectations.
In this paper we consider methods of dealing with outliers in the context of SFA. After consideration of possible existing approaches, we propose a new stochastic frontier model with a Student's t distribution for the noise term. The advantages of this model over previous proposals lie in its flexibility, since the degree of kurtosis is no longer fixed but allowed to vary with the degrees of freedom parameter, and in the fact that the Student's t distribution nests the normal distribution as the degrees of freedom parameter approaches infinity. For any given distribution of , our model nests the standard SF model. This enables testing against the standard model, in contrast to previous proposals which utilise non-nested specifications.
The structure of this paper is as follows: Section 2 looks at existing methods available to handle a large number of outliers is frontier analysis, and reviews the relevant literature, Section 3 introduces t-truncated normal and t-gamma stochastic frontier models for dealing with heavytailed noise and discusses efficiency prediction and hypothesis testing, Section 4 applies the Student's t-half normal model to data on highways maintenance costs in England and compares the results to those obtained from normal-half normal model, and Section 5 gives our summary and conclusions. Some technical results appear in an Appendix.

Literature Review: Approaches to Outliers in SFA
Our focus is on possible existing approaches to dealing with outliers in the context of SFA. We start by looking at ad-hoc solutions using methods not specifically developed for this purpose but having some potential to mitigate one or more of the symptoms, e.g. by reducing the spread of efficiency scores. These generally have limited usefulness in dealing with outliers and involve a degree of arbitrariness. We then move on to discuss methods more appropriate to dealing with the root cause of these symptoms, the presence of outliers in the data. As in linear modelling generally, these essentially revolve around the detection and removal of outliers or the adoption of alternative distributional assumptions about the noise term.
The standard SF model, as developed by Aigner et al. (1977) and Meeusen and van Den Broeck (1977), is as follows: Where the subscript denotes the observation, is the dependent variable, is a vector of independent variables, and is an error term with two components. The noise component, , follows a symmetric distribution centred at zero, and the inefficiency component, , is drawn from a one-sided distribution, and in the production case = 1, while in the cost case = −1.
Many alternative distributions have been proposed for , for example the half normal or exponential (Aigner et al., 1977), truncated normal (Stevenson, 1980), or gamma (Greene, 2003) distributions. In comparison, is almost always assumed to follow a normal distribution, although both distributional assumptions are crucial with regards to the both the robustness of the parameter estimates to outliers and the decomposition of the estimated residual into noise and inefficiency.
Concerning robustness to outliers, it is worth pointing out that one of the original motivations behind SFA as an alternative to OLS-which after all yields unbiased estimates of the frontier parameters apart from the intercept-was to obtain estimates that are more robust to skewness of the residuals caused by inefficiency. Indeed it was not until the Jondrow et al. (1982) paper that a method of obtaining observation-specific predictions of efficiency was introduced. On the decomposition of the residuals, obtaining these observation-specific efficiency scores proceeds by making some prediction of based on the conditional distribution of | .
Dropping the subscript i, this is given by where and are the probability density functions of and respectively, and is the probability density function of the composed error derived as the convolution of the two error components. The usual approach to efficiency prediction is to use the mean of this distribution according to exp[− ( | )] (Jondrow et al., 1982) or [exp(− | )] (Battese and Coelli, 1988).
Clearly, any efficiency prediction derived in this way will depend upon the distribution not only of but also of : Wang and Schmidt (2009) derive the distribution of ( | ) in the normal-half normal case, and show that it approaches ( ) only as ( ) → 0, and ( ) as ( ) → ∞. The conditional mean predictor is therefore a shrinkage of towards its mean with the degree of shrinkage depending upon the distribution of .
Given that the presence of outliers, where these are attributed to noise rather than inefficiency, implies a leptokurtic-i.e. thick tailed-noise distribution, the normal distribution usually assumed for is inappropriate since it is mesokurtic-i.e. neither thin or thick tailed-for any given parameter values. Intuitively, we would therefore expect outliers in the data to result in an exaggerated spread of efficiency scores for two reasons: first, because of an inflated estimate of the scale of the distribution of , and second because of insufficient shrinkage of towards its mean, especially at the extremes. That is to say, if leptokurtosis in the noise term due to outliers is not taken into account, residuals will be attributed disproportionately to inefficiency rather than noise, particularly in outlying observations. This motivates the development of alternative SF models that can accommodate outliers.

Alternative Efficiency Distributions and Predictors
Given an apparently implausible distribution of efficiency predictions, an obvious option is to adopt an alternative assumption about the distribution of inefficiency. For example, adopting an inefficiency distribution with thinner tails-or perhaps a truncation of the tail, as proposed by Lee (1996), Almanidis et al. (2014) and others-should intuitively result in a narrower spread of efficiency predictions by attributing outliers to noise to a greater extent. Clearly, however, if the real problem is outliers in the data, then the effectiveness of this solution will be limited since there will be not enough shrinkage at the tails. Nor does this workaround address the problem of imprecise parameter estimation, and if anything we may be driven to estimate a model in which both the inefficiency and noise distributions are inappropriate.
An even more arbitrary approach would be to adopt an alternative efficiency predictor postestimation. The two alternatives proposed by Jondrow et al. (1982) are the mean and mode of the distribution of | , with the former the most commonly used in the applied literature. When the distribution of is skewed in the expected direction-i.e. positively in the cost case and negatively in the production case-the conditional mode predictor yields predictions of efficiency that are higher for the most efficient DMUs, but practically the same for the most inefficient DMUs. There is therefore a trade-off with the conditional mode predictor, between obtaining what might appear to be more reasonable efficiency predictions for the most efficient firms via the conditional mode predictor, and expanding the range of efficiency predictions to be even more unreasonable. Adopting alternative predictors therefore does not seem to be a satisfactory solution.
Another approach would be to qualify the point predictions of efficiency with prediction intervals: Horrace and Schmidt (1996) and Bera and Sharma (1999) propose using the quantile function for the distribution of | . However Wheat et al. (2014) note that this does not necessarily yield minimum a width interval, which they derive for the normal-half normal model. When the estimated scale of the inefficiency distribution is inflated by outliers, the distribution of | will have a high variance, and therefore the range of probable values will be wide. However, this really just serves to highlight the problem, and of course the prediction interval will include values even less plausible than the point prediction. A better solution would therefore to be to obtain a better estimate of the scale of the distribution of | by taking outliers into account.

Heteroskedastic SF Models
Outliers may be thought of as a result of heteroskedasticity in one or both error components, with a small proportion of errors having higher variance. Therefore one way of mitigating the impact of outliers is to allow for heteroskedasticity. A number of heteroskedastic SF models have been introduced, although these have tended to focus on heteroskedasticity in : the general approach is to modify the standard SF specification so that the variance or standard deviation of is modelled as a function of a vector of covariates, as proposed by Reifschneider and Stevenson (1991), Caudill and Ford (1993) and Caudill et al. (1995).
Clearly, however, we would not want outliers to be picked up as heteroskedasticity in , since the estimated variance of for outlying observations would still be inflated. The application of these models is usually to capture the relationship between a set of covariates and efficiency, similar to the Battese and Coelli (1995) specification in which they enter the pre-truncation mean of a truncated normal inefficiency distribution; Wang (2002) combines these two approaches. Rather, we want to allow for heteroskedasticity in : Hadri (1999) generalised the Caudill et al. (1995) heteroskedastic stochastic frontier model by allowing for heteroskedasticity in both error components. It would be straightforward, therefore, to estimate a model with heteroskedasticity in .
In general, however, the difficulty with using heteroskedastic SF models to control for the presence of outliers is that this would require an appropriate indicator variable for identifying outlying observations, which would have to be identified either on some arbitrary ex-post basis, or on the basis of established techniques for the detection of outliers, in which case it may be better to simply exclude the observations.

Outlier Detection and Removal
In general, one simple approach to outliers is to exclude them from the analysis. This, however, requires that appropriate criteria are used to identify which observations are outlying and therefore candidates for removal. Many competing criteria exist. One approach is to exclude observations that have a disproportionate effect on parameter estimates. In the context of OLS, Cook (1977) proposed a measure of the influence of an individual observation on the parameter estimates, known as Cook's distance. A similar measure sometimes used for the same purpose is the Mahalanobis distance (Mahalanobis, 1936). However, while these measures provide an indication of the influence of an observation, the cut-off beyond which an observation should be removed is arbitrary and involves uncertainty which is not reflected in subsequent modelling.
The biggest drawback of these methods in the current context, however, is that they may not be appropriate in the case of skewness of the composed error. Cook's Distance explicitly assumes normally distributed errors, while in SFA the distribution of the composed error depends upon the distribution of and : in the normal-half normal case, the composed error has a skew normal distribution (Azzalini, 1985), and in the normal-exponential case it follows what is known as an exponentially modified Gaussian distribution (Grushka, 1972), for example. As such, these distributions are more robust to outliers in one direction than the other, and the derivation of some analogous measure of leverage taking this into account would inevitably be bound up with one's distributional assumptions.

Alternative Noise Distributions
Another potential method of dealing with outliers is to change the distributional assumptions about the error term such that the model is more robust to the existence of outliers. As discussed in Section 2.1, we should change the distributions of specifically, rather than . In principle, any distribution that is symmetric, centred around zero and unimodal is an appropriate candidate for the distribution of , and although far more attention has been paid in the literature to the distribution of , several such alternatives have been suggested.
Outside of the SF literature, Lange et al. (1989) suggest the use of the Student's t distribution for the error terms as a robust alternative to OLS. Lange and Sinsheimer (1993) discuss estimation when errors are drawn from the logistic, slash, t, and contaminated normal distributions: note that the latter is simply the mixture of two normal distributions as described in Section 2.2. All of these distributions have heavier tails than the normal distribution, and therefore offer greater robustness to outliers.
In the context of SFA, Tancredi (2002) proposes a model in which is drawn from a t distribution and from a half t distribution-i.e. the left truncation of a Student's t distribution at zero-resulting in a skew t distribution for the composed error, and shows that the distribution of | becomes flat as → ∞ and therefore the uncertainty associated with efficiency predictions increases. In contrast, as → ∞ in the normal-half normal case the distribution of | becomes concentrated around zero, implying full efficiency; the t-half t model therefore deals with outliers in a more satisfactory way. The author also notes that, since the t distribution approaches the normal distribution as the degrees of freedom parameter approaches infinity, the model nests the normal-half normal model. The model is applied to the Christensen and Greene (1976)  Nguyen (2010) introduces two additional alternative heavy tailed distributions for : the Laplace distribution and the Cauchy distribution: the latter is paired with both half Cauchy and truncated Cauchy distributions-i.e. the left truncation at zero of a Cauchy distribution with means at zero and freely estimated, respectively-while the former is paired with an exponential distribution for . The Cauchy distribution is potentially problematic since its mean, variance and higher order moments are undefined. This is also the case for the skew Cauchy distribution (Arnold and Beaver, 2000), for which only fractional moments exist.
Despite this, the author is able to derive exp[− ( | )] for the Cauchy-truncated Cauchy and Cauchy-half Cauchy models and applications are shown in Gupta and Nguyen (2010).
The Laplace distribution also presents possible issues, given that it has a cusp at the mean and hence a non-differentiable point in the likelihood function. Standard results for consistency and asymptotic normality of maximum likelihood (ML) estimates do not apply. The usefulness of the Laplace-exponential model derived by Nguyen (2010) is also limited by simplifying assumptions the author makes in its derivation, specifically that the scale parameters for and in the Laplace-exponential model are the same. Nevertheless, Horrace and Parmeter (2015) derive Laplace-truncated Laplace and Laplace-exponential SF models without such restrictions (The left truncation of a Laplace distribution at or above zero results in an exponential distribution). The authors show that when the variance in is zero, the estimator of the model reduces to the least absolute deviations (LAD) estimator. That a regression with Laplace errors minimises absolute deviations has previously been noted by Keynes (1911). The distribution of | , and hence any efficiency prediction, is constant for positive values of .
The models above each perform better than the standard SF model in the presence of outliers because of their use of heavy tailed distributions of . However, in addition to the specific issues identified above, a drawback of each of the models is that they assume that the distribution of is a left truncation of the distribution of , since this makes it possible to derive a closed form expression for the log-likelihoodsee Proposition 9 of Azzalini and Capitanio (2003), which gives the density function for sum of one random variable and the absolute value of another random variable when both random variables follow the same elliptical distribution.
While simplifying derivation and estimation of the model, this limits comparability to the standard SF model. A preferable approach would be to combine a heavy tailed distribution of with any of the canonical distributions for , e.g. half normal or exponential. Ultimately, the kurtosis of is an empirical question, and therefore we should ideally use a distribution for for which kurtosis is flexible. Of the alternatives discussed above, the Student's t distribution, in which kurtosis depends upon the degrees of freedom parameter is ideal in this regard. A further advantage of the t distribution is that it nests the Cauchy distribution when the degrees of freedom parameter is equal to one and the normal distribution as it approaches infinity. By extension, an SF model with a t distribution for nests a model in which follows a Cauchy or normal distribution for any given distribution of . This enables testing against these alternatives; in the latter case, we are testing against the standard SF model, which could be interpreted as a test of robustness to outliers.
Following the discussion above, we propose a robust SF model in which follows a Student's t distribution and follows some one-sided distribution, e.g. half normal or exponential. In many cases, this will result in a log-likelihood function with no closed form solution. Griffin and Steel (2007) briefly discuss how to estimate t-half normal, t-exponential, and t-gamma Bayesian SFA models using the WinBUGS software package-which uses Markov Chain Monte Carlo (MCMC) methods for numerical integration-and apply these to the Christensen and Greene (1976) electricity data set. But the model remains unexplored in the classical framework. We propose the use of simulation methods, as used by Greene (2003) to implement the normal-gamma model, to solve the convolution of the Student's t distribution and the distribution of and arrive at a simulated likelihood function.
At this point it is worth mentioning a recent proposal by Hofler (2014) with a similar motivation and using similar techniques, to pair a generalised normal distribution for with a half-normal distribution for . The generalised normal distribution, introduced by Nadarajah (2005), is similar to the Student's t distribution in that it has a kurtosis parameter and nests the normal distribution. However, unlike the Student's t, the generalised normal distribution can be either leptokurtic or platykurtic when its shape parameter is lower than or greater than two, respectively; in principle, an SF model with a generalised normal distribution for can therefore account not only for heavy tails but also for the converse problem of light tails, making it an attractive choice. When the kurtosis parameter is equal to one, the generalised normal distribution also nests the Laplace distribution.
However, the generalised normal distribution has several properties which hinder maximum likelihood estimation: as the thickness of the tails increases-i.e. the kurtosis parameter goes below two-the distribution acquires a cusp at the mean and as a consequence has an infinite number of continuous derivatives only when the kurtosis parameter is a positive, even integer.
Otherwise, the number of derivatives is equal to the largest integer less than or equal to the kurtosis parameter. As a consequence the standard results for consistency and asymptotic normality of ML estimates only apply when the kurtosis parameter is greater than or equal to two. When the kurtosis parameter is greater than two-i.e. the distribution is thin tailed-the distribution has only finite support. We conclude that the primary advantage of using a generalised normal distribution for rather than a Student's t distribution, the ability to account for thin tails in the noise term, is more than outweighed by these issues. It is difficult to imagine what might give rise to light tails in practice. To the best of our knowledge, the generalised normal-half normal model has not yet been implemented.
The following section shows the derivation of an SF model with a Student's t distribution for : we focus on the t-truncated normal and t-gamma models-which nest the t-half normal and t-exponential models-though extension to other distributions for is straightforward.

The t-Truncated Normal and t-Gamma SF Models
In this section we derive simulated log likelihood functions and efficiency predictors for the ttruncated normal and t-gamma SF models, and discuss estimation and hypothesis testing.
Results for the t-half normal and t-exponential models can be obtained via some simple restrictions. Extensions to other distributions of are shown to be straightforward if the quantile function of that distribution has a closed form, while in many other cases-as with the t-gamma-the simulated log likelihood function becomes slightly more complex.

Formulation and Estimation
In SFA, the error is composed of a symmetric noise component and an inefficiency component which is drawn from a one-sided distribution: where = 1 in for a production frontier and = −1 for a cost frontier. In this study, we assume that is drawn from a non-standard t distribution-which includes a scale parameter -and that is from a truncated-normal or gamma distribution. For now we shall assume the former, so that the probability density functions and are given by where and are the mean and standard deviation of the pre-truncation distribution of , respectively, is a shape parameter that determines the kurtosis of the distribution, and Γ is the gamma function. As noted previously, as → ∞ the approaches the normal distribution.
Thus, our models nest the normal-truncated normal and normal-gamma models. Similarly, when = 1 we have a Cauchy distributed noise component. The joint density of and is given by The marginal density of is given by the convolution This is an integral with no closed form, meaning that it is not possible to give an analytic expression for the log-likelihood function. One solution to this problem is to use simulation to approximate the integral and arrive at a simulated log likelihood function-see Train (2003) for an introduction to maximum simulated likelihood-as proposed by Greene (2003) for the normal-gamma model and Stead et al. (2017) for the logistic-half normal model. We begin by noting that ( 7 ) is the expectation of ( + ) where is drawn from a truncated-normal distribution ℎ( ) = [ ( + )| ≥ 0],~( , ) That can be estimated by where is a draw from a truncated-normal distribution. This gives us a simulated probability density function for The simulated log-likelihood function is This may be maximised like any conventional log-likelihood function, provided we have our draws from the truncated normal distribution. The usual method of taking draws from a nonuniform distribution is to note that the cumulative density function of a random variable follows a uniform distribution, and therefore the inverse cumulative density function, i.e. the quantile function, gives the value of a random variable as a function of a uniformly distributed random variable. We can therefore transform draws from the uniform distribution into draws from any given distribution using the quantile function of that distribution. From Geweke et al. (1997) and Greene (2003), we have the quantile function of a truncated normal random variable: Where and are the left and right truncation points, respectively. Since we know that = 0, = ∞, this simplifies to At this point, it is useful to note that in order to modify the model so that the one-sided error follows some other distribution, we need only change such that we instead obtain draws from the chosen distribution. The most obvious choices are the exponential and gamma distributions, which are the most widely used one-sided error distributions in SFA after the half normal and truncated normal. For the t-exponential case, we have the quantile function And by substituting these into ( 12 ), we have the log-likelihood function for the t-truncatednormal and t-exponential models. Other proposed distributions for , such as the Weibull (Tsionas, 2007) and Rayleigh (Hajargasht, 2015) distributions also have closed form quantile functions which can likewise substituted into ( 12 ). The t-gamma case is less straightforward in this respect, since there is no analytical expression for the inverse cumulative density function. One way to proceed, however, is to note that the convolution of a t distributed and a gamma distributed is with shape parameter and scale parameter is which is the expectation of ( + )( ⁄ ) −1 Γ( ) ⁄ where is drawn from an exponential distribution: Following the reasoning above, therefore, we arrive at the simulated probability density function for : The simulated log-likelihood function is We now have simulated log-likelihood functions for the t-truncated normal and t-gamma SFA models, and therefore also for their familiar restrictions, the t-half normal (when = 0) and texponential (when = 1), respectively. First order conditions for maximisation in both models are given in the Appendix. One remaining issue is the method of taking random draws: we prefer to use Halton draws, which aim for good coverage of the unit interval rather than randomness, as opposed to random draws: this significantly reduces the number of draws needed to approximate the integral (see Greene (2003) for a fuller discussion).

Efficiency Prediction
As discussed in previous sections, the usual approach to generating observation-specific efficiency scores is to predict values based on the distribution of | , which is given by: The most widely used predictors are the mean of this conditional distribution according to Or [−exp( | )]: Note that in both cases, since ( ) is not a function of , it can be moved outside the integral.
Bearing in mind that ( ) is the convolution of ( + ) and ( ), we therefore have Again, for the models we are considering, none of these integrals have closed form solutions, so we approximate them via simulation. The simulated integral in the denominators of both formulae are given by ( 10 ), and the integrals in the numerators are the expectations of ( + ) and exp(− ) ( + ) given that is a random variable with the probability density function ( ). We therefore have, with some simplifying and rearranging: where is given by ( 13 )

Hypothesis Testing
As discussed previously, an attraction of the Student's t distribution in the current context is that it nests the normal distribution as → ∞, and therefore a stochastic frontier model in which follows a Student's t distribution nests a model in which follows a normal distribution, for any given distribution of . This allows us to test down from a model with Student's t noise to a standard SF model, which could be interpreted as a testing for thick tails-or the significance of outliers in the data-and used for model selection. For this purpose, the likelihood ratio test statistic is an obvious choice. This is defined as Where ln is the simulated log-likelihood from the Student's t model and ln 0 is the loglikelihood from the null model with normally distributed . The standard result that this statistic has a limiting 2 distribution with degrees of freedom equal to the difference in dimensionality between the alternative and null models does not apply, since under the null hypothesis the degrees of freedom parameter lies on the boundary of the parameter space. In such cases, Case 5 in Self and Liang (1987)-see also Case 2 in Chen and Liang (2010)-shows that the likelihood ratio statistic follows an asymptotic 50:50 mixture of 0 2 and 1 2 distributions, denoted 1:0 2 . Economou (2011) applies this result to an analogous problem in survival analysis: that of testing down from a three parameter Burr XII distribution to a two parameter Weibull distribution, which it nests as a 'frailty' parameter approaches zero. A further analogue is testing for the presence of an upper bound on , since an SF model with a tail truncated distribution for nests the standard SF model as the tail truncation point → ∞. Note that under the null hypothesis that = 0, the model reduces to a regression model with Student's t errors.

Application to Highways Maintenance Costs in England
In this section, we apply a Student's t-half normal model to the data set on highway maintenance costs in England used by Stead et al. (2017). In England, responsibility for road maintenance is divided between Highways England-until 2015 the Highways Agency-a publicly-owned company responsible for the strategic 'trunk road network', and the county councils and unitary authorities, who maintain the non-trunk roads within their boundaries. Our data are from the CQC Efficiency Network 1 and consist of costs and cost drivers associated with local authorities' highway maintenance activities.
Previous studies of road maintenance costs have focussed on the issue of marginal costs, and their implications for road pricing, rather than relative efficiency: these use data on motorways and canton roads in Switzerland (Schreyer et al., 2002), motorways in Austria (Sedlacek and Herry, 2002), roads in Sweden (Haraldsson, 2006;Jonsson and Haraldsson, 2008), trunk roads in Poland (Bak et al., 2006;Bak and Borkowski, 2009), and motorways and federal roads in Germany (Link, 2006;. One exception to the marginal cost focus of empirical studies, is a study of efficiency in road maintenance is that of Fallah-Fini et al. (2009), which applies DEA to data on eight counties of the US state of Virginia, with expenditure, traffic and equivalent single axle loads as inputs, road area and quality indicators as outputs, and climate factors as non-discretionary variables.
We used an unbalanced panel consisting of data on the 70 English unitary authorities and county councils that were members of the CQC efficiency network in 2015-16 and supplied data for at least one year from 2009-10 to that year; this gives us 327 observations in total. Cost data were supplied to the network by each authority according to definitions agreed by a working group of network members, and relate to carriageway maintenance activities only, i.e. they exclude costs associated with related activities such as winter service and footway maintenance. The data set is updated annually for a new round of analysis, and in this study we use the data set from the 2015-16 round, which was the first year that the network ran. We observe large differences in unit costs, with a large number of extreme outliers in both directions, that are clearly the result of reporting errors. As a result, standard SF models yield a wide range of efficiency predictions, motivating the development of robust SF methods.
In line with the previous literature mentioned above-see Link (2014)   -186.06 ---189.14 --Statistical significance at the: * 10% level, ** 5% level, *** 1% level We can see from Table 1 that both models result in generally similar parameter estimates; the main difference is in the standard errors of these estimates, which are approximately a third smaller in the t-half normal model than in the normal-half normal model. Note that while the parameters are comparable between the two models, the parameters are not, since the distribution of varies. The variance of is given in both cases by: While the variance of is, in the t-half normal and normal-half normal cases respectively: Our estimates of these are compared in below.  Table 2 shows that, in line with our expectations, and the findings of Stead et al. (2017) relating to the logistic-half normal model, that the Student's t-half normal model results in a lower estimated variance in inefficiency than the normal-half normal model, and that more of the total error variance is attributed to . The overall error variance is also slightly lower, likewise mirroring the results from the logistic-half normal model. Following from this, according to the discussion in Section 1, we expect to find a considerably narrower distribution of efficiency predictions from the t-half normal model owing to both the lower estimated ( ) and the greater shrinkage towards the mean resulting from the higher estimated ( ).  Table 3 compares the mean, minimum and maximum efficiency scores from the t-half normal and normal-half normal models obtained via exp[− ( | )]. As expected, the minimum efficiency estimate is considerably higher, and the maximum also significantly lower, in the thalf normal model, and therefore the range of efficiency scores is remarkably smaller: in this case less than half. A more complete description is given by Figure 1, which compares the kernel density estimates for the two sets of efficiency scores. Given the similarity of the frontier parameter estimates, as shown in Table 1, the residuals from the t-half normal and normal-half normal models are highly correlated. However the relationships between the residuals and efficiency predictions is shown by Figure 2  As discussed in Section 3.3, we are interested in testing two null hypotheses: first, that is normally distributed, in which case the t-half normal nests the normal-half normal model, and second that there is no inefficiency, and the likelihood ratio follows a 1:0 2 distribution in both cases. Log-likelihoods are given in Table 1, and are used to calculate the likelihood ratio statistic as shown in ( 28 ) for the first null hypothesis: this gives a likelihood ratio of 4.155 and a corresponding p-value of 0.012. For our second null hypothesis, the t-half normal model reduces to a regression model with Student's t errors: we do not report the results of this regression, except the log-likelihood, which is -189.140, so in this case we have a likelihood ratio of 4.150 and a corresponding p-value of 0.012. We therefore reject the null hypotheses of normally distributed and zero inefficiency, indicating that this model performs better than either the standard SF model or the Student's t regression model in this instance.

Summary and Conclusions
This paper proposes a new stochastic frontier model as means to account for outliers in the context of SFA. We have reviewed possible methods in the existing literature, including the adopting alternative distributional assumptions for . Our model develops the approach of Stead et al. (2017), in which maximum simulated likelihood methods are used to estimate a model combining a logistic distribution for with a half normal distribution for , by using a Student's t distribution for . The advantages of this distribution is that the kurtosis of is determined by a degrees of freedom parameter which is freely estimated, and that it nests the normal distribution as this parameter approaches infinity. This means that the heaviness of the In the t-truncated normal and t-gamma cases, respectively, we have: = σ 2 ln ( 48 )