Semi-parametric modelling of inefficiencies in stochastic frontier analysis

We propose a novel penalized splines method to estimate a stochastic frontier model in which the frontier is linear and the inefficiency has a single index structure with unknown link function and a linear index. The approach is more flexible than the traditional methodology requiring a parametric link function and, at the same time, it does not incur the curse of dimensionality as a fully non-parametric approach. The procedure can be easily implemented using existing software. We give conditions for the model to be identified and provide some asymptotic results. We also use Monte Carlo simulations to show that the approach works well in finite samples in many situations when compared to the well specified maximum likelihood estimator. An application to the residential energy demand of US states is considered. In this case, the penalized splines approach estimates inefficiency functions that deviate substantially from those resulting from parametric maximum likelihood methods previously implemented.


Introduction
Stochastic Frontier Analysis is widely used to compare the efficiency of organizations and inform policy decisions across a variety of sectors including health, environmental, energy, etc. The estimation of stochastic frontier models often involves strong parametric assumptions regarding the specification of both the efficient frontier and the inefficiency. Although the measures of inefficiency (at least in the sense of Jondrow et al. 1982) seem to be robust to the distribution of the inefficiency (e.g. Ondrich and Ruggiero 2001), there is no evidence about their robustness to the parametric specification of the determinants of inefficiency.
Concerns for the impact of the specification of the efficient frontier and the inefficiency has led to attempts to specify these non-parametrically. Important contributions include Fan et al. (1996), Kumbhakar (2007) and Simar et al. (2017) for the efficient frontier and Tran and Tsionas (2009) and Parmeter et al. (2017) for the determinants of inefficiency. However, fully nonparametric estimators of the frontier and/or the inefficiency have slow rates of convergence. Even in the case where the number of explanatory variables is as small as 4, the computational costs may be very high in any realworld application -especially when the smoothing parameter is selected for each dimension using crossvalidation criteria.
This paper builds on these previous attempts to relax the parametric assumptions traditionally used in specifying the inefficiency of a stochastic frontier model and proposes an alternative semi-parametric approach which imposes weaker assumptions on the inefficiency. Although not as general as the fully non-parametric methodology suggested by Tran and Tsionas (2009) and Parmeter et al. (2017), the approach does These authors contributed equally: Giovanni Forchini, Raoul Theler * Giovanni Forchini giovanni.forchini@umu.se * Raoul Theler raoul.theler@umu.se 1 USBE, Umeå Universitet, SE-901 87 Umeå, Sweden not incur the curse of dimensionality. Instead, it leads to a fast and accurate estimator of the inefficiency even when the number of explanatory variables is large. The proposed semi-parametric approach relies on writing the inefficiency as a single-index with unknown link function. By so doing, the linear stochastic frontier model becomes equivalent to a partially linear single index model for which an extensive literature exists (e.g. among others Carroll et al. 1997;Yu and Ruppert 2002;Xia and Härdle 2006;Lian et al. 2015). The paper provides an identification condition which is more natural for a stochastic frontier model than those used in the literature on partially linear single index models (e.g. Xia et al. 1999;Lin and Kulasekera 2007;Dong et al. 2016;Lian et al. 2015), and proposes an estimation procedure based on penalized cubic splines which can be easily implemented with existing statistical software. This offers practitioners the possibility to easily apply this methodology. We illustrate the approach by analyzing an application of stochastic frontier models to energy demand for US states.
A standard assumption used for identification in the literature on partially linear single index model requires the orthogonality of the parameters of the linear and the single index parts. This paper suggests a new test for this assumption implemented through a bootstrap procedure.
The rest of the paper is organized as follows. The second section describes the model. The third section proposes a methodology to estimate it. Its asymptotic properties are discussed in the fourth section. A Monte Carlo simulation comparing the new method with standard maximum likelihood procedures is presented in the fifth section. The sixth section illustrates an empirical application to the rebound effect for the US states energy demand.

The model
We consider a linear stochastic frontier (SF) model (e.g. Aigner et al. 1977;Meeusen and van Den Broeck 1977) of the form: where Y is the logarithm of output, X 0 β is the efficient frontier with p dimensional vector of inputs X, η is the intercept, β is a p dimensional vector of coefficients, U ≥ 0 is an unobserved inefficiency term, and V is an unobserved idiosyncratic error. It is typically assumed that U and V are uncorrelated given X.
Traditionally, a SF model is estimated using Maximum Likelihood (ML) under the assumptions that U $ jN 0; σ 2 U À Á j and V $ N 0; σ 2 V À Á -although other distributions may be used (e.g. Greene 1990). Under standard assumptions, the slope parameters, β, can also be consistently estimated by Ordinary Least Squares (OLS). Since, E[U|X] = g(X) > 0, the intercept η is identified only through distributional assumptions of U and V. More generally, the distribution of the inefficiency U may depend on a vector of observables. In this case, the OLS estimator of β may not be consistent. ML is typically used to estimate model (1) because it yields a consistent and efficient estimator when the model is well specified. Since economic theory provides little insight into the mechanism generating the inefficiency or even its determinants, the likelihood function must rely on strong and often restrictive assumptions.
Then μ, β, α and λ are identified. The proof of this result is in the appendix. We will now discuss the conditions required by Theorem 1. Conditions 1, 2, 3 and 6 are standard for single index models and are needed for the identification of λ and α. Condition 2, is normally implied by the consistency conditions of ordinary least squares when there is no inefficiency term. Precisely, one normally requires that E||X i || 1+δ < Δ < ∞ for some δ > 0 and i = 1,..., n which implies 2. In the case under consideration, Condition 3 is satisfied by construction. Condition 6 excludes the case where the function λ is linear. In such case, λðX 0 αÞ ¼ μ 1 þ cX 0 α, and Y ¼ ðμ À μ 1 Þ þ X 0 ðβ À cαÞ þ ε and the parameters cannot be identified. Condition 4 is discussed in (Xia et al. 1999) and is required to make sure that the model can be written uniquely as in (2).
Condition 5 is non-standard, and requires some discussion. By construction, EðYjXÞ ¼ μ þ X 0 β À λðX 0 αÞ so by taking derivatives with respect to the components of X: there is a limit point X 0 and a sequence X j ∈ S, j = 1,... such that X j → X 0 and: Hence, in a small neighborhood of X 0 it is possible to vary the level of production without affecting the inefficiency. Of course, this assumption could be replaced by the condition that α 0 β ¼ 0 (e.g. Xia et al. 1999;Lin and Kulasekera 2007). To see how this differs from condition 5, let v be a p − dimensional vector on a unit sphere v So if α 0 β ¼ 0 it is possible to choose v in the space spanned by β (and orthogonal to α) such that v 0 ∇ X EðYjXÞ ¼ v 0 β for all X ∈ S. Therefore, it is possible to modify the input X to change the output and leave the inefficiency unaffected for each X. On the other hand, if condition 5 holds, this could happen only at a limit point X 0 .
Notice that although μ is identified, its components γ and δ are not. Hence, without further assumptions, gðX 0 αÞ ¼ λðX 0 αÞ þ δ can be identified up to an additive constant only. However, since gðX 0 αÞ ! 0, one can identify δ by further requiring that δ is finite and inf X gðX 0 αÞ ¼ 0. Such an assumption is quite natural as it implies that there is a vector X where the expected inefficiency is arbitrarily close to zero. In this case, one can set δ ¼ À inf X λðX 0 αÞ.
A special case of model (2) is the SF model with separability between the inputs to the efficient frontier and the determinants of the inefficiency (e.g. Simar and Wilson 2007;Wang and Schmidt 2002). Partitioning X as X ¼ ðX 0 1 X 0 2 Þ 0 , separability implies that α ¼ ð0α 0 2 Þ 0 and β ¼ ðβ 0 1 0Þ 0 . This is normally justified by appealing, for instance, to the managers' natural skills which are increased or reduced by the variables in X 2 but do not depend on the firm's characteristics in X 1 . This assumption is often violated as X 2 may contain some of the variables in X 1 either directly or after a transformation. Model (2) contains the class of SF models satisfying the scaling property (e.g. Simar et al. 1994;Wang and Schmidt 2002). In this case, U ¼ σ U ðX 0 αÞU Ã where U * ≥ 0 has a fixed and known distribution independent of X. For instance, if U *~| N(0, 1)|, then E½UjX ¼ ffiffiffiffiffiffiffi ffi 2=π p σ U ðX 0 αÞ. The latent variable U * can be interpreted as the firm's base efficiency level (e.g. Alvarez et al. 2006, p. 203). However, notice that model (2) also contains some important specifications considered in the SF literature that do not satisfy the scaling property (e.g. Battese and Coelli 1995;Kumbhakar et al. 1991;Deprins and Simar 1989). Parametric stochastic frontier models with the scaling property are normally estimated by nonlinear least squares (e.g. Simar et al. 1994;Deprins and Simar 1989;Paul and Shankar 2017;Parmeter and Kumbhakar 2014) or ML (e.g. Simar et al. 1994;Wang and Schmidt 2002)).
Fully non-parametric specifications for the inefficiency are considered by Tran andTsionas (2009) andParmeter et al. (2017) who employ the approach of Robinson (1988). Although very general, these methodologies are subject to the curse of dimensionality unless X contains only a small number of variables.
Partially linear single-index models like (2) have been studied in the statistics literature where most papers assume separability between the linear and the non-linear parts (e.g. Carroll et al. 1997;Yu and Ruppert 2002;Xia and Härdle 2006;Yuan 2011). A few articles focus on models without separability and discuss their identification (e.g. Xia et al. 1999;Lin and Kulasekera 2007;Dong et al. 2016;Lian et al. 2015), traditionally based on three assumptions: (a) α 0 α ¼ 1 with the first nonzero element positive; (b) β 0 α ¼ 0 and (c) some regularity conditions regarding the unknown function λ. Dong et al. (2016) have recently shown that a sufficient condition for identification is that (i) λ is bounded -with some regularity conditions -and (ii) the support of X is unbounded (precisely X is a unit root process). Some of the assumptions may be too restrictive for the estimation of a stochastic frontier model. For example β 0 α ¼ 0 implies that a firm can always change the composition of inputs to reduced the inefficiency without affecting the level of production. Similarly, boundedness is violated by the standard assumption that the inefficiency term is exponential.
Notice that one could consider a panel data version of (1) by allowing for for instance for fixed effects. Provided α, β and λ are the same over both the cross-section and the timeseries dimensions, the model is identified and can be estimated using the procedure described below. Although the time series dependence makes the asymptotic theory more complex, the bootstrap could be used to construct confidence intervals and tests (e.g. Kapetanios 2008). Full investigation of this case is beyond the scope of the paper.

Estimation of the model
In this section, we propose a procedure to estimate model (2) by penalized splines. This method is well-established in the statistical literature (e.g. Ruppert et al. 2003) and can be implemented using standard software.
The fundamental idea is that for fixed α, β and γ, the function λ can be estimated by minimizing: with respect to λ, where 1=w 2 i is a suitable weight. The second term is a penalization that prevents over-fitting. If λ belongs to a Sobolev space, it can be shown that the solution of this problem is a natural cubic spline with continuous first and second derivatives at each knot located at the unique points X 0 i α, i = 1, . . . , n (e.g. among others Hastie and Tibshirani 1993, p. 151). However, evaluating λ at each observation is not only timeconsuming but also unnecessary when the complexity of λ is moderate. Instead, we approximate the unknown function λ by a linear combination of L + 3 < n cubic splines S 1 (u), . . . , S L+3 (u): where c ¼ c 0 ; c 1 ; :::; c Lþ3 ð Þ 0 and SðuÞ ¼ 1; S 1 ðuÞ; :::; ( l = 4, . . . , L + 3 and ξ 4 < . . . < ξ L+3 are L fixed knots in the support of λ. If the function λ is smooth and unimodal or monotonic, a number of knots between L = 5 and L = 10 is usually sufficient to make the approximation accurate (e.g. Yu and Ruppert 2002;Ruppert 2002). Notice that in this case we can write Q n,γ (μ, α, β, c). Notice that we can set the parameter δ asδ ¼ À infλðX 0 iα Þ to guarantee thatĝðX 0 αÞ ¼δ þλðX 0 αÞ ! 0 (see Parmeter et al. 2017) for a way to enforce positivity of g when this is fully nonparameteric).
The parameters α, β and c (defined in (7)) can be estimated for a given γ by minimizing (6). Hence, we can estimate α, β, c as detailed in Algorithm 1. Notice that we impose the sample version of Condition 3 of Theorem 1 as n À1 P n i¼1 λðu i Þ ¼ 0. The tuning parameter γ can be chosen in different ways: (i) it can be taken as fixed; (ii) it can be allowed to tend to zero as the sample size n tends to infinity; (iii) it can be chosen to minimize criteria based on the prediction error such as GC, GCV or more classical criteria such as AIC or Mallows C p (e.g. Yu and Ruppert 2002;Wood 2008); (iv) it can be based on the Restricted Maximum Likelihood (REML) criterion following Wood (2011).
The weights 1=w 2 i could be taken to be 1. However, to improve efficiency, one may employ a two-stage procedure in which they are updated along the lines of Lian et al. (2015) so that VarðεjXÞ ¼ expfϕðX 0 νÞg. Hence, one can use the residualsε i ¼ Y i À μ À X 0 iβ þλðX 0 iα Þ to fit a single index model by minimizing: with a cubic spline approximation to ϕ. Then the weights w i ¼ expð b ϕðX 0 i b νÞÞ are updated and the model is reestimated. Alternatively one could model the squared residuals instead of the absolute residuals.
One referee brought to our attention that estimation based on a penalization, as in the procedure suggested above, could potentially hide identification issues.
The following algorithm describes the estimation procedure.
Algorithm 1 Iterative procedure to estimate α, β, c and γ 1. Set starting values forγ ! 0,α with first element being non-negative and satisfyingα 0α ¼ 1,β and w i = 1 2. Minimize Q n;γμ ;α;β; c À Á with respect to c 3. Estimateγ by REML (or other criteria such as GCV, AIC, C p ) 4. Repeat from 2 until a convergence criterion is satisfied 5. Minimize Q n;γ μ; α; β;ĉ ð Þwith respect to μ, α and β subject to the constraints α 0 α ¼ 1 6. Repeat from 2 until a convergence criterion is satisfied 7. Update w 2 i 8. Repeat from 2 until a convergence criterion is satisfied Although Q n,γ (μ, α, β, c) can be minimized in several way, Algorithm 1 has the advantage that it can be easily implemented with existing software. Essentially Algorithm 1 separates the estimation of α and β from the estimation of λ (or more precisely c and γ). For example, in R estimation of λ can be implemented with the mgcv package (Wood 2020) while estimation of α and β solves a nonlinear least squares problem which relies on an optimization package like optim (R Core Team 2013) or, for global optimization using genetic algorithms, the GA package (Scrucca 2013). We have experimented with alternatives to Algorithm 1 and found them less reliable.
The algorithm is similar to the one used by Yu and Ruppert (2002). However, there are some important differences. Firstly, their model separates the variables appearing in the linear and the nonlinear part and thus it is a special case of (2). Secondly, Yu and Ruppert (2002) choose the value of the tuning parameter γ by GCV which penalizes over-fitting less severely than the REML criterion of Wood (2011). We use REML as Wood (2011) convincingly argues that this approach exhibits good stability properties.
Alternative series expansion can be used instead of natural cubic splines. For example Dong et al. (2016) use an orthogonal series expansion in terms of Hermite polynomials with the number of terms in the series selected by GCV. Orthogonal series expansions have two drawbacks due to their global nature. First, they are sensitive to outliers (Li and Racine 2007, p. 446;Green and Silverman 1994, p. 2). Second, they tend to exhibit a poor fit in those parts of the support with few observations (Ramsay and Silverman 1997, p. 48;Hastie et al., 2001, p. 140).
Kernel estimation could be used as an alternative to penalized splines. Xia et al. (1999) use an alternative twostage estimation procedure in which they first approximate E½YjX 0 α and E½XjX 0 αβ, two functions of X 0 α only, with leave-one-out kernel series, that we denote b E h ½YjX 0 i α and b E h ½XjX 0 i α where h is the bandwidth. Firstly, α, β and h are estimated by minimizing: Secondly, the function λ is estimated aŝ Tran and Tsionas (2009) and Parmeter et al. (2017) use a similar approach to estimate a SF model without imposing a single index structure on λ.
Kernel methods and penalized splines tend to be asymptotically equivalent (Messer 1991). In finite samples, the penalized splines approach tends to estimate λ with lower variance than kernel estimation (e.g. Eilers and Marx 1996). Eilers and Marx (1996) also suggest that the penalized splines can often be preferable from a computational point of view in the onedimensional case and if the number of knots is substantially smaller than the sample size -as in the case considered here.

Asymptotic properties
The asymptotic properties of the procedure outlined in the previous section can be derived following the work of Yu and Ruppert (2002). Therefore, we will only report the results here.
Assumption C.
These assumptions are standard. Assumption C is needed to make sure our estimator is consistent while assumptions C and AN imply asymptotic normality.
andθ be an estimator of θ. The following results hold.
a. Under Assumption C, if the smoothing parameter γ = o(1), then a sequence of penalized least squares estimators minimizing expression (6) exists and is a strongly consistent estimator of θ * . b. Under Assumptions C and AN, if the smoothing parameter γ ¼ o n À1=2 À Á , then a sequence of constrained penalized least squares estimator of θ exists, is consistent, and is asymptotically normally distributed c. Under Assumptions C and AN, if γ > 0 is given, then The proof of Theorem 2 follows closely the proofs of Theorems 1 and 2 of Yu and Ruppert (2002), and is not reported here. Theorem 2 allows one to construct tests of hypotheses and confidence intervals based on the asymptotic standard errors. Alternatively, robust standard errors can be obtained using bootstrap procedures. In the example of application discussed later we will use the wild bootstrap.
It was shown in the discussion following Theorem 1 that one could replace condition 5 with the stronger but more traditional condition α 0 β ¼ 0. A test for H 0 : α 0 β ¼ 0 can be based on the statisticα 0β a the following result shows.
Theorem 3 Assume that as n → ∞: Precise estimation of the asymptotic variance-covariance matrix Σ * in Theorem 3 is difficult since it requires estimation of the covariance matrices in Theorem 2 parts b and c which have large dimensions. However, this problem can be overcome by noticing that Theorem 3 requires the estimation of σ 2 Ã only, and this can be estimated as the bootstrapped variance of ffiffi ffi n pα 0β obtained from bootstrapping the residuals of the estimated model.

Numerical comparison
This section investigates the finite sample properties of the methodology developed in Section 4 and compares it to the parametric methods traditionally implemented in the SF literature using Monte Carlo simulations. We consider a simplified version of the stochastic frontier model (1) with four explanatory variables X i ¼ X 1i ; ½ X 2i ; X 3i ; X 4i 0 . We also assume that: , are independent random variables 3. X i , V i , and U Ã i are independent of each other. 4. X 1i , X 2i , X 4i are sampled independently of each other from a normal distribution with mean 4 and variance 4, and X 3i = 0.5 × X 2i + 0.5 × N(4, 4).
Þj, κ captures the strength of the signal in U i in relation to the signal of X 0 β in (1) and the noise Var(V i ). Precisely, i. The signal of the frontier is the variance of X 0 β; in our case, this is ii. The signal of the inefficiency is measured by the variance of U i The variable κ is chosen so that Var(U i ) is slightly less than half the signal in the frontier, VarðU i Þ ¼ 0:45VarðX 0 i βÞ iii. The noise parametrized by σ 2 V ¼ 0:1VarðU i Þ is dominated by the signal of the inefficiency U i . 7. Three specifications for the inefficiency are considered: i. exponential σ U (x) = e x ; ii. logistic σ U ðxÞ ¼ 1 þ e Àx ð Þ À1 ; and iii. trigonometric: σ U ðxÞ ¼ 1 þ cosðxÞ; iv. partially linear x 4π 9 1 þ cosð2:25xÞ x > 4π 9 8 > < > : : 8. Both situations where the input to the production function and the determinants of inefficiency are separable and non-separable are considered.
Assumption 6 imposes the scaling property on the inefficiency. This is done for simplicity as it allows us to control the strengths of the signals/noise in X i β, U i and V i .
The exponential and the logistic specifications of the inefficiency in (7.i) and (7.ii) respectively are often used in practical applications of stochastic frontier and they make a natural comparison. They are usually estimated using maximum likelihood, which is asymptotically efficient if the model is well specified, and as such they also represent a Table 1 Exponential inefficiency For the model with separability α ¼ ð0:6; À0:8; 0:0; 0:0Þ 0 , β = (0, 0, −0.3, 0.8). For the model without separability α ¼ ð0:5; À0:5; À0:5; 0:5Þ 0 and β ¼ ð0:75; À0:25; À0:25; 0:75Þ 0 useful benchmark. Specifications (7.iii)-(7.v) are used to investigate the case where the inefficiency is not monotonic in X 0 i α, or it is linear or constant over a certain interval. The first case captures, for example, the possibility that a young farmer's productivity increases with age but then the productivity decreases when the farmer is old (e.g. Wang 2002). Specifications (7.iv) and (7.v) capture inefficiencies in which the parameters are close to be unidentified because the inefficiency is linear or constant on part of the support of the link function.
• the maximum likelihood estimator which assumes a logistic inefficiency -ML(l); when separability is also imposed in the estimation it will be denoted by ML s (l).
We investigate how the penalized splines estimators compare to the maximum likelihood estimator both when the latter is well specified and when it is misspecified. The criteria used to compare the estimators are the Median Bias (MB) and the Median Absolute Deviance (MAD). Both criteria are robust to the potential lack of moments of the estimators.  The results of 10,000 simulations are summarized in Tables 1-5 covering respectively, exponential, logistic, trigonometric, locally linear and partially constant inefficiencies. In Tables 1 and 2, the correctly specified ML estimator is highlighted in gray. This is the asymptotically efficient estimator, and can represent a benchmark for comparison.
As expected, Tables 1 and 2 show that-when correctly specified-the ML estimator tends to perform better than all other estimators of α and β in terms of both MB and MAD. The PS and WPS estimator perform slightly worse in terms of both MB and MAD, while the ML estimator for the incorrectly specified inefficiency can have a large bias for α but no for β. Imposing separability, when this holds, helps the estimation of both α and β for both correctly and misspecified estimators.
The PS and WPS estimator of μ performs well in terms of MB but not so well in terms of MAD. Notice that η and δ are  identified in the parametric model through the parametric assumptions. However, with semiparametric inefficiency they are not identified. As discussed earlier, in this case, we setδ equal to the smallest upward shift which makes gðx 0α Þ 1 λðx 0α Þ þδ non-negative over the observed sample. So it can be quite far from the "true" δ. This also affects the estimation of η for the penalized spline estimators.
In Tables 3-5 the inefficiency are specified respectively as trigonometric, partially linear and partially constant. In this case, the PS and WPS estimators are not compared to the correctly specified ML estimator but to incorrectly specified ML estimators resulting from assuming that the inefficiency is exponential or logistic. The tables show that the PS and WPS estimators of α and β perform well overall. It is important to  To summarize, when compared to the correctly specified ML estimators, the PS and WPS estimators of α and β tend to be close the ML in terms of both MB and MAD. However, correcting for heteroskedasticity in the penalized spline estimator does not improve the performance. Overall, Tables 1-5 suggest that the PS estimator of α and β works well in all situations considered.
We now turn to the test for orthogonality of α and β. Table 6 shows the rejection probability for the test of orthogonality based on the PS estimator for different specification of the inefficiency. The reported numbers are the proportion of 1000 simulations for which the test statistic is larger that the 10% and 5% critical values from a χ 2 distribution with one degree of freedom. The data has been generated through the eight steps detailed above. For the case of orthogonality, α ¼ ð0:6; À0:8; 0:0; 0:0Þ 0 , β = (0, 0, −0.3, 0.8). For the model without orthogonality, α ¼ ð0:5; À0:5; À0:5; 0:5Þ 0 and β ¼ ð0:75; À0:25; À0:25; 0:75Þ 0 . The variance of ffiffi ffi n pα 0β is estimated using 100 bootstrap replications. Table 6 shows that when α and β are orthogonal the 10% and 5% rejection probabilities can be up to two and half times higher than the nominal size. When they are not orthogonal, the rejection probability are, as expected, much higher.
6 An application to US residential energy demand Orea et al. (2015) estimate a stochastic frontier model for residential aggregate energy demand using a panel of 48 US states for the period 1995 to 2011. Their dependent variable is the log of energy consumption (lnðQÞ) and their explanatory variables are: X ¼ lnðYÞ; lnðPÞ; lnðPOPÞ; lnðAHSÞ; lnðHDDÞ; lnðCDDÞ; SDH ½ where Y is the real disposal personal income, P is the real price of energy, POP is the population size, HDD indicates the heating degree days, CDD indicates the cooling degree days, AHS is the average household size and SDH is the share of detached houses in each state. All variables are demeaned and time dummies are included in the linear frontier but not in the inefficiency. Orea et al. (2015) assume that the inefficiency satisfies the scaling property, U ¼ σ U ðX 0 αÞU Ã where U Ã $ jNð0; σ 2 U Ã Þj. They specify two parametric forms for σ U , exponential σ U ðX 0 αÞ ¼ e Àα 0 ÀX 0 α , and logistic σ U ðX 0 αÞ ¼ 1þ ð e α 0 þX 0 α Þ À1 , that are referred to respectively as the PA and SC (for partial and super-conservative rebound) specifications by Orea et al. (2015). Notice that α 0 and σ 2 U Ã cannot be both identified in the exponential case. Hence, to identify all the parameters of the stochastic frontier model with exponential σ U , Orea et al. (2015) Orea et al. (2015) restrict the parameter α in the inefficiency. Precisely, they assume that the coefficients of lnðHDDÞ, lnðCDDÞ and SDH are zero and that the composite variable lnðY=POPÞ is a determinant of inefficiency (so that the coefficients of lnðYÞ and lnðPOPÞ are opposite). The penalized spline estimation is implemented with and without these restrictions (with the resulting estimators denoted by PS(r) and PS(u) or WPS(r) and WPS(u) when correcting for heteroskedasticity). Table 7 displays the estimated parameters and corresponding standard errors for the frontier and the inefficiencies. The table shows the ML estimates and standard errors assuming exponential and logistic inefficiency obtained by Orea et al. (2015) when imposing constraints on α, the PS and WPS estimates imposing constraints on α (denoted respectively by PS(r) and WPS(r)) and the PS and WPS estimates for an unconstrained α (denoted respectively by PS(u) and WPS(u)). The standard errors for the PS and WPS estimators are obtained using the wild bootstrap.
When imposing restrictions on α as in Orea et al. (2015), the estimates of the coefficients of the frontier are very close. However, the estimates of α are different because the PS method imposes the standardization α 0 α ¼ 1. When the PS estimator is unconstrained, the estimated coefficients ln(POP), ln(P) and ln(AHS) in the frontier are close to the corresponding ML estimates constraining α, but the remaining estimates of the coefficient of the frontier and the inefficiency are quite different. Notice that correcting for heteroskedasticity leads to the same estimates as for the PS estimator. Figure 1 shows the residuals of the frontier estimation (i.e. Y À X 0β ) as a function of X 0α plotted with the predicted value of the inefficiencyλ. The estimated inefficiencies under the For the model with imposed orthogonality, the parameters are α ¼ ð0:6; À0:8; 0:0; 0:0Þ 0 , β = (0, 0, −0.3, 0.8). For the model without orthogonality, α ¼ ð0:5; À0:5; À0:5; 0:5Þ 0 and β ¼ ð0:75; À0:25; À0:25; 0:75Þ 0 .

Table 7
Coefficient estimates for fully parametric stochastic frontier models with exponential and logistic inefficiency, and for the semiparametric inefficiency The estimates and standard errors for the Exponential and Logistic models are from Table 4 of Orea et al. (2015). The standard errors for the PS and WPS models are obtained by the wild bootstrap. All models include dummy variables for year. *** indicates significance at 1%; ** Indicates significance at 5%; * Indicates significance at 10%. The results of the significance tests for the PS methods are obtained from the quantiles computed on the replicated estimates. The standard errors for the PS estimators are obtained by the wild bootstrap. All optimizations are implemented using genetic algorithms (Scrucca 2013) logistic and the exponential specifications are very similar but not identical. The form of the estimated inefficiency is quite different from that of PS(r) and WPS(r). Notice that the correlation between the residuals of the frontier and between the single indices for the different estimators are very high (see Table 8) with the exception of PS(u). A close look at the graph suggests that extreme observations are important in determining the estimated inefficiency and that the implicit assumption of monotonicity and convexity (everywhere for the exponential and for positive values for the logistic) may be important. In fact, if one re-estimates the PS(r) specification imposing monotonicity and convexity of λ, the resulting inefficiency has a shape much closer to that of Orea et al. (2015) as illustrated by Fig. 2. Thus, the assumption of logistic or exponential inefficiency implicitly shapes the rebound effect. Figure 1 also shows the estimated inefficiency when the restrictions of Orea et al. (2015) are not imposed. In this case, the estimated λ is concave on the left-hand-side indicating that the rebound to efficiency occurs slowly for small values of Exponential Logistic PS(r) WPS(r) PS(u) WPS(u) Fig. 1 Estimated inefficiencŷ λðX 0α Þ and residuals Y À X 0β . For the PS estimator of the semiparametric inefficiency a confidence interval for the inefficiency is constructed by using 1000 wild bootstrap simulations to obtain estimates of the coefficients α and β and c in (7) and to construct a 95% prediction interval for each X i X 0 α. To check the sensitivity of the estimated inefficiency to the estimation procedure, we re-estimate it (a) imposing a large penalty parameter γ = 100 to make the estimated line smoother (larger values only marginally change the estimated inefficiency) and (b) imposing monotonicity (see Fig. 3). The estimated inefficiencies are very close, suggesting that they are less sensitive to the underlying assumption than the restrictive specification described earlier.

Conclusion
The paper suggests a novel procedure to estimate a stochastic frontier model with a semi-parametric specification for the inefficiency. This is achieved by recasting a stochastic frontier model into a partially linear single index model. The resulting approach is easy to implement, uses existing software and is computationally efficient in comparison to a fully non-parametric specification. It also allows to test orthogonality between the coefficients of the stochastic frontier and those of the single index in the inefficiency, an assumption often made in partially linear single index models. Monte Carlo simulations have been used to compare the performance of the penalized spline procedure with the maximum likelihood estimator -with both a correctly and incorrectly specified inefficiency. The performance of the penalized spline procedure is close to that of the correctly specified maximum likelihood estimator, which is efficient, and it considerably improves upon a maximum likelihood estimator with an incorrectly specified inefficiency.
An application to the residential energy demand of US states highlights the practical importance of relaxing the parametric assumptions.

Data availability
The dataset for the application in "An application to US residential energy demand" section is available as online Supplementary Material to Orea et al. (2015) in the Dataverse repository, https://doi.org/10.1016/j.eneco.2015.03.016.

Code availability
The R code used in the "Numerical comparison" section will be made available on GitHub at https://github.com/ gforchini/SFA_Semiparametric_inefficiency.

Compliance with ethical standards
Conflict of interest The authors declare no competing interests.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.
8 Appendix A