Consider a frontier of the form Yt = g(xt; β) and the corresponding observation equation:
$$y_t = \ln g\left( {{{{\mathbf{x}}}}_t;\beta } \right) - \omega {{{\mathrm{u}}}}_t + {{{\mathrm{v}}}}_t,\quad \quad t = 1, \ldots ,T$$
(1)
with vt representing the i.i.d. observation errors with symmetric and unimodal probability density function (pdf), ut being an i.i.d. latent variables, taking strictly positive values, representing the inefficiency terms, and with known \(\omega \in \left\{ { - 1,1} \right\}\); thus the compound error is \({\upvarepsilon}_t = {{{\mathrm{v}}}}_t - \omega {{{\mathrm{u}}}}_t\). Basically ut and vt are assumed independent of the regressors xt, though much effort has been made recently in order to handle the potential endogeneity problem in SF models; we return to this issue Section 4. Here, g(x; β) is a known function (of unknown parameters β and exogenous variables x) representing the frontier under consideration; it corresponds to, e.g., cost function if \(\omega = - 1\), or production function if \(\omega = 1\). Additionally, yt denotes the log of observable cost or output (Yt). We make the following assumptions about vt’s and ut’s:
A1. vt’s are i.i.d. variables, with zero median, symmetric, unimodal, and continuous pdfs: \(p_{{{\mathrm{v}}}}\left( {{{{\mathrm{v}}}};\uptheta ^{\left( {{{\mathrm{v}}}} \right)}} \right)\), \({{{\mathrm{v}}}} \in {\Bbb R}\);
A2. ut’s are nonnegative i.i.d. variables, having continuous pdfs: \(p_{{{\mathrm{u}}}}\left( {{{{\mathrm{u}}}};\uptheta ^{\left( {{{\mathrm{u}}}} \right)}} \right)\), \({{{\mathrm{u}}}} \in {\Bbb R}_ +\);
A3. all vt’s are stochastically independent of all ut’s.Footnote 2
In our view, the assumptions A1–A3 are minimalistic and take into account the considerations regarding statistical identification of parameters. In particular, we argue that given a sufficiently general form of \(p_{{{\mathrm{u}}}}\left( . \right)\), and \(p_{{{\mathrm{v}}}}\left( . \right)\), the potential gain in terms of statistical fit arising from the relaxation of A1-A3 (for example, relaxation of independence of v’s and u’s) would be very limited. It could also weaken overall model identification, which SFA already suffers from to a varying degree. The resulting likelihood function would have a very small curvature in certain directions in the parameter space. Hence, although the resulting model would be locally identified almost everywhere, the parameterization would be less convenient from the statistical inference viewpoint. Though a formal investigation of the issue is left for further research, there is an informal motivation for assumptions A1-A3. That is, crucial gains in terms of explanatory power are likely to stem from a generalization of \(p_{{{\mathrm{u}}}}\left( . \right)\) and \(p_{{{\mathrm{v}}}}\left( . \right)\) rather than from relaxing A1–A3.
Once the parametric form of \(p_{{{\mathrm{u}}}}\left( . \right)\) and \(p_{{{\mathrm{v}}}}\left( . \right)\) is set, the vector of all statistical parameters is θ′=(β θ(v)′θ(u)′)′. Statistical inference regarding θ relies on properties of the compound error term εt. In the general case, its density \(p_{\upvarepsilon}\left( . \right)\) is defined by the convolution of the respective densities of vt and ut:
$$\begin{array}{l}p_{\upvarepsilon}\left( {y - \ln g\left( {{{{\boldsymbol{x}}}};\beta } \right);\theta ^{\left( \varepsilon \right)}} \right) = \mathop {\int}\limits_{{\Bbb R}_ + } {p_{{{\mathrm{v}}}}} \left( {y - \ln g\left( {{{{\boldsymbol{x}}}};\beta } \right) + \omega {{{\mathrm{u}}}};\theta ^{\left( {{{\mathrm{v}}}} \right)}} \right)\\\qquad\qquad\qquad\qquad\qquad\quad\,{p}_{{{\mathrm{u}}}}\left( {{{{\mathrm{u}}}};\theta ^{\left( {{{\mathrm{u}}}} \right)}} \right)d{{{\mathrm{u}}}}\end{array}$$
(2)
Note that in the majority of practical applications vt is assumed to be Gaussian, while \(p_{{{\mathrm{u}}}}\left( . \right)\) is half-normal or exponential. Moreover, for now, we assume that there exists a one-to-one relationship between parameters of the structural form, i.e., β, θ(v), θ(u), and the parameters of the reduced form: β, θ(ε), so no identification issues arise (possible identification problems are discussed later). However, in the case of stochastic frontier models, the statistical inference is not restricted to θ’s, since the latent variables (ut’s), are also within the scope of interest, as they represent object-specific inefficiency terms.
We assume a general parametric distributional form for \(p_{{{\mathrm{v}}}}\left( . \right)\), and \(p_{{{\mathrm{u}}}}\left( . \right)\), making use of two distributions: the generalized t distribution (GT) and the generalized beta distribution of the second kind (GB2, see references in Harvey and Lange 2017), respectively:
$$\begin{array}{l}f_{GT}\left( {{{{\mathrm{v}}}};\sigma _{{{\mathrm{v}}}},\nu _{{{\mathrm{v}}}},\psi _{{{\mathrm{v}}}}} \right) = \frac{1}{{\sigma _{{{\mathrm{v}}}}}}\frac{{\psi _{{{\mathrm{v}}}}}}{{{\rm B}\left( {1/\psi _{{{\mathrm{v}}}},\nu _{{{\mathrm{v}}}}/\psi _{{{\mathrm{v}}}}} \right)2\nu _{{{\mathrm{v}}}}^{1/\psi _{{{\mathrm{v}}}}}}}\\\qquad\qquad\qquad\quad\left[ {\frac{1}{{\nu _{{{\mathrm{v}}}}}}\left( {\frac{{\left| {{{\mathrm{v}}}} \right|}}{{\sigma _{{{\mathrm{v}}}}}}} \right)^{\psi _{{{\mathrm{v}}}}} + 1} \right]^{ - \left( {1 + \nu _{{{\mathrm{v}}}}} \right)/\psi _{{{\mathrm{v}}}}},\end{array}$$
(3)
$$\begin{array}{l}f_{GB2}\left( {{{{\mathrm{u}}}};\sigma _{{{\mathrm{u}}}},\nu _{{{\mathrm{u}}}},\psi _{{{\mathrm{u}}}},\tau } \right) = \frac{1}{{\sigma _{{{\mathrm{u}}}}}}\frac{{\psi _{{{\mathrm{u}}}}}}{{{\rm B}\left( {\tau {{{\mathrm{/}}}}\psi _{{{\mathrm{u}}}},\nu _{{{\mathrm{u}}}}/\psi _{{{\mathrm{u}}}}} \right)\nu _{{{\mathrm{u}}}}^{\tau /\psi _{{{\mathrm{u}}}}}}}\\\qquad\qquad\qquad\qquad\left( {\frac{{{{\mathrm{u}}}}}{{\sigma _{{{\mathrm{u}}}}}}} \right)^{\tau - 1}\left[ {\frac{1}{{\nu _{{{\mathrm{u}}}}}}\left( {\frac{{{{\mathrm{u}}}}}{{\sigma _{{{\mathrm{u}}}}}}} \right)^{\psi _{{{\mathrm{u}}}}} + 1} \right]^{ - \left( {\tau + \nu _{{{\mathrm{u}}}}} \right)/\psi _{{{\mathrm{u}}}}},\end{array}$$
(4)
with B(.,.) denoting the beta function. The above formulation of the GT distribution assumes that the mode, median, and mean (if the latter exists) is zero, in line with the usual formulation of SF models. The GB2 density is often formulated as \(f_{GB2}\left( {x;a,b,p,q} \right) = \frac{{\left| a \right|x^{ap - 1}}}{{{\rm B}\left( {p,q} \right)b^{ap}}}\left[ {\left( {\frac{x}{b}} \right)^a + 1} \right]^{ - \left( {p + q} \right)}.\) The parameterization in (4) used throughout this paper is its equivalent and implies: \(p = \tau /\psi _u\), \(q = \nu _u/\psi _u\), \(a = \psi _u\),\(b = \sigma _u\nu _u^{1/\psi _u}\). The reason for using an alternative parameterization is to emphasize the relationship between GT and GB2 distributions. Note that with τ = 1, the \(f_{GB2}\left( {.;\sigma _u,\nu _u,\psi _u,1} \right)\) distribution is equivalent to half-GT distribution with parameters \(f_{GT}\left( {.;\sigma _u,\nu _u,\psi _u} \right)\). In other words, the absolute value of a GT variable distributed as \(f_{GT}\left( {.;\sigma _u,\nu _u,\psi _u} \right)\) follows the GB2 distribution \(f_{GB2}\left( {.;\sigma _u,\nu _u,\psi _u,1} \right)\). Additionally, the reason for not using the double GB2 (instead of GT) for \(p_{{{\mathrm{v}}}}\left( . \right)\) is that we would violate A1 (by double GB2 we mean symmetric distribution over the real line such that the absolute value of the corresponding variable follows the GB2 distribution). The double GB2 distribution with τ > 1 would be bimodal, whereas τ < 1 implies a lack of continuity at the mode (with the density function approaching infinity from both sides of zero). We find such properties to be undesirable given the interpretation of the symmetric observation error term vt that is broadly accepted in SFA. However, the framework described here could in principle be extended to encompass such cases, as the resulting density of observables, \(p_{\upvarepsilon}\left( . \right)\) would be nevertheless continuous.
It is clear that σv in (3) and σu in (4) are scale parameters (with σv being analogous to the inverse of precision in Student’s t distribution), while \(\psi\)’s and \(\nu\)’s are shape parameters. In particular, \(\nu _{{{\mathrm{u}}}}\) and \(\nu _{{{\mathrm{v}}}}\) control tail thickness (and, hence, the existence of finite moments – analogously to the degrees-of-freedom parameter in Student’s t distribution). Moreover, for parameters \(\nu _{{{\mathrm{u}}}}\) and \(\nu _{{{\mathrm{v}}}}\) we also consider the limiting cases (of \(\nu _{{{\mathrm{u}}}} \to \infty\) or \(\nu _{{{\mathrm{v}}}} \to \infty\)). In order to analyze the limiting behavior of the normalizing constants and the kernels of (3) and (4), note that (following Harvey and Lange 2017; who cite Davis 1964):
$$\frac{\psi }{{{\rm B}\left( {\tau /\psi ,\nu /\psi } \right)\nu ^{\tau /\psi }}}\begin{array}{*{20}{c}} \to \\ {\nu \to \infty } \end{array}\frac{\psi }{{{{\Gamma }}\left( {\tau /\psi } \right)\psi ^{\tau /\psi }}},$$
(5)
$$\left[ {\frac{1}{\nu }\left( {\frac{{\left| x \right|}}{\sigma }} \right)^\psi + 1} \right]^{ - \left( {\tau + \nu } \right)/\psi }\begin{array}{*{20}{c}} \to \\ {\nu \to \infty } \end{array}\exp \left[ { - \frac{1}{\psi }\left( {\frac{{\left| x \right|}}{\sigma }} \right)^\psi } \right],$$
(6)
where \({{\Gamma }}\left( . \right)\) denotes the gamma function. The above formulations indicate the relationship with the exponential family of densities, nested as limiting cases in the general model used here. In principle, it is also possible to consider the limiting behavior for \(\psi _{{{\mathrm{u}}}}\) or \(\psi _{{{\mathrm{v}}}}\), though these cases are less interesting from the empirical point of view given the usual interpretation in SFA; so this option is not considered here (e.g., \(\psi _{{{\mathrm{v}}}} \to \infty\) implies convergence towards uniform distribution on an interval). Note that each of the abovementioned parameters governs a specific characteristic of the compound error (i.e., different combinations of the parameters’ values do not lead to similar shapes of \(p_{\upvarepsilon}\left( . \right)\)), and thus under assumptions A1-A3 the GT-GB2 SF model is identified.
We emphasize the following special cases:
-
1.
As \(\psi _{{{\mathrm{v}}}} = 2\), the GT distribution (3) reduces to the Student’s t distribution:
$$f_{ST}\left( {z;\sigma ,\nu } \right) = \frac{1}{\sigma }\frac{2}{{{\rm B}\left( {1{{{\mathrm{/}}}}2,\nu {{{\mathrm{/}}}}2} \right)\nu ^{1{{{\mathrm{/}}}}2}}}\left[ {\frac{1}{\nu }\left( {\frac{z}{\sigma }} \right)^2 + 1} \right]^{ - \left( {1 + \nu } \right)/2},$$
(7)
with σ equivalent to σv and \(\psi\) equivalent to \(\psi _{{{\mathrm{v}}}}\) in (3); consequently, the GB2 distribution in (4) with τ = 1 and \(\psi _{{{\mathrm{u}}}} = 2\) reduces to the half-Student’s t case.
-
2.
As \(\nu _{{{\mathrm{v}}}} \to \infty\), the GT distribution (3) reduces to the Generalized Error Distribution (GED):
$$f_{GED}\left( {z;\sigma ,\psi } \right) = \frac{1}{\sigma }\frac{\psi }{{2{{\Gamma }}\left( {1/\psi } \right)\psi ^{1/\psi }}}\exp \left[ { - \frac{1}{\psi }\left( {\frac{{\left| z \right|}}{\sigma }} \right)^\psi } \right],$$
(8)
consequently, the GB2 distribution with \(\tau\)=1, \(\nu _{{{\mathrm{u}}}} \to \infty\) reduces to half-GED.
-
3.
The GB2 distribution (4) reduces to the generalized gamma distribution (GG) as \(\nu _{{{\mathrm{u}}}} \to \infty\):
$$f_{GG}\left( {z;\sigma ,\tau ,\psi } \right) = \frac{1}{\sigma }\frac{\psi }{{{{\Gamma }}\left( {\tau /\psi } \right)\psi ^{\tau /\psi }}}\left( {\frac{z}{\sigma }} \right)^{\tau - 1}\exp \left[ { - \frac{1}{\psi }\left( {\frac{z}{\sigma }} \right)^\psi } \right].$$
(9)
The original parameterization of the generalized gamma distribution, according to Stacy (1962, Eq. 1), is \(f_{GG}\left( {z;a,d,p} \right) = \frac{p}{{{{\Gamma }}\left( {d/p} \right)a^d}}z^{d - 1}\exp \left[ { - \left( {\frac{z}{a}} \right)^p} \right]\). The relationship between the original parameterization and the one used in (9) is as follows: \(d = \tau\), \(p = \psi\), \(a = \sigma \psi ^{1/\psi }\). The Gaussian case can be obtained as a conjunction of the Student’s t and GED sub-cases since it requires \(\nu _{{{\mathrm{v}}}} \to \infty\), and \(\psi _{{{\mathrm{v}}}} = 2\) (analogously, GB2 becomes half-normal with τ = 1, \(\nu _{{{\mathrm{u}}}} \to \infty\), and \(\psi _{{{\mathrm{u}}}} = 2\)). Moreover, setting \(\psi _{{{\mathrm{v}}}} = 1\) instead of \(\psi _{{{\mathrm{v}}}} = 2\) leads to the Laplace distribution or the exponential case for GB2 with \(\psi _{{{\mathrm{u}}}} = \tau = 1\), and \(\nu _{{{\mathrm{u}}}} \to \infty\). The general SF model considered here is labeled GT-GB2, while its special case, which assumes τ = 1, is labeled GT-HGT (as the GB2 distribution is reduced into the half-GT case). Other special cases, with literature references, are listed in Table 1. Also, the density of \({\upvarepsilon}_t\) implied by (2), (3), and (4) in GT-GB2 is in general asymmetric and includes skew-t and skew-normal distributions as special cases.
Table 1 Selected models nested in GT-GB2 SF specification Note that in the empirical analysis, we make an additional assumption: (A4) ut’s follow a non-increasing pdf, and impose resulting restrictions on parameters of the GT-GB2 model, excluding τ > 1. We are aware that there is considerable SFA literature about non-monotonic distributions of ut’s, which would indicate their usefulness (see, e.g., Stevenson 1980; Greene 1990; van den Broeck et al. 1994; Griffin and Steel 2004, 2007, 2008). Such applications, however, have assumed restrictive distributional forms of vt’s, which could be the reason why non-monotonic distributions of ut’s have been found relevant in the first place (e.g., due to outliers; see Stead et al. 2018; for a discussion about outliers detection and treatment). In our view, a generalization of distributional assumptions about vt is sufficient to maintain similar statistical fit, despite the use of strictly non-increasing distribution of ut. Thus, A4 is an additional assumption (or restriction) we find plausible, which may also help with identification in SF models in general, but not a requirement in the GT-GB2 model. We think that A4 is a reasonable assumption for typical SF models with a single source of inefficiency. However, if analysis of the economic process at hand suggests the existence of many sources of inefficiencies, e.g., as in Oh and Shin (2021), the resulting compound error representing total inefficiency (as a sum of source-specific inefficiencies) might indeed violate A4 and follow, e.g., a unimodal distribution with nonzero mode. Nonetheless, as \(p_{{{\mathrm{u}}}}\left( . \right)\) gets closer to that of a symmetric distribution, the decomposition of errors in SFA becomes more problematic. This issue is similar in nature to what is witnessed in normal-truncated-normal and normal-gamma SF models, in which different parametric configurations of ut and vt distributions can lead to a practically identical distribution of εt, especially when the mode of u is above zero (Ritter and Simar 1997). Hence, though the GT-GB2 model is identified also without A4 (allowing τ > 1) in our view, A4 is economically reasonable in standard cases and it results in more regular shapes of posterior or likelihood.
Finally, we note that some of the forms listed in Table 1 are obtained not through parametric restrictions but as limiting cases. Although this may preclude simple hypothesis testing in such cases, model comparison is still very much feasible, e.g., (i) through likelihood ratios in case of ML-based estimation or (ii) using Bayes Factors and posterior probabilities in case of Bayesian inference. Both approaches are discussed in the next section.
Possible model extensions: varying efficiency distribution
The basic SFA formulation, labelled Common Efficiency Distribution (CED-SFA), does not allow for heterogeneity (across observations) in the inefficiency process. This assumption is relaxed within the Varying Efficiency Distribution model class (VED-SFA; see, e.g., Koop et al. 1997). VED-SF models allow exogenous variables to influence the inefficiency process. They are characterized by the fact that the inefficiency distribution depends on certain covariates (hence, u’s are independent but no longer identically distributed). Within the GT-GB2 framework a natural option is to replace σu in (4) with:
$$\sigma _{{{{\mathrm{u}}}},t} = {{{\mathrm{exp}}}}\left( {\gamma + \delta {{{\boldsymbol{w}}}}_t} \right),$$
(10)
where wt represent a vector of exogenous covariates driving differences across objects with respect to inefficiency distribution, γ and δ are model parameters replacing σu. Importantly, such a general formulation carries over to VED-type versions of all nested special cases listed in Table 1, which results in a whole new class of coherent VED-SFA formulations.
In principle, it is possible to extend the idea to consider individual effects in shape parameters as well. However, it is not obvious whether such formulation would turn out to be empirically relevant. From empirical viewpoint, an important distinction would be between covariate-driven σu,t (individual effects in inefficiency terms) and covariate-driven σv,t (heteroskedasticity in observation error). Such formulations are fully feasible extensions to the GT-GB2 framework presented here.
Possible model extensions: panel data modelling
There is a number ways to account for panel data structure in the proposed framework. First, we can consider u’s as time-invariant effects (i.e., object-specific). This is the most traditional setting in which inefficiency is to capture persistent effects that differentiate objects’ performances. In other words, all time-invariant differences in performances are attributed to inefficiency and all transient effects are treated as part of random disturbance (vit). The convolution of densities for vit and ui is
$$\begin{array}{l}f\left( \varepsilon \right) = \mathop {\prod }\limits_{i = 1}^n \mathop {\prod }\limits_{t = 1}^T p_\varepsilon \left( {y_{it} - \ln g\left( {x_{it};\beta } \right);\theta ^{\left( {\varepsilon _{it}} \right)}} \right)\\\qquad\, =\, \mathop {\prod }\limits_{i = 1}^n \left\{ {\mathop {\int}\limits_{R + } {\left\{ {\mathop {\prod }\limits_{t = 1}^T p_{{{\mathrm{v}}}}\left( {y_{it} - \ln g\left( {x_{it};\beta } \right) + \omega {{{\mathrm{u}}}}_i} \right)} \right\}} p_{{{\mathrm{u}}}}\left( {{{{\mathrm{u}}}}_i;\theta ^{\left( {{{{\mathrm{u}}}}_i} \right)}} \right)d{{{\mathrm{u}}}}} \right\}.\end{array}$$
(11)
where i is the object index \((i = 1, \ldots ,n)\) and t is the time index \((t = 1, \ldots ,T)\). This results in a likelihood function similar to that in Eqs. (2)–(4). The change is that now we evaluate only n integrals. On the other hand, however, each integral needs to be calculated over a product of T densities of pv. This is more challenging in terms of fine-tuning the integration procedure (specifying relevant waypoints etc.) and becomes more complex as T increases. We reckon, however, that this is still not as time consuming as having to evaluate nT integrals in the baseline (pooled) model. So the net effect is likely an increase in computational speed.
Another popular way is to add another latent variable (αi) to represent an object-specific effect much like in the True Random Effects SF model (Greene 2004). In this setting inefficiency is to capture transient effects that differentiate objects’ performances over time, while the random effect αi takes on the persistent part. Thus, time-invariant differences are attributed to heterogeneity of the frontier \(y_{it} = \ln g\left( {{{{\boldsymbol{x}}}}_{{{{\boldsymbol{it}}}}};\beta } \right) + {{{\mathrm{v}}}}_{it} - \omega {{{\mathrm{u}}}}_{it} + \alpha _i.\) Assuming that αi and \(\sigma _\alpha ^2\) have the traditional natural-conjugate normal-inverted gamma priors we can employ hybrid MCMC procedures which involve “wrapping” traditional data augmentation techniques (e.g., Gibbs sampling) around the procedure discussed in Section “Inference: likelihood-based methods for the GT-GB2 model”.
The third option - probably the easiest to implement - is to follow approach based on True Fixed Effects SF model (Greene 2004). From a Bayesian perspective modelling fixed effects amounts to having each prior on αi with a different scale parameter (e.g., \(\sigma _{\alpha _i}\)). Assuming that \(\alpha _i\sim N\left( {0,\sigma _{\alpha _i}^2} \right)\) we can simplify the problem by considering the intercept, e.g., \(\beta _0\sim N( {0,\sigma _{\beta _0}^2} )\), and the fixed effect jointly as \(\delta _i = \beta _0 + \alpha _i\). Hence, in practice this approach amounts to estimating an object-specific intercept δi of the frontier under the following prior: \(\delta _i\sim N(0,\sigma _{\alpha _i}^2 + \sigma _{\beta _0}^2 + 2\rho _i\sigma _{\alpha _i}\sigma _{\beta _0})\), where ρi it the correlation between αi and β0. If we assume independence between αi and β0 the prior on δi simplifies to \(N(0,\sigma _{\alpha _i}^2 + \sigma _{\beta _0}^2)\).
There are many other panel data modelling techniques proposed within the SFA. However, the abovementioned ones can be relatively easily incorporated within the proposed framework.