Abstract
The econometric analysis of cost functions is based on the analysis of the conditional distribution of the cost Y given the level of the outputs \(X\in {\mathbb {R}}_+^p\) and given a set of environmental variables \(Z\in {\mathbb {R}}^d\). The model basically describes the conditional distribution of Y given \(X\ge x\) and \(Z=z\). In many applications, the dimension of Z is naturally large and a fully nonparametric specification of the model is limited by the curse of the dimensionality. Most of the approaches so far are based on two-stage estimations when the frontier level does not depend on the value of Z. But even in the case of separability of the frontier, the estimation procedure suffers from several problems, mainly due to the inherent bias of the estimated efficiency scores and the poor rates of convergence of the frontier estimates. In this paper we suggest an alternative semi-parametric model which avoids the drawbacks of the two-stage methods. It is based on a class of model called the Proportional Incremental Cost Functions (PICF), adapted to our setup from the Cox proportional hazard models extensively used in survival analysis for durations models. We define the PICF model, then we examine its properties and propose a semi-parametric estimation. By this way of modeling, we avoid the first stage nonparametric estimation of the frontier and avoid the curse of dimensionality keeping the parametric \(\sqrt{n}\) rates of convergence for the parameters of interest. We are also able to derive \(\sqrt{n}\)-consistent estimator of the conditional order-m robust frontiers (which, by contrast to the full frontier, may depend on Z) and we prove the Gaussian asymptotic properties of the resulting estimators. We illustrate the flexibility and the power of the procedure by some simulated examples and also with some real data sets.
Similar content being viewed by others
Notes
Of course, in practice we use individual bandwidths \(h_{j}\) for each components of Z. So, in the notations that follow, \(h^{d}\) has to be understood as \(\prod _{j=1}^{d} h_{j}\). By doing so, and using product kernels, we are able to detect irrelevant components in the conditioning, see, e.g. Hall et al. (2004) and Li et al. (2013) for details.
This explains some abuse of language in this literature, where the partial frontiers are sometimes considered as robust versions of the full frontier. We try to avoid this confusion.
If a is normalized such that for some \((z_0,x_0)\), \(a(z_0, \beta (x_0))=1\) the baseline model represents the cost process for this particular production unit.
In practice, the separability condition is an empirical issue, even if some argue that it may be a reasonable assumption in many situations for economic or technical reasons. In practice this assumption is easy to test, as described in Daraio et al. (2018) and Simar and Wilson (2020). In all the real data examples in Sect. 4.2 below, the test was applied and the separability assumption was not rejected.
We limit our presentation for the case of no ties in the \(Y_{i}\) and no censoring which is mostly the case in our setup of cost efficiency analysis. The marginal likelihood can easily be extended to the case of ties and censored data (only the minimum between Y and some censoring value is observed). See, e.g. Kalbfleisch and Prentice (1980).
In the simple case where \(a(z,\beta (x)) = e^{\beta '(x)z}\), the expression of \(\ell (\beta (x))\) simplifies [see equation (4.6) in Kalbfleisch and Prentice (1980)] and explicit expressions for the gradient and the hessian can be derived.
Similar developments could be done for the conditional order-\(\alpha \) frontiers.
As explained in the Appendix, we use the \({\widetilde{S}}\) notation for survivor functions when we condition to \(X=x\), to distinguish form S where we condition on \(X\ge x\).
Since \(Q^{-1}(y,x,z)\) is specified, the value of y corresponding to a quantile \(u\in [0,1]\) is given by \(y=Q(u,x,z)\) and can be found numerically by solving \(y=\arg \min _{y} | Q^{-1}(y,x,z) - u|\), which is easy since \(Q^{-1}\) is monotone in y.
References
Aragon Y, Daouia A, Thomas-Agnan C (2005) Nonparametric frontier estimation: a conditional quantile-based approach. Econ Theory 21:358–389
Bădin L, Daraio C, Simar L (2012) How to measure the impact of environmental factors in a nonparametric production model. Eur J Oper Res 223:818–833
Cazals C, Florens JP, Simar L (2002) Nonparametric frontier estimation: a robust approach. J. Econom. 106(1):25
Cazals C, Fève F, Florens JP, Simar L (2016) Nonparametric instrumental variables estimation for efficiency frontier. J Econom 190:349–359
Charnes A, Cooper WW, Rhodes E (1981) Evaluating program and managerial efficiency: an application of data envelopment analysis to program follow through. Manag Sci 27:668–697
Cox DR (1972) Regression models and life tables. JRSS B34:187–220
Daouia A, Gijbels I (2011) Robustness and inference in nonparametric partial frontier modeling. J. Econom. 161:147–165
Daraio C, Simar L (2005) Introducing environmental variables in nonparametric frontier models: a probabilistic approach. J. Prod. Anal. 24(1):93–121
Daouia A, Simar L (2007a) Nonparametric efficiency analysis: a multivariate conditional quantile approach. J. Econom. 140:375–400
Daouia A, Simar L (2007b) Advanced robust and nonparametric methods in efficiency analysis: methodology and applications. Springer, New-York
Daouia A, Florens JP, Simar L (2010) Frontier estimation and extreme values theory. Bernoulli 16(4):1039–1063
Daouia A, Florens JP, Simar L (2012) Regularization of non-parametric frontier estimators. J. Econom. 168:285–299
Daraio C, Simar L, Wilson PW (2018) Central limit theorems for conditional efficiency measures and tests of the “Separability’’ condition in nonparametric, two-stage models of production. Econ J 21:170–191
Florens JP, Simar L, Van Keilegom I (2014) Frontier estimation in nonparametric location-scale models. J Econom 178:456–470
Grambsch PM, Therneau TM (1994) Proportional Hazards tests and diagnostics based on weighted residuals. Biometrika 81(3):515–526
Hall P, Racine JS, Li Q (2004) Cross-validation and the estimation of conditional probability densities. J Am Stat Assoc 99(468):1015–1026
Härdle WK, Simar L (2019) Applied multivariate statistical analysis, 5th edn. Springer, Switzerland
Jeong SO, Park BU, Simar L (2010) Nonparametric conditional efficiency measures: asymptotic properties. Ann Oper Res 173:105–122
Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York
Kneip A, Simar L, Wilson PW (2015) When bias kills the variance: central limit theorems for DEA and FDH efficiency scores. Econom Theory 31:394–422
Li Q, Lin J, Racine JS (2013) Optimal bandwidth selection for nonparametric conditional distribution and quantile functions. J Bus Econ Stat 31(1):57–65
Mammen E (1992) When does bootstrap work? Asymptotic results and simulations. Springer, Berlin
Park B, Simar L, Weiner Ch (2000) The FDH estimator for productivity efficiency scores: asymptotic properties. Econom Theory 16:855–877
Simar L (2003) Detecting outliers in frontiers models: a simple approach. J Prod Anal 20:391–424
Simar L, Wilson PW (2007) Estimation and inference in two-stage, semi-parametric models of production processes. J Econom 136(1):31–64
Simar L, Wilson PW (2011) Two-stage DEA: caveat emptor. J Prod Anal 36:205–218
Simar L, Wilson PW (2020) Hypothesis testing in nonparametric models of production using multiple sample splits. J Prod Anal 53:287–303
Simar L, Vanhems A, Van Keilegom I (2016) Unobserved heterogeneity and endogeneity in nonparametric frontier estimation. J Econom 190:360–373
Tibshirani R (1997) The Lasso method for variable selection in the Cox model. Stat Med 16:385–395
Tsiatis AA (1981) A large sample study of Cox’s regression model. Ann Stat 9(1):93–108
Wilson PW (1993) Detecting outliers in deterministic nonparametric frontier models with multiple outputs. J Bus Econ Stat 11:319–323
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
F. Fève and J.P. Florens acknowledge funding from the French National Research Agency (ANR) under the Investments for the Future (Investissement d’Avenir), Grant ANR-17-EURE-0010.
A Appendix: Simulation of data
A Appendix: Simulation of data
Simulating a data set \(\{(X_{i}, Y_{i},Z_{i})\}_{i=1}^{n}\) should be done with care. Usually researchers specify a model for the frontier function \(\varphi _{0}(x)\) then select a model to simulate values of \(X_{i}\) and \(Z_{i}\) and finally generate \(Y_{i}\) for \(X=X_{i}\) and \(Z=Z_{i}\). Here we have to generate the sample according to our PICF model which specifies that the survival function \(S(y | X\ge x, Z=z) = \left[ S_{0}(y|X\ge x)\right] ^{a(z,\beta (x))}\) for some basic survival function \(S_{0}(y|X\ge x)\) and some given functions \(a(\cdot ,\cdot )\) and \(\beta (\cdot )\). So we need to recover from our model the conditional distribution of Y given \(X=x\) and \(Z=z\) derived from the PICF. We will denote by \({\widetilde{S}}(y |X=x, Z=z)\) this conditional survival function where we use the \({\widetilde{S}}\) notation when we condition on \(X=x\), to distinguish from \(S(y| X\ge x, Z=z)\) defined above, where we condition on \(X\ge x\). Consider for instance the corresponding quantile function
where Q is monotone decreasing with u. We know that \(U=Q^{-1}(Y,x,z)\) is uniform on [0, 1] and independent of X and Z, so an easy way to simulate Y given \(X=x\) and \(Z=z\), is to generate \(U_{i}\) as uniform on [0, 1] and then define \(Y_{i}=Q(U_{i},X_{i},Z_{i})\).
The general form of Q(u, x, z) can be obtained as follows in order to satisfy the PICF model. Some simple algebra leads to the equation
So, for the PICF model, the function Q must satisfy
Taking the derivative with respect to x (with some abuse of notations below, for \(x\in {\mathbb {R}}^{p}\) the derivative \(\partial _{x}^{p}\) has to be understood as \(\partial ^{p}/(\partial x_{1}\ldots \partial x_{p})\)) and equating both sides we obtain
After some tedious but simple mathematical developments, this leads to the equation
which allows to define (at least numerically) its reciprocal Q(u, x, z) for any (u, x, z). The expression is greatly simplified if we introduce additional assumption in the model we want to simulate.
Indeed, if we assume that the joint conditional survival function satisfies the Cox model, i.e. \(S_{XY}(x,y | z)= \left[ S_{0}(x,y)\right] ^{a(z,\beta (x))}\) where \(S_{0}(x,y) = S_{0}(y|X\ge x) S_{0}(x)\), we have
where again \({\widetilde{S}}_{0}(y|x)\) is \({\widetilde{S}}_{0}(y|X=x)\), the correspondent of the baseline survivor \(S_{0}(y|X\ge x)\) when conditioning on \(X=x\). We also have \(S_{X}(x|z)= (S_{0}(x))^{a(z,\beta )}\). Therefore, Eq. (A.3) simplifies into
In addition if we assume that \(\beta (x)=\beta \), the derivative of both sides of (A.7) with respect to x simplifies. Note also that \(f_{X}(x| z)= a(z,\beta ) f_{0}(x) (S_{0}(x))^{a(z,\beta )-1}\). After some simplifications this leads to the equation
So given the function \(a(z,\beta )\), the survival \(\widetilde{S}_{0}(y|X=x)\) and the baseline density of X, \(f_{0}(x)\), we can compute \(S_{0}(y|X\ge x)\) by (A.6), and then the conditional survival \({\widetilde{S}}(y|X=x,Z=z)\). By inverting (A.8), we have the quantile function \(y=Q(u,x,z)\) for any u (at least numerically) and then we can simulate a value \(Y_{i}\), for a given \((X_{i},Z_{i})\) according to the PICF model.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fève, F., Florens, JP. & Simar, L. Proportional incremental cost probability functions and their frontiers. Empir Econ 64, 2721–2756 (2023). https://doi.org/10.1007/s00181-023-02386-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-023-02386-x