Skip to main content
Log in

Proportional incremental cost probability functions and their frontiers

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

The econometric analysis of cost functions is based on the analysis of the conditional distribution of the cost Y given the level of the outputs \(X\in {\mathbb {R}}_+^p\) and given a set of environmental variables \(Z\in {\mathbb {R}}^d\). The model basically describes the conditional distribution of Y given \(X\ge x\) and \(Z=z\). In many applications, the dimension of Z is naturally large and a fully nonparametric specification of the model is limited by the curse of the dimensionality. Most of the approaches so far are based on two-stage estimations when the frontier level does not depend on the value of Z. But even in the case of separability of the frontier, the estimation procedure suffers from several problems, mainly due to the inherent bias of the estimated efficiency scores and the poor rates of convergence of the frontier estimates. In this paper we suggest an alternative semi-parametric model which avoids the drawbacks of the two-stage methods. It is based on a class of model called the Proportional Incremental Cost Functions (PICF), adapted to our setup from the Cox proportional hazard models extensively used in survival analysis for durations models. We define the PICF model, then we examine its properties and propose a semi-parametric estimation. By this way of modeling, we avoid the first stage nonparametric estimation of the frontier and avoid the curse of dimensionality keeping the parametric \(\sqrt{n}\) rates of convergence for the parameters of interest. We are also able to derive \(\sqrt{n}\)-consistent estimator of the conditional order-m robust frontiers (which, by contrast to the full frontier, may depend on Z) and we prove the Gaussian asymptotic properties of the resulting estimators. We illustrate the flexibility and the power of the procedure by some simulated examples and also with some real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Endogeneity means that the parameters of interest are not determined by the conditional distribution but by the joint distribution of Y and some variables. [see Cazals et al. (2016) or Simar et al. (2016)].

  2. Of course, in practice we use individual bandwidths \(h_{j}\) for each components of Z. So, in the notations that follow, \(h^{d}\) has to be understood as \(\prod _{j=1}^{d} h_{j}\). By doing so, and using product kernels, we are able to detect irrelevant components in the conditioning, see, e.g. Hall et al. (2004) and Li et al. (2013) for details.

  3. This explains some abuse of language in this literature, where the partial frontiers are sometimes considered as robust versions of the full frontier. We try to avoid this confusion.

  4. If a is normalized such that for some \((z_0,x_0)\), \(a(z_0, \beta (x_0))=1\) the baseline model represents the cost process for this particular production unit.

  5. In practice, the separability condition is an empirical issue, even if some argue that it may be a reasonable assumption in many situations for economic or technical reasons. In practice this assumption is easy to test, as described in Daraio et al. (2018) and Simar and Wilson (2020). In all the real data examples in Sect. 4.2 below, the test was applied and the separability assumption was not rejected.

  6. We limit our presentation for the case of no ties in the \(Y_{i}\) and no censoring which is mostly the case in our setup of cost efficiency analysis. The marginal likelihood can easily be extended to the case of ties and censored data (only the minimum between Y and some censoring value is observed). See, e.g. Kalbfleisch and Prentice (1980).

  7. In the simple case where \(a(z,\beta (x)) = e^{\beta '(x)z}\), the expression of \(\ell (\beta (x))\) simplifies [see equation (4.6) in Kalbfleisch and Prentice (1980)] and explicit expressions for the gradient and the hessian can be derived.

  8. Similar developments could be done for the conditional order-\(\alpha \) frontiers.

  9. As explained in the Appendix, we use the \({\widetilde{S}}\) notation for survivor functions when we condition to \(X=x\), to distinguish form S where we condition on \(X\ge x\).

  10. Since \(Q^{-1}(y,x,z)\) is specified, the value of y corresponding to a quantile \(u\in [0,1]\) is given by \(y=Q(u,x,z)\) and can be found numerically by solving \(y=\arg \min _{y} | Q^{-1}(y,x,z) - u|\), which is easy since \(Q^{-1}\) is monotone in y.

References

  • Aragon Y, Daouia A, Thomas-Agnan C (2005) Nonparametric frontier estimation: a conditional quantile-based approach. Econ Theory 21:358–389

    Article  Google Scholar 

  • Bădin L, Daraio C, Simar L (2012) How to measure the impact of environmental factors in a nonparametric production model. Eur J Oper Res 223:818–833

    Article  Google Scholar 

  • Cazals C, Florens JP, Simar L (2002) Nonparametric frontier estimation: a robust approach. J. Econom. 106(1):25

    Article  Google Scholar 

  • Cazals C, Fève F, Florens JP, Simar L (2016) Nonparametric instrumental variables estimation for efficiency frontier. J Econom 190:349–359

    Article  Google Scholar 

  • Charnes A, Cooper WW, Rhodes E (1981) Evaluating program and managerial efficiency: an application of data envelopment analysis to program follow through. Manag Sci 27:668–697

    Article  Google Scholar 

  • Cox DR (1972) Regression models and life tables. JRSS B34:187–220

    Google Scholar 

  • Daouia A, Gijbels I (2011) Robustness and inference in nonparametric partial frontier modeling. J. Econom. 161:147–165

    Article  Google Scholar 

  • Daraio C, Simar L (2005) Introducing environmental variables in nonparametric frontier models: a probabilistic approach. J. Prod. Anal. 24(1):93–121

    Article  Google Scholar 

  • Daouia A, Simar L (2007a) Nonparametric efficiency analysis: a multivariate conditional quantile approach. J. Econom. 140:375–400

  • Daouia A, Simar L (2007b) Advanced robust and nonparametric methods in efficiency analysis: methodology and applications. Springer, New-York

  • Daouia A, Florens JP, Simar L (2010) Frontier estimation and extreme values theory. Bernoulli 16(4):1039–1063

    Article  Google Scholar 

  • Daouia A, Florens JP, Simar L (2012) Regularization of non-parametric frontier estimators. J. Econom. 168:285–299

    Article  Google Scholar 

  • Daraio C, Simar L, Wilson PW (2018) Central limit theorems for conditional efficiency measures and tests of the “Separability’’ condition in nonparametric, two-stage models of production. Econ J 21:170–191

    Google Scholar 

  • Florens JP, Simar L, Van Keilegom I (2014) Frontier estimation in nonparametric location-scale models. J Econom 178:456–470

    Article  Google Scholar 

  • Grambsch PM, Therneau TM (1994) Proportional Hazards tests and diagnostics based on weighted residuals. Biometrika 81(3):515–526

    Article  Google Scholar 

  • Hall P, Racine JS, Li Q (2004) Cross-validation and the estimation of conditional probability densities. J Am Stat Assoc 99(468):1015–1026

    Article  Google Scholar 

  • Härdle WK, Simar L (2019) Applied multivariate statistical analysis, 5th edn. Springer, Switzerland

    Book  Google Scholar 

  • Jeong SO, Park BU, Simar L (2010) Nonparametric conditional efficiency measures: asymptotic properties. Ann Oper Res 173:105–122

    Article  Google Scholar 

  • Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York

    Google Scholar 

  • Kneip A, Simar L, Wilson PW (2015) When bias kills the variance: central limit theorems for DEA and FDH efficiency scores. Econom Theory 31:394–422

    Article  Google Scholar 

  • Li Q, Lin J, Racine JS (2013) Optimal bandwidth selection for nonparametric conditional distribution and quantile functions. J Bus Econ Stat 31(1):57–65

    Article  Google Scholar 

  • Mammen E (1992) When does bootstrap work? Asymptotic results and simulations. Springer, Berlin

    Book  Google Scholar 

  • Park B, Simar L, Weiner Ch (2000) The FDH estimator for productivity efficiency scores: asymptotic properties. Econom Theory 16:855–877

    Article  Google Scholar 

  • Simar L (2003) Detecting outliers in frontiers models: a simple approach. J Prod Anal 20:391–424

    Article  Google Scholar 

  • Simar L, Wilson PW (2007) Estimation and inference in two-stage, semi-parametric models of production processes. J Econom 136(1):31–64

    Article  Google Scholar 

  • Simar L, Wilson PW (2011) Two-stage DEA: caveat emptor. J Prod Anal 36:205–218

    Article  Google Scholar 

  • Simar L, Wilson PW (2020) Hypothesis testing in nonparametric models of production using multiple sample splits. J Prod Anal 53:287–303

    Article  Google Scholar 

  • Simar L, Vanhems A, Van Keilegom I (2016) Unobserved heterogeneity and endogeneity in nonparametric frontier estimation. J Econom 190:360–373

    Article  Google Scholar 

  • Tibshirani R (1997) The Lasso method for variable selection in the Cox model. Stat Med 16:385–395

    Article  Google Scholar 

  • Tsiatis AA (1981) A large sample study of Cox’s regression model. Ann Stat 9(1):93–108

    Article  Google Scholar 

  • Wilson PW (1993) Detecting outliers in deterministic nonparametric frontier models with multiple outputs. J Bus Econ Stat 11:319–323

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Léopold Simar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

F. Fève and J.P. Florens acknowledge funding from the French National Research Agency (ANR) under the Investments for the Future (Investissement d’Avenir), Grant ANR-17-EURE-0010.

A Appendix: Simulation of data

A Appendix: Simulation of data

Simulating a data set \(\{(X_{i}, Y_{i},Z_{i})\}_{i=1}^{n}\) should be done with care. Usually researchers specify a model for the frontier function \(\varphi _{0}(x)\) then select a model to simulate values of \(X_{i}\) and \(Z_{i}\) and finally generate \(Y_{i}\) for \(X=X_{i}\) and \(Z=Z_{i}\). Here we have to generate the sample according to our PICF model which specifies that the survival function \(S(y | X\ge x, Z=z) = \left[ S_{0}(y|X\ge x)\right] ^{a(z,\beta (x))}\) for some basic survival function \(S_{0}(y|X\ge x)\) and some given functions \(a(\cdot ,\cdot )\) and \(\beta (\cdot )\). So we need to recover from our model the conditional distribution of Y given \(X=x\) and \(Z=z\) derived from the PICF. We will denote by \({\widetilde{S}}(y |X=x, Z=z)\) this conditional survival function where we use the \({\widetilde{S}}\) notation when we condition on \(X=x\), to distinguish from \(S(y| X\ge x, Z=z)\) defined above, where we condition on \(X\ge x\). Consider for instance the corresponding quantile function

$$\begin{aligned} y = Q(u,x,z) = {\widetilde{S}}^{-1}(y |X=x,Z=z), \end{aligned}$$
(A.1)

where Q is monotone decreasing with u. We know that \(U=Q^{-1}(Y,x,z)\) is uniform on [0, 1] and independent of X and Z, so an easy way to simulate Y given \(X=x\) and \(Z=z\), is to generate \(U_{i}\) as uniform on [0, 1] and then define \(Y_{i}=Q(U_{i},X_{i},Z_{i})\).

The general form of Q(uxz) can be obtained as follows in order to satisfy the PICF model. Some simple algebra leads to the equation

$$\begin{aligned} \text {Prob}(Y \ge y \mid X\ge x, Z=z) =\frac{\int _{x}^{\infty } Q^{-1}(y,t,z) f_{X}(t | z) \textrm{d}t}{S_{X}(x|z)}. \end{aligned}$$
(A.2)

So, for the PICF model, the function Q must satisfy

$$\begin{aligned} \int _{x}^{\infty } Q^{-1}(y,t,z) f_{X}(t | z) \textrm{d}t = S_{X}(x|z) \left[ S_{0}(y|X\ge x)\right] ^{a(z,\beta (x))}. \end{aligned}$$
(A.3)

Taking the derivative with respect to x (with some abuse of notations below, for \(x\in {\mathbb {R}}^{p}\) the derivative \(\partial _{x}^{p}\) has to be understood as \(\partial ^{p}/(\partial x_{1}\ldots \partial x_{p})\)) and equating both sides we obtain

$$\begin{aligned} -Q^{-1}(y,x,z) f_{X}(x| z)= & {} - f_{X}(x| z) \left[ S_{0}(y|X\ge x)\right] ^{a(z,\beta (x))} \nonumber \\{} & {} + S_{X}(x|z) \partial _{x}^{p} \left\{ \big [S_{0}(y|X\ge x)\big ]^{a(z,\beta (x)} \right\} . \end{aligned}$$
(A.4)

After some tedious but simple mathematical developments, this leads to the equation

$$\begin{aligned} Q^{-1}(y,x,z)&= {\widetilde{S}}(y|X=x,Z=z) \nonumber \\&=\left[ S_{0}(y|X\ge x)\right] ^{a(z,\beta (x))} - {\frac{S_{X}(x| z)}{f_{X}(x| z)}\partial _{x}^{p} \left\{ \big [S_{0}(y|X\ge x)\big ]^{a(z,\beta (x)} \right\} }, \end{aligned}$$
(A.5)

which allows to define (at least numerically) its reciprocal Q(uxz) for any (uxz). The expression is greatly simplified if we introduce additional assumption in the model we want to simulate.

Indeed, if we assume that the joint conditional survival function satisfies the Cox model, i.e. \(S_{XY}(x,y | z)= \left[ S_{0}(x,y)\right] ^{a(z,\beta (x))}\) where \(S_{0}(x,y) = S_{0}(y|X\ge x) S_{0}(x)\), we have

$$\begin{aligned} S_{0}(y|X\ge x) = \frac{\int _{x}^{\infty } {\widetilde{S}}_{0}(y|t) f_{0}(t) \textrm{d}t}{S_{0}(x)}, \end{aligned}$$
(A.6)

where again \({\widetilde{S}}_{0}(y|x)\) is \({\widetilde{S}}_{0}(y|X=x)\), the correspondent of the baseline survivor \(S_{0}(y|X\ge x)\) when conditioning on \(X=x\). We also have \(S_{X}(x|z)= (S_{0}(x))^{a(z,\beta )}\). Therefore, Eq. (A.3) simplifies into

$$\begin{aligned} \int _{x}^{\infty } Q^{-1}(y,t,z) f_{X}(t | z) \textrm{d}t = \left[ \int _{x}^{\infty } {\widetilde{S}}_{0}(y|t) f_{0}(t) \textrm{d}t \right] ^{a(z,\beta (x))}. \end{aligned}$$
(A.7)

In addition if we assume that \(\beta (x)=\beta \), the derivative of both sides of (A.7) with respect to x simplifies. Note also that \(f_{X}(x| z)= a(z,\beta ) f_{0}(x) (S_{0}(x))^{a(z,\beta )-1}\). After some simplifications this leads to the equation

$$\begin{aligned} Q^{-1}(y,x,z) =&{\widetilde{S}}(y|X=x,Z=z) \nonumber \\ =&\left[ S_{0}(y|X\ge x)\right] ^{a(z,\beta )-1} \widetilde{S}_{0}(y|X=x). \end{aligned}$$
(A.8)

So given the function \(a(z,\beta )\), the survival \(\widetilde{S}_{0}(y|X=x)\) and the baseline density of X, \(f_{0}(x)\), we can compute \(S_{0}(y|X\ge x)\) by (A.6), and then the conditional survival \({\widetilde{S}}(y|X=x,Z=z)\). By inverting (A.8), we have the quantile function \(y=Q(u,x,z)\) for any u (at least numerically) and then we can simulate a value \(Y_{i}\), for a given \((X_{i},Z_{i})\) according to the PICF model.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fève, F., Florens, JP. & Simar, L. Proportional incremental cost probability functions and their frontiers. Empir Econ 64, 2721–2756 (2023). https://doi.org/10.1007/s00181-023-02386-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-023-02386-x

Keywords

JEL Classification

Navigation