Skip to main content
Log in

Two-stage DEA: caveat emptor

  • Published:
Journal of Productivity Analysis Aims and scope Submit manuscript


This paper examines the wide-spread practice where data envelopment analysis (DEA) efficiency estimates are regressed on some environmental variables in a second-stage analysis. In the literature, only two statistical models have been proposed in which second-stage regressions are well-defined and meaningful. In the model considered by Simar and Wilson (J Prod Anal 13:49–78, 2007), truncated regression provides consistent estimation in the second stage, where as in the model proposed by Banker and Natarajan (Oper Res 56: 48–58, 2008a), ordinary least squares (OLS) provides consistent estimation. This paper examines, compares, and contrasts the very different assumptions underlying these two models, and makes clear that second-stage OLS estimation is consistent only under very peculiar and unusual assumptions on the data-generating process that limit its applicability. In addition, we show that in either case, bootstrap methods provide the only feasible means for inference in the second stage. We also comment on ad hoc specifications of second-stage regression equations that ignore the part of the data-generating process that yields data used to obtain the initial DEA estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  1. One could perhaps assume that the joint density of input-output vectors includes a probability mass along the frontier, but given the bias of the DEA frontier estimator and the resulting mass of observations for which the corresponding DEA efficiency estimate will equal unity, it is difficult to imagine how such a model could be identified from the model in Kneip et al. (2008). In addition, the properties of DEA estimators in such a model are unknown.

  2. In the model considered by SW, inefficiency explicitly depends on the environmental variables which may account for heteroskedasticity in the inefficiency process. SW did not consider heteroskedasticity in the error term of the second stage regression, but this could be modeled using standard techniques; i.e., \(\sigma_{\varepsilon}^2\) appearing in Assumption A3 of SW could be parameterized in terms of additional covariates. See also Park et al. (2008).

  3. The Meghalaya plateau in northeastern India is considered to be one of the rainiest places on earth (Murata et al. 2007).

  4. On p. 50, in the fourth through seventh lines after equation no. 2), it is stated that

    The contextual variables are measured such that the weights \(\beta_s,s=1,\ldots,\,S,\) are all nonnegative—i.e., the higher the value of the contextual variables, the higher is the inefficiency of the DMU.

    This is false due to the structure in (8) and the independence of Z and U.

  5. BN write (18) as \(\log\widehat{\widetilde{\theta}}=\widetilde{\beta}_0-\user2{Z}\widetilde{\varvec{\beta}}+\widetilde{\delta}\) in their equation (11), but substitution of the right-hand side of (17) for \(\widetilde{\theta}\) on the left-hand side of (16) does not change the parameters on the right-hand side of (16). Equation (17) appears as equation (A3) in BN2, where it is noted that η ≥ 0.

  6. In addition, if V M is not constant, it is equally unclear what is estimated in the first stage.

  7. Erhemjamts and Leverty (2010) is not alone in taking statements in BN uncritically and without question. Both McDonald, (2009, p. 797) and Ramalho et al. (2010, Sect. 2, eighth paragraph) state that the DGP proposed by BN is less restrictive than that considered by SW, without mentioning the various restrictions required by the BN model. This issue is revisited below in Sect. 5

  8. In the statement of their Proposition 1, BN correctly define Q as Plim(n −1 ZZ), but in equation (A4) of the proof appearing in BN2, Q is implicitly defined as n −1 ZZ. We use the definition Q = Plim(n −1 ZZ) in all that follows.

  9. In their proof appearing in BN2, BN ignore the role of the intercept β0. Consequently, their expression for the variance of their OLS estimator would be wrong even if the rest of their derivations were correct, which they are not. In addition, in their Monte Carlo experiments, BN considered only the case where p = q = 1 with VRS, and consequently did not notice the errors in their proof of their Proposition 1.

  10. Most, if not all, of the papers that have used OLS to regress DEA efficiency scores on environmental variables while citing BN for justification have numbers of dimensions greater than three in their first-stage estimation. To give just a few examples, Cummins et al. (2010) use p + q = 8 or 9; Banker et al. (2010a) use p + q = 6; Banker et al. (2010b) use p + q = 5. Each of these rely on the usual OLS standard error estimate to make inference in the second-stage regressions, and consequently the inference in these papers is invalid.


  • Aly HY, Grabowski CPRG, Rangan N (1990) Technical, scale, and allocative efficiencies in US banking: an empirical investigation. Rev Econ Stat 72:211–218

    Article  Google Scholar 

  • Banker RD, Cao Z, Menon N, Natarajan R (2010a) Technological progress and productivity growth in the US mobile telecommunications industry. Ann Oper Res 173:77–87

    Article  Google Scholar 

  • Banker RD, Lee SY, Potter G, Srinivasan D (2010b) The impact of supervisory monitoring on high-end retail sales productivity. Ann Oper Res 173:25–37

    Article  Google Scholar 

  • Banker RD, Morey RC (1986) Efficiency analysis for exogenously fixed inputs and outputs. Oper Res 34:513–521

    Article  Google Scholar 

  • Banker RD, Natarajan R (2008a) Evaluating contextual variables affecting productivity using data envelopment analysis. Oper Res 56:48–58

    Article  Google Scholar 

  • Banker RD, Natarajan R (2008b) Online companion for “evaluating contextual variables affecting productivity using data envelopment analysis”—appendix: proofs of consistency of the second stage estimation. Oper Res online supplement, 1–6. Available at

  • Barkhi R, Kao YC (2010) Evaluating decision making performance in the GDSS environment using data envelopment analysis. Decis Support Syst 49:162–174

    Article  Google Scholar 

  • Bădin L, Daraio C, Simar L (2010) Optimal bandwidth selection for conditional efficiency measures: a data-driven approach. Eur J Oper Res 201:633–664

    Article  Google Scholar 

  • Chang H, Chang WJ, Das S, Li SH (2004) Health care regulation and the operating efficiency of hospitals: evidence from taiwan. J Account Public Policy 23:483–510

    Article  Google Scholar 

  • Chang H, Choy JL, Cooper WW, Lin MH (2008) The sarbanes-oxley act and the production efficiency of public accounting firms in suppying accounting auditing and consulting services: an application of data envelopment analysis. Int J Serv Sci 1:3–20

    Google Scholar 

  • Cummins JD, Weiss MA, Xie X, Zi H (2010) Economies of scope in financial services: a DEA efficiency analysis of the US insurance industry. J Banking Finance 34:1525–1539

    Article  Google Scholar 

  • Daraio C, Simar L (2005) Introducing environmental variables in nonparametric frontier models: a probabilistic approach. J Prod Anal 24:93–121

    Article  Google Scholar 

  • Daraio C, Simar L (2006) A robust nonparametric approach to evaluate and explain the performance of mutual funds. Eur J Oper Res 175:516–542

    Google Scholar 

  • Daraio C, Simar L, Wilson PW (2010) Testing whether two-stage estimation is meaningful in non-parametric models of production. Discussion paper #1031. Institut de Statistique, Université Catholique de Louvain, Louvain-la-Neuve, Belgium

  • Davutyan N, Demir M, Polat S (2010) Assessing the efficiency of turkish secondary education: heterogeneity, centralization, and scale diseconomies. Socio-Econ Plan Sci 44:3–44

    Article  Google Scholar 

  • Erhemjamts O, Leverty JT (2010) The demise of the mutual organizational from: An investigation of the life insurance industry. J Money Credit Banking 42:1011–1036

    Article  Google Scholar 

  • Farrell MJ (1957) The measurement of productive efficiency. J Royal Stat Soc A 120:253–281

    Article  Google Scholar 

  • Gstach D (1998) Another approach to data envelopment analysis in noisy environements. J Prod Anal 9:161–176

    Article  Google Scholar 

  • Hoff A (2007) Second stage dea: comparison of approaches for modelling the dea score. Eur J Oper Res 181:425–435

    Article  Google Scholar 

  • Jeong SO, Park BU, Simar L (2010) Nonparametric conditional efficiency measures: asymptotic properties. Ann Oper Res 173:105–122

    Article  Google Scholar 

  • Jondrow J, Lovell CAK, Materov IS, Schmidt P (1982) On the estimation of technical inefficiency in the stochastic frontier production model. J Econ 19:233–238

    Google Scholar 

  • Kneip A, Park B, Simar L (1998) A note on the convergence of nonparametric DEA efficiency measures. Econ Theory 14:783–793

    Article  Google Scholar 

  • Kneip A, Simar L, Wilson PW (2008) Asymptotics and consistent bootstraps for DEA estimators in non-parametric frontier models. Econ Theory 24:1663–1697

    Article  Google Scholar 

  • Kneip A, Simar L, Wilson PW (2011a) Central limit theorems for DEA scores: when bias can kill the variance. Discussion paper, Institut de Statistique Biostatistique et Sciences Actuarielles, Université Catholique de Louvain, Louvain-la-Neuve, Belgium

  • Kneip A, Simar L, Wilson PW (2011b) A computationally efficient, consistent bootstrap for inference with non-parametric DEA estimators. Comput Econ. (Forthcoming)

  • Korostelev A, Simar L, Tsybakov AB (1995a) Efficient estimation of monotone boundaries. Ann Stat 23:476–489

    Article  Google Scholar 

  • Korostelev A, Simar L, Tsybakov AB (1995b) On estimation of monotone and convex boundaries. Publications de l’Institut de Statistique de l’Université de Paris XXXIX 1:3–18

    Google Scholar 

  • McDonald J (2009) Using least squares and tobit in second stage dea efficiency analyses. Eur J Oper Res 197:792–798

    Article  Google Scholar 

  • Murata F, Hayashi T, Matsumoto J, Asada H (2007) Rainfall on the Meghalaya plateau in northeastern India—one of the rainiest places in the world. Nat Hazards 42:391–399

    Article  Google Scholar 

  • Park BU, Jeong S-O, Simar L (2010) Asymptotic distribution of conical-hull estimators of directional edges. Ann Stat 38:1320–1340

    Article  Google Scholar 

  • Park BU, Simar L, Zelenyuk V (2008) Local likelihood estimation of truncated regression and its partial derivative: Theory and application. J Econ 146:185–2008

    Google Scholar 

  • Ramalho EA, Ramalho JJS, Henriques PD (2010) Fractional regression models for second stage DEA efficiency analyses. J Prod Anal 34:239–255

    Article  Google Scholar 

  • Shephard RW (1970) Theory of cost and production functions. Princeton, Princeton University Press

    Google Scholar 

  • Simar L, Wilson PW (2000) Statistical inference in nonparametric frontier models: the state of the art. J Prod Anal 13:49–78

    Article  Google Scholar 

  • Simar L, Wilson PW (2007) Estimation and inference in two-stage, semi-parametric models of productive efficiency. J Econ 136:31–64

    Google Scholar 

  • Simar L, Wilson PW (2010) Estimation and inference in cross-sectional, stochastic frontier models. Econ Rev 29:62–98

    Article  Google Scholar 

  • Simar L, Wilson PW (2011) Inference by the m out of n bootstrap in nonparametric frontier models. J Prod Anal. (Forthcoming)

  • Sufian F, Habibullah MS (2009) Asian financial crisis and the evolution of korean banks efficiency: a DEA approach. Glob Econ Rev 38:335–369

    Article  Google Scholar 

Download references


Financial support from the ``Inter-university Attraction Pole'', Phase VI (No. P6/03) from the Belgian Government (Belgian Science Policy) and from l'Institut National de la Recherche Agronomique (INRA) and Le Groupe de Recherche en Economie Mathématique et Quantitative (GREMAQ),Toulouse School of Economics, Toulouse, France are gratefully acknowledged. Part of this research was done while Wilson was a visiting professor at the Institut de Statistique Biostatistique et Sciences Actuarielles, Université Catholique de Louvain, Louvain-la-Neuve, Belgium. We have benefited from discussions with Valentin Zelenyuk; of course, any remaining errors are solely our responsibility.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Paul W. Wilson.

Appendix: OLS estimation in BN’s second stage

Appendix: OLS estimation in BN’s second stage

The first stage estimation in BN’s approach provides an estimator \(\widehat{\widetilde{\theta}}_i\le1\) of \(\widetilde{\theta}_i\) for \(i=1, \ldots,\,n\) where i indexes observations. The properties of DEA estimators have been developed by Korostelev et al. (1995a, b), Kneip et al. (1998), Kneip et al. (2008, 2011b), Park et al. (2010) and Simar and Wilson (2011), and depend on assumptions about returns to scale. In particular, if variable returns to scale (VRS) are assumed, then the DEA estimator converges at rate n 2/(p+q+1), which is slower than the usual parametric rate n 1/2 for p + q > 3. BN ignore this in the proof (appearing in BN2) of their Proposition 1, and this leads to important errors and false statements.

BN suggest re-writing (17) as

$$ \log\widetilde{\theta}=\log\widehat{\widetilde{\theta}}-\eta $$

and using the right-hand side of this to replace \(\log\widetilde{\theta}\) in (16) to obtain (18). Then the error term \(\widetilde{\delta}\) appearing in (18) is equal to δ + η. BN propose estimating (18) by OLS, and claim in their proof of their Proposition 1 that

$$ \sqrt{n}\left(\widehat{\varvec{\beta}}-\varvec{\beta}\right) \xrightarrow{d} N\left({\bf 0},\sigma^2\varvec{Q}^{-1}\right) $$

where Q = Plim(n −1 ZZ).Footnote 8 As shown below, these claims are false.

Recall that η i  ≥ 0 for all \(i=1, \ldots,\,n,\) with i indexing the sample observations. Simar and Wilson (2011) and Kneip et al. (2011a) prove, under mild regularity conditions,

$$ n^\gamma\eta_i \xrightarrow{{\mathcal{L}}} G(\mu_0,\sigma_0^2), $$

where \(G(\cdot)\) is an unknown, non-degenerate distribution with mean μ0 > 0 and variance σ 20  > 0 (both finite and unknown), and γ = 2/(p + q + 1) for the VRS case (or γ = 2/(p + q) for the constant returns to scale (CRS) case). In addition, as shown in Kneip et al. (2008, 2011a, b), the asymptotic covariances between η i and η j is asymptotically non-zero for a number of observations \(j=1,\ldots,\,n, j\ne i,\) which is of order O(n γ). To summarize, as \(n\to\infty,\)

$$ E(\eta_i)\approx n^{-\gamma}\mu_0, $$
$$ \hbox{VAR}(\eta_i)\approx n^{-2\gamma}\sigma_0^2, $$


$$ \hbox{COV}(\eta_i,\eta_j)\approx\left\{\begin{array}{ll} n^{-2\gamma}\alpha&\hbox{for }O(n^\gamma) \hbox{observations} j\ne i;\\ 0&\hbox { for the remaining observations}\end{array}\right\} $$

for some bounded but unknown constant α.

Recall that the error term \(\widetilde{\delta}\) in (18), i.e., the equation that BN estimate by OLS, equals δ + η as shown above. Consequently, the properties of η play an important role in determining the properties of the OLS estimator \(\widehat{\varvec{\beta}}\) of \(\varvec{\beta}.\) Let \({\varvec{\fancyscript{Z}}}\) be an n × (r + 1) matrix with ith row given by \(\left[\begin{array}{ll} 1&-\user2{Z}_i\end{array}\right], \) and let \({\varvec{\fancyscript{Y}}=\left[\begin{array}{lll} \log\widehat{\widetilde\theta}_1&\cdots& \log\widehat{\widetilde\theta}_n\end{array}\right]^{\prime}.}\) In addition, let \(\varvec{\beta}^*=\left[\begin{array}{ll}\beta_0&\varvec{\beta}^{\prime}\end{array}\right]^{\prime}\) and \(\widehat{\varvec{\beta}}^*=\left[\begin{array}{ll}\widehat{\beta}_0&\widehat{\varvec{\beta}}^{\prime}\end{array}\right]^{\prime}. \) Then OLS estimation on (18) yields

$$ \begin{aligned} \widehat{\varvec{\beta}}^* &=\left({\fancyscript{ Z}}^{\prime}\varvec{{\fancyscript{Z}}}\right)^{-1}\varvec{{\fancyscript{Z}}}^{\prime}\varvec{{\fancyscript{Y}}}\\ &=\left(\varvec{{\fancyscript{Z}}}^{\prime}\varvec{{\fancyscript{Z}}}\right)^{-1}\varvec{{\fancyscript{Z}}}^{\prime} \left(\varvec{{\fancyscript{Z}}}\varvec{\beta}^*+\widetilde{\varvec{\delta}}\right) \end{aligned} $$

where \(\widetilde{\varvec{\delta}}= \left[\begin{array}{lll}\widetilde{\delta}_1&\ldots&\widetilde{\delta}_n\end{array}\right]^{\prime}. \) Taking expectations,

$$ \begin{aligned} E\left(\widehat{\varvec{\beta}}^*\mid\varvec{{\fancyscript{Z}}}\right)&= \varvec{\beta}^*+\left(\varvec{{\fancyscript{Z}}}^{\prime}\varvec{{\fancyscript{Z}}}\right)^{-1}\varvec{{\fancyscript{Z}}}^{\prime}E(\widetilde{\varvec{\delta}}\mid\varvec{{\fancyscript{Z}}})\\ &=\varvec{\beta}^*+\left(\varvec{{\fancyscript{Z}}}^{\prime}\varvec{{\fancyscript{Z}}}\right)^{-1}\varvec{{\fancyscript{Z}}}^{\prime}E(\varvec{\delta}\mid\varvec{{\fancyscript{Z}}})+\left(\varvec{{\fancyscript{Z}}}^{\prime}\varvec{{\fancyscript{Z}}}\right)^{-1}\varvec{{\fancyscript{Z}}}^{\prime}E(\varvec{\eta}\mid\varvec{{\fancyscript{Z}}})\\ &=\varvec{\beta}^*+\left(\varvec{{\fancyscript{Z}}}^{\prime}\varvec{{\fancyscript{Z}}}\right)^{-1}\varvec{{\fancyscript{Z}}}^{\prime}E(\varvec{\eta}\mid\varvec{{\fancyscript{Z}}})\\ &\approx\varvec{\beta}^*+n^{-\gamma}c_1 \end{aligned} $$

as \(n\to\infty, \) where c 1 is a non-zero, bounded constant, due to the result in (24) and since (by BN’s assumptions) \({E(\varvec{\eta}\mid\varvec{\fancyscript{Z}})=E(\varvec{\eta}), E(\varvec{\delta}\mid\varvec{\fancyscript{Z}})=0,}\) and where \(\varvec{\delta}=\left[\begin{array}{lll}\delta_1&\ldots&\delta_n\end{array}\right]^{\prime}\) and \(\varvec{\eta}=\left[\begin{array}{lll}\delta_1&\ldots&\delta_n\end{array}\right]^{\prime}.\)

From the last line in (28) it is clear that as \(n\to\infty, \)

$$ \sqrt{n}\left(\widehat{\varvec{\beta}}^*-\varvec{\beta}^*\right)\approx O_p\left(n^{\frac{1}{2}-\gamma}\right), $$

which is rather different from what is claimed in the proof of Proposition 1 of BN (as noted earlier, BN claim that (19) holds). Recall that γ = 2/(p + q + 1) for the VRS case. As shown below, the left-hand side of (29) converges to a non-degenerate random variable with constant variance for p + q ≤ 3, and to a random variable with variance approaching infinity as \(n\to\infty\) for p + q > 3. In the CRS case, γ = 2/(p + q), and hence \(\sqrt{n}\left(\varvec{\beta}^*-\varvec{\beta}^*\right)\) converges to a non-degenerate random variable with constant variance for p + q ≤ 4, and to a random variable with variance approaching infinity as \(n\to\infty\) for p + q > 4. For p + q = 3 in the VRS case and for p + q = 4 in the CRS case, the left-hand side of (29) converges to a random variable with constant variance, but which is not normally distributed as shown below. In their Monte Carlo experiments, BN considered only the case where p = q = 1 with VRS, and consequently did not notice the errors in their proof of their Proposition 1.

Combining the results in (2426), and using standard central-limit theorem arguments (see Kneip et al. 2011a for mathematical details), we have

$$ \sqrt{n}\left(\widehat{\varvec{\beta}}^*-\varvec{\beta}^*\right) \xrightarrow{{\mathcal{L}}} N\left({\bf 0},\sigma^2\varvec{Q}^{-1}\right)+\sqrt{n}\zeta_n, $$

where \(\zeta_n\) is a random variable such that \(\sqrt{n}\zeta_n=o_p(1)\) if γ > 1/2 or \(\sqrt{n}\zeta_n=O_p\left(n^{1/2-\gamma}\right)\) otherwise, and σ2 = VAR(δ) = VAR(V) + VAR(U) (as in BN).Footnote 9 The result in (30) is very different from (22), which is the result claimed at the end of the proof appearing in BN2 of BN’s Proposition 1. Although the OLS estimator \(\widehat{\varvec{\beta}}^*\) of \(\varvec{\beta}^*\) is consistent, (22) cannot be used for valid (asymptotic) inference. Moreover, even if γ = 1/2, (30) contains unknown constants and an unknown, bounded random variable. The left-hand side of (30) does not converge to anything that is bounded if γ < 1/2. Bootstrap methods appear to provide the only feasible avenue toward valid inference or hypothesis testing in the second-stage regression.Footnote 10

The preceding discussion also illustrates how the numerous restrictive assumptions imposed on the BN model are crucial for consistency of OLS estimation in the second-stage regression. For example, if Z and U—which determines inefficiency—are correlated, then the error terms δ and \(\widetilde{\delta}\) must be correlated with Z, in which case OLS estimation in (18) would yield inconsistent estimates. As another example, if V M, the bound on the noise process, is not constant, then OLS estimation may be problematic. If \(V^M=\overline{V^M}+\zeta,\) where \(\overline{V^M}\) is constant and \(\zeta\) is random with \(E(\zeta)=0,\) then β0 can be written as \(\beta_0=E(V-U)-\overline{V^M}, \) but δ would have to be written as \(\delta=V-U-E(V-U)-\zeta.\) If \(E(\zeta)\ne0,\) then OLS estimation of β0 will be biased and inconsistent. Worse, regardless of whether \(E(\zeta)=0,\) if \(\zeta\) is not independent of Z, then OLS estimation in (18) would yield inconsistent estimates of both β0 and \(\varvec{\beta}. \) If the environmental variables are related to the size of firms, and if the error bounds vary with firm size, the Z and \(\zeta\) would clearly be correlated; this is likely to be the case in some applications.

Even more troubling is the assumption that V M is finite, which implies that the noise term V is symmetrically truncated at −V M and V M. Suppose, for example, that VN(0, σ 2 V ), and suppose the researcher has a sample of n iid draws \(\{V_1,\ldots,\,V_n\}\) from the N(0,σ 2 V ) distribution. Of course, one can easily find the sample maximum, and the maximum value in a normal sample of finite size will certainly be less than infinity. But, it is necessarily difficult, and maybe impossible, to test whether the distribution is truncated at a finite value. In situations in econometrics where truncated regression is used, the truncation typically arises from features of the sampling mechanism (e.g., survey design) or model structure (e.g., in SW, truncation arises from the fact that inefficiency has a one-sided distribution; it would make little sense to assume otherwise). Imposing finite bounds on a two-sided noise process, however, is a far more uncertain prospect.

If V M is infinite, then the first-stage estimation using DEA estimators is inconsistent. From (13), it is clear that if V M is infinite, then \(\widetilde{\phi}(X)\) must be infinite. Re-arranging terms in (15) indicates that \(\widetilde{\theta}=Y/\widetilde{\phi}(X)\) for the case of a univariate output considered by BN; hence if V M is infinite, then \(\widetilde{\theta}\) is undefined, in which case BN’s second-stage regression is an ill-posed problem without meaning.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Simar, L., Wilson, P.W. Two-stage DEA: caveat emptor . J Prod Anal 36, 205–218 (2011).

Download citation

  • Published:

  • Issue Date:

  • DOI:


JEL Classification