Skip to main content
Log in

Data envelopment analysis with missing data

  • Theoretical Paper
  • Published:
Journal of the Operational Research Society

Abstract

A first systematic attempt to use data containing missing values in data envelopment analysis (DEA) is presented. It is formally shown that allowing missing values into the data set can only improve estimation of the best-practice frontier. Technically, DEA can automatically exclude the missing data from the analysis if blank data entries are coded by appropriate numerical values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • Charnes A, Cooper WW and Rhodes E (1978). Measuring the efficiency of decision-making units. Eur J Op Res 2: 429–444.

    Article  Google Scholar 

  • Cherchye L and Kuosmanen T (2006). Benchmarking sustainable development: A synthetic meta-index approach. In: McGillivray M and Clarke M (eds). Understanding Human Well-being, Chapter 7. United Nations University Press: Tokyo.

    Google Scholar 

  • Duckworth FC and Lewis AJ (1998). A fair method for resetting the target in interrupted one-day cricket matches. J Opl Res Soc 49: 220–227.

    Article  Google Scholar 

  • Farrell MJ (1957). The measurement of productive efficiency. J R Stat Soc Ser A 120: 253–290.

    Article  Google Scholar 

  • Griliches Z (1986). Economic data issues. In: Griliches Z and Intriligator MD (eds). Handbook of Econometrics, Vol. III, Chapter 25. Elsevier: Amsterdam/New York.

    Google Scholar 

  • Kao C and Liu S-T (2000). Data envelopment analysis with missing data: An application to University libraries in Taiwan. J Opl Res Soc 51: 897–905.

    Article  Google Scholar 

  • Kuosmanen T (2001). DEA with efficiency classification preserving conditional convexity. Eur J Opl Res 132: 326–342.

    Article  Google Scholar 

  • Kuosmanen T, Post GT and Scholtes S (2007). Non-parametric tests of productive efficiency with errors-in-variables. J Econom 136: 131–162.

    Article  Google Scholar 

  • Post GT, Cherchye L and Kuosmanen T (2002). Nonparametric efficiency estimation in stochastic environments. Opns Res 50: 645–655.

    Article  Google Scholar 

  • Simar L and Wilson P (2000). Statistical inference in nonparametric frontier models: The state of the art. J Prod Anal 13: 49–78.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T Kuosmanen.

Appendix. Proofs of theorems

Appendix. Proofs of theorems

Proof of Theorem 1

  • Organised in three parts:

    1. 1)

      T UBT IDEAL: From the perspective of identity (1), the only difference between sets T UB and T IDEAL concerns the constraint for output j. For T IDEAL this constraint reads as

      Since output j is missing for DMU k in T UB, the constraint for output j reads as

      Recall that the true but unknown value Y kj must be non-negative in T IDEAL. Thus, for any given weights λ, the value of admissible y j in T UB must always be less than or equal to the corresponding value of y j in T UB. Since the two sets are otherwise identical, we have proved that T UBT IDEAL.

    2. 2)

      T FIRMkT UB: Set T FIRMk is obtained by setting λ k =0 in (1). Thus, output constraints of T FIRMk are of form

      Comparing constraints (A2) and (A3), we observe that the constraints for output j are in effect identical in both T UB and T FIRMk (the first one assigns the output of DMU k equal to zero, the latter one the weight of DMU k). However, the constraints for outputs sj are different: set T FIRMk imposes constraint of type (A3) for all outputs, while the output constraints of T UB are of form

      For any given weights λ, the constraints of type (A3) imply a maximum value of output s that is less than or equal to the maximum value allowed by constrain (A4). Since the two sets are identical except for the output constraints, the two sets are nested as T FIRMkT UB.

    3. 3)

      T Yj T UB: Set T Yj is obtained by setting Y nj =0 for all DMUs n. Thus, the constraint for output j in T Yj reads as

      Comparing constraints (A2) and (A4), we see that for any given weights λ, the admissible values of output j in set T UB are greater than or equal to the zero value implied by T Yj . Since the two sets are otherwise identical, the two sets are nested as T Yj T UB. □

Proof of Theorem 2

  • Observe that technology T Yj is obtained by imposing constraint u j =0 in (2). To prove the equivalence, we show that using reference technology T UB the DEA problems of Table 2 always yield optimal solutions where output weights u * satisfy u j *=0.

    Consider the input-oriented problem (the left column of Table 2). The objective function reads as

    Since DMU k must have produced a strictly positive amount of at least one output, say output i. Note that the value of sum (A6) will always increase if we increase weight u i and simultaneously decrease weight u j . Thus, the optimal solution of the input efficiency problem will always satisfy u j *=0. The same argument directly applies to the output efficiency problem. □

Proof of Theorem 3

  • In this case the evaluated input–output vector of (ie that of DMU l) is identical for both φ UB l and φ FIRMk l. The only difference concerns the production possibilities frontiers. Thus, the result follows directly from the fact that T FIRMkT UBT IDEAL (see Theorem 1). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuosmanen, T. Data envelopment analysis with missing data. J Oper Res Soc 60, 1767–1774 (2009). https://doi.org/10.1057/jors.2008.132

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/jors.2008.132

Keywords

Navigation