Identification in Models with Discrete Variables


This paper provides a novel, simple, and computationally tractable method for determining an identified set that can account for a broad set of economic models when the economic variables are discrete. Using this method, we show using a simple example how imperfect instruments affect the size of the identified set when the assumption of strict exogeneity is relaxed. This knowledge can be of great value, as it is interesting to know the extent to which the exogeneity assumption drives results, given it is often a matter of some controversy. Moreover, the flexibility obtained from our newly proposed method suggests that the determination of the identified set need no longer be application specific, with the analysis presenting a unifying framework that algorithmically approaches the question of identification.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


  1. 1.

    With some modification.

  2. 2.

    Therefore for all open subsets A of , is well defined.

  3. 3.

    This statement is true only under additional assumptions, e.g. rank conditions, support conditions or completeness conditions.

  4. 4.

    This is Definition 1 in Galichon and Henry (2009).

  5. 5.

    The parameter \(\theta \) may consist of two parts, \(\theta = [\theta _1,\theta _2]\), so we can have \(G_{\theta _1}\) and \(\nu _{\theta _2}\).

  6. 6.

    Definition 2 in Galichon and Henry (2009), where the dependence of the identified set \(\Theta _{I}(p)\) on the distribution of observable variables p is made explicit.

  7. 7.

    The dependence of \(c_{ij}\) and \(\nu _j\) on parameter \(\theta \) is omitted for the sake of brevity.

  8. 8.

    We may be willing to make some assumptions about the distribution of variables in the form of moment equality or inequality. It is important to note here that the GH setup can handle moment inequalities \(E(\phi (Y))\le 0\) if \(E(m(U))=0\) is assumed (Ekeland et al. 2010). In this case, the correspondence G is restricted to take a specific form. However, within the GH framework, it is not possible to consider moment inequality and further information given by G.

  9. 9.

    If the observed variable is multidimensional we can stack it into a single vector. Summing across some sets of indices then allows us to formulate a restriction for only one dimension. As an example, suppose that the observed variables are (YXZ); then, we can place a restriction on X only, so that X is independent of U.

  10. 10.

    The manner in which the independency restriction is relaxed is discussed in Sect. 4.

  11. 11.

    It is possible to determine the lower and upper bound of the threshold-crossing function t(X) without making this parametric assumption as in Chesher (2009), but instead assuming the monotonicity of t(X). For the sake of simplicity, we present the parametric example.

  12. 12.

    We could also assume that we observe the probability of YX given Z, such that for the sake of exposition, the probability of (YXZ) is known.

  13. 13.

    In this case, parameter \(\theta \) affects the support restrictions (10) only.

  14. 14.

    Note that even though \(\pi \) is four dimensional, the problem still lies within the linear programming framework, as the elements of \(\pi \) can be stacked to make a vector of size \(n_Y \cdot n_X \cdot n_Z \cdot n_U\).

  15. 15.

    In order to avoid confusion with the probabilities \(p_{ijk}\) of the observed variables, the threshold-crossing function is denoted t(.) unlike in Chesher (2009), who set it as p(.).

  16. 16.

    The meaning with the second-last restriction is omitted: \(\sum _{i,j} \pi _{ijkl} = \sum _{i,j} p_{ijk} \nu _l \ \ \forall k,l\).

  17. 17.

    From Lemma 2, we can see that this interpretation is unaffected by the discretization of the unobserved variables.

  18. 18.

    \(ACE(D \rightarrow Y) = Pr(Y = y_1|D = d_1) - Pr(Y = y_1|D = d_0) = Pr(R_Y=1) + Pr(R_Y=2) -(Pr(R_Y=2)+Pr(R_Y=3)) = Pr(R_Y=1)-Pr(R_Y=3)\).

  19. 19.

    Instrument Z only affects Y via D: \(Pr(Y|D,Z,R_Y,R_D) = Pr(Y|D,R_Y,R_D)\), and this equation can be reformulated as \(Pr(Y,D,Z,r_Y,r_D)Pr(D,R_Y,R_D)=Pr(Y,D,r_Y,r_D)Pr(D,Z,R_Y,R_D)\).

  20. 20.

    As with exogenous instruments, the marginal distribution of X does not have any identifying power.

  21. 21.

    If \(X\beta <0\) Eq. (17) is equivalent to \(Pr(U=u_1|X=x) = Pr(U=u_2|X=x) + Pr(U=u_3|X=x)\), and if \(X\beta \ge 0\) Eq. (17) can be rewritten as \(Pr(U=u_1|X=x) + Pr(U=u_2|X=x) = Pr(U=u_3|X=x)\). Note that this restriction can be rewritten as \(\sum _{i,j}\pi _{ij}(1(U \le 0)-0.5)=0\).

  22. 22.

    The observed probabilities \(p_{ijk}\) were obtained using the Matlab function mvtnorm.

  23. 23.

    Excluding the \(0\%\) and \(100\%\) quantiles.

  24. 24.

    As in Example 1, the distribution of exogenous variables per se does not have any identifying power. It is included purely for the simplicity of the exposition.

  25. 25.

  26. 26.

  27. 27.

    This is a 4-dimensional array \(\pi _{ijkl}\) stacked into a vector.

  28. 28.

  29. 29.


  1. Andrews, D. W. K., & Shi, X. (2013). Inference based on conditional moment inequalities. Econometrica, 81, 609–666.

    Article  Google Scholar 

  2. Angrist, J., Bettinger, E., Bloom, E., King, E., & Kremer, M. (2002). Vouchers for private schooling in Colombia: Evidence from a randomized natural experiment. The American Economic Review, 92, 1535–1558.

    Article  Google Scholar 

  3. Artstein, Z. (1983). Distributions of random sets and random selections. Israel Journal of Mathematics, 46, 313–324.

    Article  Google Scholar 

  4. Balke, A., & Pearl, J. (1994). Counterfactual probabilities: Computational Methods, bounds, and applications. In L. R. de Mantaras & D. Poole (Eds.), Uncertainty in artificial intelligence 10 (pp. 46–54). Burlington: Morgan Kaufmann.

    Google Scholar 

  5. Balke, A., & Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 439, 1172–1176.

    Google Scholar 

  6. Beresteanu, A., Molchanov, I., & Molinari, F. (2011). Sharp identification regions in models with convex moment predictions. Econometrica, 79, 1785–1821.

    Article  Google Scholar 

  7. Beresteanu, A., Molchanov, I., & Molinari, F. (2012). Partial identification using random set theory. Journal of Econometrics, 166, 17–32.

    Article  Google Scholar 

  8. Beresteanu, A., & Molinari, F. (2008). Asymptotic properties for a class of partially identified models. Econometrica, 76, 763–814.

    Article  Google Scholar 

  9. Boykov, Y., & Kolmogorov, V. (2001). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 359–374.

    Google Scholar 

  10. Brock, W. A., & Durlauf, S. N. (2001). Discrete choice with social interactions. Review of Economic Studies, 68, 235–260.

    Article  Google Scholar 

  11. Bugni, F. A. (2010). Bootstrap inference in partially identified models defined by moment inequalities: Coverage of the identified set. Econometrica, 78, 735–753.

    Article  Google Scholar 

  12. Chernozhukov, V., Hansen, C., & Jansson, M. (2009). Finite sample inference for quantile regression models. Journal of Econometrics, 152, 93–103.

    Article  Google Scholar 

  13. Chernozhukov, V., Hong, H., & Tamer, E. (2007). Estimation and confidence regions for parameter sets in econometric models 1. Econometrica, 75, 1243–1284.

    Article  Google Scholar 

  14. Chernozhukov, V., Lee, S., & Rosen, A. M. (2013). Intersection bounds: Estimation and inference. Econometrica, 81, 667–737.

    Article  Google Scholar 

  15. Chesher, A. (2009). Single equation endogenous binary reponse models. In CeMMAP working papers CWP23/09, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

  16. Chesher, A. (2010). Instrumental variable models for discrete outcomes. Econometrica, 78, 575–601.

    Article  Google Scholar 

  17. Chesher, A., Rosen, A. M., & Smolinski, K. (2013). An instrumental variable model of multiple discrete choice. Quantitative Economics, 4, 157–196.

    Article  Google Scholar 

  18. Chiburis, R. C. (2010). Bounds on treatment effects using many types of monotonicity. Unpublished manuscript.

  19. Conley, T. G., Hansen, C. B., & Rossi, P. E. (2012). Plausibly exogenous. Review of Economics and Statistics, 94, 260–272.

    Article  Google Scholar 

  20. Ekeland, I., Galichon, A., & Henry, M. (2010). Optimal transportation and the falsifiability of incompletely specified economic models. Economic Theory, 42, 355–374.

    Article  Google Scholar 

  21. Freyberger, J., & Horowitz, J. L. (2015). Identification and shape restrictions in nonparametric instrumental variables estimation. Journal of Econometrics, 189, 41–53.

    Article  Google Scholar 

  22. Galichon, A., & Henry, M. (2009). A test of non-identifying restrictions and confidence regions for partially identified parameters. Journal of Econometrics, 152, 186–196.

    Article  Google Scholar 

  23. Galichon, A., & Henry, M. (2011). Set identification in models with multiple equilibria. Review of Economic Studies, 78(4), 1264–1298.

    Article  Google Scholar 

  24. Goldberg, A. V., Tarjan, R. E. (1986). A new approach to the maximum flow problem. In: Proceedings of the eighteenth annual ACM symposium on Theory of computing, New York, NY, USA: ACM, STOC ’86, pp. 136–146.

  25. Hahn, J., & Hausman, J. (2005). Estimation with valid and invalid instruments. Annals of Economics and Statistics/Annales d’Économie et de Statistique, 79–80, 25–57.

  26. Henry, M., Meango, R., & Queyranne, M. (2015). Combinatorial approach to inference in partially identified incomplete structural models. Quantitative Economics, 6, 499–529.

    Article  Google Scholar 

  27. Honoré, B. E., & Tamer, E. (2006). Bounds on parameters in panel dynamic discrete choice models. Econometrica, 74, 611–629.

    Article  Google Scholar 

  28. Huber, M., Laffers, L., & Mellace, G. (2017). Sharp IV bounds on average treatment effects on the treated and other populations under endogeneity and noncompliance. Journal of Applied Econometrics, 32, 56–79.

    Article  Google Scholar 

  29. Imbens, G. W., & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467–475.

    Article  Google Scholar 

  30. Imbens, G. W., & Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica, 72, 1845–1857.

    Article  Google Scholar 

  31. Komarova, T. (2013). Binary choice models with discrete regressors: Identification and misspecification. Journal of Econometrics, 177, 14–33.

    Article  Google Scholar 

  32. Laffers, L. (2013). A note on bounding average treatment effects. Economics Letters, 120, 424–428.

    Article  Google Scholar 

  33. Laffers, L. (2015). Bounding average treatment effects using linear programming. Unpublished manuscript.

  34. Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. The Review of Economic Studies, 76, 1071–1102.

    Article  Google Scholar 

  35. Manski, C. F. (1990). Nonparametric bounds on treatment effects. American Economic Review, 80, 319–23.

    Google Scholar 

  36. Manski, C. F. (1995). Identification problems in the social sciences. Cambridge: Harvard University Press.

    Google Scholar 

  37. Manski, C. F. (2003). Partial identification of probability distributions. New York: Springer.

    Google Scholar 

  38. Manski, C. F. (2007). Partial identification of counterfactual choice probabilities. International Economic Review, 48, 1393–1410.

    Article  Google Scholar 

  39. Manski, C. F. (2008). Partial identification in econometrics. In S. N. Durlauf & L. E. Blume (Eds.), The new palgrave dictionary of economics. Basingstoke: Palgrave Macmillan.

    Google Scholar 

  40. Manski, C. F., & Pepper, J. V. (2000). Monotone instrumental variables, with an application to the returns to schooling. Econometrica, 68, 997–1012.

    Article  Google Scholar 

  41. Manski, C. F., & Thompson, T. S. (1986). Operational characteristics of maximum score estimation. Journal of Econometrics, 32, 85–108.

    Article  Google Scholar 

  42. Nevo, A., & Rosen, A. M. (2012). Identification with imperfect instruments. Review of Economics and Statistics, 93, 127–137.

    Google Scholar 

  43. Papadimitriou, C. H., & Steiglitz, K. (1998). Combinatorial optimization; algorithms and complexity. New York: Dover Publications.

    Google Scholar 

  44. Romano, J. P., & Shaikh, A. M. (2010). Inference for the identified set in partially identified econometric models. Econometrica, 78, 169–211.

    Article  Google Scholar 

  45. Rosen, A. M. (2008). Confidence sets for partially identified parameters that satisfy a finite number of moment inequalities. Journal of Econometrics, 146, 107–117.

    Article  Google Scholar 

  46. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.

    Article  Google Scholar 

  47. Shaikh, A. M., & Vytlacil, E. J. (2011). Partial identification in triangular systems of equations with binary dependent variables. Econometrica, 79, 949–955.

    Article  Google Scholar 

  48. Tamer, E. T. (2010). Partial identification in econometrics. Annual Review of Economics, 2, 167–195.

    Article  Google Scholar 

Download references


This research was supported by VEGA grant 1/0843/17. This paper is a revised chapter from my 2014 dissertation at the Norwegian School of Economics.

Author information



Corresponding author

Correspondence to Lukáš Lafférs.


A Proofs

A.1 Proof of Lemma 1


We need to show that there exists satisfying:


if and only if there exists satisfying:

$$\begin{aligned}&\sum _{i=1}^{n} \sum _{j=1}^{m} \pi _2(y_i,u^*_j)1(y_i \in G(u^*_j))du = 1, \end{aligned}$$
$$\begin{aligned} \forall i = 1,...,n:&\sum _{j=1}^{m} \pi _2(y_i,u^*_j) = p_i, \end{aligned}$$
$$\begin{aligned} \forall j = 1,...,m:&\sum _{i=1}^{n}\pi _2(y_i,u^*_j) = \nu ^*(u^*_j), \end{aligned}$$
$$\begin{aligned} \forall I \in \mathbb {I};\forall j = 1,...,m:&\left| \sum _{i \in I}\pi _2(y_i,u^*_j) - \sum _{i \in I} p_i \nu ^*(u^*_j) \right| \le \alpha \sum _{i \in I} p_i \nu ^*(u^*_j). \end{aligned}$$
$$\begin{aligned} \forall i = 1,...,n;\forall j = 1,...,m&: \pi _2(y_i,u^*_j) \ge 0 \end{aligned}$$

\((\Rightarrow )\)” - Given \(\pi _1\), we construct \(\pi _2\) according to:


and this will ensure that {(C1),(C2),(C3M),(C4M),(C5)} implies {(D1),(D2),(D3M),(D4M),(D5)} as shown below:


\((\Leftarrow )\)” - If we know \(\pi _2\), we obtain \(\pi _1\) using:


(note that (\(\Pi _1\)) implies (\(\Pi _2\))) and we now show that {(D1), (D2), (D3M), (D4M), (D5)} implies {(C1), (C2), (C3M), (C4M), (C5)}:


A.2 Proof of Lemma 2


Similarly to the proof of Lemma 1, we need to show that there exists satisfying (C1), (C2), (C5) and:


if and only if there exists satisfying (D1), (D2), (D5) and:


\((\Rightarrow )\)” - Given \(\pi _1\), we construct \(\pi _2\) according to:


and this will ensure that {(C1), (C2), (C3M), (C4M), (C5)} imply {(D1), (D2), (D3M), (D4M), (D5)}. Because the partitioning of the space using (PartU2) is finer than that using (PartU1), we find that {(C1), (C2), (C5)}, implying {(D1), (D2), (D5)} immediately using the proof of Lemma 1. It is therefore sufficient to show that {(C3M), (C4M)} imply {(D3M), (D4M)}:


\((\Leftarrow )\)” - Knowing \(\pi _2\), we obtain \(\pi _1\) using:


where \(\gamma \) is an arbitrary strictly positive probability density function. It is now sufficient to show that {(D3M),(D4M) (D5)} imply {(C3M),(C4M),(C5)}, because the proof of Lemma 1 reveals that {(C1),(C2)} imply {(D1),(D2)} and (PartU2) provides a finer discretization of than does (PartU1):


B Technical Details on the Presented Examples

B.1 Example 1

B.1.1 Chesher’s Approach

In order to present the identification result from Chesher (2009), we first introduce the basic definitions. The notation used differs from that in GH that is employed in the present study.

  • A model is defined as (10) with \(U \sim Unif(0,1)\) and for all .

  • A structure\(S \equiv \{t, F_{UX|Z}\}\) is a pair of a threshold-crossing function t and a cumulative distribution function of the conditional distribution of U and X given Z.

  • A structure S is said to be admitted by a model if \(F_{UX|Z}\) respects the independence property, that is \(F_U(u|z)\equiv F_{UX|Z}(u,\bar{x}|z)=u\) for all \(u \in (0,1)\) and all , where \(\bar{x}\) is the upper bound of X.

  • A structure Sgenerates the joint distribution of Y and X given Z if \(F_{YX|Z}(0,x|z)=F_{UX|Z}(t(x),x|z)\).

  • Two structures \(S^* \equiv \{t^*, F^*_{UX|Z}\}\) and \(S^0 \equiv \{t^0, F^0_{UX|Z}\}\) are said to be observationally equivalent if they generate the same distribution of Y and X given Z for all , that is if \(F^*_{YX|Z}(0,x|z) \equiv F^*_{UX|Z}(t^*(x),x|z) = F^0_{YX|Z}(0,x|z) \equiv F^0_{UX|Z}(t^0(x),x|z)\) for all and for all .

Theorem 1 from Chesher (2009) states that having a structure \(S_0\) admitted by the model that generates the conditional distribution of Y and X given Z with cumulative distribution function \(F^0_{YX|Z}\) and if this threshold-crossing function t is in structure S admitted by model that is observationally equivalent to \(S^0\), then t satisfies:


where \(Pr_0\) states that probabilities were calculated using the measure that was generated by \(S^0\), that is using \(F^0_{YX|Z}\) and l and u stand for the lower and upper bound, respectively.

Given the continuity of X, the converse is also true. This is equivalent to saying that the set of all functions p satisfying the above set of inequalities is a sharply defined identified set. In Chesher (2010), this theorem is proven, even for a more general setup. It is important to note that the proof is constructive, so that for a given threshold-crossing function t, a suitable distribution function \(F_{UX|Z}\) is constructed such that \(\{t,F_{UX|Z}\}\) is admitted by the model and generates the \(F_{YX|Z}\) observed in the data. This highlights the link to the GH setup, as the aim there is to find the joint probability distribution that satisfies the independence restriction, has correct marginals, and places all the probability on those combinations of variables that are compatible with the data.

B.1.2 Illustration: Discrete Endogenous Variable

Construction of True Data-generating Process

The following example is taken from Chesher (2010). Suppose that both Y and X are binary; \(Y \equiv 1(Y^* \ge 0)\) and \(X \equiv 1(X^* \ge 0)\), where \(Y^*\) and \(X^*\) were generated in the following way:


with parameters:

$$\begin{aligned} (\theta _0,\theta _1,b_0,b_1,r) = (0,0.5,0,1,-0.25). \end{aligned}$$

and the instrument Z takes values in .

However, the econometrician does not know how the data were generated. She only assumes (10) and , \(U \sim Unif(0,1)\), \(t(X) = \Phi (-\theta _0 - \theta _1 X)\), and observes the distribution of the observable variables \(p_{ijk}\).Footnote 22 Even though it is impossible to recover the true value of \(\theta = (0,0.5)\) exactly, it is possible to at least create informative bounds for it.

As the X threshold-crossing function t attains only two values, \(t(0)=\Phi (-\theta _0) = 0.5\) and \(t(1)=\Phi (-\theta _0-\theta _1)=0.308\).

B.1.3 Illustration: Continuous Endogenous Variable

Construction of the True Data-Generating Process

Suppose that the economic model is described by (10) and the data-generating process by (21) with the parameters:

$$\begin{aligned} (\theta _0,\theta _1,b_0,b_1,s_{wv},s_{vv}) = (0,-1,0,0.3,0.5,1). \end{aligned}$$

as before, the only difference being that X is no longer binary (\(X=X^*)\).

The distribution of the observable variables \((Y^*,X|Z=z)\) (\(Y^*\) and X given \(Z=z\)) is given by \(N(\mu (z),\Sigma )\), where:

$$\begin{aligned} \mu (z) = \left[ \begin{array}{cc} \theta _0 + \theta _1 b_0 + \theta _1 b_1 z\\ b_0 + b_1 z \end{array}\right] \ \ \ \Sigma = \left[ \begin{array}{cc} 1+2\theta _1s_{wv}+\theta _1^2s_{vv} &{} s_{wv}+\theta _1 s_{vv}\\ s+{wv} + \theta _1 s_{vv} &{} s_{vv}. \end{array}\right] \end{aligned}$$

We provide details of the simulations here. Because of the continuity of X, the unobservable U was discretized as the equidistant point masses on [0, 1]. The distribution of observables is given by:

$$\begin{aligned} p_{ijk} = Pr(Y = y_i \cap X = x_j \cap Z=z_k) = Pr(Y = y_i \cap X =x_j | Z=z_k) Pr(Z=z_k). \end{aligned}$$

It is known that \((Y^*,X|Z) \sim N(\mu (z),\Sigma )\) and a suitable discretization of X is needed. It is easy to show that the density of \((Y^*|X=x,Z=z)\) is:

$$\begin{aligned} N\left( \mu (z)_{1}+ \frac{\Sigma _{21}}{\Sigma _{22}}(x-\mu (z)_{2}),\left( 1-\sqrt{\frac{\Sigma _{21}^2}{\Sigma _{11}\Sigma _{22}}}\right) \Sigma _{11}\right) . \end{aligned}$$

Integrating the corresponding probability density function at (\(-\infty \),0) gives us \(Pr(Y=0|X=x,Z=z)\). The distribution of X given \(Z=z\) is \(N(b_0+b_1 z,s_{vv})\), but now the question is how to discretize the support of X, which is \(\mathbb {R}\). If the number of nodes is \(n_x\), then one suggestion would be to set the z to its mean value, that is 0, and set the values of the discretized support of X to \(n_x\) equidistant quantiles.Footnote 23 Even though this discretization appears natural, it brings some degree of arbitrariness to the problem.

Finally, taking all the pieces together yields:

$$\begin{aligned} p_{ijk} = Pr(Y = y_i| X = x_j,Z=z_k)Pr(X = x_j|Z=z_k)Pr(Z=z_k), \end{aligned}$$

where all quantities on the right-hand side are known.

B.2 Example 2

B.2.1 Illustration

True Data-Generating Process

For the illustration, \((\epsilon _1,\epsilon _2)\) are assumed to be \(N(0,I_2)\). This assumption, together with (13) and (14), generates the distribution of Y and D given X and Z. The support of Z is assumed to be \(\{-1,1\}\) and the support of X is either \(\{0\}\) or \(\{-2,-1,0,1,2\}\). (XZ) are assumed to be uniformly distributed.Footnote 24

B.3 Example 3

B.3.1 Balke and Pearl’s Approach

Balke and Pearl (1997) made use of the fact that these restrictions impose the following decomposition on the joint distribution of (YDZU):

$$\begin{aligned} Pr(Y,D,Z,U) = Pr(Y|D,U)Pr(D|Z,U)Pr(Z)Pr(U). \end{aligned}$$

There exist four different functions from Z to D and four different functions from D to Y, hence 16 different types of individuals that we can consider. Hence, one can think of U as having a discrete support with 16 points, each point representing a pair of functions, one from Z to D and the second from D to Y. For instance, one type u may be persons who always accept treatment and who do not display a positive outcome irrespective of treatment. The bounds on (15) are found using linear program searching through the space of distributions of the types (U) subject to the joint distribution to be compatible with observed data Pr(yd|z). The full setup, together with discussion, is in Balke and Pearl (1997, 1994).

B.4 Example 4

B.4.1 Komarova’s Approach

Following Manski and Thompson (1986):

$$\begin{aligned} Pr(Y=1|X =x) = 1 - Pr(U <-x\beta |X=x), \end{aligned}$$

together with the zero-median restriction (17), implies:

$$\begin{aligned} Pr(Y=1|X=x) \ge 0.5 \ \Leftrightarrow \ x\beta \ge 0. \end{aligned}$$

Therefore, the bounds on the parameter vector \(\beta \) are obtained as an intersection of linear half spaces. In Komarova (2013), a recursive procedure is proposed that translates this set of linear inequalities into bounds on the parameters.

C Implementation Issues

C.1 Extended GH Framework

Following routines were used and compared in order to solve linear program (12).

  • linprogFootnote 25—Matlab built in function from Optimization Toolbox. Interior point method was superior to simplex method because of the computational time. Since the objective value is not minimized to exact zeros, certain threshold had to be employed. Natural choice is the tolerance level of the optimization routine (\(10^{-8}\) for \(n_x = n_u = 40\) was used). Results for the two approaches were identical.

  • GNU Linear Programming Kit (GLPK)—Modified simplex method from Matlab MEX interface for the GLPK library.Footnote 26 Significantly faster than linprog with similar results.

Linear program is an old and well understood problem however if the discretization of X and U is large then the matrix that encodes the restrictions for the joint distribution can reach the limits of Matlab’s largest array that can be created.Footnote 27 For instance if the sizes of supports are \(n_x = n_u = 50\) together with \(n_y=2\) and \(n_z=30\), then the joint probability \(\pi _{ijkl}\) has 150, 000 elements. So that the matrix that carries the information about restrictions on \(\pi _{ijkl}\) will have 150, 000 columns. This is not a large problem, but the complexity grows exponentially with additional covariates. The most time consuming part is creating the matrices of equalities and inequalities that define the linear program. The computational burden is further increased when resampling methods, such as subsampling, are used for statistical inference. All these reasons allow us only to consider small problems, those with one or two parameters of interest.

C.2 Original GH Approach

Since the formulation is also a linear program both linprog and GLPK can be used. In addition to these two functions optimal transportation structure can be exploited and more efficient algorithms can be applied. Galichon and Henry (2011) refer to the link between optimal transportation and maximum flow problem studied in Papadimitriou and Steiglitz (1998). If the capacities \(\{\nu _j\},\{p_i\}\) of the arcs are the probabilities of the observables and unobservables respectively and the arcs with infinite capacity are between the values of the observables and unobservables that satisfy the economic model (\(y_i \in G_\theta (u_j)\) or \(c_{ij}=0\)), then the question of feasibility of the economic model reduces to checking whether the maximum flow is equal to one. The maximum flow formulation is illustrated on Fig. 15.

Fig. 15

Reproduced from the Figures 7–8 from Papadimitriou and Steiglitz (1998) p. 145

These two algorithms are readily available

  • maxflow of M. Rubinstein implements Boykov and Kolmogorov max-flow/min-cut algorithm (Boykov and Kolmogorov 2001).Footnote 28

  • max_flow from Matlab Boost Graph Library written by D. Gleich implements push-relabel maximum flow algorithm of Goldberg and Tarjan (1986).Footnote 29

These combinatorial algorithms are computationaly very effective, indeed, the most challenging task is to efficiently implement the creation of the matrix that defines the graph. Since this matrix is smaller that the one from linear program, finer discretization can be employed if necessary. Even for very fine discretization maximum flow was calculated in a fraction of a second.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lafférs, L. Identification in Models with Discrete Variables. Comput Econ 53, 657–696 (2019).

Download citation


  • Partial identification
  • Discrete variables
  • Linear programming
  • Sensitivity analysis

JEL Classification

  • C10
  • C21
  • C26
  • C61