Bounding average treatment effects using linear programming


This paper presents a method of calculating sharp bounds on the average treatment effect using linear programming under identifying assumptions commonly used in the literature. This new method provides a sensitivity analysis of the identifying assumptions and missing data in two applications. The first application looks at the effect of parents’ schooling on children’s schooling, and the second application studies the effect of mandatory arrest policy on domestic violence recidivism. This paper shows that even a mild departure from identifying assumptions may substantially widen the bounds on average treatment effects. Allowing for a small fraction of the data to be missing also has a large impact on the results.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Formally, the population forms a probability space \((I,\mathcal {F},P)\), where the population of individuals I is the sample space, \(\mathcal {F}\) is a set of events and P is a probability measure. Hence, the only source of randomness is the choice of individual. The individual’s behavior is deterministic.

  2. 2.

    Available at

  3. 3.

    The assumption that the outcome is a deterministic function of a treatment is intrinsic in the potential outcome framework of Rubin (1974).

  4. 4.

    Ordinary least squares regression analysis assumes exogenous treatment selection: \(\forall t_1, t_2: E[Y(t)|D = t_1] = E[Y(t)|D = t_2]\), and it point identifies the average potential outcome: \(E[Y(t)] = E[Y(t)| D = t]P(D = t) + E[Y(t)| D \ne t]P(D \ne t) = E[Y(t)| D = t]\).

  5. 5.

    The confidence sets that cover the whole identified set asymptotically are generally larger and may be preferable for a policymaker concerned with robust decisions, as argued in Henry and Onatski (2012).

  6. 6.

    The estimates of bounds in first column in this table are the same as in Table 1 in Lafférs (2013b).

  7. 7.

    This analysis was challenged by Antonovics and Goldberger (2005), who claim that their results are driven by a specific data coding. In a reply, Behrman and Rosenzweig (2005) argue that their story is supported by an additional data source.

  8. 8.

    We used a scalar relaxation parameter \(\alpha _{cMTS}\) for simplicity. An extension to a vector parameter is straightforward.

  9. 9.

    From the dataset, we get an estimate \(\hat{p}_n\) of the true and unknown p, where n is the sample size.

  10. 10.

    The inequalities \(g(\bar{p},p,\alpha _{MISS}) \le 0\) ensure that \(p_{MISS} \in \mathcal {P}\) (\(p_{MISS}\) is a proper probability distribution) in the definition (MISS) of the set \(\mathcal {P}_{MISS}\):

    $$\begin{aligned} \begin{aligned} \forall i = 1,\dots ,K: \ \ \ g_i(\bar{p},p,\alpha _{MISS})&= \left( (1-\alpha _{MISS})p_1 - \bar{p}_1 \right) /\alpha _{MISS}, \\ g_{K+1}(\bar{p},p,\alpha _{MISS})&= \bar{p}_1 + \dots + \bar{p}_K - 1,\\ g_{K+2}(\bar{p},p,\alpha _{MISS})&= -(\bar{p}_1 + \dots + \bar{p}_K) + 1. \\ \end{aligned} \end{aligned}$$
  11. 11.

    Figure 5 and Table 4 show that the relaxation of \(\alpha _{cMTS}\) translates to the upper bound one by one.

  12. 12.

    The MDVE was followed by experiments in six other cities (Atlanta, Charlotte, Colorado Springs, Metro-Date, Omaha and Milwaukee) with different types of treatment assignments. Sherman et al. (1992) compares the results of five of these experiments. The arrest/non-arrest grouping makes the results comparable with experiments with different research designs.

  13. 13.

    Given that Z is randomly assigned, conditioning on Z gives us the same assumption as the unconditional one.

    $$\begin{aligned} \begin{aligned}&E[Y(t)|V=0 ] = E[Y(t) | Z=t, V = 0] P(Z=t|V=0) = E[Y(t) | Z=t, V = 0] P(Z=t) \\&E[Y(t)|V=1 ] = E[Y(t) | Z=t, V = 1] P(Z=t|V=1) = E[Y(t) | Z=t, V = 1] P(Z=t) \\&E[Y(t)|Z=t, V=0 ] \le E[Y(t)|Z=t, V=1 ] \iff E[Y(t)|V=0 ] \le E[Y(t)|V=1 ]. \end{aligned} \end{aligned}$$
  14. 14.

    For more references, see the footnote 2 in Lafférs (2013b).

  15. 15.

    Also, the MIV assumption is made conditional on the value of the treatment assigned (\(Z=z\)), which we highlighted by labeling this assumption as cMIV (conditional MIV).


  1. Antonovics KL, Goldberger AS (2005) Does increasing women’s schooling raise the schooling of the next generation? Comment. Am Econ Rev 95:1738–1744

    Article  Google Scholar 

  2. Balke A, Pearl J (1994) Counterfactual probabilities: computational methods, bounds, and applications. In: de Mantaras LR, Poole D (eds) Uncertainty in artificial intelligence, vol 1. Morgan Kaufmann, pp 46–54

  3. Balke A, Pearl J (1997) Bounds on treatment effects from studies with imperfect compliance. J Am Stat Assoc 439:1172–1176

    Google Scholar 

  4. Behrman JR, Rosenzweig MR (2002) Does increasing women’s schooling raise the schooling of the next generation? Am Econ Rev 92:323–334

    Article  Google Scholar 

  5. Behrman JR, Rosenzweig MR (2005) Does increasing women’s schooling raise the schooling of the next generation? Reply. Am Econ Rev 95:1745–1751

    Article  Google Scholar 

  6. Berk RA, Sherman LW (1988) Police responses to family violence incidents: an analysis of an experimental design with incomplete randomization. J Am Stat Assoc 83:70–76

    Google Scholar 

  7. Carter M (2001) Foundations of mathematical economics. MIT Press, Cambridge

    Google Scholar 

  8. Chiburis RC (2010) Bounds on treatment effects using many types of monotonicity. Working paper, Department of Economics, University of Texas at Austin

  9. de Haan M (2011) The effect of parents’ schooling on child’s schooling: a nonparametric bounds analysis. J Labor Econ 29:859–892

    Article  Google Scholar 

  10. Demuynck T (2015) Bounding average treatment effects: a linear programming approach. Econ Lett 137:75–77

    Article  Google Scholar 

  11. Freyberger J, Horowitz JL (2015) Identification and shape restrictions in nonparametric instrumental variables estimation. J Econ 189:41–53

    Article  Google Scholar 

  12. Galichon A, Henry M (2009) A test of non-identifying restrictions and confidence regions for partially identified parameters. J Econ 152:186–196

    Article  Google Scholar 

  13. Hauser RM (2005) Survey response in the long run: the Wisconsin longitudinal study. Field Methods 17:3–29

    Article  Google Scholar 

  14. Henry M, Onatski A (2012) Set coverage and robust policy. Econ Lett 115:256–257

    Article  Google Scholar 

  15. Hirano K, Porter JR (2012) Impossibility results for nondifferentiable functionals. Econometrica 80:1769–1790

    Article  Google Scholar 

  16. Holmlund H, Lindahl M, Plug E (2011) The causal effect of parents’ schooling on children’s schooling: a comparison of estimation methods. J Econ Literature 49:615–51

    Article  Google Scholar 

  17. Honore BE, Tamer E (2006) Bounds on parameters in panel dynamic discrete choice models. Econometrica 74:611–629

    Article  Google Scholar 

  18. Imbens GW, Manski CF (2004) Confidence intervals for partially identified parameters. Econometrica 72:1845–1857

    Article  Google Scholar 

  19. Kim JH (2014) Identifying the distribution of treatment effects under support restrictions. Available at SSRN

  20. Lafférs L (2013) Inference in partially identified models with discrete variables. Working paper

  21. Lafférs L (2013) A note on bounding average treatment effects. Econ Lett 120:424–428

    Article  Google Scholar 

  22. Lafférs L (2017) Identification in models with discrete variables. Comput Econ, forthcoming

  23. Manski CF (1990) Nonparametric bounds on treatment effects. Am Econ Rev 80:319–23

    Google Scholar 

  24. Manski CF (1995) Identification problems in the social sciences. Harvard University Press, Cambridge

    Google Scholar 

  25. Manski CF (1997) Monotone treatment response. Econometrica 65:1311–1334

    Article  Google Scholar 

  26. Manski CF (2003) Partial identification of probability distributions. Springer, New York

    Google Scholar 

  27. Manski CF (2007) Partial identification of counterfactual choice probabilities. Int Econ Rev 48:1393–1410

    Article  Google Scholar 

  28. Manski CF (2008) Partial identification in econometrics. In: Durlauf SN, Blume LE (eds) The new Palgrave dictionary of economics. Palgrave Macmillan, Basingstoke

    Google Scholar 

  29. Manski CF, Pepper JV (2000) Monotone instrumental variables, with an application to the returns to schooling. Econometrica 68:997–1012

    Article  Google Scholar 

  30. Martin D (1975) On the continuity of the maximum in parametric linear programming. J Optim Theory Appl 17:205–210

    Article  Google Scholar 

  31. Munkres JR (2000) Topology, 2nd edn. Prentice Hall, Englewood Cliffs

    Google Scholar 

  32. Romano JP, Shaikh AM (2008) Inference for identifiable parameters in partially identified econometric models. J Stat Plan Inference 138:2786–2807

    Article  Google Scholar 

  33. Romano JP, Shaikh AM (2010) Inference for the identified set in partially identified econometric models. Econometrica 78:169–211

    Article  Google Scholar 

  34. Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688–701

    Article  Google Scholar 

  35. Sherman LW, Schmidt JD, Rogan DP (1992) Policing domestic violence: experiments and dilemmas. Free Press, New York

    Google Scholar 

  36. Siddique Z (2013) Partially identified treatment effects under imperfect compliance: the case of domestic violence. J Am Stat Assoc 108:504–513

    Article  Google Scholar 

Download references


This research was supported by VEGA Grant 1/0843/17. This paper is a revised chapter from my 2014 dissertation at the Norwegian School of Economics. I would like to thank Monique de Haan for generously providing me with the data used in this paper, as well as Christian Brinch, Andrew Chesher, Christian Dahl, Gernot Doppelhofer, Charles Manski, Peter Molnar, Adam Rosen, Erik Sorensen, Ivan Sutoris and Alexey Tetenov for valuable feedback. Special thanks goes to the referees and the editor for carefully reading through the manuscript and for suggesting the second application.

Author information



Corresponding author

Correspondence to Lukáš Lafférs.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lafférs, L. Bounding average treatment effects using linear programming. Empir Econ 57, 727–767 (2019).

Download citation


  • Partial identification
  • Bounds
  • Average treatment effect
  • Sensitivity analysis
  • Linear programming

JEL Classification

  • C4
  • C6
  • I2