An evolutionary algorithm for subset selection in causal inference models



Researchers in all disciplines desire to identify causal relationships. Randomized experimental designs isolate the treatment effect and thus permit causal inferences. However, experiments are often prohibitive because resources may be unavailable or the research question may not lend itself to an experimental design. In these cases, a researcher is relegated to analyzing observational data. To make causal inferences from observational data, one must adjust the data so that they resemble data that might have emerged from an experiment. The data adjustment can proceed through a subset selection procedure to identify treatment and control groups that are statistically indistinguishable. Identifying optimal subsets is a challenging problem but a powerful tool. An advance in an operations research solution that is more efficient and identifies empirically more optimal solutions than other proposed algorithms is presented. The computational framework does not replace existing matching algorithms (e.g., propensity score models) but rather further enables and augments the ability of all causal inference models to identify more putatively randomized groups.


causal inference subset selection optimization 



Many thanks to Yan Liu for helpful comments and advice.


  1. Begg CB (1990). Significance tests of covariate balance in clinical trials. Controlled Clinical Trials 11(4):223–225.CrossRefGoogle Scholar
  2. Cho WKT (2017). Causal inference via many experiments. Journal of Applied Statistics.Google Scholar
  3. Cho WKT, Sauppe JJ, Nikolaev AG, Jacobson SH and Sewell EC (2013). An optimization approach for making causal inferences. Statistica Neerlandica 67(2):211–226.CrossRefGoogle Scholar
  4. Cochran WG and Cox GM (1957). Experimental Designs. Chapman & Hall: London.Google Scholar
  5. Dehejia RH and Wahba S (1999). Causal effects in non-experimental studies: Re-evaluating the evaluation of training programs. Journal of the American Statistical Association 94(448):1053–1062.CrossRefGoogle Scholar
  6. Dehejia RH and Wahba S (2002). Propensity score matching methods for nonexperimental causal studies. Review of Economics and Statistics 84(1):151–161.CrossRefGoogle Scholar
  7. Fisher RA (1935). Design of Experiments. Hafner: New York.Google Scholar
  8. Hansen BB and Bowers J (2008). Covariate balance in simple, stratified and clustered comparative studies. Statistical Science 23(2):219–236.CrossRefGoogle Scholar
  9. Holland PW (1986). Statistics and causal inference. Journal of the American Statistical Association 81(396):945–960.CrossRefGoogle Scholar
  10. Imai K, King K and Stuart EA (2008). Misunderstandings among experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A 171(2):481–502.CrossRefGoogle Scholar
  11. LaLonde R (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76:604–620.Google Scholar
  12. Neyman J (1923 [1990]). On the application of probability theory to agricultural experiments. Essay on principles. Section 9 (1923). Statistical Science 5(4):465–472. (reprint. Transl. by Dabrowska and Speed).Google Scholar
  13. Nikolaev AG, Jacobson SH, Cho WKT, Sauppe JJ and Sewell EC (2013). Balance optimization subset selection (BOSS): An alternative approach for causal inference with observational data. Operations Research 61(2):398–412.CrossRefGoogle Scholar
  14. Raab GM and Butcher I (2001). Balance in cluster randomized trials. Statistics in Medicine 20:351–365.CrossRefGoogle Scholar
  15. Radcliffe NJ (1993). Genetic Set Recombination. In Whitley LD (Ed.) Foundations of Genetic Algorithms 2. Morgan Kaufmann Publishers: San Mateo.Google Scholar
  16. Rosenbaum PR (1989). Optimal matching for observational studies. Journal of the American Statistical Association 84(408):1024–1032.CrossRefGoogle Scholar
  17. Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55.CrossRefGoogle Scholar
  18. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66(5):688–701.CrossRefGoogle Scholar
  19. Rubin DB (1977). Assignment to a treatment group on the basis of a covariate. Journal of Educational Statistics 2(1):1–26.CrossRefGoogle Scholar
  20. Rubin DB (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics 6(1):34–58.CrossRefGoogle Scholar
  21. Zubizarreta JR (2012). Using mixed integer programming for matching in an observational study of kidney failure after surgery. Journal of the American Statistical Association 107:1360–1371.CrossRefGoogle Scholar

Copyright information

© The Operational Research Society 2017

Authors and Affiliations

  1. 1.Department of Political Science and Department of StatisticsUniversity of Illinois at Urbana-ChampaignILUSA
  2. 2.National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignILUSA

Personalised recommendations