Treatment Effect Decomposition and Bootstrap Hypothesis Testing in Observational Studies

  • Hee Youn KwonEmail author
  • Jason J. Sauppe
  • Sheldon H. Jacobson


Causal inference with observational data has drawn attention across various fields. These observational studies typically use matching methods which find matched pairs with similar covariate values. However, matching methods may not directly achieve covariate balance, a measure of matching effectiveness. As an alternative, the Balance Optimization Subset Selection (BOSS) framework, which seeks optimal covariate balance directly, has been proposed. This paper extends BOSS by estimating and decomposing a treatment effect as a combination of heterogeneous treatment effects from a partitioned set. Our method differs from the traditional propensity score subclassification method in that we find a subset in each subclass using BOSS instead of using the stratum determined by the propensity score. Then, by conducting a bootstrap hypothesis test on each component, we check the statistical significance of these treatment effects. These methods are applied to a dataset from the National Supported Work Demonstration (NSW) program which was conducted in the 1970s. By examining the statistical significance, we show that the program was not significantly effective to a specific subgroup composed of those who were already employed. This differs from the combined estimate—the NSW program was effective when considering all the individuals. Lastly, we provide results that are obtained when these steps are repeated with sub-samples.


Data processing Inference mechanism Optimization Causal inference 

Mathematics Subject Classification

62-07 90-08 90C11 90C90 


  1. 1.
    Nikolaev AG, Jacobson SH, Cho WKT, Sauppe JJ, Sewell EC (2013) Balance optimization subset selection (BOSS): an alternative approach for causal inference with observational data. Oper Res 61(2):398–412CrossRefGoogle Scholar
  2. 2.
    Cochran WG (1968) The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24:295–313CrossRefGoogle Scholar
  3. 3.
    Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38Google Scholar
  4. 4.
    Xie Y, Brand JE, Jann B (2012) Estimating heterogeneous treatment effects with observational data. Sociol Methodol 42(1):314–347CrossRefGoogle Scholar
  5. 5.
    Imai K, Ratkovic M (2013) Estimating treatment effect heterogeneity in randomized program evaluation. Ann Appl Stat 7(1):443–470CrossRefGoogle Scholar
  6. 6.
    Elwert F, Winship C (2010) Effect heterogeneity and bias in main-effects-only regression models. In: Dechter R, Geffner H, Halpern JY (eds) Heuristics, Probability and Causality: A Tribute to Judea Pearl. College Publications, London, pp 327–336Google Scholar
  7. 7.
    Grender J, Williams K, Walters P, Klukowska M, Reick H (2013) Plaque removal efficacy of oscillating-rotating power toothbrushes: review of six comparative clinical trials. Am J Dent 26(2):68–74Google Scholar
  8. 8.
    Sauppe JJ, Jacobson SH, Sewell EC (2014) Complexity and approximation results for the balance optimization subset selection model for causal inference in observational studies. INFORMS J Comput 26(3):547–566CrossRefGoogle Scholar
  9. 9.
    LaLonde RJ (1986) Evaluating the econometric evaluations of training programs with experimental data. Am Econ Rev 76:604–620Google Scholar
  10. 10.
    Sauppe JJ, Jacobson SH (2017) The role of covariate balance in observational studies. Naval Res Logist (NRL) 64(4):323–344CrossRefGoogle Scholar
  11. 11.
    Kwon HY, Sauppe JJ, Jacobson SH (2018) Bias in balance optimization subset selection: exploration through examples. J Oper Res Soc 1–14.
  12. 12.
    MacKinnon JG (2009) Bootstrap hypothesis testing. In: Belsley DA, Kontoghiorghes E (eds) Handbook of computational econometrics, chapt 6. Wiley, West Sussex, pp 183–2113CrossRefGoogle Scholar
  13. 13.
    Heckman JJ, Joseph Hotz V (1989) Choosing among alternative nonexperimental methods for estimating the impact of social programs: the case of manpower training. J Am Stat Assoc 84(408):862–874CrossRefGoogle Scholar
  14. 14.
    Dehejia RH, Wahba S (1999) Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc 94(448):1053–1062CrossRefGoogle Scholar
  15. 15.
    Dehejia RH, Wahba S (2002) Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat 84(1):151–161CrossRefGoogle Scholar
  16. 16.
    Imbens GW (2003) Sensitivity to exogeneity assumptions in program evaluation. Am Econ Rev 93:126–132CrossRefGoogle Scholar
  17. 17.
    Smith JA, Todd PE (2005) Does matching overcome LaLonde’s critique of nonexperimental estimators? J Econom 125(1):305–353CrossRefGoogle Scholar
  18. 18.
    Abadie A, Imbens GW (2011) Bias-corrected matching estimators for average treatment effects. J Bus Econ Stat 29(1):1–11CrossRefGoogle Scholar
  19. 19.
    Colson KE, Rudolph KE, Zimmerman SC, Goin DE, Stuart EA, van der Laan M, Ahern J (2016) Optimizing matching and analysis combinations for estimating causal effects. Sci Rep 6:23222CrossRefGoogle Scholar
  20. 20.
    Cho WKT, Sauppe JJ, Nikolaev AG, Jacobson SH, Sewell EC (2013) An optimization approach for making causal inferences. Stat Neerlandica 67(2):211–226CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Kellogg School of Management and Northwestern Institute on Complex Systems (NICO)Northwestern UniversityEvanstonUSA
  2. 2.Department of Computer ScienceUniversity of Wisconsin–La CrosseLa CrosseUSA
  3. 3.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations