Extending coarsened exact matching to multiple cohorts: an application to longitudinal well-being program evaluation within an employer population

  • J. A. Sidney
  • C. Coberley
  • J. E. Pope
  • A. Wells


Research to date within the field of well-being program evaluation has considered the study population to have either been given a treatment or not, and that matching will yield an unbiased, efficient estimate of the treatment causal effect. As well-being intervention programs become more sophisticated and diverse in their offerings, so too must the methods for assessing program effect. The objective of this research was to extend the traditional binary cohort assignment in quasi-experimental program evaluation in order to quantify the differential effects of a multi-tiered well-being improvement program administered over a 3 year period in a large employer. Data collected over this 3 year period included well-being assessments and medical claims from 17,669 employees and spouses. These individuals were assigned different cohorts based on intervention program intensity and matched utilizing coarsened exact matching. The matching process was able to remove 85 %, on average, of detectable bias across all comparison cohorts. A weighted generalized linear model, using the coarsened exact matching derived weights, was estimated to quantify the net (difference-in-difference) causal effect of the well-being intervention program. The results showed an increase of overall well-being on average in the High Intensity cohort of 1.48 and 1.32 points in the Mild Intensity cohort. The non-intervened cohort only evidenced a 0.57 point increase in overall well-being. The methodology reported here provides an expanded and robust approach to matching on different cohorts for the purpose of program evaluation.


Coarsened exact matching Treatment effect estimation Well-being improvement Quasi-experimental program evaluation 



We thank Gary King (Harvard University) and Patrick Lam (Harvard University) for valuable comments and guidance on the coarsened exact matching methodology.


  1. Allen-Ramey, F.C., Doung, P.T., Goodman, D.C., Saijan, S.G., Nelsen, L.M., Santanello, N.C., Markson, L.E.: Treatment effectiveness of inhaled corticosteroids and leukotriene modifiers for patients with asthma: an analysis from managed care data. Allergy Asthma Proc. 24(1), 43–51 (2003)PubMedGoogle Scholar
  2. Bound, J., Charles, B., Nancy, M.: Measurement error in survey data. Population Studies Center; University of Michigan. (2000)
  3. Brandt, S., Gale, S., Tager, I.B.: Estimated effect of asthma case management using propensity score methods. Am. J. Manag. Care 16(4), 257–264 (2010)PubMedGoogle Scholar
  4. Cortes, C., Mehryar, M., Michael, R., Afshin, R.: Sample selection bias correction theory. In: Yoav, F., László, G., György, T., Thomas, Z. (eds.) Algorithmic learning theory, 5254:38–53. Berlin: Springer. (2008)
  5. Evers, K.E., Prochaska, J.O., Castle, P.H., Johnson, J.L., Prochaska, J.M., Harrison, P.L., Rula, E.Y., Coberley, C., Pope, J.E.: Development of an individual well-being scores assessment. Psychol. Well-Being: Theory Res. Pract. 2(2), 1–9 (2012). doi: 10.1186/2211-1522-2-2 Google Scholar
  6. Gandy, W.M., Coberley, C., Pope, J.E., Rula, E.Y.: Well-being and employee health—how employees’ well-being scores interact with demographic factors to influence risk of hospitalization or an emergency room visit. Popul. Health Manag. (2013). doi: 10.1089/pop.2012.0120 PubMedCentralGoogle Scholar
  7. Gandy, W.M., Coberley, C., Pope, J.E., Wells, A., Rula, E.Y.: Comparing the contributions of well-being and disease status to employee productivity. J. Occup. Environ. Med. 56(3), 252–257 (2014). doi: 10.1097/JOM.0000000000000109 CrossRefPubMedGoogle Scholar
  8. Goetzel, R.Z., Roemer, E.C., Liss-Levinson, R.C., Samoly, D.K.: Workplace Health Promotion: Policy Recommendations that Encourage Employers to Support Health Improvement Programs for their Workers. Partnership for Prevention, Washington, DC (2008)Google Scholar
  9. Hade, E.M.: Propensity score adjustment in multiple group observational studies: comparing matching and alternative methods. Ohio State University. (2012)
  10. Hamar, B., Wells, A., Gandy, W., Haaf, A., Coberley, C., Pope, J.E., Rula, E.Y.: The impact of a proactive chronic care management program on hospital admission rates in a German health insurance society. Popul. Health Manag. 13(6), 339–345 (2010). doi: 10.1089/pop.2010.0032 CrossRefPubMedCentralPubMedGoogle Scholar
  11. Harrison, P.L., Pope, J.E., Coberley, C.R., Rula, E.Y.: Evaluation of the relationship between individual well-being and future health care utilization and cost. Popul. Health Manag. (2012). doi: 10.1089/pop.2011.0089 Google Scholar
  12. Ho, D.E., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Anal. 15(3), 199–236 (2006). doi: 10.1093/pan/mpl013 CrossRefGoogle Scholar
  13. Iacus, S. M., King, G., Porro, G.: Matching for causal inference without balance checking. (2008)
  14. Iacus, S.M., King, G., Porro, G.: Cem: software for coarsened exact matching. J. Stat. Softw. 30(9). (2009)
  15. Iacus, S.M., King, G., Porro, G.: Causal inference without balance checking: coarsened exact matching. (2011)
  16. Jackman, S.: Generalized linear models. Stanford University. Accessed 21 Nov 2013
  17. King, G., Nielsen, R., Coberley, C., Pope, J.E., Wells A.: Comparative effectiveness of matching methods for causal inference. (2011a)
  18. King, G., Nielsen, R., Coberley, C., Pope J.E., Wells A.: Avoiding randomization failure in program evaluation, with Application to the medicare health support program. Popul. Health Manag. 14 (S1), S-11–S-22 (2011b). doi: 10.1089/pop.2010.0074
  19. King, G., Zeng, L.: When can history be our guide? The pitfalls of counterfactual inference. Int. Stud. Q. 51(1), 183–210 (2007). doi: 10.1111/j.1468-2478.2007.00445.x CrossRefGoogle Scholar
  20. Larzelere, R.E., Kuhn, B.R., Johnson, B.: The intervention selection bias: an underrecognized confound in intervention research. Psychol. Bull. 130(2), 289–303 (2004). doi: 10.1037/0033-2909.130.2.289 CrossRefPubMedGoogle Scholar
  21. Mattke, S., Liu, H., Caloyeras, J.P., Huang, C.Y., Van Busum, K.R., Khodyakov D., Shier V.: Workplace wellness programs study. Congressional report. Health & Human Services. (2013)
  22. McCullagh, P., Nelder, J.A.: Generalized linear models. Chapman and Hall, London (1989)CrossRefGoogle Scholar
  23. Merrill, R.M., Aldana, S.G., Pope, J.E., Anderson, D.R., Coberley, C.R., Grossmeier, J.J., Whitmer, R.W., HERO Research Study Subcommittee: Self-rated job performance and absenteeism according to employee engagement, health behaviors, and physical health. J. Occup. Environ. Med. 55(1), 10–18 (2013). doi: 10.1097/JOM.0b013e31827b73af CrossRefPubMedGoogle Scholar
  24. Merrill, R.M., Aldana, S.G., Pope, J.E., Anderson, D.R., Coberley, C.R., Whitmer, R.W., HERO Research Study Subcommittee: Presenteeism according to healthy behaviors, physical health, and work environment. Popul. Health Manag. 15(5), 293–301 (2012). doi: 10.1089/pop.2012.0003 CrossRefPubMedGoogle Scholar
  25. Pawa, D., Firestone, R., Ratchasi, S., Dowling, O., Jittakoat, Y., Duke, A., Mundy, G.: Reducing HIV risk among transgender women in Thailand: a quasi-experimental evaluation of the sisters program. PLoS One 8(10), e77113 (2013). doi: 10.1371/journal.pone.0077113 CrossRefPubMedCentralPubMedGoogle Scholar
  26. Perkins, S.M., Tu, W., Underhill, M.G., Zhou, X.H., Murray, M.D.: The use of propensity scores in pharmacoepidemiologic research. Pharmacoepidemiol. Drug Saf. 9(2), 93–1001 (2000)CrossRefPubMedGoogle Scholar
  27. Plesca, M., Smith, J.: Evaluating multi-treatment programs: theory and evidence from the U.S. job training partnership act experiment. In: Dustmann, C., Fitzenberger, B., Machin, S. (eds) The economics of education and training, pp. 293–330. Heidelberg: Physica-Verlag HD. (2008)
  28. Prochaska, J.O., Evers, K.E., Johnson, J.L., Castle, P.H., Prochaska, J.M., Sears, L.E., Rula, E.Y., Pope, J.E.: The well-being assessment for productivity: a well-being approach to presenteeism. J. Occup. Environ. Med. 53(7), 735 (2011)CrossRefPubMedGoogle Scholar
  29. Prochaska, J.O., Evers, K.E., Castle, P.H., Johnson, J.L., Prochaska, J.M., Rula, E.Y., Coberley, C., Pope, J.E.: Enhancing multiple domains of well-being by decreasing multiple health risk behaviors: a randomized clinical trial. Popul. Health Manag. (2012). doi: 10.1089/pop.2011.0060 PubMedGoogle Scholar
  30. Prochaska, J.J., Velicer, W.F., Nigg, C.R., Prochaska, J.O.: Methods of quantifying change in multiple risk factor interventions. Prev. Med. 46(3), 260–265 (2008). doi: 10.1016/j.ypmed.2007.07.035 CrossRefPubMedCentralPubMedGoogle Scholar
  31. Rassen, J.A., Shelat, A.A., Franklin, J.M., Glynn, R.J., Solomon, D.H., Schneeweiss, S.: Matching by propensity score in cohort studies with three treatment groups. Epidemiology (Cambridge, Mass.) 24(3), 401–409 (2013). doi: 10.1097/EDE.0b013e318289dedf CrossRefGoogle Scholar
  32. Rassen, J.A., Solomon, D.H., Glynn, R.J., Schneeweiss, S.: Simultaneously assessing intended and unintended treatment effects of multiple treatment options: a pragmatic ‘matrix design’: ‘matrix design’ for comparative effectiveness research. Pharmacoepidemiol. Drug Saf. 20(7), 675–683 (2011). doi: 10.1002/pds.2121 CrossRefPubMedGoogle Scholar
  33. Rubin, D.B.: The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat. Med. 26(1), 20–36 (2007). doi: 10.1002/sim.2739 CrossRefPubMedGoogle Scholar
  34. Rubin, D.B.: On the limitations of comparative effectiveness research. Stat. Med. 29(19), 1991–1995 (2010). doi: 10.1002/sim.3960 CrossRefPubMedGoogle Scholar
  35. Schafer, J.L.: Analysis of incomplete multivariate data. Chapman & Hall, London (1997)CrossRefGoogle Scholar
  36. Sears, L.E., Shi, Y., Coberley, R.C., Pope, J.E.: Overall well-being as a predictor of health care, productivity, and retention outcomes in a large employer. Popul. Health Manag. (2013). doi: 10.1089/pop.2012.0114 Google Scholar
  37. Sears, L.E., Agrawal, S., Sidney, J.A., Castle, P.H., Rula, E.Y., Coberley, C.R., Witters, D., Pope, J.E., Harter, J.K.: The well-being 5: development and validation of a diagnostic instrument to improve population well-being. Popul. Health Manag. 17, 357–365 (2014)Google Scholar
  38. Shepardson, L.B., Youngner, S.J., Speroff, T., Rosenthal, G.E.: Increased risk of death in patients with do-not-resuscitate orders. Med. Care 37(8), 727–737 (1999)CrossRefPubMedGoogle Scholar
  39. Shi, Y., Lindsay, E., Coberley, C.R., Pope, J.E.: The association between modifiable well-being risks and productivity: a longitudinal study in pooled employer sample. J. Occup. Environ. Med. 55(4), 353–364 (2013). doi: 10.1097/JOM.0b013e3182851923 CrossRefPubMedGoogle Scholar
  40. Shi, Y., Sears, L.E., Coberley, C.R., Pope, J.E.: Classification of individual well-being scores for the determination of adverse health and productivity outcomes in employee populations. Popul. Health Manag. (2012). doi: 10.1089/pop.2012.0039 PubMedGoogle Scholar
  41. Stuart, E.A., Rubin, D.B.: Matching with multiple control groups with adjustment for group differences. J Educ. Behav. Stat. 33(3), 279–306 (2007). doi: 10.3102/1076998607306078 CrossRefGoogle Scholar
  42. Wang, Y., Cai, H., Li, C., Jiang, Z., Wang, L., Song, J., Xia, J.: Optimal caliper width for propensity score matching of three treatment groups: a monte carlo study. PLoS One 8(12), e81045 (2013). doi: 10.1371/journal.pone.0081045 CrossRefPubMedCentralPubMedGoogle Scholar
  43. Wells, A.R., Hamar, B., Bradley, C., Gandy, W.M., Harrison, P.L., Sidney, J.A., Coberley, C.R., Rula, E.Y., Pope, J.E.: Exploring robust methods for evaluating treatment and comparison groups in chronic care management programs. Popul. Health Manag. (2012a). doi: 10.1089/pop.2011.0104 Google Scholar
  44. Wells, A.R., Hamar, B., Bradley, C., Gandy, W.M., Harrison, P.L., Sidney, J.A., Coberley, C.R., Rula, E.Y., Pope, J.E.: Exploring robust methods for evaluating treatment and comparison groups in chronic care management programs. Health Manag, Popul (2012b). doi: 10.1089/pop.2011.0104 Google Scholar
  45. Yu, C., Legg, J., Liu, B.: Estimating multiple treatment effects using two-phase semiparametric regression estimators. Electron. J. Stat. 7, 2737–2761 (2013). doi: 10.1214/13-EJS856 CrossRefGoogle Scholar
  46. Yuan, Y.C.: Multiple imputation for missing values: concepts and new development—revised 2009. SAS Institute Inc. (2009)

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • J. A. Sidney
    • 1
  • C. Coberley
    • 1
  • J. E. Pope
    • 1
  • A. Wells
    • 1
  1. 1.Center for Health ResearchHealthways, Inc.FranklinUSA

Personalised recommendations