Skip to main content

Matching and Imputation Methods for Risk Adjustment in the Health Insurance Marketplaces


New state-level health insurance markets, denoted as Marketplaces, created under the Affordable Care Act, use risk-adjusted plan payment formulas derived from a population ineligible to participate in the Marketplaces. We develop methodology to derive a sample from the target population and to assemble information to generate improved risk-adjusted payment formulas using data from the Medical Expenditure Panel Survey and Truven MarketScan databases. Our approach requires multi-stage data selection and imputation procedures because both data sources have systemic missing data on crucial variables and arise from different populations. We present matching and imputation methods adapted to this setting. The long-term goal is to improve risk adjustment estimation utilizing information found in Truven MarketScan data supplemented with imputed Medical Expenditure Panel Survey values.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Adamson DM, Chang S, Hansen LG (2008) Health research data for the real world: the marketscan databases. Thompson Healthcare, New York

    Google Scholar 

  2. Austin PC, Mamdani MM (2006) A comparison of propensity score methods: a case study estimating the effectiveness of post-AMI statin use. Stat Med 25:2084–2106

    Article  MathSciNet  Google Scholar 

  3. Cole SR, Stuart EA (2010) Generalizing evidence from randomized clinical trials to target populations the actg 320 trial. Am J Epidemiol 172(1):107–115

    Article  Google Scholar 

  4. DuGoff E, Schuler M, Stuart E (2014) Generalizing observational study results: applying propensity score methods to complex surveys. Health Serv Res 49(1):284–303

    Article  Google Scholar 

  5. Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Book  Google Scholar 

  6. Gu X, Rosenbaum PR (1993) Comparison of multivariate matching methods: structures, distances, and algorithms. J Comput Graph Stat 2:405–420

    Google Scholar 

  7. Kautter J, Pope GC, Ingber M, Freeman S, Patterson L, Cohen M, Keenan P (2014) The HHS-HCC risk adjustment model for individual and small group markets under the Affordable Care Act. Medicare Med Res Rev 4(3):E1–E46

  8. King G, Honaker J, Joseph A, Scheve K (2001) Analyzing incomplete political science data: an alternative algorithm for multiple imputation. In American Political Science Association, vol 95. Cambridge University Press, Cambridge, pp 49–69

  9. Lee B, Lessler J, Stuart EA (2009) Improving propensity score weighting using machine learning. Stat Med 29:337–346

    MathSciNet  Google Scholar 

  10. Little RJ, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    Book  MATH  Google Scholar 

  11. McGuire TG, Newhouse JP, Normand SL, Shi J, Zuvekas S (2014) Assessing incentives for service-level selection in private health insurance exchanges. J Health Econ 35:47–63

    Article  Google Scholar 

  12. Meng XL (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci: 9(4):538–558

  13. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodol 27(1):85–96

    Google Scholar 

  14. Rassen JA, Shelat AA, Myers J, Glynn RJ, Rothman KJ, Schneeweiss S (2012) One-to-many propensity score matching in cohort studies. Pharmacoepidemiol Drug Safety 21(S2):69–80

    Article  Google Scholar 

  15. Resche-Rigon M, White IR, Bartlett JW, Peters SA, Thompson SG (2013) Multiple imputation for handling systematically missing confounders in meta-analysis of individual participant data. Stat Med 32(28):4890–4905

    Article  MathSciNet  Google Scholar 

  16. Rose S (2013) Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol 177(5):443–452

    Article  Google Scholar 

  17. Rose S, van der Laan M (2008) Simple optimal weighting of cases and controls in case-control studies. Int J Biostat 4(1): Article 19

  18. Rose S, van der Laan M (2009) Why match? Investigating matched case-control study designs with causal effect estimation. Int J Biostat 5(1):Article 1

  19. Rose S, van der Laan, M (2011) A targeted maximum likelihood estimator for two-stage designs. Int J Biostat 7(1): Article 17

  20. Rose S, van der Laan M (2014) A double robust approach to causal effects in case-control studies. Am J Epidemiol 179(6):663–669

    Article  Google Scholar 

  21. Rose S, van der Laan M (2014) Rose and van der laan respond to “Some advantages of RERI”. Am J Epidemiol 179(6):672–673

    Article  Google Scholar 

  22. Rosenbaum PR (2002) Observational Studies, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  23. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55

    Article  MathSciNet  MATH  Google Scholar 

  24. Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79:516–524

    Article  Google Scholar 

  25. Rosenbaum PR, Rubin DB (1985) The bias due to incomplete matching. Biometrics 103–116

  26. Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91(434):473–489

    Article  MATH  Google Scholar 

  27. Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley, New York

    MATH  Google Scholar 

  28. Rubin DB, Schenker N (1991) Multiple imputation in health-are databases: an overview and some applications. Stat Med 10(4):585–598

    Article  Google Scholar 

  29. Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147

    Article  Google Scholar 

  30. Stuart EA, Azur M, Frangakis C, Leaf P (2009) Multiple imputation with large data sets: a case study of the children’s mental health initiative. Am J Epidemiol 169(9):1133–1139

    Article  Google Scholar 

  31. van der Laan MJ, Rose S (2011) Targeted learning: causal inference for observational and experimental data. Springer, Berlin

    Book  Google Scholar 

  32. van Kleef RC, Van Vliet RC, Van de Ven WP (2013) Risk equalization in the netherlands: an empirical evaluation. Expert Rev Pharmacoecon Outcomes Res 13(6):829–839

Download references


The authors acknowledge support from NIH/NIMH 2R01MH094290.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sherri Rose.

Ethics declarations

Conflict of Interest Disclosure Statement

The authors have no conflicts of interest to declare.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rose, S., Shi, J., McGuire, T.G. et al. Matching and Imputation Methods for Risk Adjustment in the Health Insurance Marketplaces. Stat Biosci 9, 525–542 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Matching
  • Imputation
  • Prediction
  • Risk adjustment