Statistics in Biosciences

, Volume 9, Issue 2, pp 525–542 | Cite as

Matching and Imputation Methods for Risk Adjustment in the Health Insurance Marketplaces

  • Sherri Rose
  • Julie Shi
  • Thomas G. McGuire
  • Sharon-Lise T. Normand


New state-level health insurance markets, denoted as Marketplaces, created under the Affordable Care Act, use risk-adjusted plan payment formulas derived from a population ineligible to participate in the Marketplaces. We develop methodology to derive a sample from the target population and to assemble information to generate improved risk-adjusted payment formulas using data from the Medical Expenditure Panel Survey and Truven MarketScan databases. Our approach requires multi-stage data selection and imputation procedures because both data sources have systemic missing data on crucial variables and arise from different populations. We present matching and imputation methods adapted to this setting. The long-term goal is to improve risk adjustment estimation utilizing information found in Truven MarketScan data supplemented with imputed Medical Expenditure Panel Survey values.


Matching Imputation Prediction Risk adjustment 



The authors acknowledge support from NIH/NIMH 2R01MH094290.

Compliance with Ethical Standards

Conflict of Interest Disclosure Statement

The authors have no conflicts of interest to declare.


  1. 1.
    Adamson DM, Chang S, Hansen LG (2008) Health research data for the real world: the marketscan databases. Thompson Healthcare, New YorkGoogle Scholar
  2. 2.
    Austin PC, Mamdani MM (2006) A comparison of propensity score methods: a case study estimating the effectiveness of post-AMI statin use. Stat Med 25:2084–2106MathSciNetCrossRefGoogle Scholar
  3. 3.
    Cole SR, Stuart EA (2010) Generalizing evidence from randomized clinical trials to target populations the actg 320 trial. Am J Epidemiol 172(1):107–115CrossRefGoogle Scholar
  4. 4.
    DuGoff E, Schuler M, Stuart E (2014) Generalizing observational study results: applying propensity score methods to complex surveys. Health Serv Res 49(1):284–303CrossRefGoogle Scholar
  5. 5.
    Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  6. 6.
    Gu X, Rosenbaum PR (1993) Comparison of multivariate matching methods: structures, distances, and algorithms. J Comput Graph Stat 2:405–420Google Scholar
  7. 7.
    Kautter J, Pope GC, Ingber M, Freeman S, Patterson L, Cohen M, Keenan P (2014) The HHS-HCC risk adjustment model for individual and small group markets under the Affordable Care Act. Medicare Med Res Rev 4(3):E1–E46Google Scholar
  8. 8.
    King G, Honaker J, Joseph A, Scheve K (2001) Analyzing incomplete political science data: an alternative algorithm for multiple imputation. In American Political Science Association, vol 95. Cambridge University Press, Cambridge, pp 49–69Google Scholar
  9. 9.
    Lee B, Lessler J, Stuart EA (2009) Improving propensity score weighting using machine learning. Stat Med 29:337–346MathSciNetGoogle Scholar
  10. 10.
    Little RJ, Rubin DB (2002) Statistical analysis with missing data. Wiley, New YorkCrossRefMATHGoogle Scholar
  11. 11.
    McGuire TG, Newhouse JP, Normand SL, Shi J, Zuvekas S (2014) Assessing incentives for service-level selection in private health insurance exchanges. J Health Econ 35:47–63CrossRefGoogle Scholar
  12. 12.
    Meng XL (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci: 9(4):538–558Google Scholar
  13. 13.
    Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodol 27(1):85–96Google Scholar
  14. 14.
    Rassen JA, Shelat AA, Myers J, Glynn RJ, Rothman KJ, Schneeweiss S (2012) One-to-many propensity score matching in cohort studies. Pharmacoepidemiol Drug Safety 21(S2):69–80CrossRefGoogle Scholar
  15. 15.
    Resche-Rigon M, White IR, Bartlett JW, Peters SA, Thompson SG (2013) Multiple imputation for handling systematically missing confounders in meta-analysis of individual participant data. Stat Med 32(28):4890–4905MathSciNetCrossRefGoogle Scholar
  16. 16.
    Rose S (2013) Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol 177(5):443–452CrossRefGoogle Scholar
  17. 17.
    Rose S, van der Laan M (2008) Simple optimal weighting of cases and controls in case-control studies. Int J Biostat 4(1): Article 19Google Scholar
  18. 18.
    Rose S, van der Laan M (2009) Why match? Investigating matched case-control study designs with causal effect estimation. Int J Biostat 5(1):Article 1Google Scholar
  19. 19.
    Rose S, van der Laan, M (2011) A targeted maximum likelihood estimator for two-stage designs. Int J Biostat 7(1): Article 17Google Scholar
  20. 20.
    Rose S, van der Laan M (2014) A double robust approach to causal effects in case-control studies. Am J Epidemiol 179(6):663–669CrossRefGoogle Scholar
  21. 21.
    Rose S, van der Laan M (2014) Rose and van der laan respond to “Some advantages of RERI”. Am J Epidemiol 179(6):672–673CrossRefGoogle Scholar
  22. 22.
    Rosenbaum PR (2002) Observational Studies, 2nd edn. Springer, New YorkCrossRefMATHGoogle Scholar
  23. 23.
    Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79:516–524CrossRefGoogle Scholar
  25. 25.
    Rosenbaum PR, Rubin DB (1985) The bias due to incomplete matching. Biometrics 103–116Google Scholar
  26. 26.
    Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91(434):473–489CrossRefMATHGoogle Scholar
  27. 27.
    Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley, New YorkMATHGoogle Scholar
  28. 28.
    Rubin DB, Schenker N (1991) Multiple imputation in health-are databases: an overview and some applications. Stat Med 10(4):585–598CrossRefGoogle Scholar
  29. 29.
    Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147CrossRefGoogle Scholar
  30. 30.
    Stuart EA, Azur M, Frangakis C, Leaf P (2009) Multiple imputation with large data sets: a case study of the children’s mental health initiative. Am J Epidemiol 169(9):1133–1139CrossRefGoogle Scholar
  31. 31.
    van der Laan MJ, Rose S (2011) Targeted learning: causal inference for observational and experimental data. Springer, BerlinCrossRefGoogle Scholar
  32. 32.
    van Kleef RC, Van Vliet RC, Van de Ven WP (2013) Risk equalization in the netherlands: an empirical evaluation. Expert Rev Pharmacoecon Outcomes Res 13(6):829–839Google Scholar

Copyright information

© International Chinese Statistical Association 2015

Authors and Affiliations

  • Sherri Rose
    • 1
  • Julie Shi
    • 2
  • Thomas G. McGuire
    • 1
  • Sharon-Lise T. Normand
    • 1
    • 3
  1. 1.Department of Health Care PolicyHarvard Medical SchoolBostonUSA
  2. 2.School of EconomicsPeking University, Haidian DistrictBeijingChina
  3. 3.Department of BiostatisticsHarvard School of Public HealthBostonUSA

Personalised recommendations