Skip to main content

Improving the prediction model used in risk equalization: cost and diagnostic information from multiple prior years


Currently-used risk-equalization models do not adequately compensate insurers for predictable differences in individuals’ health care expenses. Consequently, insurers face incentives for risk rating and risk selection, both of which jeopardize affordability of coverage, accessibility to health care, and quality of care. This study explores to what extent the predictive performance of the prediction model used in risk equalization can be improved by using additional administrative information on costs and diagnoses from three prior years. We analyze data from 13.8 million individuals in the Netherlands in the period 2006–2009. First, we show that there is potential for improving models’ predictive performance at both the population and subgroup level by extending them with risk adjusters based on cost and/or diagnostic information from multiple prior years. Second, we show that even these extended models do not adequately compensate insurers. By using these extended models incentives for risk rating and risk selection can be reduced substantially but not removed completely. The extent to which risk-equalization models can be improved in practice may differ across countries, depending on the availability of data, the method chosen to calculate risk-adjusted payments, the value judgment by the regulator about risk factors for which the model should and should not compensate insurers, and the trade-off between risk selection and efficiency.

This is a preview of subscription content, access via your institution.


  1. Individuals who did not have continuous enrolment over the study period were excluded. Inclusion of deceased individuals is not useful for prediction purposes, but the exclusion of newborns may have moderately affected the generalizability of our results for the Dutch population.

  2. This weight is corrected for duplicate records in the dataset. Duplicate records were generated when merging the administrative data of 4 years due to switching behavior of individuals in prior years. Records of individuals who did not switch in year t, but who switched in 1 or more of the 3 years prior were copied (duplicates) when merging the administrative data of 4 years. These duplicate records were weighted by a value of 0.5 in the estimation of the model. There were no individuals who switched insurer more than once during 1 year (which would mean that more than two records would be generated during the merging process).

  3. “Statistics Netherlands” (“Centraal Bureau voor Statistiek”) is an autonomous Dutch agency that collects and analyzes data.

  4. The administrative data is merged with the health survey data on the individual level according to Dutch privacy protection laws and regulations.

  5. To examine to what extent percentiles of prior expenses and prior expenses continuous are ‘substitutes’, two other models were estimated; one model did not include percentiles for prior expenses and the other did not include continuous variables for prior expenses. These two models yielded adjusted R 2-values of 35.34 and 31.33 %, respectively. The adjusted R 2-value of model 6 is 35.98 %. These results indicate that continuous variables for expenses and dummy variables for percentiles of expenses both independently contribute to the predictive power of the model. Therefore, both types of variables were included in model 6.

  6. The described procedure is programmed in statistical software package SAS version 9.2.

  7. Table 3 presents descriptive statistics of the training and validation-sample. Descriptive statistics of the total sample are not presented here but can be provided on request (contact the first author).

  8. Based on an empirical analysis of Dutch administrative data from 2007, under-predictions varying from 300 Euro up to 1,400 Euro can be expected on subgroups with a relatively large proportion of institutionalized individuals [39].


  1. Adams, E.K., Bronstein, J.M., Raskind-Hood, C.: Adjusted clinical groups: predictive accuracy for medicaid enrollees in three states. Health Care Financ. Rev. 24, 43–61 (2002)

    PubMed Central  PubMed  Google Scholar 

  2. Ash, A., Porell, F., Gruenberg, L., Sawitz, E., Beiser, A.: Adjusting medicare capitation payments using prior hospitalization data. Health Care Financ. Rev. 10, 17–29 (1989)

    PubMed Central  CAS  PubMed  Google Scholar 

  3. Babyak, M.A.: What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom. Med. 66, 411–421 (2004)

    PubMed  Google Scholar 

  4. Barry, C.L., Weiner, J.P., Lemke, K., Busch, S.H.: Risk adjustment in health insurance exchanges for individuals with mental illness. Am. J. Psychiatry 169, 704–709 (2012)

    Article  PubMed  Google Scholar 

  5. Basu, A., Manning, W.G.: Issues for the next generation of health care costs analyses. Med. Care 47, S109–S114 (2009)

    Article  PubMed  Google Scholar 

  6. Behrend, C., Buchner, F., Happich, M., Holle, R., Reitmeir, P., Wasem, J.: Risk-adjusted capitation payments: how well do principal inpatient diagnosis-based models work in the German situation? Results from a large data set. Eur. J. Health Econ. 8, 31–39 (2007)

    Article  PubMed  Google Scholar 

  7. Buntin, M.B., Zaslavsky, A.M.: Too much abo about two-part models and transformation? Comparing methods of modeling medicare expenditures. J. Health Econ. 23, 525–542 (2004)

    Article  PubMed  Google Scholar 

  8. Cumming, R.B., Knutson, D., Cameron, B.A., Derrick, B.: Claims-Based Methods of Health Risk Assessment for Commercial Populations. Research Report, Society of Actuaries, Milliman, Minneapolis (2002)

    Google Scholar 

  9. DeSlavo, K.B., Jones, T.M., Peabody, J., McDonald, J., Fihn, S., Fan, V., He, J., Muntner, P.: Health care expenditure prediction with a single item, self-rated health measure. Med. Care 47(4), 440–447 (2009)

    Article  Google Scholar 

  10. Duan, N., Manning, W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand for medical care. J. Bus. Econ. Stat. 1(2), 115–126 (1983)

    Google Scholar 

  11. Dunn, G., Mirandola, M., Amaddeo, F., Tansella, M.: Describing, explaining or predicting mental health care costs: a guide to regression models: methodological review. Brit. J. Psychiat. 183, 398–404 (2003)

    Article  PubMed  Google Scholar 

  12. Fishman, P.A., Goodman, M.J., Hornbook, M.C., Meenan, R.T., Bachman, D.J., O’Keeffe Rosetti, M.C.: Risk adjustment using automated ambulatory pharmacy data: the RxRisk model. Med. Care 41(1), 84–99 (2003)

    Article  PubMed  Google Scholar 

  13. Fleishman, J.A., Cohen, J.W., Manning, W.G., Kosinsk, M.: Using the SF-12 health status measure to improve predictions of medical expenditures. Med. Care 44, 54–63 (2006)

    Article  Google Scholar 

  14. Fox, J.: Applied Regression Analysis and Generalized Linear Models. Sage, Thousand Oaks (2008)

    Google Scholar 

  15. Garber, A.M., Macurdy, T.E., McClellan, M.B.: Persistence of medicare expenditures among elderly beneficiaries. In: Garber, A.M. (ed.) Frontiers in Health Policy, MIT Press, Cambridge, MA, pp. 153–180 (1998)

  16. Gilmer, T., Kronick, R., Fishman, P., Ganiats, T.G.: The medicaid R x model: pharmacy-based risk adjustment for public programs. Med. Care 39(11), 1188–1202 (2001)

    Article  CAS  PubMed  Google Scholar 

  17. Hughes, J.S., Averill, R.F., Eisenhandler, J., Goldfield, N.I., Muldoon, J., Neff, J.M., Gay, J.C.: Clinical Risk Groups (CRGs): a classification system for risk-adjusted capitation-based payment and health care management. Med. Care 42(1), 81–90 (2004)

    Article  PubMed  Google Scholar 

  18. Jones, A.M.: Models for health care. Working paper, University of York. (2010). Accessed 13 June 2013

  19. Kronick, R., Gilmer, T., Dreyfus, T., Lee, L.: Improving health-based payment for medicaid beneficiaries: CDPS. Health Care Financ. Rev. 21(3), 29–64 (2000)

    PubMed Central  CAS  PubMed  Google Scholar 

  20. Lamers, L.M., van Vliet, R.C.J.A.: Multiyear diagnostic information from prior hospitalizations as a risk-adjuster for capitation payments. Med. Care 34, 549–561 (1996)

    Article  CAS  PubMed  Google Scholar 

  21. Lamers, L. M.: Capitation payments to competing Dutch sickness funds based on diagnostic information from prior hospitalizations. Ph.D. Dissertation, Erasmus University Rotterdam, Rotterdam (1997)

  22. Lamers, L.M.: Health-based risk adjustment: is inpatient and outpatient diagnostic information sufficient? Inquiry 38(4), 423–431 (2001)

    PubMed  Google Scholar 

  23. Lamers, L.M., van Vliet, R.C.J.A.: Health-based risk adjustment improving the pharmacy-based cost group model to reduce gaming possibilities. Euro. J. Health Econ. 4, 107–114 (2003)

    Article  CAS  Google Scholar 

  24. Lamers, L.M., van Vliet, R.C.J.A.: The pharmacy-based cost group model: validating and adjusting the classification of medications for chronic conditions to the Dutch situation. Health Policy 68, 113–121 (2004)

    Article  PubMed  Google Scholar 

  25. Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20, 461–494 (2001)

    Article  CAS  PubMed  Google Scholar 

  26. Manning, W.G., Busa, A., Mullahy, J.: Generalized modelling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24, 465–488 (2005)

    Article  PubMed  Google Scholar 

  27. Monheit, A.C.: Persistence in health expenditures in the short run: prevalence and consequences. Med. Care 41(7), 53–64 (2003)

    Article  Google Scholar 

  28. McIntyre, S.H., Montgomery, D.B., Srinivasan, V., Weitz, B.A.: Evaluating the statistical significance of models developed by stepwise regression. J. Mark. Res. 20, 1–11 (1983)

    Article  Google Scholar 

  29. Mihaylova, B., Briggs, A., O’Hagan, A., Thompson, S.G.: Review of statistical methods for analysing healthcare resources and costs. Health Econ. 20, 879–916 (2011)

    Article  Google Scholar 

  30. Newhouse, J.P.: Reimbursing health plans and health providers: efficiency in production versus selection. J. Econ. Lit. XXXIV, 1236–1263 (1996)

    Google Scholar 

  31. Pindyck, R.S., Rubinfeld, D.L.: Econometric models and economic forecasts. McGraw-Hill, New York City (1998)

  32. Pope, G.C., Ellis, R.P., Ash, A.S., Liu, C.F., Ayanian, J.Z., Bates, D.W., Burstin, H., Iezzoni, L.I., Ingber, M.J.: Principal inpatient diagnostic cost group model for medicare risk adjustment. Health Care Financ. Rev. 21(3), 93–118 (2000)

    PubMed Central  CAS  PubMed  Google Scholar 

  33. Pope, G.C., Kautter, J., Ellis, R.P., Ash, A.S., Ayanian, J.Z., Iezzoni, L.I., Ingber, M.J., Levy, J.M., Robs, J.: Risk adjustment of medicare capitation payments using the CMS-HCC model. Health Care Financ. Rev. 25(4), 119–141 (2004)

    PubMed Central  PubMed  Google Scholar 

  34. Powers, C.A., Meyer, C.M., Roebuck, M.C., Vaziri, B.: Predictive modeling of total healthcare costs using pharmacy claims data. A comparison of alternative econometric cost modeling techniques. Med. Care 43, 1065–1072 (2005)

    Article  PubMed  Google Scholar 

  35. Prinsze, F.J., van Vliet, R.C.J.A.: Health-based risk adjustment: improving the pharmacy-based cost group model by adding diagnostic cost groups. Inquiry 44(4), 469–480 (2007)

    PubMed  Google Scholar 

  36. Schokkaert, E., van de Voorde, C.: Risk selection and the specification of the conventional risk adjustment formula. Eur. J. Health Econ. 23, 1237–1259 (2004)

    Article  Google Scholar 

  37. Schokkaert, E., van de Voorde, C.: Incentives for risk selection and omitted variables in the risk adjustment formula. Ann. Econ. Stat. 83(84), 327–351 (2006)

    Google Scholar 

  38. Schokkaert, E., van de Voorde, C.: Direct versus indirect standardization in risk adjustment. J. Health Econ. 28, 361–374 (2009)

    Article  PubMed  Google Scholar 

  39. Schut, F.T., van de Ven, W.P.M.M.: Uitvoering AWBZ door zorgverzekeraars onverstandig. ESB 95(4591), 486–489 (2010)

    Google Scholar 

  40. Stam, P.J.A.: Testing the effectiveness of risk equalization models in health insurance. Ph.D. Dissertation, Erasmus University Rotterdam, Rotterdam (2007)

  41. Stam, P.J.A., van de Ven, W.P.M.M.: Risicoverevening in de zorgverzekering: Een evaluatie en oplossingsrichtingen voor verbetering. Research Report, iBMG, Erasmus University Rotterdam, Rotterdam (2006)

  42. Stam, P.J.A., van de Ven, W.P.M.M.: De harde kern in risicoverevening. ESB. February, 104-7 (2008)

  43. Stam, P.J.A., van Vliet, R.C.J.A., van de Ven, W.P.M.M.: Diagnostic, pharmacy-based, and self-reported health measures in risk equalization models. Med. Care 48, 448–457 (2010)

    Article  PubMed  Google Scholar 

  44. Thompson, M.L.: Selection of variables in multiple regression: part 1. a review and evaluation. Int. Stat. Rev. 46, 1–19 (1978)

    Article  Google Scholar 

  45. Veazie, P.J., Manning, W.G., Kane, R.L.: Improving risk adjustment for medicare capitated reimbursement using nonlinear models. Med. Care 41, 741–752 (2003)

    PubMed  Google Scholar 

  46. van Kleef, R.C., van Vliet, R.C.J.A.: Prior use of durable medical equipment as a risk adjuster for health-based capitation. Inquiry 47, 1–16 (2010)

    Google Scholar 

  47. van Kleef, R.C., van Vliet, R.C.J.A.: Improving risk equalization using multi-year high cost as a health indicator. Med. Care 50, 140–144 (2012)

    Article  PubMed  Google Scholar 

  48. van Kleef, R.C., van Vliet, R.C.J.A., van de Ven, W.P.M.M.: Risicoverevening tussen zorgverzekeraars: Kwantificering modelverbeteringen 1993–2011. TSG 90, 312–326 (2012)

    Article  Google Scholar 

  49. van Kleef, R.C, van Vliet, R.C.J.A., van de Ven, W.P.M.M.: Risicoverevening 2012. Een analyse van voorspelbare winsten en verliezen op subgroep niveau. Research report, iBMG, Erasmus University Rotterdam, Rotterdam (2012)

  50. van Kleef, R.C., van Vliet, R.C.J.A., van de Ven, W.P.M.M.: Diagnosis-based cost groups in risk adjustment: The effects of including outpatient diagnoses. Research report, iBMG, Erasmus University Rotterdam, Rotterdam (2012)

  51. van de Ven, W.P.M.M., Ellis, R.P.: Risk adjustment in competitive health plan markets. In: Cutler, A., Newhouse, J.P. (eds) Handbook of health economics, pp. 755-845. Elsevier Science B.V., Amsterdam (2000)

  52. van de Ven, W.P.M.M., Schut, F.T.: Guaranteed access to affordable coverage in individual health insurance markets. In: Glied, S., Smith, P. (eds.) The Oxford Handbook of Health Economics, pp. 380–404. Oxford University Press, Oxford (2011)

    Google Scholar 

  53. van Vliet, R.C.J.A., van de Ven, W.P.M.M.: Towards a capitation formula for competing health insurers: an empirical analysis. Soc. Sci. Med. 34, 1035–1048 (1992)

    Article  PubMed  Google Scholar 

  54. van Vliet, R.C.J.A., van de Ven, W.P.M.M.: Capitation payments based on prior hospitalizations. Health Econ. 2, 177–188 (1993)

    Article  PubMed  Google Scholar 

  55. Ware Jr, J.E., Kosinski, M., Keller, S.D.: A 12-item short-form health survey: construction of scales and preliminary tests of reliability and validity. Med. Care 34, 220–233 (1996)

    Article  PubMed  Google Scholar 

Download references


The authors gratefully acknowledge the Dutch Ministry of Health, Welfare and Sport and the national association of Dutch health insurers (“Zorgverzekeraars Nederland”) for their permission to use administrative data for this study. In addition, we gratefully thank “Statistics Netherlands” (“Centraal Bureau voor Statistiek”) for providing access to the health survey data. For their helpful comments on an earlier draft, we would gratefully thank the members of the Risk Adjustment Network and the two anonymous referees. The opinions in this article are those of the authors and do not necessarily reflect those of the above-mentioned organisations and individuals.

Author information

Authors and Affiliations


Corresponding author

Correspondence to S. H. C. M. van Veen.


Appendix 1

See Table 7.

Table 7 Definition of risk adjusters included in estimated RE-models

Appendix 2

See Table 8.

Table 8 Description of all subgroups based on more than one question and/or more answer categories of the health survey

Appendix 3

See Table 9.

Table 9 Subgroups for which the mean prediction error in year t was already not statistically significantly different from zero for model 1, 2, 3, or 4. In this study, the prediction year t is 2009. The column of total expenses presents the corrected total expenses. Total expenses and predicted expenses in the sample with health survey information were corrected in such a way that the average MPE on the total survey sample is zero. This was done to test the statistical significance of the MPEs from zero. By doing so, the column with total expenses in year t minus the column with the MPEs of model 1 results into the same number for each group, namely total average expenses in year t (1,689 Euro)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

van Veen, S.H.C.M., van Kleef, R.C., van de Ven, W.P.M.M. et al. Improving the prediction model used in risk equalization: cost and diagnostic information from multiple prior years. Eur J Health Econ 16, 201–218 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Competitive health care schemes
  • Health insurance
  • Risk equalization
  • Predictive performance

JEL Classification

  • I13
  • I18