Skip to main content

Advertisement

Log in

Estimating transition probabilities between health states using US longitudinal survey data

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

We use data from two representative US household surveys, the Medical Expenditure Panel Survey (MEPS) and the Health and Retirement Study (RAND-HRS) to estimate transition probability matrices between health states over the lifecycle from age 20–95. We compare nonparametric counting methods and parametric methods where we control for individual characteristics as well as time and cohort effects. We align two year transition probabilities from HRS with one-year transition probabilities in MEPS using a stochastic root method assuming a Markov structure. We find that the nonparametric counting method and the regression specifications based on ordered logit models produce similar results over the lifecycle. However, the counting method overestimates the probabilities of transitioning into bad health states. In addition, we find that young women have worse health prospects than their male counterparts but once individuals get older, being female is associated with transitioning into better health states with higher probabilities than men. We do not find significant differences of the conditional health transition probabilities between African Americans and the rest of the population. We also find that the lifecycle patterns are stable over time. Finally, we discuss issues with controlling for time effects, sample attrition, the Markov assumption, and other modeling issues that can arise with categorical outcome variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. MEPS is a representative household survey of the US working population, while the HRS is a survey representative of the older population in the USA.

  2. Compare Catillon et al. (2018) and life tables from the CDC at: https://www.cdc.gov/nchs/nvss/life-expectancy.htm.

  3. A related approach by Dalgaard and Strulik (2014) has focused on modeling the dynamics of health as a health deficit accumulation process that eventually ends in death which can be measured with a frailty index. An introduction to the frailty index measure can be found in Rockwood and Mitnitski (2007).

  4. Cutler and Richardson (1997) and more specifically Grossman (2000) provide summaries of this empirical literature concerning health capital.

  5. Appendices A–I contain additional results including additional summary statistics, ordered probit models, multinomial logit, and probit models, transition probabilities based on samples from two different time periods, detailed lifecycle transition probabilities by gender, race, and race and time, as well as results from finite mixture and mixed processes models.

  6. Chowdhury et al. (2019) provides details about the MEPS survey designs.

  7. Sect. 5.8 contains a more detailed discussion about attrition bias issues in MEPS and HRS.

  8. The RAND-HRS is developed from the HRS and comprises a cross-wave file with variables derived consistently across waves. The RAND-HRS is maintained by the RAND Center of Aging. More information is available at: https://www.rand.org/well-being/social-and-behavioral-policy/centers/aging/dataprod/hrs-data.html.

  9. Fisher and Ryan (2017) provides a recently published summary of the Health and Retirement Study.

  10. A frequency distribution of the full sample is available in Figure A.1 and Table A.1 in Online Appendix A.

  11. OECD (2018), Inflation (CPI) (indicator). doi: 10.1787/eee82e6e-en (Accessed on 29 June 2018) at https://data.oecd.org/price/inflation-cpi.htm.

  12. It should be noted that the observed frequencies of the respective health categories are much more uneven in MEPS than in the HRS. For instance, only 391 individuals transition to death and 3, 172 into poor health states whereas much larger numbers of individuals transition into the other health states. Uneven counts of observations of different categories in the outcome variable can lead to convergence issues in multinomial models. We discuss some of these issues in Sects. 4.3 and 5.4.

  13. Similar results of very persistent health states have been found in other surveys such as the British Household Panel Survey as shown in Contoyannis et al. (2004).

  14. Increasing the polynomial order of age does not affect our results.

  15. It should be noted that \(t+1\) refers to the future period which in the econometric implementation could mean a one year ahead variable for MEPS data and a two-year ahead variable for HRS data.

  16. Other tests for IIA are based on Small and Hsiao (1985). Compare Long and Freese (2014) for further details on testing for IIA.

  17. See Long and Freese (2014) for details.

  18. The number of children in a household is sometimes added to the lifestyle equation as the number of children could affect smoking behavior but not one’s assessment of health.

  19. We will discuss methods to transform two-year transition probabilities into one-year frequencies probabilities in the next Section.

  20. Compare Contoyannis et al. (2004) for a discussion of using initial condition to control for individual effects in dynamic panel regressions.

  21. Life expectancy numbers are from CDC life tables for the year 2001 and 2016 retrieved in June 2021 from https://www.cdc.gov/nchs/nvss/life-expectancy.htm:

  22. The Python version of the algorithm is available on the author’s website at: https://juejung.github.io/research.htm.

  23. We skip the test for state dependency as it is pretty clear that consecutive health states are not independent from each other as shown by the highly significant coefficients of current health states in the marginal effects estimations of Tables 9 and 10.

  24. A joint hypothesis over all initial health types h can easily be implemented by summing up the individual \(\alpha _{n}\) over all health states with then follow a Chi-square distribution with \(6\left( 6-1\right) \left( T-1\right) \) degrees of freedom.

  25. The joint distribution over all health states h follows a Chi-square distribution with \(6\times \left( 6-1\right) ^{2}\) degrees of freedom.

  26. Tests for subcategories of individuals for whom we control in the parameterized version of the model are difficult to implement as some transitions between rare health states would not show up in the divided sample and the tests would have diminished statistical power.

  27. The age group of 50–60 year olds has good representation in both surveys as can be seen from Figure A.1 in Online Appendix A.

  28. In order to assess the robustness of our results based on the ordered logit model, we also report estimation results from an ordered probit model in Figures B.1–B.5 in Online Appendix B. The results are almost identical to the ordered logit model.

  29. Marginal effects estimates for both the MLM and MPM using HRS data are available in Online Appendix C. They are very similar to the marginal effects based on the ordered logit model from Sect. 5.1.

  30. For more detailed discussions of age, cohort, and time effects see Fernández-Villaverde and Krueger (2007) and Jung and Tran (2014).

  31. This does not contradict our earlier result that finds some significant differences in the early and late time period dummy variables as differential effects from periods of recessions are potentially driving the results.

  32. Adding additional interaction terms of gender with a higher order age polynomial does not change the resulting graphs in a statistically significant way.

  33. Online Appendix G presents the lifecycle profiles of the differences in the conditional transition probabilities as well as summary statistics by race across the two time periods.

  34. Baulch and Quisumbing (2010) contains detailed descriptions including Stata codes for these type of tests.

  35. See Heeringa and Connor (1995) and Ofstedal et al. (2011) for more detail about the HRS sample design and sample weights.

  36. Attrition on observables occurs when the dependent variable is independent of the attrition process conditional on the explanatory variables. Attrition on unobservables occurs when this conditional independence does not hold. A sample selection model can account for attrition on unobservables but requires an exclusion restriction for identification, that is, an instrumental variable that affects attrition only but not the dependent variable (Hausman and Wise 1979; Ridder 1992). Fitzgerald et al. (1998) point out that it is almost impossible to find plausible exclusion restrictions.

References

  • Aiyagari RS (1994) Uninsured idiosyncratic risk and aggregate saving. Q J Econ 109(3):659–684

    Article  Google Scholar 

  • Alderman H, Behrman JR, Kohler H-P, Maluccio JA, Watkins SC (2001) Attrition in longitudinal household survey sata. Demogr Res 5:79–124

    Article  Google Scholar 

  • Anderson TW, Goodman LA (1957) Statistical inference about markov chains. Ann Math Stat 28(1):89–110

    Article  Google Scholar 

  • Balia S (2014) Survival Expectations, Subjective Health and Smoking: Evidence from SHARE. Empir Econ 47(2):753–780

    Article  Google Scholar 

  • Balia S, Jones AM (2008) Mortality, lifestyle and socio-economic status. J Health Econ 27(1):1–26

    Article  Google Scholar 

  • Baulch B, Quisumbing A (2010) Testing and adjusting for attrition in household panel data. Toolkit Note, Chronic Poverty Research Centre, London, UK 1–12

  • Becketti S, Gould W, Lillard L, Welch F (1988) The panel study of income dynamics after fourteen years: an evaluation. J Law Econ 6(4):472–492

    Google Scholar 

  • Bewley T (1986) Stationary monetary equilibrium with a continuum of independently fluctuating consumers. In: Hildenbrand W, Mas-Colell A (eds) Contributions to mathematical economics in Honor of Gerard Debreu. North-Holland

  • Billingsley P (1961) Statistical inference for markov processes, vol 7. University of Chicago Press, Chicago

    Google Scholar 

  • Brant R (1990) Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics 46(4):1171–1178

    Article  Google Scholar 

  • Cao H, Hill DH (2005) Active versus passive sample attrition: the health and retirement study. Econometrics 0505006, University Library of Munich, Germany

  • Catillon M, Cutler D, Getzen T (2018) Two hundred years of health and medical care: the importance of medical care for life expectancy gains. (25330)

  • Chhatwal J, Jayasuriya S, Elbasha EH (2016) Changing cycle lengths in state-transition models: challenges and solutions. Med Decis Making 36(8):952–964

    Article  Google Scholar 

  • Chowdhury SR, Machlin SR and Gwet KL (2019) Sample designs of the medical expenditure panel survey household component, 1996–2006 and 2007–2016. Methodology report #33 (January 2019) agency for healthcare research and quality. Rockville, MD

  • Clarke PM, Ryan C (2006) Self-reported health: reliability and consequences for health inequality measurement. Health Econ 15(6):645–652

    Article  Google Scholar 

  • Cohen SB, Machlin SR, Branscome JM (2000) Patterns of survey attrition and reluctant response in the 1996 medical expenditure panel survey. Health Serv Outcomes Res Method 1(2):131–148

    Article  Google Scholar 

  • Contoyannis P, Jones AM (2004) Socio-economic status, health and lifestyle. J Health Econ 23(5):965–995

    Article  Google Scholar 

  • Contoyannis P, Jones AM, Rice N (2004) The dynamics of health in the british household panel survey. J Appl Economet 19(4):473–503

    Article  Google Scholar 

  • Crossley TF, Kennedy S (2002) The reliability of self-assessed health status. J Health Econ 21:643–658

    Article  Google Scholar 

  • Cutler DM, Richardson E (1997) Measuring the health of the U.S. population. Brookings papers on economic activity: microeconomics pp 217–282

  • Dalgaard C-J, Strulik H (2014) Optimal aging and death: understanding the preston curve. J Eur Econ Assoc 12(3):672–701

    Article  Google Scholar 

  • Deaton AS, Paxson CH (1998) Aging and Inequality in Income and Health. Am Econ Rev Papers Proceed 88(2):248–253

    Google Scholar 

  • Deb P, Trivedi PK (1997) Demand for medical care by the elderly: a finite mixture approach. J Appl Economet 12(3):313–336

    Article  Google Scholar 

  • Diehr P, Patrick DL (2001) Probabilities of transition among health states for older adults. Qual Life Res 10:431–442

    Article  Google Scholar 

  • Diehr P, Patrick DL, Bild DE, Gregory L, Williamson BJD (1998) Predicting future years of healthy life for older adults. J Clin Epidemiol 51(4):343–353

    Article  Google Scholar 

  • Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56:968–976

    Article  Google Scholar 

  • Fernández-Villaverde J, Krueger D (2007) Consumption over the life-cycle: some facts from consumer expenditure survey data. Rev Econ Stat 89(3):552–565

    Article  Google Scholar 

  • Fisher GG, Ryan LH (2017) Overview of the health and retirement study and introduction to the special issue. Work, aging and retirement 4(1):1–9

  • Fitzgerald J, Gottschalk P, Moffitt R (1998) An analysis of sample attrition in panel data: the michigan panel study of income dynamics. J Hum Resour 33(2):251–299

    Article  Google Scholar 

  • Fonseca R, Michaud P-C, Galama T, Kapteyn A (2021) Accounting for the rise of health spending and longevity. J Eur Econ Assoc 19(1):536–579

    Article  Google Scholar 

  • French E (2005) The effects of health, wealth, and wages on labour supply and retirement behaviour. Rev Econ Stud 72(2):395–427

    Article  Google Scholar 

  • French E, Jones JB (2011) The effects of health insurance and self-insurance on retirement behavior. Econometrica 79:693–732

    Article  Google Scholar 

  • Gerdtham UG, Johannesson M, Lundberg L, Isacson D (1999) A note on validating wagstaff and Van Doorslaer’s health measure in the analysis of inequalities in health. J Health Econ 18(1):117–124

    Article  Google Scholar 

  • Grossman M (2000) Handbook of health economics. Vol. 1A Elsevier North Holland chapter The Human Capital Model, pp 347–408

  • Grossman M (1972) On the concept of health capital and the demand for health. J Polit Econ 80(2):223–255

    Article  Google Scholar 

  • Halliday TJ, Mazumder B, Wong A (2020) The intergenerational transmission of health in the United States: a latent variables analysis. Health Economics pp 1–15

  • Halliday TJ, Mazumder B, Wong A (2021) Intergenerational mobility in self-reported health status in the US. J Public Econ 193:104307

    Article  Google Scholar 

  • Hausman JA, McFadden D (1984) Spedification tests for the multinomial logit model. Econometrica 52:1219–1240

    Article  Google Scholar 

  • Hausman JA, Wise DA (1979) Attrition bias in experimental and panel data: the gary income maintenance experiment. Econometrica 47(2):455–473

    Article  Google Scholar 

  • Heeringa SG, Connor JH (1995) Technical description of the health and retirement survey sample design. Institute for Social Research University of Michigan Ann Arbor, MI

  • Higham NJ, Lin L (2011) On Pth roots of stochastic matrices. Linear Algebra Appl 435:448–463

    Article  Google Scholar 

  • Huggett M (1993) The risk-free rate in heterogeneous-agent incomplete-insurance economies. J Econ Dyn Control 17(5–6):953–969

    Article  Google Scholar 

  • Idler EL, Benyamini Y (1997) Self-rated health and mortality: a review of twenty-seven community studies. J Health Soc Behav 38(1):21–37

    Article  Google Scholar 

  • Idler EL, Kasl SV (1995) Self-ratings of health: do they also predict change in functional ability? J Gerontol Ser B, Psychol Sci Soc Sci 50(6):S344-353

    Google Scholar 

  • İmrohoroğlu S, Kitao S (2012) Social security reforms: benefit claiming, labor force participation, and long-run sustainability. Am Econ J Macroecon 4(3):96–127

    Article  Google Scholar 

  • İmrohoroğlu A, İmrohoroğlu S, Joines D (1995) A life cycle analysis of social security. Econ Theor 6(1):83–114

    Article  Google Scholar 

  • Israel RB, Rosenthal JS, Wei JZ (2001) Finding generators for markov chains via empirical transition matrices, with applications to credit ratings. Math Financ 11(2):245–265

    Article  Google Scholar 

  • Juerges H (2007) True health vs response styles: exploring cross-country differences in self-reported health. Health Econ 16(2):163–178

    Article  Google Scholar 

  • Jung J, Tran C (2014) Medical consumption over the life cycle: facts from a U.S. medical expenditure panel survey. Empir Econ 47(3):927–957

    Article  Google Scholar 

  • Jung J, Tran C (2016) Market inefficiency, insurance mandate and welfare: U.S. health care reform 2010. Rev Econ Dyn 20:132–159

    Article  Google Scholar 

  • Jung J, Tran C, Chambers M (2017) Aging and health financing in the U.S.: a general equilibrium analysis. Eur Econ Rev 100:428–462

    Article  Google Scholar 

  • Juster FT, Suzman R (1995) An overview of the health and retirement study. J Human Resour 30(Supplement):S7–S56

    Article  Google Scholar 

  • Kakwani N, Wagstaff A, van Doorslaer E (1997) Socioeconomic inequalities in health: measurement, computation, and statistical inference. J Econ 77(1):87–103

    Article  Google Scholar 

  • Kaplan G (2012) Inequality and the life cycle. Quant Econ 3(3):471–525

    Article  Google Scholar 

  • Kapteyn A, Meijer E (2014) A comparison of different measures of health and their relation to labor force transitions at older ages. In discoveries in the economics of aging. NBER Chapters National Bureau of Economic Research, Inc pp 115–150

  • Kapteyn A, Michaud PC, Smith JP, Van Soest A (2006) Effects of attrition and non-response in the health and retirement study. RAND Working Paper WR-407

  • Kerkhofs M, Lindeboom M (1995) Subjective health measures and state-dependent reporting errors. Health Econ 4(3):221–235

    Article  Google Scholar 

  • Kropko J (2008) Choosing between multinomial logit and multinomial probit models for analysis of unordered choice data. College of Arts and Sciences, Department of Political Science, Masters Thesis

  • Kullback S, Kupperman M, Ku HH (1962) Tests for contingency tables and markov chains. Technometrics 4(4):573–608

    Google Scholar 

  • Lillard LA, Panis CWA (1998) Panel attrition from the panel study of income dynamics: household income, marital status, and mortality. J Hum Resour 33(2):437–457

    Article  Google Scholar 

  • Lin L (2011) Roots of stochastic matrices and fractional matrix powers. Ph.D Thesis, Manchester Institute for Mathematical Sciences School of Mathematics

  • Lindeboom M, van Doorslaer E (2004) Cut-point shift and index shift in self-reported health. J Health Econ 23(6):1083–1099

    Article  Google Scholar 

  • Lindeboom M, Kerkhofs M (2009) Health and work of the elderly: subjective health measures, reporting errors and endogeneity in the relationship between health and work. J Appl Economet 24(6):1024–1046

    Article  Google Scholar 

  • Long SJ, Freese J (2014) Regression models for categorical dependent variables using stata, 3rd edn. Stata Press, College Station, TX

    Google Scholar 

  • McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. Dekker, New York

    Google Scholar 

  • Meijer E, Kapteyn A, Andreyeva T (2011) Internationally comparable health indices. Health Econ 20(5):600–619

    Article  Google Scholar 

  • Nardi D, Mariacristina EF, Jones JB (2010) Why do the elderly save? The role of medical expenses. J Polit Econ 118(1):39–75

    Article  Google Scholar 

  • Ofstedal MB, Weir DR, Kuang-Tsung C and James W (2011) Updates to HRS sample weights updates to HRS sample weights. Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI

  • Okun MA, Stock WA, Haring MJ, Witter RA (1984) Health and subjective well-being: a meta-analyis. Int J Aging Human Develop 19(2):111–132

    Article  Google Scholar 

  • Palumbo MG (1999) Uncertain medical expenses and precautionary saving near the end of the life cycle. Rev Econ Stud 66(2):395–421

    Article  Google Scholar 

  • Pashchenko S, Porapakkarm P (2013) Quantitative analysis of health insurance reform: separating regulation from redistribution. Rev Econ Dyn 16(3):383–404

    Article  Google Scholar 

  • Ridder G (1992) An empirical evaluation of some models for non-random attrition in panel data. Struct Chang Econ Dyn 3(2):337–355

    Article  Google Scholar 

  • Rockwood K, Mitnitski A (2007) Frailty in relation to the accumulation of deficits. J Gerontol: Ser A 62(7):722–727

    Article  Google Scholar 

  • Ruhm CJ (2000) Are recessions good for your health? Q J Econ 115(2):617–650

    Article  Google Scholar 

  • Siebert U, Alagoz O, Bayoumi AM, Jahn B, Owens DK, Cohen DJ, Kuntz KM (2012) State-transition modeling: a report of the ISPOR-SMDM modeling good research practices task force-3. Value Health 15(6):812–820

    Article  Google Scholar 

  • Small KA, Hsiao C (1985) Multinomial logit specification tests. Int Econ Rev 26(3):619–627

    Article  Google Scholar 

  • van Doorslaer E, and Jones AM (2003) Inequalities in Self-reported Health: Validation of a New Approach to Measurement. Journal of Health Economics 22(1):61–87

  • Vijverberg Wim PM (2011) Testing for IIA with the Hausman-Mcfadden Test. IZA Discussion Paper No. 5826

  • Wagstaff A, Van Doorslaer E (1994) Measuring inequalities in health in the presence of multiple-category morbidity indicators. Health Econ 3(4):281–291

    Article  Google Scholar 

  • Wallace RB, Herzog RA (1995) Overview of the health measures in the health and retirement study. J Human Resour 30(Supplement):S84–S107

    Article  Google Scholar 

  • Wilde J (2000) Identification of multiple equation probit models with endogenous dummy regressors. Econ Lett 69(3):309–312

    Article  Google Scholar 

  • Ziebarth N (2010) Measurement of health, health inequality, and reporting heterogeneity. Soc Sci Med 71(1):116–124

    Article  Google Scholar 

Download references

Funding

Not applicable. This study is not funded by any Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juergen Jung.

Ethics declarations

Conflict of interest

Juergen Jung declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We appreciate comments from Gerhard Glomm, Vinish Shrestha, Jialu Streeter, Pravin Trivedi, and an anonymous referee. This paper was formerly circulated as “Estimating Markov Transition Probabilities between Health States in the HRS Dataset”

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2226 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jung, J. Estimating transition probabilities between health states using US longitudinal survey data. Empir Econ 63, 901–943 (2022). https://doi.org/10.1007/s00181-021-02157-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-021-02157-6

Keywords

JEL Classification

Navigation