Abstract
We use data from two representative US household surveys, the Medical Expenditure Panel Survey (MEPS) and the Health and Retirement Study (RAND-HRS) to estimate transition probability matrices between health states over the lifecycle from age 20–95. We compare nonparametric counting methods and parametric methods where we control for individual characteristics as well as time and cohort effects. We align two year transition probabilities from HRS with one-year transition probabilities in MEPS using a stochastic root method assuming a Markov structure. We find that the nonparametric counting method and the regression specifications based on ordered logit models produce similar results over the lifecycle. However, the counting method overestimates the probabilities of transitioning into bad health states. In addition, we find that young women have worse health prospects than their male counterparts but once individuals get older, being female is associated with transitioning into better health states with higher probabilities than men. We do not find significant differences of the conditional health transition probabilities between African Americans and the rest of the population. We also find that the lifecycle patterns are stable over time. Finally, we discuss issues with controlling for time effects, sample attrition, the Markov assumption, and other modeling issues that can arise with categorical outcome variables.
Similar content being viewed by others
Notes
MEPS is a representative household survey of the US working population, while the HRS is a survey representative of the older population in the USA.
Compare Catillon et al. (2018) and life tables from the CDC at: https://www.cdc.gov/nchs/nvss/life-expectancy.htm.
A related approach by Dalgaard and Strulik (2014) has focused on modeling the dynamics of health as a health deficit accumulation process that eventually ends in death which can be measured with a frailty index. An introduction to the frailty index measure can be found in Rockwood and Mitnitski (2007).
Appendices A–I contain additional results including additional summary statistics, ordered probit models, multinomial logit, and probit models, transition probabilities based on samples from two different time periods, detailed lifecycle transition probabilities by gender, race, and race and time, as well as results from finite mixture and mixed processes models.
Chowdhury et al. (2019) provides details about the MEPS survey designs.
Sect. 5.8 contains a more detailed discussion about attrition bias issues in MEPS and HRS.
The RAND-HRS is developed from the HRS and comprises a cross-wave file with variables derived consistently across waves. The RAND-HRS is maintained by the RAND Center of Aging. More information is available at: https://www.rand.org/well-being/social-and-behavioral-policy/centers/aging/dataprod/hrs-data.html.
Fisher and Ryan (2017) provides a recently published summary of the Health and Retirement Study.
A frequency distribution of the full sample is available in Figure A.1 and Table A.1 in Online Appendix A.
OECD (2018), Inflation (CPI) (indicator). doi: 10.1787/eee82e6e-en (Accessed on 29 June 2018) at https://data.oecd.org/price/inflation-cpi.htm.
It should be noted that the observed frequencies of the respective health categories are much more uneven in MEPS than in the HRS. For instance, only 391 individuals transition to death and 3, 172 into poor health states whereas much larger numbers of individuals transition into the other health states. Uneven counts of observations of different categories in the outcome variable can lead to convergence issues in multinomial models. We discuss some of these issues in Sects. 4.3 and 5.4.
Similar results of very persistent health states have been found in other surveys such as the British Household Panel Survey as shown in Contoyannis et al. (2004).
Increasing the polynomial order of age does not affect our results.
It should be noted that \(t+1\) refers to the future period which in the econometric implementation could mean a one year ahead variable for MEPS data and a two-year ahead variable for HRS data.
See Long and Freese (2014) for details.
The number of children in a household is sometimes added to the lifestyle equation as the number of children could affect smoking behavior but not one’s assessment of health.
We will discuss methods to transform two-year transition probabilities into one-year frequencies probabilities in the next Section.
Compare Contoyannis et al. (2004) for a discussion of using initial condition to control for individual effects in dynamic panel regressions.
Life expectancy numbers are from CDC life tables for the year 2001 and 2016 retrieved in June 2021 from https://www.cdc.gov/nchs/nvss/life-expectancy.htm:
The Python version of the algorithm is available on the author’s website at: https://juejung.github.io/research.htm.
A joint hypothesis over all initial health types h can easily be implemented by summing up the individual \(\alpha _{n}\) over all health states with then follow a Chi-square distribution with \(6\left( 6-1\right) \left( T-1\right) \) degrees of freedom.
The joint distribution over all health states h follows a Chi-square distribution with \(6\times \left( 6-1\right) ^{2}\) degrees of freedom.
Tests for subcategories of individuals for whom we control in the parameterized version of the model are difficult to implement as some transitions between rare health states would not show up in the divided sample and the tests would have diminished statistical power.
The age group of 50–60 year olds has good representation in both surveys as can be seen from Figure A.1 in Online Appendix A.
In order to assess the robustness of our results based on the ordered logit model, we also report estimation results from an ordered probit model in Figures B.1–B.5 in Online Appendix B. The results are almost identical to the ordered logit model.
Marginal effects estimates for both the MLM and MPM using HRS data are available in Online Appendix C. They are very similar to the marginal effects based on the ordered logit model from Sect. 5.1.
This does not contradict our earlier result that finds some significant differences in the early and late time period dummy variables as differential effects from periods of recessions are potentially driving the results.
Adding additional interaction terms of gender with a higher order age polynomial does not change the resulting graphs in a statistically significant way.
Online Appendix G presents the lifecycle profiles of the differences in the conditional transition probabilities as well as summary statistics by race across the two time periods.
Baulch and Quisumbing (2010) contains detailed descriptions including Stata codes for these type of tests.
Attrition on observables occurs when the dependent variable is independent of the attrition process conditional on the explanatory variables. Attrition on unobservables occurs when this conditional independence does not hold. A sample selection model can account for attrition on unobservables but requires an exclusion restriction for identification, that is, an instrumental variable that affects attrition only but not the dependent variable (Hausman and Wise 1979; Ridder 1992). Fitzgerald et al. (1998) point out that it is almost impossible to find plausible exclusion restrictions.
References
Aiyagari RS (1994) Uninsured idiosyncratic risk and aggregate saving. Q J Econ 109(3):659–684
Alderman H, Behrman JR, Kohler H-P, Maluccio JA, Watkins SC (2001) Attrition in longitudinal household survey sata. Demogr Res 5:79–124
Anderson TW, Goodman LA (1957) Statistical inference about markov chains. Ann Math Stat 28(1):89–110
Balia S (2014) Survival Expectations, Subjective Health and Smoking: Evidence from SHARE. Empir Econ 47(2):753–780
Balia S, Jones AM (2008) Mortality, lifestyle and socio-economic status. J Health Econ 27(1):1–26
Baulch B, Quisumbing A (2010) Testing and adjusting for attrition in household panel data. Toolkit Note, Chronic Poverty Research Centre, London, UK 1–12
Becketti S, Gould W, Lillard L, Welch F (1988) The panel study of income dynamics after fourteen years: an evaluation. J Law Econ 6(4):472–492
Bewley T (1986) Stationary monetary equilibrium with a continuum of independently fluctuating consumers. In: Hildenbrand W, Mas-Colell A (eds) Contributions to mathematical economics in Honor of Gerard Debreu. North-Holland
Billingsley P (1961) Statistical inference for markov processes, vol 7. University of Chicago Press, Chicago
Brant R (1990) Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics 46(4):1171–1178
Cao H, Hill DH (2005) Active versus passive sample attrition: the health and retirement study. Econometrics 0505006, University Library of Munich, Germany
Catillon M, Cutler D, Getzen T (2018) Two hundred years of health and medical care: the importance of medical care for life expectancy gains. (25330)
Chhatwal J, Jayasuriya S, Elbasha EH (2016) Changing cycle lengths in state-transition models: challenges and solutions. Med Decis Making 36(8):952–964
Chowdhury SR, Machlin SR and Gwet KL (2019) Sample designs of the medical expenditure panel survey household component, 1996–2006 and 2007–2016. Methodology report #33 (January 2019) agency for healthcare research and quality. Rockville, MD
Clarke PM, Ryan C (2006) Self-reported health: reliability and consequences for health inequality measurement. Health Econ 15(6):645–652
Cohen SB, Machlin SR, Branscome JM (2000) Patterns of survey attrition and reluctant response in the 1996 medical expenditure panel survey. Health Serv Outcomes Res Method 1(2):131–148
Contoyannis P, Jones AM (2004) Socio-economic status, health and lifestyle. J Health Econ 23(5):965–995
Contoyannis P, Jones AM, Rice N (2004) The dynamics of health in the british household panel survey. J Appl Economet 19(4):473–503
Crossley TF, Kennedy S (2002) The reliability of self-assessed health status. J Health Econ 21:643–658
Cutler DM, Richardson E (1997) Measuring the health of the U.S. population. Brookings papers on economic activity: microeconomics pp 217–282
Dalgaard C-J, Strulik H (2014) Optimal aging and death: understanding the preston curve. J Eur Econ Assoc 12(3):672–701
Deaton AS, Paxson CH (1998) Aging and Inequality in Income and Health. Am Econ Rev Papers Proceed 88(2):248–253
Deb P, Trivedi PK (1997) Demand for medical care by the elderly: a finite mixture approach. J Appl Economet 12(3):313–336
Diehr P, Patrick DL (2001) Probabilities of transition among health states for older adults. Qual Life Res 10:431–442
Diehr P, Patrick DL, Bild DE, Gregory L, Williamson BJD (1998) Predicting future years of healthy life for older adults. J Clin Epidemiol 51(4):343–353
Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56:968–976
Fernández-Villaverde J, Krueger D (2007) Consumption over the life-cycle: some facts from consumer expenditure survey data. Rev Econ Stat 89(3):552–565
Fisher GG, Ryan LH (2017) Overview of the health and retirement study and introduction to the special issue. Work, aging and retirement 4(1):1–9
Fitzgerald J, Gottschalk P, Moffitt R (1998) An analysis of sample attrition in panel data: the michigan panel study of income dynamics. J Hum Resour 33(2):251–299
Fonseca R, Michaud P-C, Galama T, Kapteyn A (2021) Accounting for the rise of health spending and longevity. J Eur Econ Assoc 19(1):536–579
French E (2005) The effects of health, wealth, and wages on labour supply and retirement behaviour. Rev Econ Stud 72(2):395–427
French E, Jones JB (2011) The effects of health insurance and self-insurance on retirement behavior. Econometrica 79:693–732
Gerdtham UG, Johannesson M, Lundberg L, Isacson D (1999) A note on validating wagstaff and Van Doorslaer’s health measure in the analysis of inequalities in health. J Health Econ 18(1):117–124
Grossman M (2000) Handbook of health economics. Vol. 1A Elsevier North Holland chapter The Human Capital Model, pp 347–408
Grossman M (1972) On the concept of health capital and the demand for health. J Polit Econ 80(2):223–255
Halliday TJ, Mazumder B, Wong A (2020) The intergenerational transmission of health in the United States: a latent variables analysis. Health Economics pp 1–15
Halliday TJ, Mazumder B, Wong A (2021) Intergenerational mobility in self-reported health status in the US. J Public Econ 193:104307
Hausman JA, McFadden D (1984) Spedification tests for the multinomial logit model. Econometrica 52:1219–1240
Hausman JA, Wise DA (1979) Attrition bias in experimental and panel data: the gary income maintenance experiment. Econometrica 47(2):455–473
Heeringa SG, Connor JH (1995) Technical description of the health and retirement survey sample design. Institute for Social Research University of Michigan Ann Arbor, MI
Higham NJ, Lin L (2011) On Pth roots of stochastic matrices. Linear Algebra Appl 435:448–463
Huggett M (1993) The risk-free rate in heterogeneous-agent incomplete-insurance economies. J Econ Dyn Control 17(5–6):953–969
Idler EL, Benyamini Y (1997) Self-rated health and mortality: a review of twenty-seven community studies. J Health Soc Behav 38(1):21–37
Idler EL, Kasl SV (1995) Self-ratings of health: do they also predict change in functional ability? J Gerontol Ser B, Psychol Sci Soc Sci 50(6):S344-353
İmrohoroğlu S, Kitao S (2012) Social security reforms: benefit claiming, labor force participation, and long-run sustainability. Am Econ J Macroecon 4(3):96–127
İmrohoroğlu A, İmrohoroğlu S, Joines D (1995) A life cycle analysis of social security. Econ Theor 6(1):83–114
Israel RB, Rosenthal JS, Wei JZ (2001) Finding generators for markov chains via empirical transition matrices, with applications to credit ratings. Math Financ 11(2):245–265
Juerges H (2007) True health vs response styles: exploring cross-country differences in self-reported health. Health Econ 16(2):163–178
Jung J, Tran C (2014) Medical consumption over the life cycle: facts from a U.S. medical expenditure panel survey. Empir Econ 47(3):927–957
Jung J, Tran C (2016) Market inefficiency, insurance mandate and welfare: U.S. health care reform 2010. Rev Econ Dyn 20:132–159
Jung J, Tran C, Chambers M (2017) Aging and health financing in the U.S.: a general equilibrium analysis. Eur Econ Rev 100:428–462
Juster FT, Suzman R (1995) An overview of the health and retirement study. J Human Resour 30(Supplement):S7–S56
Kakwani N, Wagstaff A, van Doorslaer E (1997) Socioeconomic inequalities in health: measurement, computation, and statistical inference. J Econ 77(1):87–103
Kaplan G (2012) Inequality and the life cycle. Quant Econ 3(3):471–525
Kapteyn A, Meijer E (2014) A comparison of different measures of health and their relation to labor force transitions at older ages. In discoveries in the economics of aging. NBER Chapters National Bureau of Economic Research, Inc pp 115–150
Kapteyn A, Michaud PC, Smith JP, Van Soest A (2006) Effects of attrition and non-response in the health and retirement study. RAND Working Paper WR-407
Kerkhofs M, Lindeboom M (1995) Subjective health measures and state-dependent reporting errors. Health Econ 4(3):221–235
Kropko J (2008) Choosing between multinomial logit and multinomial probit models for analysis of unordered choice data. College of Arts and Sciences, Department of Political Science, Masters Thesis
Kullback S, Kupperman M, Ku HH (1962) Tests for contingency tables and markov chains. Technometrics 4(4):573–608
Lillard LA, Panis CWA (1998) Panel attrition from the panel study of income dynamics: household income, marital status, and mortality. J Hum Resour 33(2):437–457
Lin L (2011) Roots of stochastic matrices and fractional matrix powers. Ph.D Thesis, Manchester Institute for Mathematical Sciences School of Mathematics
Lindeboom M, van Doorslaer E (2004) Cut-point shift and index shift in self-reported health. J Health Econ 23(6):1083–1099
Lindeboom M, Kerkhofs M (2009) Health and work of the elderly: subjective health measures, reporting errors and endogeneity in the relationship between health and work. J Appl Economet 24(6):1024–1046
Long SJ, Freese J (2014) Regression models for categorical dependent variables using stata, 3rd edn. Stata Press, College Station, TX
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. Dekker, New York
Meijer E, Kapteyn A, Andreyeva T (2011) Internationally comparable health indices. Health Econ 20(5):600–619
Nardi D, Mariacristina EF, Jones JB (2010) Why do the elderly save? The role of medical expenses. J Polit Econ 118(1):39–75
Ofstedal MB, Weir DR, Kuang-Tsung C and James W (2011) Updates to HRS sample weights updates to HRS sample weights. Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI
Okun MA, Stock WA, Haring MJ, Witter RA (1984) Health and subjective well-being: a meta-analyis. Int J Aging Human Develop 19(2):111–132
Palumbo MG (1999) Uncertain medical expenses and precautionary saving near the end of the life cycle. Rev Econ Stud 66(2):395–421
Pashchenko S, Porapakkarm P (2013) Quantitative analysis of health insurance reform: separating regulation from redistribution. Rev Econ Dyn 16(3):383–404
Ridder G (1992) An empirical evaluation of some models for non-random attrition in panel data. Struct Chang Econ Dyn 3(2):337–355
Rockwood K, Mitnitski A (2007) Frailty in relation to the accumulation of deficits. J Gerontol: Ser A 62(7):722–727
Ruhm CJ (2000) Are recessions good for your health? Q J Econ 115(2):617–650
Siebert U, Alagoz O, Bayoumi AM, Jahn B, Owens DK, Cohen DJ, Kuntz KM (2012) State-transition modeling: a report of the ISPOR-SMDM modeling good research practices task force-3. Value Health 15(6):812–820
Small KA, Hsiao C (1985) Multinomial logit specification tests. Int Econ Rev 26(3):619–627
van Doorslaer E, and Jones AM (2003) Inequalities in Self-reported Health: Validation of a New Approach to Measurement. Journal of Health Economics 22(1):61–87
Vijverberg Wim PM (2011) Testing for IIA with the Hausman-Mcfadden Test. IZA Discussion Paper No. 5826
Wagstaff A, Van Doorslaer E (1994) Measuring inequalities in health in the presence of multiple-category morbidity indicators. Health Econ 3(4):281–291
Wallace RB, Herzog RA (1995) Overview of the health measures in the health and retirement study. J Human Resour 30(Supplement):S84–S107
Wilde J (2000) Identification of multiple equation probit models with endogenous dummy regressors. Econ Lett 69(3):309–312
Ziebarth N (2010) Measurement of health, health inequality, and reporting heterogeneity. Soc Sci Med 71(1):116–124
Funding
Not applicable. This study is not funded by any Grant.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Juergen Jung declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We appreciate comments from Gerhard Glomm, Vinish Shrestha, Jialu Streeter, Pravin Trivedi, and an anonymous referee. This paper was formerly circulated as “Estimating Markov Transition Probabilities between Health States in the HRS Dataset”
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Jung, J. Estimating transition probabilities between health states using US longitudinal survey data. Empir Econ 63, 901–943 (2022). https://doi.org/10.1007/s00181-021-02157-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-021-02157-6
Keywords
- Lifecycle profiles of health transition probabilities
- Medical expenditure panel survey (MEPS)
- Health and retirement study (RAND-HRS)
- Health transition matrices
- Conditional health transition probabilities
- Markov property
- Age-time-cohort effects