Reporting Heterogeneity and Health Disparities Across Gender and Education Levels: Evidence From Four Countries


I use anchoring vignettes from Indonesia, the United States, England, and China to study the extent to which differences in self-reported health across gender and education levels can be explained by the use of different response thresholds. To determine whether statistically significant differences between groups remain after adjusting thresholds, I calculate standard errors for the simulated probabilities, largely ignored in previous literature. Accounting for reporting heterogeneity reduces the gender gap in many health domains across the four countries, but to varying degrees. Health disparities across education levels persist and even widen after equalizing thresholds across the two groups.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. 1.

    Mortality selection is one potential reason for the gender paradox, but Strauss et al. (1993) found that adjusting for it reduces but does not eliminate the gender gap in self-reported health. Case and Paxson (2005) found evidence that men and women face different distributions of chronic conditions; and for some conditions, the severity is worse for men than women. The combination of these two findings help explain why women, afflicted with more chronic conditions that are less fatal, may report worse health yet still live longer than men.

  2. 2.

    See Cutler and Lleras-Muney (2006) and Grossman (2006) for reviews of the theory and empirical evidence and Vogl (2014) for a review specifically for developing countries.

  3. 3.

    General self-reported health is an answer to the question, “In general, how healthy do you feel?” I use domain-specific and not general self-reported health in this article because the standard vignettes have been designed for domain-specific health.

  4. 4.

    Idler and Benyamini (1997) reviewed 27 studies conducted in eight countries. With remarkable consistency, these studies showed that the coefficient on general self-rated health in regression on mortality remains significant even when other covariates and health status indicators are included. A more recent meta-analysis by DeSalvo et al. (2006) found that individuals who reported being in “poor” health have almost double the mortality risk of those who reported being in “excellent” health. This calculation included studies that controlled for various covariates, such as age and socioeconomic status (SES).

  5. 5.

    A test question with DIF is one that two people of the same ability but from different groups (races or genders, for example) have different probabilities of answering correctly.

  6. 6.

    This vignette is from the cognition domain and is used in all four data sets in this article. See Online Resource 1 (section A1) for complete list of vignettes.

  7. 7.

    For example, King et al. (2004) used vision tests to validate subjective scale questions about vision impairment, and van Soest et al. (2011) used actual counts of alcoholic drinks to validate subjective questions about the severity of drinking problems.

  8. 8.

    See the appendix for more detail.

  9. 9.

    The HRS, ELSA, and CHARLS are all aging data sets focused on the elderly, while the IFLS is a household survey that interviews all members of a sample household. The vignettes in the IFLS, however, were targeted only to those 40 and older.

  10. 10.

    See Molina (2014).

  11. 11.

    The vignettes in the IFLS are grouped by domain and within each domain appear to be ordered with the least severe vignettes at the beginning and the most severe at the end. For most domains, the ordering is quite clear, while domains like cognition and sleep are more open to interpretation. However, the data confirm that the relative severity perceived by IFLS respondents is consistent with the ordering of vignettes in the interview.

  12. 12.

    In these three exceptions, the differences in average ratings are very small in magnitude. Note that my arbitrarily chosen ordering is irrelevant to the estimation of the model because the θ li , which capture the actual ordering of perceived severity, are directly estimated.

  13. 13.

    In Online Resource 1, I estimate both an ordered probit model and a HOPIT model on the entire IFLS sample to illustrate importance of accounting for reporting heterogeneity. For pooled analyses of the HRS, ELSA, and CHARLS vignettes, see Dowd and Todd (2011), Bago d’Uva et al. (2011), and Mu (2014), respectively. Dowd and Todd (2011) and Bago d’Uva et al. (2011) used the same data I use here, whereas Mu (2014) used the pilot wave of the CHARLS. I use a slightly different specification from these studies.

  14. 14.

    A specific example is discussed in more detail in Online Resource 1, section A4.1.

  15. 15.

    Although hypertension itself may not result in more pain, related conditions, such as obesity or inactivity, might. I thank an anonymous reviewer for making this point.

  16. 16.

    See Dowd and Todd (2011) for a more detailed discussion.

  17. 17.

    Although some studies have been able to include vignette gender as a variable in the vignette latent variable equation, I do not have this information for all four data sets.

  18. 18.

    These heterogeneous reporting styles are likely related to the tendency of women to incorporate a wider range of nonphysical factors into self-reports (Benyamini et al. 2000) or societal expectations that consider males the tougher gender (Courtenay 2000).


  1. Angelini, V., Cavapozzi, D., & Paccagnella, O. (2011). Dynamics of reporting work disability in Europe. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 621–638.

    Article  Google Scholar 

  2. Bago d’Uva, T., Lindeboom, M., O’Donnell, O., & Van Doorslaer, E. (2011). Slipping anchor? Testing the vignettes approach to identification and correction of reporting heterogeneity. Journal of Human Resources, 46, 875–906.

    Article  Google Scholar 

  3. Bago d’Uva, T., O’Donnell, O., & van Doorslaer, E. (2008a). Differential health reporting by education level and its impact on the measurement of health inequalities among older Europeans. International Journal of Epidemiology, 37, 1375–1383.

    Article  Google Scholar 

  4. Bago d’Uva, T., Van Doorslaer, E., Lindeboom, M., & O’Donnell, O. (2008b). Does reporting heterogeneity bias the measurement of health disparities? Health Economics, 17, 351–375.

    Article  Google Scholar 

  5. Benyamini, Y., Leventhal, E. A., & Leventhal, H. (2000). Gender differences in processing information for making self-assessments of health. Psychosomatic Medicine, 62, 354–364.

    Article  Google Scholar 

  6. Case, A., & Paxson, C. (2005). Sex differences in morbidity and mortality. Demography, 42, 189–214.

    Article  Google Scholar 

  7. Courtenay, W. H. (2000). Constructions of masculinity and their influence on men’s well-being: A theory of gender and health. Social Science & Medicine, 50, 1385–1401.

    Article  Google Scholar 

  8. Crimmins, E. M., Kim, J. K., & Solé-Auró, A. (2010). Gender differences in health: Results from SHARE, ELSA and HRS. European Journal of Public Health. doi:10.1093/eurpub/ckq022

    Google Scholar 

  9. Cutler, D. M., & Lleras-Muney, A. (2006). Education and health: Evaluating theories and evidence (NBER Working Paper No. 12352). Cambridge, MA: National Bureau of Economic Research.

  10. DeSalvo, K. B., Bloser, N., Reynolds, K., He, J., & Muntner, P. (2006). Mortality prediction with a single general self-rated health question. Journal of General Internal Medicine, 21, 267–275.

    Article  Google Scholar 

  11. Dow, W. H., Gertler, P., Schoeni, R. F., Strauss, J., & Thomas, D. (1997). Health care prices, health and labor outcomes: Experimental evidence (Labor and Population Program Working Paper Series 97-01 No. DRU-1588-NIA). Santa Monica, CA: RAND.

  12. Dowd, J. B., & Todd, M. (2011). Does self-reported health bias the measurement of health inequalities in us adults? Evidence using anchoring vignettes from the health and retirement study. Journals of Gerontology, Series B: Psychological Sciences and Social Sciences, 66, 478–489.

  13. Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., . . . Baicker, K. (2012). The Oregon Health Insurance Experiment: Evidence from the first year. Quarterly Journal of Economics, 127, 1057–1106.

  14. Gertler, P., & Gruber, J. (2002). Insuring consumption against illness. American Economic Review, 92, 51–70.

    Article  Google Scholar 

  15. Grol-Prokopczyk, H. (2014). Age and sex effects in anchoring vignette studies: Methodological and empirical contributions. Survey Research Methods, 8(1), 1–17.

    Google Scholar 

  16. Grol-Prokopczyk, H., Freese, J., & Hauser, R. M. (2011). Using anchoring vignettes to assess group differences in general self-rated health. Journal of Health and Social Behavior, 52, 246–261.

    Article  Google Scholar 

  17. Grol-Prokopczyk, H., Verdes-Tennant, E., McEniry, M., & Ispány, M. (2015). Promises and pitfalls of anchoring vignettes in health survey research. Demography, 52, 1703–1728.

    Article  Google Scholar 

  18. Grossman, M. (2006). Education and nonmarket outcomes. Handbook of the Economics of Education, 1, 577–633.

    Article  Google Scholar 

  19. Health and Retirement Study (HRS). (2014). Public use dataset. Produced and distributed by the University of Michigan with funding from the National Institute on Aging (Grant No. NIA U01AG009740). Ann Arbor: University of Michigan.

  20. Idler, E. L., & Benyamini, Y. (1997). Self-rated health and mortality: A review of twenty-seven community studies. Journal of Health and Social Behavior, 38, 21–37.

    Article  Google Scholar 

  21. Kapteyn, A., Smith, J. P., & van Soest, A. (2007). Vignettes and self-reports of work disability in the United States and the Netherlands. American Economic Review, 1, 461–473.

    Article  Google Scholar 

  22. Kapteyn, A., Smith, J. P., & van Soest, A. (2010). Life satisfaction. In E. Diener, D. Kahneman, & J. Helliwell (Eds.), International differences in well-being (pp. 70–104). New York, NY: Oxford University Press.

    Google Scholar 

  23. King, G., Murray, C. J. L., Salomon, J. A., & Tandon, A. (2004). Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review, 98, 191–207.

  24. Lei, X., Smith, J. P., Sun, X., & Zhao, Y. (2013). Gender differences in cognition in China and reasons for change over time: Evidence from CHARLS (IZA Discussion Paper No. 7536). Bonn, Germany: Institute for the Study of Labor.

  25. Maccini, S., & Yang, D. (2009). Under the weather: Health, schooling, and economic consequences of early-life rainfall. American Economic Review, 99, 1006–1026.

    Article  Google Scholar 

  26. Macintyre, S., Ford, G., & Hunt, K. (1999). Do women over-report morbidity? Men’s and women’s responses to structured prompting on a standard question on long standing illness. Social Science & Medicine, 48, 89–98.

    Article  Google Scholar 

  27. Manning, W. G., Newhouse, J. P., Duan, N., Keeler, E. B., Leibowitz, A., & Marquis, M. S. (1987). Health insurance and the demand for medical care: Evidence from a randomized experiment. American Economic Review, 77, 251–277.

    Google Scholar 

  28. Marmot, M., Oldfield, Z., Clemens, S., Blake, M., Phelps, A., Nazroo, J., . . . Banks, J. (2014). English Longitudinal Study of Ageing: Waves 0-6, 1998–2013 [UK Data Archive]. Retrieved from

  29. Molina, T. (2014). Adjusting for heterogeneous response thresholds in cross-country comparisons of mid-aged and elderly self-reported health. Unpublished manuscript, Department of Economics, University of Southern California, Los Angeles, CA.

  30. Mu, R. (2014). Regional disparities in self-reported health: Evidence from Chinese older adults. Health Economics, 23, 529–549.

    Article  Google Scholar 

  31. Nathanson, C. A. (1975). Illness and the feminine role: A theoretical review. Social Science & Medicine (1967), 9(2), 57–62.

    Article  Google Scholar 

  32. Strauss, J., Gertler, P. J., Rahman, O., & Fox, K. (1993). Gender and life-cycle differentials in the patterns and determinants of adult health. Journal of Human Resources, 28, 791–837.

    Article  Google Scholar 

  33. Strauss, J., Witoelar, F., Sikoki, B., & Wattie, A. M. (2009). The Fourth Wave of the Indonesian Family Life Survey (IFLS4): Overview and field report (Labor and Population Working Paper No. WR-675/1-NIA/NICHD). Santa Monica, CA: RAND.

  34. van Soest, A., Delaney, L., Harmon, C., Kapteyn, A., & Smith, J. P. (2011). Validating the use of anchoring vignettes for the correction of response scale differences in subjective questions. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 575–595.

    Article  Google Scholar 

  35. Verbrugge, L. M. (1989). The twain meet: Empirical explanations of sex differences in health and mortality. Journal of Health and Social Behavior, 30, 282–304.

    Article  Google Scholar 

  36. Vogl, T. S. (2014). Education and health in developing economies. In A. J. Culyer (Ed.), Encyclopedia of health economics (1st ed., pp. 246–249). Boston, MA: Newnes.

    Google Scholar 

  37. Zhao, Y., Hu, Y., Smith, J. P., Strauss, J., & Yang, G. (2012). Cohort profile: The China Health and Retirement Longitudinal Study (CHARLS). International Journal of Epidemiology, 43, 61–68.

    Article  Google Scholar 

  38. Zhao, Y., Strauss, J., Yang, G., Giles, J., Hu, P., Hu, Y., . . . Wang, Y. (2013). China Health and Retirement Longitudinal Study – 2011–2012 National baseline users guide. Beijing, China: National School of Development, Peking University.

Download references


I am grateful to John Strauss, Arie Kapteyn, Jeff Nugent, Geert Ridder, and Hashem Pesaran for invaluable feedback and guidance throughout various stages of this project. Many thanks to USC seminar participants and attendees of the 2015 annual meeting of the Population Association of America for helpful comments. I acknowledge funding from the USC Provost’s Ph.D. Fellowship, the USC Dornsife INET graduate student fellowship, and the Oakley Endowed Fellowship.

Author information



Corresponding author

Correspondence to Teresa Molina.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1

(PDF 340 kb)

Appendix: Description of Data Sets

Appendix: Description of Data Sets

Indonesian Family Life Survey (IFLS)

I use the 2007 wave of the IFLS, an ongoing longitudinal household survey of individuals in 13 of the 27 Indonesian provinces, representative of 80 % of the Indonesian population. This article uses information from the individual-level demographic and health status modules. IFLS 4 also randomly chose 2,500 households to participate in the health vignette module. In selected households, all adults over 40 were asked the six domain-specific health questions. Crucially, the IFLS included three anchoring vignettes per health domain in addition to the above self-reports. Although all vignette households were asked all of the self-reported health questions, time constraints meant that each vignette household was assigned to respond to anchoring vignettes for only two randomly chosen domains of the six, leaving between 1,100–1,300 individuals per domain. During the interview, the interviewers read aloud a vignette like the one described in the Anchoring Vignettes section (see Online Resource 1 for a list all of the vignettes). The interviewers then repeated the domain-relevant question from the list of self-report questions in the Data section (of course, replacing the word “you” with the name of the hypothetical vignette person). The gender of the hypothetical individuals, depicted through their names, was randomized at the household level. Answers to the health status questions and anchoring vignettes form the outcome variables of interest for this analysis.

Purposely focusing on a set of simple explanatory variables in order to facilitate comparisons with the three other data sets, I use gender, age, and education levels. Specifically, I create one dummy variable for males, one for high school graduates, and one for those who completed primary but not high school.

Health and Retirement Study (HRS)

Since 1992, the HRS has interviewed a representative sample of Americans older than 50, reinterviewing the original sample and adding new cohorts every two years. In 2007, an “off-year” between two main interview years, the Disability Vignette Study (DVS) was sent out as a mail survey to a subsample, of which 81.7 % (more than 4,000) responded. This survey included the exact same anchoring vignettes for the same six domains found in the IFLS vignette modules, except with American instead of Indonesian names. Unlike the IFLS, two versions of the questionnaires, which ordered the questions differently and used different genders for the hypothetical individuals, were used.

I combine data from this off-year study with data from the most recent main survey prior to it, which took place in 2006. From the 2006 interviews, I obtain the basic explanatory variables: age, gender, and educational attainment. Because the vast majority of HRS respondents are high school graduates, I use college graduation as my “high-education” group and high school graduates (who have not completed college) as my “medium-education” group.

English Longitudinal Study of Aging (ELSA)

Similar to the HRS, the ELSA is a longitudinal panel of individuals aged over 50 living in England (Marmot et al. 2014). Since 2002, the representative sample, which was initially drawn from the Health Survey for England, has been reinterviewed every two years. The ELSA sample was also refreshed at Waves 3, 4, and 6. I use data from the third wave, collected during 2006 and 2007, which included self-completion vignette questionnaires that were handed out to a randomly selected one-third of the sample (and completed by almost 2,500 individuals). Individuals were asked to rate their own health in the six domains and then to respond to the same vignettes found in the IFLS and HRS. Unlike the other data sets, which randomized the genders of vignette individuals in varying ways, the ELSA had only one version of the questionnaire, which had the same names (and thus genders) assigned to the same questions for all respondents. The vignette genders alternated throughout the questionnaire, with one-half of the vignette individuals assigned female names and the other one-half male names.

Along with respondent age and gender, I use degree qualifications as my education variable because precise years of schooling are not included in this survey. The “high-education” category includes those who have received their A-levels or higher, while the “medium-education” category includes all qualifications lower than A-levels. This leaves those with no qualifications as the low-education group.

China Health and Retirement Longitudinal Study (CHARLS)

Finally, I also use data from the first wave of the CHARLS, conducted in 2011 (Zhao et al. 2013). Very similar to the other two longitudinal aging studies described earlier (the HRS and ELSA), the CHARLS has interviewed a representative sample of over 17,000 Chinese residents aged 45 and older and plans to follow up with the respondents every two years. The CHARLS is one of very few Chinese surveys that include domain-specific self-reports and vignette questions, which are asked as part of the full in-person interview for a random subsample of households. Like in the IFLS, each vignette household is randomly assigned to two of the six domains, resulting in around 1,100 to 1,300 respondents per domain. The genders of the hypothetical individuals are also randomized at the household level.

Because high school graduation rates for this sample are so low (less than 10 %), I use junior high school completion as my “high-education” cutoff and primary school completion as the boundary between the medium- and low-education groups.

Self-report Distributions

Figures 2 and 3 explore within-country differences across gender and education. Figure 2 depicts the distribution of self-report responses by gender for each data set separately. On each domain graph, I report the p value corresponding to the Pearson chi-squared statistic for the test of the null hypothesis that the distribution of the responses is the same for males and females. In the IFLS and CHARLS, for pain, cognition, affect, and sleep, males and females have significantly different self-report distributions, with males disproportionately falling in the healthiest category.

Fig. 2

Distribution of self-reports by gender

In the HRS, male and female distributions in the cognition, affect, and sleep domains are significantly different. In the ELSA, the domains that exhibit significant gender differences are pain, sleep, and affect. Like in the IFLS and CHARLS, males fall into the healthiest category more often than females.

Figure 3 shows even more drastically different distributions of self-reports, this time between high-education and lower-education groups (for which I pool the medium- and low-education categories). In virtually all domains in all four samples (with the exception of cognition and affect in the IFLS), the distributions are significantly different, with the higher education group disproportionally represented in the healthiest categories.

Fig. 3

Distribution of self-reports by education

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Molina, T. Reporting Heterogeneity and Health Disparities Across Gender and Education Levels: Evidence From Four Countries. Demography 53, 295–323 (2016).

Download citation


  • Self-reported health
  • Anchoring vignettes
  • Gender health disparities
  • SES–health disparities