Biases in health expectancies due to educational differences in survey participation of older Europeans: It’s worth weighting for

Health expectancies are widely used by policymakers and scholars to analyse the number of years a person can expect to live in good health. Their calculation requires life tables in combination with prevalence rates of good or bad health from survey data. The structure of typical survey data, however, rarely resembles the education distribution in the general population. Specifically, low-educated individuals are frequently underrepresented in surveys, which is crucial given the strong positive correlation between educational attainment and good health. This is the first study to evaluate if and how health expectancies for 13 European countries are biased by educational differences in survey participation. To this end, calibrated weights that consider the education structure in the 2011 censuses are applied to measures of activity limitation in the Survey of Health, Ageing and Retirement in Europe. The results show that health expectancies at age 50 are substantially biased by an average of 0.3 years when the education distribution in the general population is ignored. For most countries, health expectancies are overestimated; yet remarkably, the measure underestimates health for many Central and Eastern European countries by up to 0.9 years. These findings highlight the need to adjust for distortion in health expectancies, especially when the measure serves as a base for health-related policy targets or policy changes.


Introduction
Life expectancy continues to increase in Europe. We live longer, but do we live healthier? Answering this question is of utmost importance in the presence of demographic change. How long and how healthy we live is necessary information for public and private healthcare providers to plan health coverage and care services. Furthermore, policymakers are interested in the employability of older generations when adapting pension systems, in particular, when adjusting the retirement age. Whether we spend our additional life years in good or bad health is frequently analysed via health expectancy (HEX), an indicator that captures the number of years a person can expect to live in good health. This concept was developed half a century ago [1,2] and has garnered increasing attention from both scholars and policymakers. For example, the European Commission aims to add 2 years of healthy life for the average European between 2010/2011 and 2020 to improve the sustainability of the European social and healthcare systems [3,4]. Furthermore, many European governments use HEX to set health-related targets and make policy changes based on this measure [5].
HEX usually combines information on mortality with prevalence rates of good or bad health from survey data; therefore, it captures both the quantity and quality of additional life years. A key problem with this approach, however, is that survey participation is often selective and differs by individual characteristics such as gender, age and socio-economic status. A common deviation is that highly educated individuals are more likely to participate in surveys than loweducated individuals, leading to an overrepresentation of the highly educated among the respondents [6][7][8]. This mismatch is crucial given the strong positive correlation between educational attainment and good health [9][10][11][12]. Overrepresenting healthy, well-educated individuals in surveys makes countries appear to have healthier populations than is actually the case.
The aim of this study is to explore if and how HEX differs when the education structure in the general population is considered. For this purpose, prevalence rates of bad health from the Survey of Health, Ageing and Retirement in Europe (SHARE) for 13 European countries are adjusted with calibrated weights based on auxiliary information from censuses. Although there has been vast research on HEX, to the best of my knowledge, no previous work has addressed whether biases in the education composition distort the measure. Given the widespread use of HEX among scholars and health authorities, knowing the reliability of the indicator in the context of flawed survey data is pivotal. Moreover, this study contributes to the literature by illustrating how bias can be adjusted for when auxiliary information on the true population structure is available.

Educational attainment affects health
The positive correlation between educational attainment and good health is well established [9]. For example, the average life expectancy at birth of well-educated Europeans is 7 years higher than that of low-educated individuals [13]. Furthermore, low-educated persons report higher activity limitations [14] and higher levels of bodily pain [12]. This can be partially explained with economic rationales, such as the positive link between education and income or correlations between education and occupational choice [11]. Additionally, differences in health behaviour are potential drivers of the education gradient in health. On one hand, low-educated persons are more likely to smoke, drink heavily, and be obese than highly educated persons. On the other hand, they are less likely to use preventive care, drive safely, and live in safe houses [15]. While the positive relationship between socio-economic advantages and health is found throughout Europe, the magnitude of that correlation varies by gender and country. First, the education gradient is larger for men than for women in life expectancy [16] as well as in HEX [17]. Second, in Central and Eastern European (CEE) countries, highly educated individuals are much healthier than low-educated individuals; whereas the difference is small in, for example, Denmark [18]. While most social health inequalities among older Europeans are driven by current socio-economic conditions, childhood circumstances also add to the health differences between socio-economic groups [19].

Educational attainment affects survey participation
Educational attainment is associated not only with health but also with survey participation. Low-educated persons are frequently underrepresented in health surveys, for example, in Belgium [7,20], Denmark [21], and Finland, where the gap in survey participation between low-and welleducated individuals has substantially widened over time [6]. This violation of the "missing at random" assumption can be attributed to coverage errors, sampling errors, and non-response errors [22]. Coverage errors stem from the mismatch between the survey's target population and its sampling frame, for example, when phone registers serve as sampling frames, although low-educated persons are less likely to own phones than the highly educated. Sampling errors denote the gap between sampling frame and the sample, which emerges because not all individuals in the sampling frame can be surveyed due to time and money constraints. To account for the unequal selection probabilities of sample units, surveys frequently provide sampling weights. Finally, non-response errors stem from differences between the invited sample and the actual respondents.
The strong association between non-response and low education [23] can be explained by three channels [22]. First, low-educated persons are harder to contact due to their socio-demographic and social-environmental attributes. For example, they might have unstable life paths and are consequently more likely to change their address. Second, participation in surveys is usually voluntary and loweducated persons are more likely to refuse to participate than the highly educated. Finally, low-educated individuals may be less likely to provide the requested survey data for reasons such as being too sick to participate or because they are less aware of certain domains such as their health or financial situation.
Education is not the only characteristic corresponding with lower response rates. Gender and age also impact survey participation, which is why these variables are commonly considered in survey weights. Furthermore, characteristics such as race [24] and relationship status [8] are associated with non-response. This study, however, only focuses on education-related biases. First and foremost, education is a common proxy for socio-economic status that is rather stable over lifetime with relatively low measurement error. Furthermore, the education gradient in response behaviour is well established. Finally, register or census data on the education structure in the general population are more readily available than auxiliary information on other socioeconomic characteristics, making it more possible to compare the education distribution in the general population to that in the survey data.

Educational differences in survey participation bias the prevalence of good and bad health
In summary, highly educated individuals are, on average, healthier than low-educated individuals and are more likely to participate in surveys. Thus, both the variable of interest (health) and the likelihood to participate in a survey are influenced by educational attainment. When inferences about the health of the general population are made based on unweighted prevalence rates from such flawed surveys, the general population appears healthier than what is true in reality. For example, Van Der Heyden et al. [20] found that the prevalence of people with diabetes and asthma in Belgium is underestimated when the actual education distribution in the general population is not considered. In the Netherlands, education-related non-response leads to negative biases in the prevalence of low self-assessed health, smoking, alcohol intake, and low physical activity [25].

Prevalence of good or bad health is needed to calculate HEX
Prevalence rates of good or bad health are one of the main components needed when calculating HEX, which makes the education-related bias in survey data a major concern. Similar to life expectancy, HEX varies substantially among European countries and is particularly low in CEE countries [26]. Around 2010, HEX at birth was 70.1 years for Swedish men but only 52.6 years for Slovakian men. For women, HEX at birth ranged from 71.5 years in Malta to 52.7 years in Slovakia [27]. Overall, women live a larger proportion of their life disabled than men [28,29].
While life expectancy has clearly increased throughout Europe, evidence on HEX is less conclusive. The outcome depends on the health dimension that is considered [30] as well as the survey utilised [31]. Analysing 25 European countries between 2005 and 2010, [30] show that years in poor general self-rated health at age 65 decreased by 0.5 (1.1) years for men (women). By contrast, years with chronic morbidity increased at the same time and years without activity limitations remained stable. Analysing the latter separately for different countries, Jagger [32] found that HEX increased in some countries but decreased in others. In addition to differences in health measures, surveys, subpopulations and the relationship between mortality and morbidity, the lack of a consistent time trend in HEX might be partly explained by the small number of observations in the surveys utilised. Analysing prevalence by country, gender, and age requires sufficient numbers of observations in each country-gender-age cell. This is often not the case, especially at older ages. Consequently, prevalence rates based on these small cells are often not reliable and have large confidence intervals: the small cell sizes make it difficult to separate the signal from the noise.
Regardless of the evidence on the inadequate representation of the low-educated persons in surveys, studies typically do not adjust for prevalence rates of HEX. One explanation for this might be that auxiliary information on the actual education distribution in the general population is not readily available. Register data are only accessible for some European countries and censuses are only conducted with long time intervals. Yet whenever available, auxiliary data on the actual education distribution in the general population can be utilised to calibrate weights so that they account for deviations between the true distribution and the survey distribution.

Data
The following sections describe analyses of whether adjusting for the education structure in the general population changes the prevalence of bad health and consequently the HEX for European countries. The analyses rely on three different data sources. Auxiliary information that is expected to capture the actual education distribution in the general population is taken from Eurostat's Census database, which provides Population and Housing Censuses for Europe. These census data are used to generate calibrated weights via iterative proportional fitting (IPF). In addition, life tables from Eurostat [33] along with prevalence of bad health from SHARE are taken to compute HEX with Sullivan's method [2,34]. Analyses and comparisons of HEX in Europe are frequently based on SHARE [26,35,36] as well as on the European Statistics on Income and Living Conditions (EU-SILC) and on the European Health Interview Survey (EHIS). This analysis utilises SHARE, because its sampling and weighting procedure is well documented, thus enabling an exact replication of the calibration approach employed [37,38].

The Survey of Health, Ageing and Retirement in Europe (SHARE)
Prevalence rates of bad health are extracted from the fourth wave of SHARE, which was mainly conducted in 2011, and consequently corresponds with the census data [39][40][41][42]. Although some interviews took place in 2010 and 2012, 94% of all observations stem from 2011. In total, 16 European countries participated in the fourth wave; however, 3 of these countries do not provide reliable census data via Eurostat for the requested year (see "Eurostat data for poststratification weights and life tables"). Therefore, the analysis is restricted to 13 countries including Austria, Belgium,
The target population of SHARE consists of all noninstitutionalised individuals aged 50 and older who regularly live in the respective survey country and speak its language(s). Spouses of target individuals are included in the data regardless of their age; however, for this study, all individuals younger than 50 years old are excluded [42][43][44]. The remaining number of respondents lies between 1615 in Germany and 6754 in Estonia. Some countries only provide small numbers of observations per gender-age-education cell, especially at higher ages. Respondent numbers for Germany, Poland, and Portugal are particularly small: all three countries provide less than 2000 observations. Germany and Poland also have small respondent numbers at ages 50-54, because their panel was not refreshed since Wave 2 in 2007. Details on the number of respondents for each country are summarised in Appendix 1.1. All numbers for SHARE refer to the final set of respondents used for the calculations in this paper.
The survey is based on probability samples with close to full target population coverage for all countries, yet details regarding the sample design, in particular the sampling frame, vary by country (for an overview, see [38,43,44]). Respondents were surveyed in their homes by interviewers using computer-assisted personal interviews. For details on response rates, consult [44].
For the calibration of weights, information on the proportions of respondents by country, gender, age, and educational attainment is required. Educational attainment is split into three groups in accordance with the International Standard Classification of Education [45]. The "low-educated" group includes individuals whose educational attainment is lower secondary education and less. The "medium-educated" group includes individuals with upper secondary or postsecondary non-tertiary education. The "high-educated" group includes all individuals with higher than post-secondary non-tertiary education. A fourth category was added to capture all individuals with missing values in their education variable (2.2%). The education categories are directly comparable to the categories in the census data. By construction, country information has no missing values in SHARE. The gender variable also has no missing values. Age information is available for all observations save four individuals in Czechia, who are subsequently excluded. To calculate proportions in SHARE for IPF, age is grouped into 10-year age groups with 90 + serving as an open-ended category. Details regarding the survey proportions by country, gender, age, and education are presented in Appendix 1.1.
HEX in Europe is most commonly calculated based on the Global Activity Limitation Indicator (GALI) [5,27,46,47], making the health measure the obvious choice for this analysis. Moreover, evaluations show that GALI similarly measures function and disability across European countries [48,49], allowing cross-country comparisons. In particular, GALI is based on the reply to the following survey question: "For the past 6 months at least, to what extent have you been limited because of a health problem in activities people usually do?" The question is answered by each survey participant based on three categories: "severely limited", "limited but not severely", and "not limited". For the purpose of this study, GALI is dichotomised into a binary variable with (1) "severely limited" and (0) "not severely limited". Prevalence of bad health π is calculated by country, gender, and 5-year age group; 85 years of age serves as an open-ended category. In the final set of respondents, GALI has missing values for only 0.58% of the survey participants. Because there is no evidence of an education-related pattern in item nonresponse concerning GALI, this study only focuses on unit non-response.
GALI is a self-assessed health measure, and as such, is likely biased depending on the respondent's individual characteristics [50][51][52][53] and cultural background [54][55][56][57]. Low-educated survey respondents are particularly prone to misreporting their health. Some evidence suggests that loweducated individuals have the tendency to overestimate their physical health; whereas, highly educated persons tend to underestimate their physical health [57]. If that is the case, the bias in HEX that is associated with underrepresentation of low education could appear smaller than it actually is, because low-educated individuals are overstating their physical abilities. Furthermore, self-assessed measures are often upward biased at older ages [57,58], presumably due to peer effects [59]. Thus, as a robustness analysis, the prevalence of bad health is also estimated based on grip strength, a tested measure that is expected to be less biased by systematic misreporting. Despite GALI and grip strength measuring different health domains, additional calculations based on grip strength are expected to reveal if self-reported and tested health measures are equally biased by educational differences in survey participation.
Grip strength is primarily used to measure sarcopenia, the age-related decrease in muscle mass [60]. Furthermore, it is a strong predictor of mortality [61], mobility, and cognition [62]. While GALI only captures activity limitations, grip strength is often considered a proxy for overall health. In SHARE, grip strength is ascertained twice per hand for each participant via a handheld Smedley dynamometer (for details, see Ref. [63]). In accordance with the literature, the maximum of these four measurements is used for robustness analysis [61,63,64]. Grip strength is measured in kilograms, yet the calculation of HEX requires a binary outcome variable. Consequently, thresholds have to be applied, dividing the participants into groups of impaired and unimpaired. The European Working Group on Sarcopenia in Older People (EWGSOP) suggests cut-off values < 20 kg for women and < 30 kg for men to determine the onset of sarcopenia [60]. More recent evidence, however, suggests that such pragmatic thresholds do not fully capture critically weak hand grip [61]. Moreover, grip strength varies by factors such as body height and country of residence [63], implying that thresholds should be adapted accordingly. Because the purpose of this study is not to analyse grip strength as such, the pragmatic approach suggested by EWGSOP is deemed satisfactory. If the thresholds are indeed inaccurate, they would affect both the adjusted and unadjusted prevalence rates and, therefore, would not distort the results.

Eurostat data for post-stratification weights and life tables
The calibration of weights requires auxiliary information on the actual population structure. To this end, it is assumed that the auxiliary information captures the true structure in the population with respect to certain characteristics such as gender, age, and education. For this study, the European Population and Housing Censuses are utilised as auxiliary data [65]. Along with the National Statistical Institutes, Eurostat combined national censuses from 2011 for 32 European countries and structured them in a comparable manner. Sixteen of these countries overlap with the countries from SHARE Wave 4. Because the Netherlands, Sweden, and Switzerland show irregularities in the census data provided by Eurostat, these countries are not included in the current analysis, leaving a sample of 13 countries.
For each country, population totals by gender, age, and education for individuals over 50 years of age are extracted from the censuses. The totals are used as control totals when calibrating weights. Some countries have missing information on educational attainment, which is why four education categories are constructed. The education groups "low educated", "medium educated", and "high educated" are based on the same criterion as adopted in SHARE, which are described in "The Survey of Health, Ageing and Retirement in Europe (SHARE)". In addition, an education category denoted "unknown education" is created. Regarding gender and age, missing values are negligible, which is why this analysis is only based on the known population, and census cells for unknown gender and age are excluded. The census does not differentiate between institutionalised and non-institutionalised persons, which is why it is assumed that both groups are comparable. For details regarding the population proportions by country, gender, age, and education based on the censuses, consult Appendix 1.1.
In addition to prevalence rates, the calculation of HEX with Sullivan's method relies on life tables provided by Eurostat for 2011 [33]. They are prepared to resemble standard abridged period life tables by country, gender, and 5-year age group, with 85 + considered an open-ended category.

Education distribution in SHARE versus that in the censuses
By comparing the education distribution of participants in SHARE with that in the respective censuses, three country groups can be differentiated: countries for which SHARE data fit the education distribution in the population, country data in which highly educated individuals are overrepresented and low-educated individuals are underrepresented, and remarkably, country data in which this trend is reversed. Tables comparing the distributions can be found in Appendix 1.1.
The only two SHARE datasets resembling the education distribution in the population are those for Italy and Spain. The fit for Italy is close to perfect (Table 9). Spain shows slight deviations in the younger age groups, but overall achieves concordance between SHARE and the census (Table 13). Both countries have little variation in education within age groups. For example, the vast majority of the 70 + population is low educated. This pattern might explain the good fit with respect to the education distribution. However, Portugal also has little variation in education within age groups, but the education distribution in SHARE varies strongly from that in the census (Table 11). Hence, non-complex education distributions do not guarantee concordance between the education structure in surveys and the general population.
For most countries, high-educated individuals are overrepresented and low-educated individuals are underrepresented in SHARE. This pattern is in line with the literature discussed in "Background". The countries belonging to that category are Austria, Belgium, Denmark, Germany, Hungary, Portugal, and to a lesser extent France and Slovenia. The deviation is particularly strong in Denmark, where the proportions in SHARE differ from those in the census on average by 51% for men and 52% for women in the age group of 50-89 (Table 4).
Interestingly, three CEE countries show the opposite pattern. In Czechia, Estonia, and Poland, low-educated individuals are overrepresented in the survey. Deviations are minor for Estonia (Table 5) and Poland (Table 10). For Czechia, however, SHARE proportions deviate from the census by 95% for men and 38% for women on average (Table 3). While high-educated individuals are underrepresented in the Estonian and Polish data, medium-educated individuals are underrepresented in the Czech data. Overall, the findings presented in this subsection suggest a need for educationadjusted weights (EW) when making inferences based on survey data.

Method
To determine if distortions in the education distribution of survey data affect HEX, SHARE sampling design weights are adjusted via IPF so that the education structure in SHARE would match the education structure in the respective census. Following that, two sets of prevalence rates of severe activity limitations are computed. The first set π EW is calculated using EW; whereas the control set π RW uses standard weights without adjustment. Finally, Sullivan's method is applied to calculate HEX EW with education-adjusted prevalence rates and HEX RW with the unadjusted rates. Comparing the two sets of HEX reveals if and how the measure is biased by educational differences in survey participation.

Generating calibrated weights via IPF
Frequently, the proportions of certain characteristics in survey data deviate from the proportions of the same characteristics in the general population. Assuming that the distribution in the general population is known, calibrated weights can be generated for each survey respondent to account for these discrepancies. Calibrated weights are usually based on sampling design weights, which compensate for unequal selection probabilities of sample units, and in the case of SHARE, are provided with the survey data. They are defined as the inverse of the probability of being included in the sample. These design weights account for the unequal selection of sample units, but not for unit non-response [43].
A common method for calibrating sampling design weights is IPF, also known as raking. For this approach, marginal totals for each variable on which the weights are calibrated are taken from an auxiliary source that is assumed to capture the true distribution in the general population. When applying IPF, sampling design weights are iteratively modified by a multiplicative factor until convergence is achieved and the marginal totals of the adjusted weights conform to the corresponding marginal totals from the auxiliary source [66,67]. After the adjustment, groups that were formerly underrepresented have relatively larger weights; whereas groups that were formerly overrepresented have relatively smaller weights. Importantly, the original information provided by the sampling design weights is maintained, since the weights within a group increase proportionally. The empirical strategy of this study evolves around three different sets of calibrated weights, which are discussed in more detail below.

SHARE weights (SW)
SHARE provides its own set of calibrated weights to account for differences in response behaviour. However, their weights do not consider the education structure in the general population [38]. For the remainder of this paper, these weights are referred to as SHARE weights (SW). The SW are generated based on a calibration approach by Deville and Särndal [68], which is implemented using Stata's sreweight command by [69]. Control totals for the SW stem from the Eurostat regional database. The weights are calculated separately for each country, considering NUTS 1 regions as well as eight gender-age groups, with cutoffs at 50-59 years, 60-69 years, 70-79 years, and an open-ended category of 80 + years. In some countries, finer partitions are made below age 59 [37,38].

Replicated weights (RW)
In a first step, the SW are replicated; this second set of weights is referred to as replicated weights (RW). Using RW instead of SW ensures that differences between estimates with and without education-adjusted weights do not stem, for example, from methodological differences applied for SW and EW. The goal is for RW to be as close as possible to the SW. However, some amendments in the method are made, so that later, education could be added as an additional control total. First, control totals are used for each calibration variable separately, instead of cross-classification. For example, instead of using age-gender totals, separate totals for age and gender are applied. The rationale behind this modification in the method is that calibrated weights are generally less stable and less likely to converge when observations are thinly spread over the calibration cells [66]. Using separate totals increases the number of observations by calibration cell. As a second amendment, Stata's survwgt rake algorithm by [67] is used to generate the RW because it appears more robust than the sreweight command [70]. Third, control totals for NUTS 1 regions are not considered in this study, again, to increase the weight's stability. The control total was included for a robustness analysis but did not alter the results. Fourth, an additional age category of 80-89 years is included, making 90 + the open-ended category. Finally, the Eurostat regional database does not provide information by education, which is why the 2011 census is used for this paper instead. Although these five changes are made, prevalence rates calculated based on the SW are almost identical to those calculated based on the RW, which confirms the approach.

Education-adjusted weights (EW)
Following the replication of SW, the EW are calculated. They are identical to the RW, except that an additional control total for education is considered for the calibration. Hence, EW vary for each individual observation, depending on the individual's sampling design weight, gender, age, and educational attainment. In addition, the 2.2% of individuals with missing values for education receive a calibrated weight, since both the prevalence rates by education and the control totals include a category for "unknown education".
Weighted prevalence rates of bad health π are calculated based on RW (π RW ) and EW (π EW ). In particular, the prevalence rates for the main analysis are based on the binary GALI measures, and prevalence rates for the robustness analysis are based on dichotomised grip strength. The means are calculated separately by country, gender, and 5-year age group, which follows the most common approach to calculate HEX in Europe. Prevalence rates π RW and π EW based on GALI along with the confidence intervals are presented in Appendix 1.2.

Calculating HEX with Sullivan's method
HEX is computed by applying Sullivan's method [2,34]. According to the standard life table notation (e.g. [71]), let l x = number of survivors at exact age x (beginning of age interval i) L i = number of person-years lived in age interval i π i = prevalence of severe activity limitations in age interval i.
Then HEX at age x is calculated separately by country and gender as follows: where the 5-year age groups range from i = 0 to A. More specifically, prevalence rates π i are used to divide personyears lived according to the Eurostat life tables into years with and without severe activity limitations. Following that, HEX is calculated by dividing the number of individuals surviving to a certain age x by the total years lived healthily from age x onwards. Two sets of HEX are calculated. HEX EW is based on π EW , the prevalence of severe activity limitations in age interval i weighted with EW. HEX RW is based on π RW , the prevalence of severe activity limitations in age interval i weighted with RW. The bias in HEX due to the misrepresentation of educational groups in the survey is computed as the difference between HEX RW and HEX EW and denoted as ∆HEX. Confidence intervals around HEX RW , HEX EW and ∆HEX are approximated using the method suggested by [72].
An alternative to calculating HEX via Sullivan's method is the multistate life table method, which is sometimes said to be more accurate [73,74]; however, Mathers and Robine [75] report that differences between the two methods are small. Furthermore, Sullivan's method is the most common approach to calculate HEX in Europe for both health authorities and scholars, which makes the results of this study comparable.

Prevalence of bad health with and without adjusted weights
The differences between adjusted (π EW ) and unadjusted (π RW ) prevalence rates correspond to the deviation in education structure in SHARE from the census (see tables in Appendix 1.2). For Italy and Spain, π RW and π EW are rather similar. For all country datasets in which high-educated individuals are overrepresented and low-educated individuals are underrepresented, π RW is smaller than π EW , indicating a downward bias in mean activity limitation. This finding is in line with the evidence that education and good health are positively correlated. The size of the bias depends on the deviation between SHARE data and the census. It is minor for countries such as France, where the deviation is small: π RW at age 50 is 0.095 (0.097) for men (women) and π EW at age 50 is 0.105 (0.107) for men (women). Yet the bias is severe for countries such as Denmark, where the deviation is large: π RW at age 50 is 0.074 (0.076) for men (women) and π EW at age 50 is 0.107 (0.110) for men (women). For the three countries in which low-educated individuals are overrepresented, π RW is larger than π EW , indicating an upward-bias in mean activity limitation. Consequently, these countries appear healthier once the education structure in the general population is considered. The countries concerned are Czechia, Estonia, and Poland. The shift is most pronounced for Czechia, which is in line with the finding that the Czech SHARE data are particularly distorted.
Confidence intervals of π EW and π RW are mostly overlapping due to the small numbers of observations in the age-gender-education cells. For example, the male age group 90 + in Germany only consists of five men, and that in Slovenia consists of four men only. In Austria, the male age group 90 + consisted of 20 men, of which 7 are low educated, 6 are medium educated, 6 are high educated, and 1 has unknown education. While the aggregated data show a clear positive link between educational attainment and good health, the direction of the relationship between education and health in these small gender-age cells is sometimes the opposite. For example, the seven low-educated men in the Austrian 90 + group reported on average better health than the six high-educated men. Due to the reversal, prevalence of bad health is slightly lower for that group, once EW are applied. Given the small number of observations in certain cells and the subsequently large confidence intervals, HEX as well as differences in HEX have to be interpreted cautiously, especially for Portugal and Germany, where 1 3 confidence intervals are particularly large and no clear age gradient in severe activity limitations for men is visible.
Comparing prevalence rates based on grip strength measures with those based on GALI leads to similar findings as described above. Yet for most countries, the age gradient in bad health is steeper when measured via grip strength, so the prevalence of bad health at old age is usually higher. This finding could be explained with the evidence that participants rate their health relatively better at old age than at young age (see "The Survey of Health, Ageing and Retirement in Europe (SHARE)"). Most notably, Portuguese and German men show a clear age gradient in education when health is tested with grip strength, while no such age gradient is visible when health is measured with GALI. Figure 1 shows how HEX at age 50 is biased because of educational differences in survey participation. The bias is given in absolute years and the countries are ranked based on the average bias in all age groups. Results for German as well as Polish men are not shown, because small numbers of observations at young ages and subsequent large confidence intervals prevent a meaningful illustration and interpretation of the difference in HEX for those countries at ages 50-54. In addition to Fig. 1, HEX RW and HEX EW are presented in Appendix 1.2 for all age groups, along with the respective bias in absolute years denoted as ∆HEX and the proportional bias denoted as ∆%. Confidence intervals for HEX RW , HEX EW and ∆HEX are also provided in Appendix 1.2.

Bias in HEX
On average, HEX at age 50 is biased by 0.3 years, yet the deviation varies substantially between countries and genders. It is larger for women (0.4 years) than for men (0.2 years), presumably due to the higher life expectancy of women in general. For most parts, the bias resembles the deviations between SHARE and the census, and consequently, the deviation between π RW and π EW . As a result, HEX RW and HEX EW are similar for Italy and Spain, since SHARE mimics the censuses in those countries. At age 50, ∆HEX for Spanish men (women) is only − 0.04 (0.00) years. For Italian men (women), the bias is only − 0.07 (− 0.06) years. Overall, the deviations are even smaller at older ages.
By contrast, HEX at age 50 is upward-biased in countries for which high-educated persons are overrepresented in the SHARE data. This is the case for Belgium, Denmark, Austria, Germany, Hungary, France, and Slovenia. Without EW, these countries appear to have a healthier population than is actually the case. At age 50, the upward bias is largest for women in Belgium, where HEX is overestimated by 0.87 years or 3.5%. The opposite is true for Estonia, Czech Republic, and Poland, where low-educated individuals are overrepresented in the SHARE data. Consequently, these countries appear unhealthier than they actually are. At age 50, the downward bias is largest for Czech women, whose HEX is 0.85 years or 3.2% lower when the education structure in the general population is ignored. Since the bias has different magnitudes, and more importantly, different directions, it affects the country ranking of HEX. For example, Danish men aged 50 appear to have relatively high HEX without the EW (rank 4 of 13) but drop to the lower middle field (rank 7 of 13) when adjustments are made.
∆HEX mostly decreases with age, since life expectancy decreases with age. The proportional bias ∆%, however, remains stable over all age groups or decreases only slightly for the most part. Overall, the country and gender differences described for age 50 also hold for older age groups. Due to uncertainty in the data, however, some age groups in some countries (e.g., male age group 90 + in Austria) do not show the expected sign for ∆HEX. As indicated in the previous sections, the results for Germany and Portugal have to be treated especially carefully due to the small cell sizes. HEX at age 50 for Portuguese men appears to be severely underestimated, although the data clearly show that higheducated men are overrepresented in the Portuguese SHARE data (Table 11).
As a robustness analysis, HEX based on grip strength is also provided (Fig. 2). The overall bias appears smaller when the tested indicator is applied: average ∆HEX at age 50 is reduced to 0.17 years but is still larger for women (0.23 years) than for men (0.11 years). Even though the overall level of the bias is lower when grip strength is utilised, the general findings are confirmed. The bias is still negligible for Italy and Spain. The countries showing an upward bias based on GALI also show an upward bias based on grip strength; the same holds for all countries showing downward biases. Moreover, the inconsistencies in the Portuguese data disappear once grip strength is used. HEX at age 50 for both Portuguese men and women appears to be overestimated without the EW, just as expected when comparing the Portuguese SHARE data with the census. By contrast, the results for German women suggest an unexpected downward bias of HEX, albeit with a large confidence interval, which indicates once again that results based on small numbers of respondents must be handled with care.

Discussion
This study is the first to evaluate if HEX in Europe is biased by educational differences in survey participation. The analysis showed that SHARE data for 11 of the 13 countries analysed did not resemble the education structure in the general population. In most countries, high-educated individuals were overrepresented, leading to an upward bias in HEX by up to 0.87 years, because of the positive correlation between educational attainment and good health. Contrary to what is suggested in the literature, most CEE countries analysed showed the opposite pattern that high-educated individuals were less likely to participate in surveys. As a consequence, HEX was underestimated by up to 0.85 years in those countries. These biases are crucially important, especially since HEX is frequently used by health authorities to assess population health and to make comparisons between countries. Future studies could fruitfully explore this issue further by exploring the non-response related bias in HEX for other surveys such as EHIS and EU-SILC. Investigating EU-SILC is considered particularly relevant since the data are used to monitor the European Commission's aim to add 2 years of healthy life for the average European by 2020.
Related literature suggests that the biases are in fact larger and that the results ascertained in this study constitute a lower bound. First and foremost, this is because the loweducated individuals who participate in surveys are most likely healthier than the low-educated individuals who are not captured. Studies have shown that low-educated respondents have lower mortality [76], better self-reported health [77][78][79], and suffer less from psychosis [80] than low-educated non-respondents. Thus, being included in the survey is a collider that creates an artificial negative correlation between educational attainment and health. Importantly, this collider bias introduces an even larger bias for all countries in which high-educated persons are overrepresented.
In addition, measurement errors in education might increase the biases. For example, [81] found that a substantial proportion of Danish SHARE participants exaggerated their level of education, especially when they were low educated. If unhealthy low-educated individuals exaggerate their level of education, they artificially narrow the health gap between low-and high-educated participants, adding to the bias. Finally, the survival bias might increase the bias in HEX if unhealthier low-educated persons have higher mortality and consequently do not appear in the survey.
An important finding of this study was that, in contrast to common results from the literature, low-educated individuals are not necessarily more likely to be underrepresented in surveys than the highly educated. The education structures in the Italian and Spanish SHARE are almost identical to those in the respective censuses. Consequently, HEX appears to be unbiased for these countries. Potentially, this is because educational attainment hardly varies within age groups in both nations, making it easier to survey the "correct" distribution. However, Portugal has similar education patterns across age but a still highly biased HEX. What could also explain the good fit for Italy and Spain is that the effect of education on health appears to be weaker than that for other countries. Both nations are among the countries with the highest life expectancy in Europe [33], even though their overall level of education is low compared to Western and Northern European countries [65]. Moreover, the education gradient in life expectancy is very pronounced in most of Europe; yet interestingly, Italy was the only country in the sample in which life expectancy at age 50 was slightly lower for the highly educated (34.6 years) than for the medium educated (35.2 years) [13]. Unfortunately, Eurostat does not provide life expectancy by education for Spain, thereby preventing a comparison. [16] found similar results for Italian women during the 1990s, although not for men. The evidence suggests that the association between education and health might be weaker in both countries than in other European countries. If the link between education and survey participation is weaker too, this would be an additional explanation for their unbiased HEX.
The CEE countries Czechia, Estonia, and Poland also did not follow the expected pattern in terms of educational differences in survey participation. Contrary to what is generally found in the literature, high-educated individuals were underrepresented in all three countries, most profoundly so in Czechia. One explanation for this curious finding is that in all three countries, high-educated individuals are much more likely to keep working at older ages, presumably due to low pension replacement rates. This pattern holds for both men and women. For the age group of 65-74, Estonian academics had the highest employment rate in the sample (26.9%), followed by the highly educated in Czechia (20.5%), Italy (19.7%), and Poland (18.6%) [82]. As a result, the highly educated might be less likely to participate in surveys due to time constraints: when an interviewer knocks on their doors, they might simply be at work. A second, somewhat speculative, explanation for the low participation of high-educated individuals in Czechia, Estonia, and Poland could be related to trust or the lack thereof. It is well established that postcommunist societies in Europe have, on average, lower levels of trust in institutions [83] and lower levels of social trust [84]. If the highly educated were more distrustful than loweducated individuals, this could explain the participation pattern in the three countries. What contradicts this speculation is the fact that Slovenia is also a CEE country with a similar history. However, the Slovenian SHARE data follow the common pattern of too few low-educated respondents.
HEX is calculated by combining the prevalence of good and bad health from survey data with life tables. This study analysed how distortion in the education structure of surveys affects HEX via biases in prevalence rates. In addition, one could analyse whether educational differences in life expectancy also add to the bias. Due to data restrictions, it is commonly assumed that all educational groups share the same life expectancies when applying Sullivan's method. However, Eurostat data for a small sample of European countries show that all countries but Italy have a clear education gradient in life expectancy. The educational differences are most pronounced in the CEE countries, save Slovenia, and are weakest in the Nordic countries [13]. If and how these differences bias HEX in the context of distorted surveys cannot be said a priori, as the bias depends on the interactions between the education distribution in the general population and the education-related response behaviour in the respective country. Thus, this study only focused on distortions due to prevalence rates to stay within scope. Furthermore, this study evaluated HEX in its most common form, which is without education-specific mortality. However, future studies should explore how educational differences in life expectancy affect the bias in HEX, especially in countries where the education gradient in mortality is strong.
The main limitation of this paper is data driven. For most countries, SHARE captures non-institutionalised persons only. Since the census does not differentiate between institutionalised and non-institutionalised persons, it was assumed that both groups are comparable. If this assumption is violated due to educational differences between the two groups, prevalence rates based on EW might deviate from the prevalence rates for the general population.
Overall, the findings of this study highlight the need to account for distortions in the education structure of survey data. First and foremost, this can be achieved by preventing the misrepresentation of certain educational groups in the first place, and if prevention does not lead to accurate representation, by adjusting for deviations with survey methods such as calibrated weights. Literature has shown that survey modes [23], recruitment methods [85], interviewer experience, and the number of attempted contacts [22] affect survey participation and consequently might be helpful for counteracting heterogeneities in survey representation. However, past evidence has also revealed that response rates have declined over time [22], and that the gap in response behaviour between high-and low-educated individuals has increased [6]. If this pattern continues, survey methods that adjust for misrepresentation will become even more important in the future. Although auxiliary information on the education structure in the general population is not available for each European country at any given year, censuses might still allow for the calibration of weights since the education structure at old age changes slowly [86], or as Schumacher [87] puts it: "education does not 'jump'".

Conclusion
Survey participation differs substantially among educational groups, which leads to biased health expectancy (HEX) when the discrepancies are not accounted for. This study was the first to explore the magnitude and direction of the bias in HEX for 13 European countries based on the Survey of Health, Ageing and Retirement in Europe (SHARE) for 2011. To this end, calibrated weights were generated so that the education structure in SHARE would resemble that of the respective Population and Housing Census.
The analysis revealed that SHARE did not accurately resemble the education structure in the general population for 11 of the 13 countries investigated, which lead to substantial biases in HEX. In most of the datasets, high-educated individuals were overrepresented. Due to the positive correlation between educational attainment and good health, HEX was upward-biased for these countries by as much as 0.87 years. Remarkably, most CEE countries showed the opposite pattern that high-educated individuals were underrepresented. As a result, HEX was underestimated for these countries by up to 0.85 years.
Understanding the sensitivity of HEX measures is crucial because of their immense scientific and political influence. In the context of ever decreasing survey response rates, it is of utmost importance that the flawed education structure in survey data is prevented and adjusted for. Only then, it is possible to accurately assess policy targets based on HEX. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.