1 Introduction

In an ageing society, the well-being of the elderly is at the top of the public policy agenda. Considering the demographic situation of the “Old Europe”, a world region characterized by a steadily increasing life expectancy (passing from 70 to 79 between 1960 and 2008), a (almost) null birth rate and the highest proportion of people over the age of 65 (Eurostat 2005), it is easy to understand the growing interest of both policy makers and social scientists for the conditions of aged persons. Footnote 1 In economics, measures of the well-being of the elderly have focused mainly on observable and objective economic factors, such as income and wealth. While there is no doubt about the importance of these studies, income and wealth might not be an accurate proxy of individuals’ overall welfare, which seems to be also significantly influenced by numerous non-economic factors.

A widely recognized methodology used to collect data on subjective well-being and to test the validity of economic models is to directly ask individuals to self-report their level of life satisfaction on an ordered scale. Given the individuals’ answers, social scientists seek to identify the relation between these self-assessed measures and both economic and non-economic factors. As a consequence, economists, psychologists and sociologists have developed a growing interest in the measurement and determinants of life satisfaction (see Frey and Stutzer 2002a, b; van Praag and Ferrer-i-Carbonell 2004; Bruni and Porta 2005; Dolan et al. 2008 for surveys) in later life.

A crucial methodological issue which limits the validity of these studies is that, when asked to self-report their life satisfaction, individuals can adopt different benchmarks or scales in evaluating themselves (van Praag 1971; Winkelmann and Winkelmann 1998; Clark and Oswald 2002; Ferrer-i-Carbonell and Frijters 2004; Senik 2004; Clark and Lelkes 2005). This phenomenon is known as differential item functioning (henceforth DIF). In a nutshell, the determinants of the level of life satisfaction reported by respondents might also influence how they perceive the thresholds of the survey question. As a consequence, estimates are biased and do not reflect the genuine relationship between covariates and respondents’ self-assessment of life satisfaction. The existence of scale biases might be caused by differences in demographic characteristics, socio-economic conditions (Frey and Stutzer 2002b) and cultural connotations (Uchida et al. 2004; Diener and Suh 2000; Inglehart and Klingemann 2000; Jürges 2007). In particular, when focusing on older people’s responses, a reasonable conjecture is that scale biases are associated with their deteriorating health conditions (Rowe and Kahn 1998; Atchley 1999; Hilleras et al. 2001a, b; Easterlin 2003) and physical limitations (Dulin and Pachana 2005). Painful pathologies or mobility limitations might affect self-assessments both directly as genuine determinants of life satisfaction and indirectly by negatively biasing respondents’ mood and the perception of the scale. Similarly, after controlling for health conditions, one might expect age to generate scale biases, per se. For instance, the “role” theory of the ageing process highlights the positive effects on both life satisfaction and respondents’ reporting style of virtuous aspects like self-integration, insight, positive psychological traits including satisfaction and self-esteem which are increasing in age (Gove and Shin 1989).

Conducting analysis on panel data represents a potential solution of the scale bias problem. Indeed, when individuals’ life satisfaction is followed over time, conventional fixed or random effects (Kapteyn et al. 2007) or latent class techniques (Clark and Lelkes 2005) allow filtering out time-invariant heterogeneity. Unfortunately, in addition to objective problems of availability of panel data on individuals’ life satisfaction, other important methodological issues question the validity of this solution. For instance, given the psychological nature of life satisfaction, the assumption of individuals’ time-invariant scales required by studies based on panel data is arguable. The scale adopted by an individual may vary over time according to her actual mood (Kahneman et al. 2004) or her socio-economic conditions. Moreover, when the analysis focuses on the condition of aged persons, panel data analysis can be strongly influenced by attrition problems which limit the possibility of following the observation over time.

The vignette methodology allows for properly correcting cross sectional data from scale biases. Anchoring vignettes were first introduced by King et al. (2004) for analyzing ordinal survey responses taking into account individual differences in the interpretation of the survey questions. Following this approach, individuals are asked to self-report their level of life satisfaction, along with the assessment, on the same scale, of hypothetical people described in fixed situations or conditions (anchoring vignettes). The idea is that, since the description of the hypothetical person is invariant across respondents, answers to the vignette questions will differ only because of the individual heterogeneity in reporting styles. Therefore, the evaluation of the anchoring vignettes provides sufficient information to correct self-reported life satisfaction from individual specific scale biases.

The vignette methodology has been successfully applied to several domains which can be potentially affected by scale biases. In particular, it has been used in researches based on cross-sectional data and studying political efficacy (King et al. 2004), health (Salomon et al. 2004; Bago d’Uva et al. 2008), work disability (Kapteyn et al. 2007; van Soest et al. 2006) and job satisfaction (Kristensen and Johansson 2008). As far as we know, Angelini et al. (2008) are the first to apply this methodology to a cross-country study of self-reported life satisfaction. Using data from the 2006 wave of the Survey of Health, Ageing and Retirement in Europe (SHARE), they show how the high variability in the self-reported levels of life satisfaction across European countries is mainly caused by differences in the scales and benchmarks adopted by individuals to evaluate themselves. In particular, when differences in reporting styles are not taken into account, once controlled for several dimensions such as economic, demographic, health and social conditions, Danes and Italians result to be the populations with the highest and the lowest levels of life satisfaction respectively. On the other hand, when the vignette methodology is applied to correct for individual-specific biases, the ranking across countries dramatically change. The difference in self-reported life satisfaction between Danes and Italians disappears and the Netherlands and Czech Republic respectively replace Denmark and Italy in the ranking of life satisfaction.

Using the richness of the SHARE vignette data, in this paper we aim at further investigating the relationship between age, health conditions and life satisfaction among the elderly. We find evidence that age affects the self-reported level of life satisfaction in two opposite ways. Controlling for the effects of all other variables, the own perceived level of life satisfaction increases with age; however, at the same time, given the same true level of life satisfaction, older respondents are more likely to rank themselves as “dissatisfied” with their life than younger individuals. The resulting effect is that both in the self-reported data and after controlling for potential differences in individual reporting styles, age variations in the level of life satisfaction are rather small, in particular for respondents younger than 75 years. Moreover, coherently with our initial conjecture of an interplay between age and health conditions, we find that mobility limitations and other pathologies play an important role in explaining scale biases in the reporting style of older individuals.

This paper is organized as follows. In Sect. 2, we specify the econometric model based on the vignettes (the Hopit model), as well as its main properties and statistical assumptions. In Sect. 3, we describe the dataset used in our analysis, along with some descriptive statistics of the variables used in the regressions. Section 4 reports and comments the estimates of the Hopit model. A comparison with an Ordered Probit model in which scales are assumed to be constant is also proposed. In Sect. 5, we discuss some possible limitations of our approach. Section 6 ends the paper, summarizing the main findings.

2 The Econometric Model

We adopt the econometric specification that was first introduced by King et al. (2004) and is usually referred to as the hierarchical ordered probit (Hopit) model. It mainly consists of two elements: the self-assessment component and the vignette component.

Let us denote with \(Y_{i}^{\ast }\) the life satisfaction perceived by individual \(i=1,\ldots,n\) and assume that it is a linear function of the observed variables X i and a normally distributed error term \(\varepsilon_{i}:\)

$$ \begin{aligned} &Y_{i}^{\ast }=X_{i}\beta +\varepsilon _{i};\\ &\varepsilon _{i}|X_{i}\,\sim\,N(0,1), \end{aligned} $$
(1)

with parameter vector β.

As in the standard ordered probit model, we do not observe directly \(Y_{i}^{\ast }\) but only the answers to a question on life satisfaction recorded as an ordered categorical variable, which goes from 1 (“very dissatisfied”) to 5 (“very satisfied”). In particular,

$$ Y_{i}=j\quad\hbox{ if }\tau _{i}^{j-1}<Y_{i}^{\ast }\leq \tau _{i}^{j},\quad j=1,\ldots,5. $$
(2)

The main difference between the Hopit and the standard ordered probit model is that the thresholds τ j i are individual-specific:

$$ \begin{aligned} \tau _{i}^{0}&=-\infty ;\quad \tau _{i}^{5}=\infty\\ \tau _{i}^{1}&=X_{i}\gamma ^{1}; \end{aligned} $$
(3)
$$ \tau _{i}^{j}=\tau _{i}^{j-1}+\exp (X_{i}\gamma ^{j}),\quad j=2,3,4. $$
(4)

Unfortunately, using only the self-assessment component does not allow to separately identify the parameters in β and γ. To do so, we need to use the information provided by the vignettes.

Let us denote with \(Z_{il}^{\ast }, l=1,2\) how respondent i perceives the actual level of life satisfaction of the person described in vignette l. We assume that

$$ \begin{aligned} Z_{il}^{\ast } &=\theta _{l}+\nu _{il};\\ \nu _{il} &\sim N(0,\sigma _{v}^{2}), \end{aligned} $$
(5)

where θ l is the actual level of life satisfaction described in the vignette l and ν il is a stochastic component assumed to be independent of \(\varepsilon _{i}. \) The requirement that θ l does not vary over i is what is referred to in the literature as the vignette equivalence assumption, according to which the situation described in the vignettes is perceived by each respondent in the same way.

Even in this case, we do not observe \(Z_{il}^{\ast }\) but only the ordered responses to vignette questions on the same 5-item scale used for self-assessments:

$$ Z_{il}=j\quad\hbox{ if }\tau _{i}^{j-1}<\,Z_{il}^{\ast }\leq\, \tau _{i}^{j}, \quad j=1,\ldots,5. $$
(6)

Note that the same set of thresholds is also found in Eq. 2, which implies that we assume the same reporting styles to be used both for the self-assessment and the vignette evaluations. This hypothesis is known as the response consistency assumption.

In this set-up using the same set of thresholds in (2) and in (6) allows connecting the self-assessment component and the vignette component. This implies that the information relevant to estimate Eqs. 1 and 5 in the sample should be combined to estimate the common set of parameters in the threshold Eqs. 3 and 4. Following King et al. (2004), the joint estimation is carried out via conditional maximum likelihood and implemented by the STATA module gllamm. Footnote 2

3 Data

In this paper we use data from the 2006 wave of the Survey of Health, Ageing and Retirement in Europe (SHARE). SHARE is an interdisciplinary survey on ageing that is run every two years and collects extensive information on health, socioeconomic status and family interactions of individuals aged 50 and over in a host of European countries. We present evidence for eleven countries for which vignette data were collected. These countries range from the North (Sweden and Denmark) through Central Europe (Germany, The Netherlands, Belgium, France, Poland and Czech Republic) to the South (Greece, Italy and Spain). Data are collected by face-to-face, computer-aided personal interviews (CAPI), supplemented by a self-completion paper and pencil questionnaire, which collects self-assessments and vignette evaluations as part of the COMPARE project. We select only those respondents who have answered both to the self-evaluation question and to at least one vignette and our final sample is composed by 7,320 individuals. Table 1 summarizes their main characteristics. Respondents are prevalently females (55%) and 64 years old on average. About 78% of them live with a partner, even though this percentage falls to 54% for individuals older than 75 years. More than 50% are retired from work (86% among 76+ respondents) and only 30% of them are still working. As expected, bad health conditions are more frequent among older age groups; in particular, the percentage of respondents who have one or more limitations with activities of daily living (adl) or instrumental activities of daily living (iadl) among the 76+ is five times higher than that of respondents younger than 55 years.

Table 1 Description of the variables included in the regressions

Our measure of subjective well-being is obtained by the question:

“How satisfied are you with your life in general?”

Respondents’ self-assessments are measured according to the scale “Very dissatisfied”, “Dissatisfied”, “Neither satisfied nor dissatisfied”, “Satisfied”, “Very satisfied”.

In order to investigate the relationship between age and life satisfaction, we divide respondents into five age classes (younger than 56 years, between 56 and 60, between 61 and 65, between 66 and 75 and older than 75 years). However, it is worth noting that working with cross-sectional data does not allow disentangling between age, cohort and period effects. Figure 1 reports the answers provided by respondents to the self-assessment question. Self-evaluations of life satisfaction seem to be substantially homogenous across age groups, apart from the very old individuals. The proportion of individuals satisfied with their life is about 80 percent until 75 years, then it falls to 70%.

Fig. 1
figure 1

Life satisfaction self-assessments by age group

In order to implement the methodology introduced in the previous section, this self-evaluation is followed by two anchoring vignettes:

  1. 1.

    John is 63 years old. His wife died 2 years ago and he still spends a lot of time thinking about her. He has 4 children and 10 grandchildren who visit him regularly. John can make ends meet but has no money for extras such as expensive gifts to his grandchildren. He has had to stop working recently due to heart problems. He gets tired easily. Otherwise, he has no serious health conditions. How satisfied with his life do you think John is?

  2. 2.

    Carry is 72 years old and a widow. Her total after tax income is about € 1,100 Footnote 3 per month. She owns the house she lives in and has a large circle of friends. She plays bridge twice a week and goes on vacation regularly with some friends. Lately she has been suffering from arthritis, which makes working in the house and garden painful. How satisfied with her life do you think Carry is?

Analysing the distribution of the vignettes evaluations at a pure descriptive level may help understanding how they can be used to purge self-assessments from individual heterogeneity in reporting styles. Indeed Fig. 2 presents how respondents rate the level of life satisfaction of the persons described in the two vignettes (John and Carry) according to the age groups defined before. While vignette 1 shows ratings invariant across age groups, the figure for vignette 2 seems to indicate that response scales cannot be invariant across age groups, because of a decreasing pattern over age. The youngest respondents in our sample are more likely to consider Carry as satisfied or very satisfied with her life, about 10 percent higher than the oldest ones. Note also that there is consistency in the responses to the vignette questions, since in each age group Carry is rated as more satisfied with her life than John.

Fig. 2
figure 2

Vignette evaluations by age group

Even though individuals living in different countries may adopt different reporting styles in life satisfaction self-assessments (Angelini et al. 2008), the same evidence does not appear when conditioning on the age of these respondents. In the next section our estimation method will exploit the variability in vignette evaluations to assess to what extent the small differences in Fig. 1 are genuine or they hide larger differences in the response scales used by respondents.

4 Results

The richness of information collected in SHARE allows us to include a large number of variables in our model specification: demographics (gender and age), education, employment status (being at work, retired or out of work), marital status and physical health status (adl and iadl limitations), as well as country dummies. Age is included in the model through dummy variables generated from the underlying age class. We choose this specification because it is more flexible and it allows us to keep an agnostic view on the shape of the relationship between age and life satisfaction. Footnote 4

We estimate both the Hopit model described in Sect. 2 and a baseline model that does not allow for any threshold variation across respondents, which is an ordered probit model that does not take into account potential differences in reporting styles. Results are showed in Table 2: column 1 reports the estimates of the model not accounting for reporting style heterogeneity, column 2 those of the Hopit model, while the estimates for the threshold equations of the Hopit model are reported in columns 3 to 6. For both specifications, standard errors are robust to arbitrary correlation at the household level.

Table 2 Hopit model, determinants of life satisfaction

The main result is that age is positively related to the self-evaluation of the life satisfaction and this is in line with the conclusions of Yang (2008), who shows that the overall levels of life satisfaction increase with age, ceteris paribus. However, using the estimates of the Hopit model of Table 2, we find the presence of a trade-off between counterbalancing effects for life satisfaction in later life, as discussed in the Introduction. On the one hand, the own perceived level of life satisfaction increases with age; on the other hand, the individual thresholds that determine whether an individual is satisfied with her life are shifted up with age (that is, given the same true level of life satisfaction, older respondents are more likely to rank themselves as “dissatisfied” with their life than younger individuals). This result can be explained by looking at the Hopit estimates of the self-reported level of life satisfaction (column 2) and the threshold equations (columns 3–6). In our sample, respondents younger than 56 years are the most dissatisfied, followed by respondents in the 56–60 year class, while there are no statistical differences between all age classes of respondents older than 60 years. As regards the estimates for the threshold equations, there are a few significant variables, which also have a negative sign and are mostly related to the younger age classes. This means that the threshold for discriminating between dissatisfaction and satisfaction with life is shifted up for older respondents.

Estimates of country dummies confirm the results of Angelini et al. (2008). Correcting for DIF, Dutch and Swedish people are the most satisfied, while Czech respondents are the most dissatisfied. All other individual variables in the self-assessment equation of the Hopit model are significant and consistent with the literature. Women seem to be happier than men and being married is associated with a higher level of self-reported life satisfaction. Life satisfaction increases with the level of education and, as regards occupation, respondents who are at work are more satisfied than those who do not currently work (both because they are retired or because they are out the labour force for other reasons). In other words, people with a high socio-economic status are happier than the others. Not surprisingly, life satisfaction is strongly and negatively related to the presence of health problems. This can also be seen in Fig. 3, which shows that ceteris paribus having limitations with instrumental activities of daily living (IADL) is associated with lower levels of life satisfaction for all age groups. The variables that mostly affect the thresholds are the country of residence, age and health variables. The other characteristics included in the regression are significantly correlated with individuals’ level of life satisfaction but not with the thresholds.

Fig. 3
figure 3

Model predictions, life-satistifaction and the presence of IADL

The findings on the relationship between age and life satisfaction in this section are not in contrast with the descriptive statistics of the previous section. In the Hopit model, parameters on age variables are estimated conditioning on all the other explanatory factors. Figure 1 does not account for these differences. Indeed, older respondents are more likely to suffer from bad health conditions, live alone or be less educated than younger respondents. Summing up all these conditions, older respondents have an overall level of life satisfaction smaller than young respondents. This relationship reverses when we analyse the age effect net of all the other explanatory variables included in the model.

It is worth noting that the baseline model (column 1) is not able to stress such a strong age effect on life satisfaction and many parameter estimates are different in terms of magnitude and significance with respect to the Hopit model. Indeed, the baseline model can be strongly rejected against the model allowing for the correction of DIF by a formal Wald test.Footnote 5

4.1 Counterfactuals

In this section we simulate counterfactuals in order to assess the relevance of the DIF bias across age groups. Indeed the main objective of the vignette methodology is to estimate the DIF for each respondent and correct for it. The procedure is very easy. The analyst can choose a benchmark and then compute adjusted distributions of the observed variable for all respondents, using the benchmark scale instead of the respondent’s own scale. Usually the benchmark is defined as the scale of one country. However, as we want to compare life satisfaction across socio-economic groups, the flexibility of the vignette approach allows us to define other benchmark scales. In particular, we first use as a benchmark the scales of a specific age class and we calculate how many people in the age class A would report to be satisfied with their life if they used the response scales of respondents in the age class B. Then, we repeat the same exercise using as a benchmark the reporting styles of the persons reporting limitations with IADL.

In Figs. 4 and 5 we compare the proportions of men (women) satisfied with their life if all men (women) used respectively the response scales of men (women) younger than 56 years and the response scale of men (women) older than 75 years. Using the response scales of the youngest group in our sample, the proportion of satisfied individuals becomes larger than when we use the response scales of the oldest respondents in our sample, in particular for women. This means that, given the same true level of life satisfaction, older respondents are more likely to rank themselves low in the 5-point scale for life satisfaction than younger individuals. Figure 6 shows that what is true for older respondents also holds for respondents with IADL limitations.

Fig. 4
figure 4

Counterfactual simulations: age-specific thresholds, men

Fig. 5
figure 5

Counterfactual simulation: age-specific thresholds, women

Fig. 6
figure 6

Counterfactual simulation: thresholds varying with the presence of IADL

5 Discussion and Caveats

Our approach crucially relies on the assumptions of response consistency and vignette equivalence.

As regards vignette equivalence, it is important to note that there is consistency in the responses to the vignette questions, since in each country Carry is always rated as more satisfied with her life than John. Of course, “we still need to be careful of question wording, question order, accurate translation of different items, sampling design, interview length, social background of the interviewer and the respondent” (King et al. 2004, p. 199) because missing cultural differences across subsets of respondents might threaten this approach. Nonetheless, since both the self-assessments and the vignettes are measured with DIF, the fact that the same vignette is not interpreted in the same way in different cultures by different people is not a problem in itself. What would be a problem is if the nature of the DIF differed for the vignettes and the self-assessments, but we can reasonably assume that this difference is close to zero because this approach requires the same person (with the same biases!) to answer both the self-assessment and the vignette questions.

Testing the response consistency assumption requires the availability of an objective measure of the construct of interest, which is hard to find in the context of life satisfaction. In fact, given the multidimensionality of life satisfaction, we should take into account a variety of aspects of the life of individuals to devise a meaningful indicator. Indeed, self-evaluations are widely used in empirical research because they summarize in a single factor all this information, mostly unobserved to researchers. The existence of such objective indicators would question the necessity of collecting self-assessments as well.

Most importantly, the main concern in the literature about the validity of the response consistency assumption (see also Datta Gupta et al. 2010, and Bago d’Uva et al. 2009) is that, in the case of work limiting health problems or drinking behaviour, individuals might have clear incentives to misreport their actual condition due to social norms. As an example, working age individuals who are out of the labour force or unemployed might have an incentive to misreport their disability status to rationalize their labour market condition (justification bias) but not the disability status of the individuals described in the vignettes.

In the case of life satisfaction it is not clear which incentives could lead individuals to adopt different reporting styles when rating themselves or vignettes. The finding in social-psychology that assessments of oneself and others might differ is only due to the fact that respondents have less information about others. However, vignettes provide respondents with all the necessary information to evaluate the persons there described. In addition, the SHARE questionnaire explicitly asks respondents to evaluate vignettes according to their own preferences. In other words, our survey instrument is designed in order to support the validity of the response consistency assumption by requiring the same response scale for self-evaluations and vignette ratings.

6 Conclusions

Ageing is one of the greatest social and economic challenges of the twenty-first century for European societies. Europe is the continent with the highest proportion of people aged 65 or over and the proportion of older individuals is steadily rising. Accounting for this ageing process in Europe, it is easy to understand the growing interest of policy makers and scientists on having good social indicators to assess quality of life in the overall population and in population subgroups, as people aged 65 or over.

A limit of previous studies is that, when asked to self-report their life satisfaction, individuals, who are similar for both economic and non-economic conditions, can use different benchmarks or scales in evaluating themselves. In this paper, we analyze the relationship between age and life satisfaction in later life. In order to overcome the scale bias, we apply the vignette methodology to data from the Survey of Health, Ageing and Retirement in Europe (SHARE), a unique and innovative multidisciplinary dataset containing a large amount of information on both the economic and non economic conditions of individuals aged 50 and over.

Our main results can be summarized as follows. First, by comparing estimates from a standard Ordered Probit with those from a model in which vignettes are used to correct for the DIF bias (Hopit model), we find a significant effect of scale biases on estimated coefficients. Indeed, a formal Wald test strongly rejects the ordered probit against the more general Hopit model. Moreover, thresholds significantly depend on the explanatory variables used in the regressions. Second, both looking at the raw self-reported data and after controlling for potential differences in individual reporting styles, age variations in the level of life satisfaction are rather small. Larger variations are observed only for the oldest old (76+ individuals). Third, we gain a deeper understanding of the reasons for such age invariance. On the one hand, controlling for the effects of all other variables, the own level of life satisfaction increases with age; on the other hand, the individual thresholds that determine whether a respondent is “satisfied” with her life are shifted up with age; that is, given the same true level of life satisfaction, older respondents are more likely to rank themselves as “dissatisfied” with their life than younger individuals. The two effects work in the opposite directions. Finally, we highlight health problems and physical limitations as potential sources of scale biases for older individuals. As reasonably expected, detrimental health conditions affect the self-assessments both directly by reducing the level of life satisfaction and indirectly by pessimistically biasing the reporting style of respondents.