Conversion of Verbal Response Scales: Robustness Across Demographic Categories

Happiness and life satisfaction have traditionally been measured using verbal response scales, however, these verbal scales have not kept up with the present trend to use numerical response scales. A switch from a verbal scale to a numerical scale, however, causes a severe problem for trend analyses, due to the incomparability of the old and new measurements. The Reference Distribution Method is a method that has been developed recently to deal with this comparison problem. In this method use is made of a reference distribution based on responses to a numerical scale which is used to decide at which point verbally labelled response options transit from one state to another, for example from ‘happy’ to ‘very happy’. Next, for each wave of the time series in which the verbal scale is used, a population mean is estimated for the beta distribution that fits best to these transition points and the responses in this wave. These estimates are on a level that is comparable to that of the mean of the reference distribution and are appropriate for use in an extended time series based on the responses measured using a verbal and a numerical scale. In this paper we address the question of whether the transition points derived for the general population can be used for demographic categories to produce reliable, extended time series to monitor differences in trends among these categories. We conclude that this is possible and that it is not necessary to derive transition points for each demographic category separately.

general population can be used for demographic categories to produce reliable, extended time series to monitor differences in trends among these categories. We conclude that this is possible and that it is not necessary to derive transition points for each demographic category separately.
Keywords Comparability of rating scales Á Beta distribution Á Reference distribution Á Demographic categories Á Happiness Á Satisfaction with life

Introduction
A change from using a verbal to using a numerical scale to measure happiness and life satisfaction disrupts the trends for demographic categories, and these discontinuities due to a change in scales have to be fixed in such a way that trend analyses can be made in an accurate way based on an extended time series which includes responses measured using the old verbal scale and those measured using the new numerical scale.
1.1 The Need for, and Consequence of, a Change from Using a Verbal Scale to Using a Numerical Scale Happiness and life satisfaction have traditionally been measured using verbal response scales, however, these verbal scales have not kept up with the present trend to use numerical response scales. An increasing body of research (e.g. OECD 2013) states that numerical scales are more suitable for measuring such constructs, because use of numerical scale minimises categorisation effects, improves data analysis and produces more reliable data. Furthermore, numerical scales are especially suitable for telephone interviews, because they are time saving and prevent response-order effects (Scherpenzeel 1999). At present, 11-and 10-point scales, with the anchor points 1 defined by verbal labels, are used most frequently to measure subjective well-being (Voorpostel et al. 2009;OECD 2013;Van Beuningen et al. 2014).
Changing from a verbal scale to a numerical scale leads to an instantaneous discontinuity in the existing time series. This causes a severe problem for trend analyses, due to incomparability of the old and new measurements, meaning that, with the introduction of the numerical scale, current trends in well-being across demographic categories cannot be continued for long time series unless the discontinuity caused by the change of the scale type can be overcome in a suitable way.

The Reference Distribution Method and the Research Question
The Reference Distribution Method is a method that has been developed recently to deal with the kind of problem presented above (DeJonge et al. 2014a). This method relies heavily on the Continuum Approach of Kalmijn (2010, Ch. VI) which is a method that is used to estimate a mean and standard deviation in a population based on the beta distribution that fits best to the points, upper boundaries, on a continuum bounded by 0 and 10 at which response options from a verbal or numerical primary scale transit from one state to another, for example from 'happy' to 'very happy', combined with the frequency distribution of this primary scale.
In the Reference Distribution Method the Continuum Approach is applied to estimate a reference distribution based on the response on a discrete numerical scale. This reference distribution is then used to determine at which transition points the upper boundaries of the response options from a verbal scale are positioned. Next, the Continuum Approach is used to estimate a population mean for the beta distribution that fits best to the upper boundaries determined in the previous step. This is done for each wave of a time series. These estimates are comparable to the mean of the reference distribution and can be used to extend a time series and to combine the responses measured using the verbal and numerical scales. In the remainder of this paper we will refer to the transition points derived from the reference distribution as 'reference boundaries'.
To date, in the development of the Reference Distribution Method the focus has been placed on the general population in nations and no distinction has been made between demographic groups within a nation. The research question we addressed in this paper is: Can the reference boundaries derived for the general population be used for demographic categories to produce reliable extended time series to monitor differences in trends among these categories?

Plan of this Paper
We had data sets from Statistics Netherlands and the Eurobarometer at our disposal for different demographic categories which we could use to answer the research question. These data are described in Sect. 2. The Reference Distribution Method, which we explain in Sect. 3, provides the basis for all the results presented in this paper. We applied the Reference Distribution Method to estimate reference distributions and derive reference boundaries from these for different demographic categories and different verbal scales. We will elaborate on this in Sect. 4. Using the reference boundaries we found, we estimated population means and the means for all the demographic categories for each wave of a time series on a continuum from 0 to 10. We use these estimates to answer the research question in Sect. 5. We end with a discussion in Sect. 6 and the main conclusions in Sect. 7.

Data
An application of the Reference Distribution Method for trend analyses in different demographic categories requires time series of the responses to each of these categories on a verbal scale and for one wave of measuring the same topic on a numerical scale. A set of time series based on survey items 2 taken from Statistics Netherlands data and the Eurobarometer provided the data for our application. In 2012 Statistics Netherlands included the questions on happiness and life satisfaction in an experiment with a split-half design for the Social Cohesion Survey ( Van Beuningen et al. 2014). For this survey respondents were at random assigned to a group that had to rate the traditional 5-point verbal scales and a group that had to rate the same questions on a numerical 11-point response scales of which only the anchor points had verbal labels. The question on happiness with the 11-point numerical scale was phrased as: ''On a scale from 0 to 10 can you indicate to what extent you consider yourself to be a happy person? A score of 0 refers to being completely unhappy and a score of 10 refers to being completely happy''. The question on life satisfaction with a numerical scale was worded as: ''On a scale from 0 to 10 can you indicate to what extent you are satisfied with the life you currently lead? A score of 0 refers to being completely dissatisfied and a score of 10 to being completely satisfied''.
Data in the experiment were collected using a sequential mixed mode design. People were sent an invitation and two reminder letters asking them to fill out the questionnaire online. Those who did not respond to this invitation were interviewed by phone if a telephone number was available. When no telephone number was available people were interviewed face-to-face at their home.
The survey for the split-half experiment was distributed among respondents of 15 years and older, but for this paper only respondents 18 years and older were included in the analyses, which gave a total number of 7,641 respondents. Distinctions were made between gender, age, {18-25, 25-35, 35-45, 45-55, 55-65 and 65 and older}, and education, {low, middle and high}, to differentiate demographic categories of respondents. The frequency distributions for the two 5-point scales and the two 11-point scales for each of these categories are included in Tables 3, 4, 5 and 6 in Appendix 1.

Satisfaction with Life: The Eurobarometer
The Eurobarometer is a series of public opinion surveys that have been conducted regularly in the member states of the European Union on behalf of the European Commission since 1973. The standard version of the Eurobarometer contains an item on life satisfaction with the leading question 'On the whole how satisfied are you with the life you lead?' and a 4-point verbal scale with response options 'Not at all satisfied', 'Not very satisfied', 'Fairly satisfied' and 'Very satisfied'. The standard Eurobarometer has a spring wave and an autumn wave and each survey consists of approximately 1,000 face-to-face interviews per participating country. In the autumn of 2011, prior to the standard survey, version 76.2 of the Eurobarometer was launched, in which the item for life satisfaction had to be rated on a 10-point numerical scale, with the anchor points labelled 'Very dissatisfied' and 'Very satisfied' (European Commission 2012b).
Since we are interested in time series for life satisfaction, we selected only countries which had participated in the European Union for more than 10 years. In the fifteen European countries that joined the European Union before 2004, life satisfaction was measured for the entire period between 1997 and 2012 and also the item with the 10-point numerical scale was fielded in all these countries (Schmitt et al. 2008;European Commission, 2012a, b, 2013. The number of respondents to the Eurobarometer per country was too low to distinguish all the demographic categories we distinguished in the split-half experiment of Statistics Netherlands. Therefore, we divided the countries into two groups of countries: • Northern Europe: Denmark, Sweden, Finland, Germany, Great Britain, The Netherlands, Belgium, France, Ireland, Luxemburg, Austria • Southern Europe: Portugal, Spain, Italy and Greece Then, for these two groups, we distinguished similar demographic categories to those used for the items of Statistics Netherlands. Only the three categories indicating the education level of a respondent are not fully comparable between the two datasets. We labelled these categories as 'low', 'middle' and 'high'. In the Eurobarometer the education level classification is: in education until the age of 15, until the age between 16 and 19 years old and until an age of 20 years or over. The frequency distributions for the 5-point scale and the 10-point scale for each of these categories and the European regions can be found in Tables 7, 8, 9 and 10 of Appendix 2. It is interesting to compare the frequencies with which the upper anchor points have been ticked in The Netherlands on the numerical scale of Statistics Netherlands scale and in Northern Europe on the numerical Eurobarometer scale. This frequency is more than three times higher in Northern Europe than in The Netherlands. This difference is rather due to the labelling of each of the anchor points than to a real difference in satisfaction: the wording 'Completely satisfied' used by Statistics Netherlands is likely to be interpreted differently from the wording 'Very satisfied' used in the Eurobarometer. Moreover the Northern Europeans ticked the anchor point more frequently than the preceding option labelled '9'. This phenomenon according to which the frequency with which respondents tick a '10' stands out and is sometimes higher than the frequency by which a '9' is ticked by respondents in the same sample is not unique as has been described by Brulé and Veenhoven (2014), who named this the '10-excess phenomenon'.

The Reference Distribution Method
The idea that underlies the Reference Distribution Method is that the estimated means for equivalent questions about the same topic asked in different representative surveys in 1 year should be approximately the same when compared irrespective of the primary response scales used.

Application of the Continuum Approach to Derive a Reference Distribution
In this section we will demonstrate an application of the Continuum Approach to derive a reference distribution. An outline of the Continuum Approach is given in Appendix 3. We assume that a discrete numerical scale with 10 or 11 points is available to derive a reference distribution. Furthermore, we assume that the upper boundaries of the response options of this scale are equidistant distributed over the 0-10 continuum. This assumption is a methodological choice (Kalmijn 2013), which provides a very useful basis for the Reference Distribution Method.
How we applied the Continuum Approach to derive a reference distribution from the life satisfaction ratings on the numerical scale of the split-half experiment of Statistics Netherlands is shown in Fig. 1. In this figure the cumulative frequencies for life satisfaction are plotted as vertical bars at equal distances, starting at 0.91 for the response option at the lower end of the scale and ending with 10.0 for the option at the upper end of the scale. The best fitting beta distribution to the frequency distribution obtained by application of the Continuum Approach is depicted as a curve. It is clear that the beta distribution does not perfectly fit to the cumulative frequencies on the discrete numerical scale, but it is an approximation which underestimates the cumulative frequency at some points and overestimates it at other points.
The parameters of the beta distribution in Fig. 1 are a = 8.31 and b = 3.05, which, according to Eq. 3 in Appendix 3, corresponds to a mean of 7.31.

Illustration of the Application of the Reference Distribution Method
An application of the Reference Distribution Method to derive reference boundaries for the response options of the verbal scale is explained in this section. The method itself is described in detail in DeJonge et al. (2014a). We applied the Reference Distribution Method to the reference distribution we found in the previous section for the verbal scale of the life satisfaction item of the split-half experiment and the ratings of the respondents of 18 years and older on this scale. The frequency distribution of these ratings was The corresponding cumulative frequency distribution is shown as a stack diagram on the left side of Fig. 2. The reference distribution is depicted to the right side of this cumulative frequency distribution.
The procedure to determine the reference boundaries between the response options of the primary verbal scale on the continuum from 0 to 10 is as follows, see also Fig. 2. For each response option, a horizontal line is drawn from the cumulative frequency displayed for the verbal scale to the point where it meets the reference distribution. At this point the value of the reference distribution is equal to the cumulative distribution on the verbal scale. A vertical line is drawn to the horizontal axis from this point downwards. The value at which the vertical line meets the horizontal axis is the position of the reference boundary of the corresponding response option. Following this procedure, the reference boundaries for the response options on the 0 to 10 continuum are consecutively 5.15, 6.09, 7.83, 9.00 and 10.00.
The reference boundaries and the cumulative frequencies measured on the verbal scale can in turn be used as input for the application of the Continuum Approach.

Reference Boundaries in Different Demographic Categories
People will interpret verbal response options differently for all kinds of reasons such as personality, cultural context, demographics characteristics or the context of the scale. This makes it unlikely that the reference boundaries for the response options of a given scale for the general population are equal to those for subcategories with specific characteristics.

Reference Boundaries and Estimated Means for Demographic Categories
To apply the Reference Distribution Method in an appropriate way, we had to adjust the verbal scale for happiness from Statistics Netherlands, because only a few of the respondents had ticked the option 'unhappy' in the split-half experiment. For the application of the reference distribution method such response options have to be combined with the preceding or succeeding option to obtain proper reference boundaries. Except for the age category from 55 to 64 years, in all demographic categories \1 % of the respondents selected this option as can be seen from Table 5 of Appendix 1. As a consequence, the application of the Reference Distribution Method will return a reference boundary equal to zero for this option for respondents aged from 35 to 44 years. Since the frequency with which the option 'Not very happy' was chosen was also small in all demographic categories, we combined the options 'Unhappy' and 'Not very happy' for the application of the Reference Distribution Method. We applied the Reference Distribution Method to obtain reference boundaries for the verbal scales for life satisfaction of Statistics Netherlands and the Eurobarometer and the adjusted verbal scale for happiness of Statistics Netherlands. For each demographic category and each topic, either happiness or life satisfaction, we first determined the best fitting beta distribution to the cumulative frequencies measured on the corresponding numerical scale from the split-half experiment of Statistics Netherlands or Eurobarometer version 76.2 as we illustrated in Fig. 1. Next, we used these best fitting beta distributions to derive the reference boundaries between the response options of the verbal scales for each demographic category in the way we demonstrated in Fig. 2 based on the related The results are displayed in Fig. 3 in which the response scales are visualized as vertical bars in which each interval represents the degree of appreciation expressed by a response option and in which the reference boundaries separate two adjacent response options. The estimated mean for a demographic category d in Fig. 3 is, in accordance with formula (3) in Appendix 3 equal to 10*a d /(a d ? b d ), with a d and b d-the shape parameters of the beta distribution that fits best to the reference boundaries and the cumulative frequency distribution for the given demographic category.
From Fig. 3 it is obvious that differences in reference boundaries between demographic categories and the general population are most prominent for age categories and for categories based on the education level of the respondents. Apparently, gender does not make much of a difference, not in the split-half experiment or in the Eurobarometer.
The levels of the estimated means for Eurobarometer version 76.3 cannot be compared to those for life satisfaction in the split-half experiment, because the reference distribution in the split-half experiment was based on an 11-point numerical scale, whereas in the Eurobarometer a 10-point numerical scale was used. 6 For this reason, the estimated means for the split-half experiment are visibly higher than those for the Eurobarometer. In addition, the anchor points of both numerical scales for life satisfaction were labelled differently.
The patterns of estimated means for the Eurobarometer are similar for Northern and Southern Europe, except that there is a substantial difference in the level: life satisfaction in Southern Europe is, in all demographic categories, lower than in Northern Europe. This comes with noticeably differences in reference boundaries between the two clusters of European countries.
As we stated above, the differences in boundaries can be attributed to a number of reasons. For the differences between age categories, it is imaginable that for example the response time of younger people is shorter than that of older people, whereas persons belonging to the latter category have a longer life experience than those who are younger. Both aspects may, among others, have an effect on the response patterns. The reasons for differences in boundaries however, fall outside the scope of this paper and have been addressed by others, see for example Hazelrigg and Hardy (2000), Storm et al. (1996), the National Research Council (2013), and DeJongeet al (2014b). We recall that the research question addressed in this paper is whether the reference boundaries derived for the general population can be used for demographic categories to produce reliable extended time series to monitor differences in trends between these categories.

Differences in Estimated Means
A first step to answer the research question is to look at the differences in estimated means for demographic categories between the situation in which the reference boundaries for the general population are used and the situation in which category-specific boundaries are used. In Table 1 the estimated means as presented in Fig. 3 are given in the columns headed by 'Reference boundaries category'. The estimated means based on the best fitting beta distributions to the reference boundaries for the general population and the cumulative 6 The boundaries between the response options of a 10-point numerical scale are positioned at different values than those between the response options of an 11-point numerical scale. frequencies measured in each demographic category are given in the columns headed 'Reference boundaries population'.
What is clear from Table 1 is that the differences between the estimated means obtained using the category-specific reference boundaries or by using the reference boundaries for It would be premature to conclude on basis of Table 1 that category-specific reference boundaries should be used to estimate a mean for the demographic category. It very much depends on the purpose of the transition to the 0-10 continuum. If one is interested in the absolute difference in happiness or life satisfaction between demographic categories on a continuum from 0 to 10, than it seems evident to use category-specific reference boundaries. This is not obvious though, if one is interested, as we are given our research question, in whether or not trends in happiness and life satisfaction evolve in the same way in different demographic categories. In this case the absolute difference in happiness or life satisfaction is of less importance as long as the development of the trend in each category shows a reliable pattern after transition to the 0-10 continuum. From this perspective, to be able to answer the research question requires insight into the stability over time of the differences between the estimated means on the continuum for each demographic category obtained using the category-specific boundaries or the boundaries for the general population.

Trends in Estimated Means in Different Demographic Categories
The evolution of happiness and life satisfaction over time may differ across demographic category. These differences in evolution have to be preserved after estimation of the corresponding means on a continuum from 0 to 10, independent of whether reference boundaries for the population of a whole are used or category-specific boundaries. If the latter is the case, then it suffices to apply the reference boundaries for the general population to obtain estimated means on a 0-10 continuum for monitoring differences in trends in extended time series.

Trends in Estimated Means over Time for the General Population
We The application of the Continuum Approach resulted in a best fitting beta distribution for each item and each wave in the corresponding period. We used the estimated parameters of each of these beta distributions to estimate a population mean, the results of which are depicted in Fig. 4.
It is clear from Fig. 4 that there has been little variation in the levels of happiness and life satisfaction over the years in The Netherlands. The average level of life satisfaction in The Netherlands is somewhat lower than that for happiness. In 2012 the levels for both items of Statistics Netherlands dropped. Since no data were collected in 2010 and 2011, it cannot be said whether this is due to a different measurement techniques, or it is an effect of the economic crisis or it has some other cause.
Compared to the flat lines for happiness and life satisfaction based on the surveys of Statistics Netherlands, the time series of estimated Eurobarometer means show a much more fluctuating pattern. Moreover, the time series for Northern Europe shows a slightly upward trend, whereas in Southern Europe the trend is heading downwards.

Differences in Estimated Means over Time Depending on the Boundaries Used
Just as we did for the general population, we applied the Continuum Approach to the responses for each demographic category for each wave of each item. For each wave we applied the Continuum Approach twice, one time making use of the reference boundaries for the general population and one time making use of the category-specific reference boundaries which are shown in Fig. 3.
As a result we found two best fitting beta distributions for each item and each wave, for which we, just as before, could use the parameters to estimate means on a 0-10 continuum for the corresponding demographic category. As was to be expected and in line with what we presented in Table 1, the estimated means based on the category-specific reference boundaries are not equal to the estimated means based on the reference boundaries for the general population. The differences between the estimated means based on the category-specific boundaries and those based on the boundaries for the entire population however, turn out to be very stable over time. We have summarized this in Table 2, which contains two columns for each item. The first of these columns contains the average difference over the number of waves in the period of observation between the estimated means. The second column contains the standard deviation of these differences from the average difference.
The averages in Table 2 are similar to those presented in Table 1. The standard deviations are very small for each demographic category and each item indicating that the differences are rather stable over time. The, in the absolute sense, largest average difference is once more found in the age category 25-35 years for the item of Statistics Netherlands on life satisfaction. A possible explanation for finding the largest difference for this particular category may be that at this stage in their lives individuals in this category must deal with numerous changes, many of which may affect their response to surveys. We have depicted the differences for this demographic category in Fig. 5. From Fig. 5 it can be seen that for the two items of Statistics Netherlands, the estimated means based on the boundaries for the general population are higher than the estimated means based on the category-specific boundaries, and that this is the opposite for the Eurobarometer item in both Northern Europe and Southern Europe. The patterns of the time series are, however, consistent from which we conclude that if the main interest of a piece of research, as here, is to determine how trends develop then which reference boundaries are used for the conversion becomes less important.
The results shown in Table 2, supported by the visualization in Fig. 5, lead us to conclude that the reference boundaries define the levels of the estimated population means, but do not influence the evolution of the trends. This conclusion makes sense for the flat patterns of the time series for happiness and life satisfaction of Statistics Netherlands, and for the fluctuation patterns for life satisfaction of the Eurobarometer the trend for which go in a different direction in Northern Europe and Southern Europe.

Discussion
Using three different populations, The Netherlands, Northern Europe and Southern Europe, and items from two surveys, we showed that, from the perspective of monitoring trends in happiness and life satisfaction among demographic categories, it is not necessary to use category-specific reference boundaries to obtain estimated means on a 0-10 continuum if the Continuum Approach is applied. Although we covered some variants for which this conclusion is valid, it would be worthwhile applying the Reference Distribution Method presented in this paper to other topics than happiness and life satisfaction, for example to social cohesion or self-reported health. This would allow us to validate the conclusions presented here more generally. In Sect. 4.1 we remarked that the levels of the estimated means for the Eurobarometer cannot be compared to those for life satisfaction from the split-half experiment of Statistics Netherlands, due to the different number of response options on the numerical scales used in the surveys. Since we concluded that it is not necessary to use category-specific boundaries to make the transition to the estimated means on a 0-10 continuum, the numerical scale for life satisfaction from the split-half experiment could be used to derive reference boundaries between the response options of the verbal scale of the Eurobarometer item based on the response of the general population measured in 2012. These boundaries in turn could be used to apply the Continuum Approach to the Eurobarometer time series of the different social categories in Northern Europe and Southern Europe. This would enable comparison of trends in life satisfaction among demographic categories in The Netherlands with equivalent categories in Northern Europe and Southern Europe.
The numerical scale from the Eurobarometer was only used in 2011, and as there was no wave of the survey of Statistics Netherlands in this time, this numerical scale from the Eurobarometer cannot be used to apply the Reference Distribution Method to derive reference boundaries for the verbal scale of the life satisfaction item of Statistics Netherlands. There are however, other surveys, such as the European Social Survey, that contain an item on life satisfaction with a numerical scale and that have been fielded over a number of years. The response to such a survey item in a year when both the Eurobarometer and the survey of Statistics Netherlands were fielded could be used substituted to estimate a reference distribution. This reference distribution could then be used accordingly to derive reference boundaries for the general population for the verbal scales of the Eurobarometer item and the Statistics Netherlands item.
In this paper we distinguished between gender, age, and education to differentiate demographic categories of respondents. Another distinction that might have been of interest is that of ethnicity. The conversion of verbal terms into numerical scales is likely to vary between people from a different cultural and linguistic background. The data used for this paper, however, fell short to make this distinction, at least for The Netherlands. We will leave this therefore as a potential direction for future research.
For the trend analyses in this paper we have only focused on mean happiness within demographic categories. We are aware, however, that in practice scholars are often not interested in just the mean, but want to know what the interrelations of the means in subgroups of these demographic groups are, for example from the perspective of inequality or marginalization. Although this falls outside the scope of this paper, we have spent some words on the technique to determine means for subgroups based on a given beta distribution in Appendix 4.

Main Conclusion
In this paper we addressed the question of whether the transition points derived for the general population can be used for demographic categories to produce reliable extended time series to monitor differences in trends among these categories. We conclude that this is possible and that it is not necessary to derive transition points for each demographic category separately.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix 1: Frequency Distributions Split-Half Experiment Statistics Netherlands
See Tables 3, 4, 5 and 6.      the larger the values of a and b, the more peaked the density curve and the steeper the cumulative distribution curve are. A starting point for the Continuum Approach to happiness is provided by the cumulative frequencies of measured happiness on a discrete primary scale and the values on the continuum from 0 to 10 at which respondents change their judgment from one to the adjacent response option on this primary scale, for example from 'happy' to 'very happy'. On basis of the cumulative frequencies and the values on the continuum of the boundaries between the response options of the primary scale, the shape parameters a and b of the best fitting beta distribution are estimated in the Continuum Approach as maximum likelihood estimators. This estimation procedure is described into more detail in Kalmijn (2010;pp. 160-162). There is always a perfect fit in the case of a primary scale with three response options. If the number of response options is restricted to only two, then there is no single solution: the number of perfectly fitting beta distributions is infinite and the application of the Continuum Approach is useless. In the case of at least four response options, then in general there will be no perfectly fitting beta distribution and the best fitting solution should be taken. Those who are interested in the methodological considerations of the Continuum Approach can find more information about it in Kalmijn (2010, Ch. VI) and Kalmijn et al. (2011).

Appendix 4: Means for Subgroups Derived from a Beta Distribution
Suppose that the parameters a and b of the estimated beta distribution for happiness or life satisfaction in a population are given. The mean happiness or life satisfaction within subgroups of this population can accordingly be estimated by making use of the inverse beta distribution. 7 In Fig. 1 we have depicted the beta distribution for life satisfaction in The Netherlands in 2012. The parameters of this distribution are: a = 8.31 and b = 3.05. The value of the inverse of this beta distribution for the 25th percentile point is equal to 6.5, meaning that the estimated value of life satisfaction is 6.5 or lower for the subgroup of the 25 % least satisfied people in the population. In a similar way the inverse of the beta distribution at the 75th percentile point is equal to 8.3, meaning that the value of life satisfaction for the subgroup of the 25 % most satisfied people is at least 8.3. The inverse beta distributions of the reference distributions derived from the numerical scales of the split-half experiment of Statistics Netherlands and the Eurobarometer are shown in Fig. 7. In each graph of this figure we have also depicted four quartile groups. The most left of the quartile groups represents the least happy or satisfied people in the population and the most right quartile group represents the most happy or satisfied people. It can, for example, be seen in Fig. 7 that the degree of life satisfaction in 2012 within the second quartile group in The Netherlands varies from 6.5 to 7.5. In the corresponding subgroup in Southern Europe the degree of life satisfaction varies from 5.2 to 6.6.
We calculated the value of the inverse beta distribution for 25.000 equidistant points in each quartile group. The average of this large number of values for a quartile group is an estimate of the mean happiness or life satisfaction in this group. The estimates of the mean life satisfaction in The Netherlands calculated in this way for the four quartile groups are equal to respectively 5.6, 7.0, 7.9 and 8.8. We have estimated the means for the quartile groups of all the demographic categories we have distinguished in this paper, based on both the beta distribution estimated using category-specific reference boundaries and by using Fig. 7 The inverse beta distribution and quartile groups Fig. 8 Means of quartile groups of the population aged from 25 to 35 years the reference boundaries for the general population. We did not find noteworthy differences in the evolution of the trends depending on the reference boundaries used. This is illustrated for the age group from 25 to 35 years old in Fig. 8. We used the abbreviation 'cs' to denote the use of the category-specific boundaries to estimate the means and the abbreviation 'tot' to denote the use of the boundaries for the general population to estimate the means for the boundaries.