The Data: EU-SILC 2018
To assess earnings differentials between social classes and overall earnings dispersion across a broad range of countries, we make use of the EU Statistics on Income and Living Conditions (EU-SILC) microdata. EU-SILC is the main source for comparative research into earnings and income inequality in Europe. The 2018 wave (release of Spring 2020) contains information on 30 European countries including all EU Member States, plus Norway, Serbia Switzerland and the United Kingdom. Slovakia has been dropped from this study as the occupations variable is missing from the dataset for this country. EU-SILC follows a format of ‘guided output-harmonisation’, which implies that there is a predetermined list of commonly defined target variables, while there is quite some variation across countries in sample design, mode of data collection (especially the use of survey data vs. register data), and questionnaire design (Atkinson et al., 2017; Goedemé & Zardo Trindade, 2020). In most countries, all household members aged 16 and over are interviewed, while in Denmark, Finland, the Netherlands, Norway, Sweden and Slovenia a part of the questionnaire is only completed by selected respondents. We follow the procedures proposed in Goedemé (2013) to take EU-SILC’s complex sample design as much as possible into account when estimating standard errors and confidence intervals.
In this study we focus on the population in paid employment, aged 18–64 and with earnings above zero in the income reference year. The income reference year is the calendar year before the survey year (i.e. 2017). Exceptions are Ireland (the 12 months preceding the interview) and the United Kingdom (the current year). Our subsample of interest for which we have both information on social class and earnings varies between 2500 (Denmark and Sweden) and 17,000 individuals (Italy).
The measurement of earnings and social class
In what follows we discuss in some detail the variables included in our analysis. The dependent variable is gross earnings in the income reference year, which includes cash, near-cash and non-cash employee income as well as profits and losses from self-employment.Footnote 1 Observations with total gross earnings of zero or below are excluded, while at the top of the distribution we winsorize at the 999th permille. The earnings variable reflects both the number of hours worked and pay per hour, so both part-time working and time spent not in work during the year will affect total earnings. This measure of earnings must be distinguished from on the one hand the measures of household income including other income sources and after tax that would be used in analysing household income inequality, and on the other the hourly earnings measure that would usually be employed in estimating human capital models. Hourly earnings in the income reference year cannot be robustly constructed from the information available in EU-SILC. However, the annual earnings variable has advantages for current purposes. Differences in pay per hour, in hours worked per week, and in weeks worked in the year are all likely to be highly structured in social class terms, so being able to capture them in this earnings measure is valuable in analysing earnings gaps between the classes. Gross earnings are a major component of household income, but the latter is also affected by how individuals group together in households as well as by the redistributive impact of social protection transfers and direct taxes. Unpacking class gaps in disposable household income is a highly worthwhile exercise but even more complex than the analysis of individual gross earnings on which we concentrate here, and on which it could build.
The conceptualisation of social class we employ is based on occupations, as reflected in the EGP class schema developed by Erikson et al. (1979) for comparative research. Occupational classifications define social classes by looking at attributes of a position in the labour market that are independent of the person holding the position (rather than, for example, seeking to group together people sharing identities, interests, social and cultural resources, and lifestyles). The central focus is on employment relations in the labour market (as distinct from measures aiming to capture the social status or prestige associated with different occupations). Employers face contractual hazards in the labour market, especially with regard to two main problems: work monitoring and human asset specificity. The former arises when the employer cannot assess whether the employee is working and acting in the employer’s interest, while the latter refers to the extent to which a job requires job-specific skills. These in combination motivate the broad differentiation of employment relations between the situation of employers, the self-employed and employees, and the further distinction between those in a service relationship (the service class or professionals) and labour contracts (see also Erikson & Goldthorpe, 1992; Goldthorpe, 2007, 2010). Given our focus here on class and current earnings, it is important to note that while this theoretical framework predicts a marked relationship between class and employment security, pension rights and the steepness of age-earnings profiles, the expectations with respect to variation in current earnings are by no means as clear (see for example Goldthorpe & McKnight, 2006).
This theoretical framework as reflected in the EGP class schema provided the basis for the European Socio-Economic Classification (ESeC) subsequently developed for Eurostat (Rose & Harrison, 2007, 2010). Here we operationalise social class using ESeC as it reflects the most influential theoretical base informing occupation-based class analysis in Europe, is specifically designed for such comparative analysis, and is by far the measure most commonly employed for that purpose. The related occupation-based class schema proposed by Oesch (2006a, b, 2013) is intended to reflect the transformation of employment structures over recent decades, on the basis that with the decline of manufacturing and growth of services previously homogeneous groups (such as professionals or the ‘middle class’) have become more internally differentiated. This schema is thus intended especially to allow horizontal differentiation to be studied, while as Oesch (2006b) notes vertical/hierarchical differences – on which our analysis is centrally focused—are captured by varying degrees of advantage attaching to the employment relationship as in Erikson and Goldthorpe (1992). Empirical evidence comparing the predictive power of the Oesch schema with more conventional schema is also still scant, as Barbieri et al. (2020) point out (though see Lambert & Bihagen, 2014). Such a comparison in terms of earnings patterns would be valuable but beyond the scope of the present paper, which concentrates on the ESeC measure for the reasons set out. In that context it is also worth noting the arguments put forward by Maloutas (2007) that ESeC is less than satisfactory for Southern European countries because a relatively large proportion of the workforce are not employees or operate within small firms where internal hierarchies are very limited. These are issues that certainly complicate the application and interpretation of class schema in a comparative context and need to be kept in mind.
The key ingredients of social class as measured by ESeC are employment status, size of the firm (in the case of self-employed), supervisory status (in the case of employees), and occupation. Given that in EU-SILC 2018 the International Standard Classification of Occupations (ISCO 2008) is available at the two-digit level, we use a simplified version of the original ESeC based on this two-digit ISCO code.Footnote 2 While in most countries this allows between 40 and 43 occupational groups to be distinguished, occupation is only available at a much more aggregated level in the case of Germany (9 groups), Malta (10 groups) and Slovenia (10 groups). This leads to an over-estimation of the share of the Higher white collar and Higher salariat classes, at the cost of the Lower salariat class’ share in these countries (see Goedemé and Zardo-Trinidade, 2020 for more details). Since we only consider observations with earnings above zero, we exclude respondents who never worked or are in long-term unemployment.
Although our main focus is on the non-hierarchical nine-class version of ESeC, for presentational purposes we also use the hierarchical three-class version.Footnote 3 As shown in Table 1, we follow Rose and Harrison (2010) in collapsing the nine-class version of ESeC into a hierarchical three-class schema, labelling these the ‘Salariat’, ‘Intermediate’ and ‘Working’ classes.
Table 1 Collapsing ESeC from 9 to 3 classes Using this nine-class schema derived from ESeC, Fig. 1 depicts the share of each class in the working population across the thirty European countries we are covering. The size of the working class is relatively small in Western Europe and largest in Eastern Europe, ranging from around 20% in the Netherlands to over half of the active population in Bulgaria. In contrast, the size of the salariat class is 30% or below in Greece, Serbia, Bulgaria and Romania and close to or above 50% in the Nordic countries, Austria, Belgium, Luxembourg, the Netherlands and Switzerland. The intermediate class is largest in Greece (and Germany), where it accounts for about 35% of the active population, and smallest in Norway, Latvia and Lithuania where it is about 15%. Also the relative share of the more refined nine classes that make up these three classes varies considerably across countries.
The (Un)adjusted Mean Log Deviation
We make use of the mean logarithmic deviation (MLD) to assess the overall size of between-class inequality and the contribution that between-class differentials make to overall earnings inequality between individuals in different countries. This approach is widely employed for decomposition analyses of income inequality because it is additively decomposable into between-group and within-group components, has various theoretically attractive properties, and is relatively sensitive to differences across groups/countries in the tails of the distribution. Furthermore, as we will explain below, it has the additional helpful property that one can use it to re-estimate the level of between-group inequality controlling for various factors in order to gain more insight into the degree to which between-class earnings differences are a function of other observable factors, such as the composition of social classes and differences in returns to education and other socio-economic variables. The supplementary material presents evidence that country rankings when using other inequality indices, such as the Theil index and the Gini coefficient, both for overall earnings inequality and between-class inequality are very consistent with the MLD-based results.
The overall MLD can be computed as follows:
$$MLD = \frac{1}{N}\mathop \sum \limits_{{i = 1}}^{N} \ln \left( {\frac{{\bar{y}}}{{y_{i} }}} \right).$$
(1)
In other words, it is equal to the average logarithm of the ratio of average earnings (y bar) and the earnings of each member of the target population (yi). The higher the value, the higher the level of inequality. In our data, the MLD of earnings ranges between 0.13 and 0.37. The MLD is additively decomposable into between and within-group inequality. When identifying nine classes c in the population, and s standing for the share of each class in the population, the MLD can be decomposed as follows:
$$MLD = \left[ {\ln \left( {\bar{y}} \right) - \mathop \sum \limits_{{c = 1}}^{9} s_{c} *\ln \left( {\bar{y}_{c} } \right)} \right] + \left[ {\mathop \sum \limits_{{c = 1}}^{9} s_{c} *\ln \left( {\bar{y}_{c} } \right) - \mathop \sum \limits_{{c = 1}}^{9} s_{c} \overline{{\ln \left( y \right)}} _{c} } \right].$$
(2)
The first two terms represent between-group inequality, whereas the second two terms represent the weighted average of the MLD within each group, i.e. the contribution of within-group inequality to overall inequality.
In addition, we estimate the counterfactual between-group MLD in which we ‘adjust’ the MLD for observable factors that contribute to between-class differences in earnings.Footnote 4 These observable factors can be subdivided into two groups: (1) the differences in composition of the nine classes in terms of measured variables associated with higher versus lower earningsFootnote 5; (2) differences between classes in the “returns” to those variables, using that term in the sense we explained early on and elaborate on now. Both factors may contribute to an increase or a decrease of between-class earnings inequality. To filter out the contribution of these factors, insofar they can be identified with the available variables, we fit two OLS regressions which allow us to tease out the contribution of factor (1) versus factor (2) (the number of observations, design degrees of freedom and R2 of these regressions can be found in the supplementary material). In the first regression, we include a dummy for each social class, a term for each covariate, as well as an interaction term for each class with this covariate. With ‘class 1’ as the reference category, this can be written as:
$$\begin{aligned} earnings & = \beta _{0} + \beta _{{12}} class_{2} + \cdots + \beta _{{19}} class_{9} + \beta _{2} x_{2} + \cdots \\ & \quad + \beta _{z} x_{z} + \beta _{{i22}} x_{2} class_{{2 }} + \cdots + \beta _{{i29}} x_{2} class_{{9 }} + \cdots \\ & \quad + \beta _{{iz2}} x_{z} class_{{2 }} + \cdots + \beta _{{iz9}} x_{z} class_{9} + u \\ \end{aligned}$$
(3)
with class being dummy variables for each social class, x2…xz representing a list of covariates, b2…biz9 the accompanying list of regression coefficients, and the i subscript indicating the regression coefficient for the interaction terms between each social class and the covariates. In addition, we estimate the same regression model, but now excluding the interaction terms between social class and the covariates:
$$earnings = \beta _{0} + \beta _{{12}} class_{2} + \cdots + \beta _{{19}} class_{9} + \beta _{2} x_{2} + \cdots + \beta _{z} x_{z} + u$$
(4)
By not including interaction terms of the covariates and the social class dummies, we estimate the ‘average’ association between earnings and the covariates, taking all classes together. Subsequently, we create two counterfactual estimates of the MLD of between-class earnings inequality. We do so by first making use of the estimated regression coefficients to ‘predict’ overall average earnings and average earnings in each class, under the assumption that the average value for each of the covariates is equal to the average in the target population of each country, i.e. under the assumption that the average composition of each class is the same as the average in the population, and subsequently plugging these predicted average earnings into the first two terms of Eq. 2.Footnote 6 In other words:
$$\bar{y}_{{ac}} = \beta _{0} + \beta _{{1c}} class_{c} + \beta _{2} \bar{x}_{2} + \cdots + \beta _{z} \bar{x}_{z} + \beta _{{i2c}} \bar{x}_{2} class_{c} + \beta _{{izc}} \bar{x}_{z} class_{c}$$
(5)
with b0…bizc estimated on the basis of Eq. 3 corresponds to the adjusted or counterfactual average earnings y of class c where we only adjust for differences in average composition of each social class. Similarly, making use of the regression coefficients estimated with Eq. 4, and dropping the interaction terms from Eq. 5, results in a counterfactual estimate of average earnings in each class, in which we additionally ‘equalize’ returns to the observed compositional variables across social classes. Thus, we can estimate the adjusted average earnings of each class in both scenarios, while the weighted average of all classes corresponds to the counterfactual overall average earnings. These values are subsequently used to estimate an adjusted measure of between-class earnings inequality in accordinace with the first two terms of Eq. 2., generating two counterfactual estimates of between-class earnings inequality. Comparing these values with the original MLD provides insight into the relative contribution to between-class earnings inequality of differences between the classes in average composition in observed variables versus differences in returns to those compositional variables, and more generally into the extent to which these factors taken together allow us to account for between-class earnings differentials.
To estimate these adjusted measures of between-class inequality, we include the following variables that are typically associated with earnings:
Hours worked. Estimated proportion of full-year full-time hours worked (FYFTE). Each month for which the respondent reports having worked full-time (FT) is counted as 1/12, with months working part-time counted proportionately based on reported typical hours worked per week at the time of the interview, the only hours measure collected in the survey.
Education. Highest level of education is measured in three categories, which are added as dummy variables: (1) lower secondary and below; (2) higher secondary and post-secondary, non-tertiary; (3) tertiary education.Footnote 7
Potential work experience. Number of years since the start of the first regular job.Footnote 8 Due to this variable, we do not include age in our models as the two are highly correlated.
Gender. Approximated by sex, in two categories (female/male).
Health status. Whether or not person reported feeling (very) limited in the activities they usually do because of health problems for at least the past six months;
Immigration status. This is measured by whether someone was born outside the country.Footnote 9 This variable is not included in Bulgaria, Hungary, Poland and Romania due to a low overall prevalence and zero prevalence of immigrants in some social classes in these countries.
Household type. We include three continuous variables (without interaction): the number of children below the age of 18, the number of dependent adults (earning less than 5% of national median earnings in the income reference year), and the number of adults with earnings in the household.
In some countries the number of missing cases in our target sample on these variables is relatively high, including in Denmark (50%), Sweden (17%), and the United Kingdom (37%), and to a lesser extent Belgium (5%), the Netherlands (4%) and Finland (4%), with some variation by social class, especially in Finland, Norway and the Netherlands. Furthermore, given that we can only estimate the counterfactual between-class mean log deviation controlling for compositional effects (but not between-class differences in returns) when there is some variation on each (category of each) variable within each social class in the sample, social classes that account for less than 1.5% of the population at working age in paid employment have been excluded from the analysis of counterfactual between-class earnings. This includes Small farmers in all countries except for Austria, Bulgaria, Croatia, Finland, Greece, Hungary, Ireland, Latvia, Lithuania, Poland, Romania, Sweden and Spain; as well as the Petit bourgeoisie in Denmark and the Higher blue collar class in Romania. To assess the potential impact of both limitations on the findings of our study, in the results section below, we show both the MLD of between-class earnings in the total sample (Table 2) and in the restricted sample that is used for the counterfactual scenario (Fig. 5). This shows that, overall, the impact of these restrictions is very small in the great majority of countries, with the exception of Sweden, Denmark and Finland, where between-class earnings inequality is about 10% lower in the restricted sample (i.e. a reduction in the between-class MLD of between 0.002 and 0.004), while the cross-country rank correlation coefficient between the between-class MLD in the total sample and in the restricted sample is 99.5. Given these results, we believe it is rather unlikely that there would be a substantial bias in the estimated size of the reduction in between-class inequality in the restricted sample as compared to the size of the reduction that we would observe in the complete sample, and especially in the broader cross-national pattern that we observe. Due to the small sample size of Denmark, it is excluded from the counterfactual analysis by gender.
Table 2 Earnings inequality between social classes and overall, mean log deviation, EU-SILC 2018