As discussed in the previous chapter, there are many reasons for including the welfare state as an independent variable. It is an inseparable part of modern democracies, and its functions are far-reaching and consequential for individuals and societies alike. However, the very premise of this book is that there is a distinct lack of feasible recommendations and guidelines for scholars trying to include the welfare state as an independent variable in multilevel analyses. This chapter outlines the reasons for this.

So why is there a deficit at all? As mentioned in the preceding chapter, comparative research on the welfare state has always included a strong emphasis on modelling welfare policies empirically. There are comprehensive discussions about appropriate indicators—for instance surrounding the debate on spending vs. social rights (section 2.1.1) and the dependent variable problem (section 2.1.3). Equally extensive discussions address empirical classifications of welfare regime typologies (section 2.1.2). Thus, there is actually a great number of contributions dealing with methodological issues and proposing specific ways of operationalising different welfare state arrangements. However—as I will show in this chapter—the fact that these operationalisations aim at researching the welfare state its premises and functions per se and as a dependent variable is frequently passed over. As a result, measures are adapted under the implicit assumptions that (1) what is eligible as dependent variable should be suitable as explanatory variable as well and (2) that differences between operational approaches are negligible because they capture similar or at least strongly related elements of the same construct—the welfare state. I will argue that both assumptions are flawed and that they influence the comparability of results in a negative way.

This chapter critically examines different empirical approaches conceptually as well as empirically. Out of the operational strategies outlined briefly in the previous chapter, I discuss those in more detail that represent especially popular independent variables in the current state of research. More specifically, several examples for single indicators, typologies, and composite indices are inspected more closely in order to illustrate immanent problems. First, all three approaches are discussed conceptually with an emphasis on sources of dissent within each approach. In the next step, popular operationalisations are compared in empirical analyses of cross-national survey data from the International Social Survey Programme (ISSP) and the European Social Survey (ESS) in order to explore the consequences of different conceptual choices. This is followed by a discussion of possible points of departure for the development of more suitable and standardised operationalisations for the specific use as explanatory variables.Footnote 1

3.1 Approaches and Debates

As the previous chapter revealed, there are many ways of approaching the welfare state and its functions. These different approaches are mirrored in the existing empirical operationalisations. While early research mainly focussed on welfare state effort—in most cases operationalised through social expenditure—contemporary literature agrees that social policy arrangements are captured more adequately by focussing on social rights of citizenship (e.g. Esping-Andersen 1999; Stephens 2010). Both conceptualisations can still be found in empirical operationalisations of welfare stateness. Moreover, additional approaches and respective empirical measurements were introduced (such as social investment and benefit receipt). Regardless of their content, the operationalisation can result in three types of indicators, which will be discussed in more detail in the following sections: single indicators, typologies, and composite indices.

3.1.1 The Single Indicator Approach

A very popular way of operationalising different welfare policies is to utilise single indicators highlighting specific elements of the welfare state. They are used in literature on classifying regimes,Footnote 2 as well as in studies that treat characteristics of the welfare state as independent variables (e.g. Jæger 2006; Jordan 2013; Eger & Breznau 2017).

By far the most popular are expenditure-based indicators (Kvist 2011). Usually, this means including a variable on social spending as a percentage of GDP in one specific policy area (e.g. in the labour market, Schneider & Makszin 2014) or as an overarching measure (e.g. Steele 2015) of welfare effort. Such indicators receive much criticism, as discussed in the preceding chapter (cf. section 2.1). Main criticism includes that other areas of social policy-making—for instance entitlement criteria—are more important and that a focus on spending postulates a linearity of welfare efforts which is not given in reality (Esping-Andersen 1990) and disregards the multifaceted nature of welfare states (Bonoli 1997). Furthermore, high spending can be a signal of a generous system, but also a consequence of a higher number of people depending on social benefits (Bergqvist et al. 2013). From a comparative perspective, we are therefore confronted with the issues that equally high spending may not necessarily mean two countries actually provide similar benefits (Kvist 2011), and we cannot determine if higher or lower income groups profit more from redistribution (as already noted by Titmuss 1974). Such criticism led to a widespread consensus that social expenditure is a problematic operationalisation of welfare stateness (for a more differentiated discussion see Jensen 2011). Still, it has not been completely discarded, as some scholars point out that despite justified criticism, social expenditure is a good indicator of a country’s commitment to social transfers and services (Reinprecht et al. 2018: 785).

An alternative is to use net replacement rates (NRR) for individuals in particular risk positions. In many cases, such replacement rates are seen as indicators of welfare generosity, which is an important part of the social rights perspective (cf. section 2.1). However, how to calculate replacement rates is still controversial and they vary depending on the source. This is discussed among others by Scruggs (2013), Wenzelburger and colleagues (2013) and Ferrarini and colleagues (2013), who explore differences between replacement rates calculated in the Comparative Welfare Entitlement Dataset (CWED2, Scruggs et al. 2014) and the Social Citizenship Program (SCIP, later included in the Social Insurance Entitlement Dataset (SIED)).Footnote 3 More recently, Bolukbasi and Öktem (2018) add that other non-replacement indicators—such as waiting days and qualification periods—are affected by the same problem and also differ depending on the data source because similar indictors are operationalised based on varying conceptual premises. Further ways to operationalise welfare state generosity using single indicators include the share of income that comes from welfare transfers (e.g. Brady et al. 2017).

The third perspective on the welfare state, which can be modelled in single indicators, focusses on benefit receipt (cf. section 2.1.3). Here, actual cash benefits are aggregated from survey data (e.g. Otto 2018a). Since this take on the operationalisation of welfare stateness is only starting to receive more attention (van Oorschot 2013) and is not particularly present in literature using the welfare state as independent variable, it is only partly relevant at this point. A similar argument can be made for social investment as an additional fourth perspective, which is only relevant for very specific research questions and, therefore, has not been used often so far.

Using single indicators for the two most common perspectives on welfare stateness—effort and generosity—as independent variables in comparative research has advantages and disadvantages, both of which are visible in the existing literature. The two main disadvantages address their limited informative value on the one hand and the above-mentioned deviations in the calculations on the other hand. In empirical studies, these disadvantages are often outweighed by the main advantage of this operationalisation: since a variety of international organisations such as the OECD and Eurostat offer extensive and regularly updated information on key indicators, data are easily accessible and available for a great number of countries.

A way to overcome the problem of limited informative value is to use more than one indicator. There are many studies that refer to a theoretically well-grounded selection of several single indicators representing relevant areas of the welfare state (e.g. Jæger 2006), explain in detail why they prefer a single indicator to other operationalisation (e.g. Jakobsen 2010; Visser et al. 2018), or examine single indicators together with other operationalisations (e.g. Jakobsen 2011). However, there are also studies, which only briefly elaborate on their selection. This is problematic as there is an obvious conceptual difference between using, for example, replacement rates and social expenditure. Still, studies often refrain from justifying their selection, instead just arguing that they would have liked to use an alternative (e.g. a composite measure) that was not available for their sample of countries or time periods (e.g. Angel & Heitzmann 2015; Kulin & Meuleman 2015).

As for the second disadvantage, to my knowledge there is no study that analyses the consequences of deviations between data sources when using single indicators as independent variables. I therefore recommend that further research not only justifies the selection of each specific indicator, but also discusses the sources of the macro-level data in more detail and compares the selection with the referenced literature.

In conclusion, it can be stated that the use of single indicators as operationalisations of welfare stateness in multilevel frameworks has its limits. As there are no recommendations on which indicator to choose when modelling specific causal assumptions, the choice requires a well-founded justification. Given the different operationalisations, failing to do so can have consequences for the results and their comparability with other studies that use other measures or data sources.

3.1.2 The Regime Typology Approach

Using a regime typology is another popular approach to operationalise the welfare state. As outlined in the previous chapter, this is also surrounded by a broad debate. While the introduction of new typological approaches seems to have slowed down (cf. section 2.1), this is not the case when it comes to its application as an independent variable.

When exploring in more detail the potentials and pitfalls of using regime typologies as independent variables, the first observations is that the popularity of Esping-Andersen’s (1990) Three Worlds of Welfare Capitalism (TWWC) is present here as well. As outlined before, his threefold classification, identifying a generous Social-Democratic, a status-oriented Conservative, and a market-oriented Liberal regime has turned into a true classic. Research following Esping-Andersen’s initial typology has introduced a great number of varying classifications and it inspired a remarkable body of literature and a critical and ongoing discussion regarding the number, composition, and scope of regimes (comprehensive discussions are provided by Arts & Gelissen 2002; Ferragina & Seeleib-Kaiser 2011; Rice 2013; van Kersbergen & Vis 2015; Powell et al. 2019). Before discussing the applicability of typologies as independent variables, it is important to look closely at methodological sources of dissent between different classifications, which address conceptual as well as operational details.

In classifications of typical arrangements of social policies, scholars have focussed on different elements of the welfare state. While some focussed on how much a welfare state spends, others classified how social policies are organised and financed (Bambra 2007 and Bonoli 1997 discuss and combine both perspectives). Another lively debate surrounds the question how many welfare states exist. Popular additions to Esping-Andersen’s typology include a Mediterranean (e.g. Ferrera 1996) and a post-socialist welfare regime (e.g. Castles & Obinger 2008).

As noted before (cf. section 2.1.2), such classifications are often referred to as ideal types. However, in empirical studies—and eventually most of the typologies are tested empirically—this bears potential for confusion. As soon as empirical evidence is interpreted and countries are clustered based on actual indicators of policy-making, the resulting classification actually captures real types. Two problematic issues are frequently raised in this context. First, the practice of historical argumentation is criticised. Rice (2013) for instance argues that in order to provide a sound groundwork, the idea of a historical deduction of types should be abandoned and replaced with a purely ideal–typical one which is detached from empirical evidence and focusses solely on overarching dimensions (she proposes welfare culture, welfare institutions, and socio-structural effects). Second, the misuse of the term ideal type is denounced. As van Kersbergen and Vis (2015) point out, most literature mainly offers typologies of real cases instead of an actual deduction based on ideal types. Even Esping-Andersen’s final classification of countries bears more resemblance to an empirical typology than to an explicit operationalisation of approximation. Thus, the fact that the term is very present in literature describing how countries cluster into ‘ideal–typical’ groups gives a false sense of theoretical justification and distorts what ideal types should accomplish by definition.Footnote 4 However, instead of using them as a point of reference, they often serve as a template to be reproduced in reality in studies on welfare state regimes. An actual empirical attribution of real cases to ideal types would model the proximity between a case and an archetype instead of a deterministic assignment. In practice, this would mean, for instance, that rather than using hierarchical cluster analysis to achieve a classification that is later labelled according to ideal types, one could look at how far each country departs from an ideal score on the examined variables. This of course raises the issue of what the ‘ideal’ would actually look like in terms of an empirically measurable point of reference. The role of ideal–typical welfare regimes, the insights they offer and their informative value are subject to an ongoing debate (Aspalter 2011; van Kersbergen & Vis 2015) and the fact that real types only approach ideal types is frequently voiced (Arts & Gelissen 2002; Kääriäinen & Lehtonen 2006) but rarely modelled. Furthermore, the debate mainly addresses the value of ideal and real types as classifications alone and not as concepts, which could serve as an explanatory variable.

There are numerous indicators and methods for the empirical operationalisation of such types that reflect the different conceptual considerations. While some studies base their classifications on expenditure (Kuitto 2011), others focus on benefit coverage and replacement rates (Ferrera 1996), or on a two-dimensional approach combining spending and funding of welfare provision (Bonoli 1997; Bambra 2007). Moreover there are those who add measures of economic insecurity (Menahem 2007) or stratification (Esping-Andersen 1990). These indicators are empirically merged into typologies through different analytical techniques and each methodological approach claims to shed light on aspects, which have been disregarded so far (e.g. certain indicators or countries).

Lastly, the country sample constitutes a considerable source of variation. Any classification is highly dependent on the sample upon which it is drawn. This insight, albeit having been voiced more prominently in the years following the boom of typologies, is not new and predates the TWWC (Uusitalo 1984). However, it is still common practice to develop typologies based on a sample that is neither random nor systematic. Even though consequences of case-selection have been addressed sporadically (Ebbinghaus 2011; Kim 2015), the most prevalent criterion appears to be data availability. Thus, countries are often chosen because they belong to an organisation, such as the OECD, which publishes a comprehensive amount of data on social policies of their members. In addition, belonging to any of these organisations means at least a minimal amount of similarity in the economic and political development is guaranteed which in turn often serves as a justification for comparability (Ebbinghaus 2011). Still, most studies only cover a selection of those countries and especially the Central and Eastern European (CEE) countries are highly underrepresented even though a meaningful complete survey for instance, could be achieved by examining all member states of the European Union. Bearing in mind possible applications as independent variable, the latter proceeding would be particularly fruitful since EU citizens are a commonly chosen population on the micro-level due to a multitude of research question, which includes attitudes, behaviour and living conditions in light of European Integration.

Apart from the oversight of countries, different samples may affect the classification itself because most approaches determine types based on proximity between cases. For instance, Esping-Andersen’s (1990) classification is based on composite indices of decommodification and stratification where countries receive a score based on their deviation from the overall mean. However, mean and deviation vary depending on the included countries and are sensitive to slight changes or miscalculations. Ironically, Esping-Andersen himself serves as an example for this.Footnote 5 A similar argument applies to cluster analysis (e.g. Castles & Obinger 2008; Kuitto 2011), which groups countries based on the proximity between them. In light of these differences in conceptualisations and operationalisations, it does not surprise that the number, title, and composition of regimes differ remarkably between typologies.

The lack of agreement on which typology suits best and which theoretical perspective is preferable is acknowledged in many studies using them as independent variables. Nonetheless, many of them still rely heavily on the regime approach—sometimes even with an apologetic reference to the need to circumvent a more detailed discussion of the scientific debate (e.g. Motel-Klingebiel et al. 2009: 70). While regime typologies bear the advantage that they are easily operationalised as dummy variables, their main disadvantage is a practical one: the selection of countries in survey data (like the ESS) usually deviates from the countries covered by a typology. Hence, authors face a difficult conceptual choice having to either exclude unclassified countries or include them by combining classifications or extending them. Since cross-cultural analyses often aim at examining as many countries as possible, the second option is preferred. Such combination or extension often relies on instinct since the literature lacks consensus on what to do in this situation and there is a plethora of different typologies. As a result, a buffet strategy evolved in which authors pick a combination “from the vast array of welfare state typologies” (Arts & Gelissen 2001: 285) that seems helpful for the envisioned purpose. There are many examples for such buffet-approaches (more recently Deeming & Jones 2015; dem Knesebeck et al. 2016; Arundel & Lennartz 2017; Schuck & Steiber 2017). The proceeding often seems inspired more by practical considerations than by theoretical ones. As a result, we see many modifications adding countries that were not classified in whatever typology serves as a starting point, as well as uncommented reclassifications. In light of the existing debate on welfare state change and new risks (cf. section 2.1.3), it furthermore seems problematic that many of the buffet-type studies still rely heavily on typologies from the 1990s and assume that those classifications (very prominent are Esping-Andersen 1990 and Ferrera 1996) are still valid and only require some additions or slight modifications. A last and very general problem associated with regime typologies is that they represent strong reduction of complexity. This marginalises variation between countries by reducing variation between countries to a handful of types (Kvist et al. 2013: 331).

It has rarely been tested how different typologies affect results if treated as independent variables. Bergqvist and colleagues (2013) provide one of the few overviews using the example of health inequality as dependent variable. In their re-analysis of 34 studies employing regime typologies as independent variable they found not only considerable differences in the kind of typology used and the amendments made to classifications but also in the results. Since different associations with health were even found within identical typologies, they conclude that the main problem is not the theoretical and empirical conception but the general use of welfare regimes as an explanation for health inequality. However, they examined studies, which draw on different data sources and apply different methods of analysis. Thus, it should be tested if their finding holds true if these aspects were kept constant.

In summary, regime typologies can be an excellent tool for classifying different policy arrangements. However, they rarely fit the country sample in cross-national survey data, prompting scholars to resort to combinations and reclassifications. Given the strong conceptual and operational variations underlying different typologies, such an approach appears highly problematic. It is therefore important to examine the consequences of different classification more closely.

3.1.3 The Composite Index Approach

Composite indices and scores to measure welfare stateness represent a comparatively rare approach. Nevertheless, attempts to develop such indicators exist throughout the literature (e.g. Castles & McKinlay 1979). In particular, the two indices underlying Esping-Andersen’s (1990) TWWC typology have had a major impact on more contemporary approaches. Especially his decommodification index has been replicated, updated and revised numerous times (e.g. Bambra 2005; Scruggs & Allan 2006; Scruggs 2014; Kuitto 2018). Noteworthy are furthermore works of Segura-Ubiergo (2007) and Cruz-Martinez (2014), who develop multidimensional measures of welfare state arrangements for Latin American countries. However, their proposals have not been adapted for European samples so far. Other composite measures in the literature either take a more specific perspective (e.g. on defamilialisation, Lohmann & Zagel 2016) or a more general one which goes beyond characteristics of social policies and includes overall features of governance (e.g. the Social Policy IndexFootnote 6). The main sources of dissent within the index approach include the operationalisation and country sample.

Hereinafter, some examples for differing operationalisations are named: Castles and McKinlay (1979) devise an index of welfare commitment based on educational expenditure, transfer payments, and infant mortality, Esping-Andersen’s (1990) decommodification index includes replacement rates, extent, and duration of individual contribution, waiting periods and insurance coverage, and Menahem (2007) combines insurance coverage and replacement rates with disposable income. Besides these obvious differences in the choice of indicators, there are also differences in regard to weighting procedures and modes of standardisation. The Benefit Generosity Index in the Comparative Welfare Entitlement Dataset—an updated and slightly modified version of Esping-Andersen’s decommodification index—z-standardises the underlying variables (Scruggs 2014). In contrast, Esping-Andersen’s original version using data from the Social Citizenship Program gives countries a value between one and three for each underlying indicator representing levels of generosity and adds them up. Furthermore, Esping-Andersen only superficially justifies why some indicators are given more weight than others (discussed among others by Bambra 2006). However, as Wenzelburger and colleagues (2013) point out, not just the modes of combining indicators vary, the underlying indicators themselves may differ as well depending on the data source (as discussed in the preceding section on single indicators).

The second source of variation within the approach is closely linked to the first. The measures introduced above all rely on mean values and deviations from that mean and are thus very sensitive to the underlying country sample. If the composition of countries changes, these values will most likely change as well (as discussed in the case of typologies). This compromises the comparability of results and it impairs stretching composite measures to further countries. A way to overcome this problem, which I rarely encountered in the literature, is refraining from standardisations based on mean and deviation. An alternative is a benchmark approach, which standardises based on the highest existing occurrence of a given indicator in a meaningful population (as used by Sjöberg 2017). Such a population could for instance consist of the entire European Union or all OECD member states. In this case, the standardised score indicates how close a country is to an existing frontrunner (for instance the highest existing replacement rate) and they could be used independently of the country sample.

Composite indices are perhaps the most desired but least implemented independent variables (exceptions include e.g. Rothstein et al. 2012). They promise the multidimensionality of typologies while maintaining the metric scale and variation of single indicators. However, the number of existing measures is very limited and the most popular ones are only available for a limited selection of countries and points in time. This shortcoming is often stated as a reason for having to resort to a less desirable alternative (e.g. Angel & Heitzmann 2015; Kulin & Meuleman 2015).

Overall, composite measures represent very promising tools for capturing welfare stateness. However, since the most comprehensive ones cover only a small number of countries, their usefulness as independent variables is very limited at this point.

3.2 An Empirical Confrontation

In the following section, the discussed operationalisations are tested empirically with an emphasis on illustrating the advantages and disadvantages mentioned before. In this empirical test, welfare attitudes serve as an exemplary dependent variable on the individual level to illustrate the consequences of differing operationalisations. Welfare attitudes represent a very popular dependent variables in the relevant literature. In very broad terms, it is believed that attitudes towards social policies are shaped by the institutional context in which individuals are embedded—in this case the welfare state (Svallfors 1997; Arts & Gelissen 2001). A prominent hypothesis holds that generous and universal social policies, following social-democratic principles, generate political support, and positive attitudes towards the welfare state (Jaime-Castillo 2013; Roosma et al. 2014), while redistribution-based and targeted polices increase conflicts between beneficiaries and contributors and lead to disapproval (Jordan 2013). However, the empirical tests of this policy feedback hypothesis produce mixed results and several studies cannot confirm such a linear relationship between generosity and support (Jæger 2009; Jakobsen 2011). One reason for this may be that different operationalisations of welfare policies were tested—including different typologies and single indicators. While typologies may fail to grasp subtle differences between welfare states (Jordan 2013), single indicators could be correlated with other macroeconomic indicators and thus may have no independent effect once other variables are controlled (Jæger 2013 suspects this in the case of social expenditure). Due to these divergent findings and the ongoing discussion, welfare attitudes present a good example of a micro-level outcome, which may be explained differently depending on the conceptualisation of welfare stateness in an analysis. In this chapter, the focus rests on determining how sensitive results are to such different operationalisations, while the results themselves and their relation to hypotheses in the literature is of secondary importance.

3.2.1 Data, Operationalisation and Method

The following analyses are based on data from the fourth wave of the European Social Survey (ESS 2008) and the International Social Survey Programme (ISSP Research Group 2017). These two data sources are chosen for several reasons. First, they both include similar questions addressing attitudes towards the welfare state. Second, the data were collected over a similar period of time (mainly 2008 and 2009), which means that the same macro-level indicators can be used in both analyses. Third, both datasets are frequently used in comparative research on how welfare attitudes are shaped by different welfare state arrangements (more recently Kulin & Meuleman 2015; Steele 2015; Eger & Breznau 2017). Fourth, using ESS and ISSP data represents a common situation in which the researcher has no influence on the country selection. Lastly, the comparison between the two datasets will allow to determine—at least partly—the reliability of findings.

To ensure the examined population is suitable for the proposed analysis and covers comparable units of analysis, the sample is reduced to respondents from countries that are member states or have strong ties to the European Union.Footnote 7 As a result, 21 countries covered by both datasets are included.Footnote 8

An item is chosen as the dependent variable that measures attitudes towards government responsibility for supporting the unemployed. This particular aspect of social policy attitudes is covered in both datasets in a comparable, if not identical, manner. The ESS includes the question “how much responsibility do you think governments should have to ensure a reasonable standard of living for the unemployed?” on an eleven-point scale ranging from “should not be governments’ responsibility at all” to “should be entirely governments’ responsibility”. In the ISSP, respondents indicated on a five-point scale to what extend they agreed with the statement “the government should provide a decent standard of living for the unemployed”.

The analyses focus on independent variables on the country-level. Since the main surveying period for both datasets was late 2008 and early 2009, these indicators are primarily based on 2008 data. The only exception is SCIP/SIED data, which is available at five-year intervals and is therefore from 2005. Furthermore, since the dependent variable addresses attitudes towards generosity in the field of unemployment, macro-level indicators, which relate to unemployment policies are chosen, whenever possible.

Four single indicators are tested as independent variables: overall social expenditure as percentage of GDP (Eurostat 2018a), social expenditure in the field of unemployment policies (Eurostat 2018b), and two versions of net replacement rates for unemployed average production workers, which stem from different data sources and are based on slightly varying operationalisations (CWED2 and SCIP/SIED). These two types of single indicators were chosen because they are especially popular in the relevant literature.

Since there are no typologies covering all analysed countries, two different buffet-typologies are included. The first version uses Esping-Andersen’s classification as a starting point and adds a Southern type following Ferrera (1996). The CEE countries were all joined in an Eastern-European group by applying classifications used, amongst others, in analyses by Roosma and colleagues (2014) and Bambra and colleagues (2014). This leaves Cyprus (only included in additional analyses), which was classified as Southern following Castles and Obinger (2008). The second buffet-typology differs from the first in the classification of two countries, which represent ambiguous cases: Switzerland is classified as Liberal (instead of Conservative) following Obinger and Wagschal (1998) and Ferragina and colleagues (2013) and Austria is assigned to the Social-Democratic type instead of the Conservative one, which is supported by Arts and Gelissen (2001).

The Welfare Generosity Score provided in the CWED2 dataset is included as a composite measure. Since it only covers a small sample of countries and none of the CEE states, I added some missing indicatorsFootnote 9 and updated the index following instructions by Scruggs’ (2014) so that it now covers all 21 countries in the main analysis. The correlation of my version with the unemployment generosity score already provided in the dataset is very high (0.98) for the 12 countries shared by CWED2, ISSP, and ESS.

Furthermore unemployment rate is included as control variable in all models, as is often done in analyses of welfare attitudes (Jæger 2013; Arikan & Ben-Nun Bloom 2015; Eger & Breznau 2017).Footnote 10 In this specific case, it can also be seen as a proxy for benefit receipt, which is relevant because the dependent variable focusses on attitudes towards government responsibility for meeting the needs of the unemployed. Testing these different operationalisations within each of the two surveys should help illustrating differences while reducing potential bias stemming from varying survey periods and country samples.

The following empirical tests are based on multilevel linear regression analyses (MLA). During the last decades this method has become increasingly popular in comparative research because it takes into account the hierarchical structure of cross-cultural data in which individuals are nested in national contexts. Multilevel analysis is able to estimate variance components on the level of individuals and contexts (in this case countries) simultaneously. This leads to a more correct estimation of standard errors and reduces the risk of fallacies, which can arise when results on either level are translated to the other (cf. section 1.2). Moreover, it allows to estimate the effects of independent variables on the micro- and macro-level in the same analysis (for a more detailed description see among others Hox 2010; Snijders & Bosker 2012; Marx et al. 2013). In order to estimate effects between countries in a regression framework, a sufficiently large country sample is required. With 21 countries, my analyses are at the lower end of what is recommended, but should produce reliable results as long as the model specification is not too complex (Stegmueller 2013).

In essence, the idea of applying multilevel linear modelling in cross-national comparative analyses is to extend regular regression analyses such as ordinary least squares (OLS) by explaining variance of a dependent variable on two levels: within and between countries. The resulting regression equation for the full model predicting attitudes among individuals \(i\) in country \(j\) and including all variables at the individual (\(x_{ij}\)) and country level (\(z_{0j}\)), is:

$$y\left( {attitude} \right)_{ij} = \gamma_{00} + \mathop \sum \limits_{h = 1}^{r} \gamma_{h} x_{hij} + \mathop \sum \limits_{l = 1}^{r} \gamma_{l} z_{l0j} + u_{0j} + e_{ij}$$

Here, the intercept (\(\gamma_{00} )\) represents a general mean and—in contrast to non-hierarchical regression models—two residual components are distinguished. One of them represents the residual effect on the level of countries (\(u_{0j} )\) and the other on the level of individuals (\(e_{ij} )\). This formulation of a multilevel regression model is a simple version of a hierarchical linear model, in which only the intercepts and the residual terms are assumed to vary randomly, while the slopes are fixed (Random-Intercept-Fixed-Slope-model). This means that differences between countries are assumed when it comes to the general level of attitudes, while we do not expect the strength and direction of the effects caused by the independent variables to vary between countries. The latter (a random slope) is usually included if cross-level interaction effects are assumed because there is reason to believe that the slope of individual-level determinants varies between countries (cf. Snijders & Bosker 2012: 74–87). Since this is not the case and the limited number of countries calls for a slim model, random slopes are not examined in the following analysis.

In addition, information on the models and their explanatory power is included. In this analysis, changes in variance are especially informative. They are obtained by calculating the intraclass correlation coefficient (ICC), which represents the share of variance that can be attributed to differences between countries. A high enough value of the ICC can be considered a pragmatic prerequisite for a multilevel analysis as there should be variance in the first place in order to explain it. In addition, Bryk and Raudenbush’s (2012) R-squared is included in order to evaluate the explanatory power (especially on the level of countries).Footnote 11

3.2.2 Results and Interpretation

The following two tables (Table 3.1 and Table 3.2) present the results of the multilevel analyses for each data source (ESS and ISSP). Both versions show very similar intraclass correlation coefficients (ICC) in the random-intercept-only model (M0): in both datasets, about 10 percent of the variation in attitudes towards the role of government can be attributed to the country-level.

Table 3.1 Government responsibility for providing standard of living for unemployed (ESS 2008)
Table 3.2 Government responsibility for providing standard of living for unemployed (ISSP 2009)

Looking at the coefficients, many similarities can be found in the ESS (Table 3.1) and ISSP (Table 3.2) data, which indicates a certain robustness of the findings. In both analyses, overall social expenditure is negatively associated with supporting a strong role of government in the field of unemployment policies and explains a considerable amount of variation between countries (M1). Social expenditure in the field of unemployment policies (M2) points in the same direction, although this effect is only significant in the ISSP analysis. Respondents from countries with higher social expenditure therefore want less government responsibility for ensuring a decent standard of living for the unemployed.

The two different unemployment replacement rates (models 3 and 4) lead to slightly different results. In the ESS analysis, only the version provided by the SIED data produces a significant and positive effect, while the CWED2 version is insignificant. In the ISSP analysis, neither of the rates exhibits significant effects. Still, this shows that varying data sources should at least be discussed—especially if results are compared with studies using indicators from a different data source. In this analysis, generous benefits in case of unemployment tend to lower support for government responsibility in the field but this effect does not appear to be very robust. Apart from this, the opposed directions of the effects compared to the spending indicators correspond to the expectation that welfare effort and welfare generosity represent different parts of the welfare state (cf. section 2.1).

The two buffet-typologies (models 5 and 6) consistently show that people living in Liberal welfare states, which are assumed the least generous, are significantly less in favour of government responsibility than those in inclusive Social-Democratic welfare states. Furthermore, the first typology (model 5) also reveals a significantly lower preference for state responsibility in countries belonging to the Conservative type. This effect disappears in the second buffet-typology (model 6) with the different classification of Austria and Switzerland, and it indicates that a potential bias due to slightly differing combinations and extensions of existing typologies should be taken seriously.

In interpreting these results, the two typologies seem to point in the direction of the policy feedback hypothesis: living in a Social-Democratic welfare state seems to increase support for government action – at least compared to Liberal regimes. On the other hand, the insignificant effect of the Generosity Index (model 7) undermines this finding. Since this index is based on many of the indicators Esping-Andersen used to construct his initial typology, it should at least roughly indicate patterns that correspond to the TWWC typology or one of the succeeding classifications. However, this is hardly the case (Figure 3.1). Instead, a ranking of generosity scores does not reveal clusters of countries that fit the typologies I used in the analyses, the TWWC, or in fact any other typology.

Figure 3.1
figure 1

(Colours indicate membership in regime according to buffet typology 1, data: CWED2, colouring of pillars indicates membership in regimes according to buffet-typology 1; stripes (horizontal): Social-Democratic, white: Conservative, stripes (diagonal): Southern, grey: Eastern, black: Liberal)

Unemployment generosity index by country.

In addition to these findings, further analysis (Table 3.3) shows that when the same two buffet typologies are tested on a slightly larger sample of countries,Footnote 12 the results turn out quite different. Suddenly, Liberal countries no longer differ significantly from Social-Democratic ones. Instead, Conservative welfare states now consistently show significantly less support for government action than the latter, while respondents from countries belonging to the Southern type show significantly more support for state responsibility—however this effect is only found for the second typology.

Table 3.3 Comparison of regime typologies: Government responsibility for providing standard of living for unemployed (ESS 2008)

This finding is quite problematic because although it may seem obvious that different country samples may produce different results, samples in secondary analyses of survey data like the ESS will always vary from wave to wave. Thus, even if scholars use the same typologies, the differing samples will still hinder the comparability of results with previous research. Of course, the same argument holds true for every kind of indicator and analysis. Still, typologies exhibit a sense of homogeneity among the members of a category, which may tempt to underestimate the problem. This problem of the country sample in connection with the large number of different classificationsFootnote 13 makes the application of welfare typologies very volatile.

Overall, the additional analysis again indicates that people in Conservative and Liberal welfare states prefer less government responsibility than in Social-Democratic regimes. However, as soon as a Southern and Eastern type are included, they relegate the Social-Democratic countries to an intermediary position. These results advice caution. The unexpected intermediate position of Scandinavian countries could be explained by the fact that citizens affected by crises or transitions in the South and East may wish for more social security. To postulate however that this is actually due to similar properties of the welfare state and not just to a geographical or developmental vicinity could be premature. If the welfare state is at work, it works in different ways: welfare generosity may inspire more confidence in Social-Democratic regimes than in Liberal ones, but if wanting more governmentally regulated welfare provision is due to less generous or malfunctioning welfare states in Southern and Eastern states, those countries follow a different logic.

Summarising all results, the negative effect of social expenditure (overall and in the field of unemployment) on attitudes opposes the policy feedback hypothesis at first glance while net replacement rates and typologies show a tendency to support it. However, the indicators produce very unstable results and small modifications influence the significance of effects severely. Moreover, the explanatory contribution of the different approaches varies considerably—reaching from close to zero (benefit generosity index and NRR) to moderate (social expenditure) and even very high values (regime typologies). Considering that the different operationalisations should at least be somewhat related, this is problematic.

Based on the findings in this analysis, it would be very difficult to answer why attitudes differ. Regardless—and in line with the aim of this project—the analysis reveals important sources of bias, such as sample selection and data source. Discussing these issues and finding ways to avoid them can help standardise the process.

3.3 Summary: an Independent Variable Problem

In this chapter, sources of dissent within each approach were identified and each of these issues was visible and consequential in the empirical test that followed.

Within the single indicator approach, limited informative value and differing data sources are critical issues. Although it may seem trivial to say that replacement rates and social expenditure address distinct and very different aspects of the welfare state, both are used as independent variables in analyses of welfare attitudes. The literature does not offer any guidelines that recommend a standardised selection of suitable indicators and sensible combinations as well as data sources. The latter leads over to the second issue. The analyses reveal variations in the results and their significance depending on the data source. This indicates a potential bias, which should be examined in more detail and—at the very least—should encourage taking sources into account when comparing results.

The regime approach is characterised by differences in the underlying conceptual and operational premises. As this chapter shows, different classifications can affect the results—and there are many other classifications in the literature that have not been tested in this contribution that could lead to even more different results. Furthermore, the differing country samples in survey data prove to be highly problematic. More research is needed to test how much combination and extension a typology can endure before the results are no longer comparable.

Lastly, it is very difficult to assess the composite index approach. Since comprehensive examples of this approach are only available for a small number of countries, they need to be extended to bigger country samples. However, the inclusion of other countries—especially CEE countries—proves to be quite difficult. There are many issues that are critical when attempting to include CEE states in existing measurements. For instance, de jure and de facto benefit generosity in these countries may not entirely match, labour market participation may differ systematically from older welfare states, atypical employment may be more common, and much more. A comprehensive discussion is given by Kuitto (2018) who extends Esping-Andersen’s version of the decommodification index to CEE countries and raises these and more important issues.

Given the problems identified, several practical recommendations can be made at this point. First, different operationalisations should never be treated as interchangeable options – neither within nor between approaches. They have different conceptual premises and allow different interpretations. The selection of an indicator should, whenever possible, be based on the greatest possible comparability and should not be based solely on a lack of alternatives. Second, data sources should receive more attention. This directly applies to single indicators and indirectly to typologies and composite measures, because they are based on such single indicators. Third, combining and extending typologies should be avoided or follow clear theoretical justifications, as arbitrarily picking and blending classifications from the literature can severely affect the comparability of results. Fourth, the frequent exclusion of Central and Eastern European countries is dated and obstructive to comparative analyses of social phenomena in Europe and beyond. If the existing indicators do not fit the character of the welfare state in these nations, more attention should be paid to finding proxies that work equally for old and new welfare states. Finally, and this may be the most controversial finding, the disadvantages of using welfare regime typologies as independent variables far outweigh the advantages. Based on the conceptual and the empirical assessment in this chapter, I can only advise against using such typologies as independent variables until the problems discussed are solved.

Despite the many issues discussed, the differences between welfare states reflect very important features of modern democracies. They fulfil important functions (cf. section 2.2) and strongly influence individuals and their living conditions. The lack of a reliable, easily available and applicable instrument should lead neither to making unsatisfactory compromises nor to excluding the welfare state from the analysis. Thus, it is important to explore what kind of instrument is actually needed by scholars looking for an independent variable. Based on the previous discussion, I recommend two objectives that I believe can serve as starting points for a fruitful discussion. First, the requirements for a measurement that is to serve as an explanatory instrument must be examined in detail. Second, there is a need for a detailed theoretical and conceptual discussion of the mechanisms assumed when exploring the outcome of different welfare state arrangements.

The problems identified in this contribution already help to substantiate the first objective because they show obstacles that can be circumvented. Following the preceding discussion, the main problematic issues are lack of clarity, availability, and comparability. Those three aspects can be translated into criteria that can contribute to the development of a more standardised approach: it should be clear what information an indicator is based on, the indicator should be available for a sufficiently large sample to allow replications, and it must be comparable with other studies.

Strictly speaking, neither existing typologies nor composite measures fit these premises. In both cases, the lack of availability for a big enough population is rather obvious. Moreover, they also lack clarity because their operationalisation aims at capturing the multidimensionality of welfare states and are thus based on a variety of indicators. In the case of composite measures, this combination may level out and thereby mask important outliers (Kvist 2011), while the broad categories of typologies may represent much more than just welfare state policies (like political cultures, economic and democratic development et cetera). As a result, neither of the two operationalisations allow determining, which specific part of the operationalisation is at work if an effect is observed. This leaves single indicators as perhaps the most fruitful way to operationalise welfare policies as independent variables. Still, while availability is much better in this case, clarity and comparability are not a given. Social expenditure, for example, is anything but a precise indicator. As argued in section 2.1, high social spending can represent very different things. Furthermore, data sources have to be addressed. Still, single indicators bear one great advantage: by highlighting one specific aspect of welfare stateness, their interpretation is most unequivocal. Perhaps, the best way to include the welfare state as an independent variable is such focus on single issues instead of broad and multidimensional conceptualisations of this very complex institution.

Regarding the second objective, I suggest a closer look at potential dependent variables in order to get a clearer picture of the hypothesised mechanisms. It is not enough to assume that ‘the welfare state’ influences an outcome. A key question is why this should be the case and how the mechanism may work. The answer to both questions does not come from the independent variable, but from the dependent one. This suggests that different dependent variables may require different operationalisations of welfare stateness. Returning to the example of welfare attitudes helps to illustrate this argument. Here, the hypothesis highlighted attitude formation as a result of policy feedback. The underlying mechanism implies a process of evaluation. To test this assumed affect, we would thus require social policy indicators, which contribute to opinion-formation because individuals are likely aware of them. Indicators like waiting days and contribution periods, which are integral parts of composite measures and many typologies, do not fall under that category because only a small part of the population will know these details. However, respondents have at least a basic understanding of the generosity of benefits (e.g. replacement rates), potentially making this a much better indicator.

However, if another exemplary topic is chosen, the argumentation can be very different. When it comes to explaining the risk of poverty, for example, the individual perception and evaluation of social policy plays no role. Rather, the organisational principle and the effectiveness appear to be more important – regardless of whether or not the majority of individuals are actually aware of them (e.g. waiting periods or benefit duration). The aim should therefore be to collect such mechanisms, to systematise them and to offer suitable indicators for their testing that meet comprehensible criteria.

While this chapter has painted a very critical picture of approaches to operationalise the welfare state, the conclusions refer only to a very specific problem—the operationalisation as an independent variable. If the way in which the welfare state is included in cross-national analyses is so inconsistent in this case, how can we achieve more transparency in the future? I have already hinted at a possible approach: it might be worth asking whether the impact welfare states are hypothesised to have is adequately captured in an operationalisation. If this is not the case, how can the actually relevant characteristics of welfare stateness be identified? In the following, I am going to explore this issue in more detail. Based on the findings of this chapter, a catch-all approach to operationalising the welfare state as an independent variable is discarded. Instead, the focus rests on conceptualisations of welfare stateness that are embedded in the theoretical arguments that warrant its inclusion as independent variable in the first place. This means that the main objective going forward is to pinpoint why the welfare state is assumed to shape individual-level outcomes and which conceptual perspectives on the welfare state exist. Focussing on the dependent variables and hypothesised macro–micro mechanisms should be a good point of departure for determining what kinds of measures are actually needed by scholars who want to treat features of the welfare state as an independent variables in multilevel analyses.