1 Introduction

Much of our knowledge about the determinants of subjective well-being is based on survey research. However, surveys differ in their data collection mode. For instance, in volume 22 (issue 1) of Journal of Happiness Studies (Jan 2021), 10 papers used data based on face-to-face interviews (F2F), 3 used data from mixed sources and 9 used data collected by internet-methods (WEB). Although F2F has been considered the‘gold standard’, in recent years WEB is becoming increasingly common. The response rates of F2F surveys have shown a clear, though not uniform, decrease over time (Koen et al., 2018). Simultaneously, the costs of F2F have surged and act as a clear impediment to continued use, particularly in countries where labour costs are high. Concomitantly, internet penetration across the globe has risen sharply (Pandita, 2017; Daikeler et al., 2020), making the use of WEB surveys increasingly attractive, as people can answer questions whenever they feel ready to and with minimal interference. As a result, WEB has become more and more popular, either in combination with F2F (e.g. to boost response rates), leading to a kind of mixed-mode survey design, or as a replacement of F2F (Manfreda et al., 2008; Garbarski et al., 2019).

It is well-known, though, that the mode of data collection can influence the univariate distribution of variables (Epstein et al., 2001; Kreuter et al., 2008; Dillman et al., 2014; Toepoel, 2015). This is particularly true for many of the variables of key interest to subjective well-being researchers (Dolan & Kavetsos, 2016; Sarracino et al., 2017; Sanchez Tome, 2018). However, subjective well-being research is not only interested in univariate distributions of single variables, but also in testing theoretical models about the relationship between subjective well-being indicators and their antecedents. It is therefore surprising that only a few studies have examined what potential consequences the use of different modes of data collection have for the association between variables in multivariate models (see Martin and Lynn (2011); Dolan and Kavetsos (2016); Sarracino et al. (2017)). Whether our substantive conclusions about determinants of subjective well-being indicators depend on the survey mode with which the data have been collected is an issue we rarely reflect upon, let alone examine. This paper aims to start filling this gap in our knowledge by exploring the existence and magnitude of mode effects on the multivariate associations between subjective well-being indicators on the one hand, and a set of objective and subjective indicators on the other.

To examine mode effects on multivariate analyses, we use data from a pilot study conducted in 2018 in Germany and Croatia within the Generations and Gender Project (GGP), a cross-national social science data infrastructure which collects survey data in more than 20 countries (Gauthier et al., 2018). Respondents were randomly assigned to two different modes: (a) F2F; or (b) WEB. We compare to what extent multivariate models on the relationship between subjective well-being indicators and a set of determinants were affected by the mode that respondents were assigned to.

2 F2F Versus WEB: What Do We Know?

A rich literature exists identifying differences between on-line survey responses and those collected with other modes (telephone, face-to-face, self-administered) (Epstein et al., 2001; Currivan et al., 2004). Such differences are defined as mode effects, a concept which refers to systematic dissimilarity between data collected with different survey modes. Generally, it has been found that people who answer subjective well-being related questions in WEB mode report less positively on their well-being than people who answer in other modes, like F2F and telephone (Sarracino et al., 2017; Sanchez Tome, 2018). However, this is not universally true (Hendriks et al., 2018). For instance, Martin and Lynn (2011) did not find differences in questions on life satisfaction between F2F and a mixed-mode approach including WEB.

One important mechanism that could drive the mode effect is social desirability bias, by which respondents over-report‘desirable’ attitudes or behaviours and under-report‘undesirable’attitudes or behaviours, especially on questions related to sensitive topics (Edwards, 1957; Tourangeau & Yan, 2007). Social desirability bias rests on the idea that there are social norms governing a range of attitudes and behaviours, and that people may consciously or unconsciously misrepresent themselves to comply with these norms (Krumpal, 2013). This mechanism is assumed to be very important in F2F interviews, with the physical presence of an interviewer, while it is assumed to be less present in WEB surveys due to their perceived high level of anonymity and confidentiality (Aquilino, 1994; Kreuter et al., 2008; Schork et al., 2021). As a result of the differential susceptibility to social desirability bias of WEB and F2F, it can be expected that the univariate distributions of subjective well-being related items differ between WEB and F2F, with people generally giving more socially desirable responses in the latter than in the former.

Indeed, such mode effects on univariate distributions have been examined extensively in the literature. Focusing on the comparison between WEB and F2F, it has for instance been found that respondents in WEB mode score lower on a‘happy-healthy index’than respondents in F2F mode (Heerwegh, 2009). Zhang et al. (2017) reported much lower scores on subjective well-being indicators, like life satisfaction, happiness and positive mental health among respondents in WEB mode than in F2F mode. Generally, these studies are in line with the suggestion made above that WEB leads to less social desirability in the answering patterns for indicators than F2F.

However, we identified only three studies and one meta-analysis that explicitly examined the effect of survey mode on associations between subjective well-being indicators and their potential correlates, although none of them directly compared WEB and F2F modes. Sarracino et al. (2017) compared WEB and telephone mode in data from Luxembourg and examined whether the association between life satisfaction on the one hand and gender, age, education, employment status and household income on the other hand differed between both modes. The only statistically significant difference was that income effects on life satisfaction were stronger in WEB than in telephone mode. Sanchez Tome (2018) compared WEB with mail and telephone mode. She examined the relationship between happiness, social trust and job satisfaction on the one hand and a set of demographic and socio-economic determinants on the other. She did find differences in the strength of a number of associations, although no formal statistical tests were performed. Dolan and Kavetsos (2016) also examined the determinants of subjective well-being indicators, but compared F2F and telephone mode using data from the UK Annual Population Survey. They examined how different well-being indicators correlated with gender, age, partner status, level of education, employment status and disability status. They concluded that mode effects in associations exist, but generally are small. However, they did not explicitly test whether differences were statistically significant. Finally, Walter et al. (2019) conducted a meta-analysis in which they compared the association between several work-related subjective well-being variables and independent variables like leadership and personality that were observed in online panels to the same associations in conventionally sourced data, like F2F interviews. They concluded that most associations are more or less comparable in both designs. A strong feature of this study is that it was able to compare many studies. However, the comparison is between F2F and online panels rather than between F2F and data collected from random WEB samples. Even more important, in their meta-analysis they were not able to make the data collection for the WEB and F2F modes completely comparable. Thus, the authors did not succeed in disentangling the ’real’mode effect from the individual self-selection (into the survey mode) effect.

Two other studies examined mode effects on associations between indicators not linked to subjective well-being. Martin and Lynn (2011) analysed data from an experiment conducted within the European Social Survey in which respondents who had taken part in a F2F interview were compared to respondents who either had a free choice between F2F, WEB and telephone interview or were pushed to WEB first, and if not responding were asked to conduct the interview by telephone, and if they were not willing to do so, by F2F. They compared a series of associations between F2F mode and the two mixed-modes combined. In modelling determinants of political interest, only two out of sixteen determinants showed a statistically significant mode effect in bivariate analyses and only one of them showed a statistically significant mode effect in a multivariate analysis. In modelling determinants of political activism (voter turnout, political involvement) about one in five determinants showed mode effects. Martin and Lynn (2011) concluded that there is evidence for mode effects on the associations of interest, but that it is not overwhelming. In the majority of examined associations no mode effects were detected. Jackle et al. (2010) compared associations between attitudes towards immigration and a set of nine other subjective indicators in a European Social Survey experiment comparing F2F and telephone mode and found only one statistically significant mode effect on associations.

This brief literature review shows that very few studies have examined mode effects on associations within multivariate models. No studies that exclusively compare F2F and WEB (the two most likely types of mode at this time) have been identified at all. This would be unproblematic if mode does not influence univariate distributions of variables. In that instance, it is likely to assume that no mode effect on the strength of the association between these two variables is present either. But, as mentioned earlier, we know that many subjective well-being indicators are influenced by mode effect. This makes it highly relevant to consider whether there is a mode effect on the strength of the association if a mode effect exists in one or both of the univariate distributions of the relevant variables.

3 Choosing a Set of Well-being Indicators and their Determinants

To explore whether associations between well-being and its determinants are subject to mode effects, we decided to focus on multivariate models that correlate a number of well-being indicators with a set of objective and subjective determinants, on the basis both of theoretical considerations and the availability of suitable indicators in the survey.

3.1 Subjective Well-being Indicators

Regarding the well-being indicators, we focus on a number of indicators related to well-being in key life domains. The four domains are: (i) work and employment; (ii) family; (iii) social relationships; (iv) health. Within the domain of work and employment, we examine satisfaction with the current job. In the family domain, we consider satisfaction with the partner relationship. In the social relationships domain, we use loneliness as our key indicator. Finally, in the health domain, we choose people’s subjective health assessment.

3.2 Objective Determinants of Well-being

We select a number of individual characteristics that are key predictors of well-being, namely age, gender and education; interestingly, the extent to which people are prone to provide socially desirable answers to questions on well-being may depend on these characteristics.

Men and women may differ in their tendency to provide social desirable answers (Fisher & Dube, 2005). How this works out for well-being indicators is not clear in advance, and may vary by whether satisfaction is measured in life domains that are traditionally more male-oriented or life domains that are traditionally more female-oriented.

Age is another interesting correlate to consider, as the tendency to provide socially desirable answers has often been found to increase with age (Hitchcott et al., 2020; Soubelet & Salthouse, 2011), which could lead to an inflated positive correlation between age and well-being factors. Given that WEB could reduce social desirability effects compared to F2F, using WEB might attenuate the association between age and well-being indicators.

As for education, Heerwig and McCabe (2009) stated that in general a tendency has been found for the higher educated to be more susceptible to report socially desirable behaviours. In their own research on US presidential candidates, however, they did not observe educational differentials in social desirability in the preference for a black president.

3.3 Subjective Determinants of Well-being

In addition, we decide to include work-life balance (WLB from here onwards) as a potential subjective correlate of well-being. Given that this indicator is included less frequently in models of subjective well-being, some additional reflections on this choice are in order. As [451]Pichler (2009) noticed, WLB ‘is experienced when demands from the domain of (paid) work are compatible with demands from other domains’. Poor WLB can have negative consequences for different aspects of people’s subjective well-being (Noda, 2020; Schnettler et al., 2021; Sirgy et al., 2020). In their review, Sirgy and Lee (2018) suggested that WLB is associated with work-related outcomes, such as job satisfaction, absenteeism and burn-out, with non-work-related outcomes, such as marital satisfaction, leisure satisfaction and family conflict, and with stress-related outcomes, such as depression, illness symptoms and alcohol consumption.

First, there are clear links between WLB and work-related outcomes. For instance, Haar et al. (2014) showed that WLB is positively related to job satisfaction across a range of different countries (France, Spain, Italy, Malaysia, China and New Zealand), while Uglanova and Dettmers (2018) found a positive association between flexible working time arrangements and job satisfaction both for men and women in Germany.

Second, WLB is related to family and social network outcomes. Muurlink et al. (2014) showed that work-life imbalance is not only associated with a lower relationship satisfaction for people experiencing the imbalance, but also with a lower relationship satisfaction reported by their spouse. Ferguson et al. (2012) in a longitudinal study showed that WLB is positively associated with both job satisfaction and relationship satisfaction. Given that WLB positively influences the time and energy people have available for catering to their personal relationships, it can be expected that WLB is also negatively related to loneliness. Loneliness can be viewed as an imbalance between the quality and quantity of the relationships that one has and the quality and quantity of the relationship one wants (De Jong Gierveld & van Tilburg, 2006).

Third, WLB is related to health and other stress-related outcomes. It has been linked to mental health outcomes (Haar et al., 2014; Hagqvist et al., 2017; Lunau et al., 2014) and to physical health outcomes (Lunau et al., 2014). This potential link between WLB and well-being in different life domains makes it an ideal candidate to include in an exploration of mode effects on well-being associations.

Thus, this paper will empirically examine the effect of survey mode on the association between (i) indicators of subjective job-related, family-related, social relationship-related and health-related well-being and (ii) gender, age, education and WLB. This will be done in a multivariate framework.

4 Data and Methods

4.1 Data

Our data derive from an experiment in the context of the Generations and Gender Programme (GGP), which is a cross-national social science data infrastructure that collects micro- and macro-level data on demographically relevant topics in over 20 countries (Gauthier et al., 2018). In the past, the Generation and Gender Survey (GGS)-the focal survey instrument of the GGP- has been conducted primarily in F2F (CAPI) mode. In 2018, an experimental pilot study was conducted in three countries (Germany, Croatia, Portugal) to study the possibilities of implementing a push-to-web design (Lugtig et al., 2022). Respondents were aged 18 to 49 at the time of the survey. In Portugal, the overall experimental design of the pilot turned out to be problematic, as the fieldwork window was too short to realize an adequate sample and an individual listing of individuals was not available. Therefore, we restricted our analysis to data from Germany and Croatia were a listing of individuals was available for sampling.

In both countries, respondents were randomly assigned to either WEB or F2F mode. A small selection of non-responding WEB invitees was followed up with F2F in both Croatia and Germany. In Croatia, this selection was based on the regional availability of interviewers. In Germany it was based on whether the respondents had received an unconditional incentive when invited to participate in the study. As this selection was not random, we excluded WEB invitees that responded only after having been contacted by an interviewer. This allowed us to make a‘pure’comparison between WEB and F2F modes and to avoid bias due to self-selection of participants into the F2F mode. In Germany, response rates were somewhat lower for WEB than for F2F (23.7% versus 29.5%), whereas the reverse was true for Croatia (49.5% versus 27.7%). Due to the design of the experiment, only a small minority of respondents were assigned to the F2F condition in both countries. In Germany, 1492 (88.6%) respondents reported in WEB mode and 192 (11.4%) in F2F mode. In Croatia, 1296 (89.7%) respondents reported in WEB mode and 149 (10.3%) in F2F.

4.2 Measures

For our substantive example, we selected as dependent variables a set of GGS survey items and scales related to subjective well-being. The choice of indicators was driven by substantive arguments (we wanted to include indicators that tap into the important life domains of work, family, social relationship and health) and by the practical availability of subjective well-being indicators in the survey.

First, respondents expressed their evaluation of their satisfaction with their job, answering the question“How satisfied are you with your current job?”(0“Not at all satisfied”, 10“Completely satisfied”). Second, respondents were asked to evaluate their satisfaction with their relationship by answering the question “How satisfied are you with your relationship with your partner/spouse?”. Again, the possible answers ranged from 0“Not at all satisfied”to 10“Completely satisfied”. Third, we used a loneliness scale based on six items (De Jong Gierveld & van Tilburg, 2006). The items used were: 1“There are plenty of people that I can lean on in case of trouble”; 2“I experience a general sense of emptiness”; 3“I miss having people around”; 4“There are many people that I can count on completely”; 5“Often, I feel rejected”; 6“There are enough people that I feel close to”. Answer categories were“Yes”,“More or less”and“No”. We followed the instructions of the authors in producing scale scores, by counting the number of times that respondents chose the middle or high scores on these six items in terms of being lonely. For instance, if people answered“Yes”or“More or less”to the item “Often, I feel rejected”, this would add a point to their overall loneliness score, ending with a scale ranging from 0 to 6 (0 “Lowest loneliness”, 6“Highest”). This scale has been shown to be reliable and comparable across contexts and modes (Van Tilburg & de Leeuw, 1991; De Jong Gierveld & van Tilburg, 2010). Our fourth and final measure of well-being, was subjective health, based on the question “How is your health in general?”(range from 1“Very bad”to 5“Very good”).

Then, to focus on the potential heterogeneity of the mode effect on subjective well-being across individual characteristics, we explore the moderating role of the three objective determinants we selected: gender (“Female”,“Male”); age classes (“18–25”,“26–35”,“36–45”, “46 or +”);Footnote 1 education (“Low”, corresponding to ISCED 0-2; “Medium”, ISCED 3-4;“High”, ISCED 5-8). In order to expand our view, we also check the moderating role of additional socio-demographic variables, namely being married; having children younger than 6 years in the household; total number of children; being a citizen of the surveyed country.

Finally, we built the measure of subjective WLB following the International Social Survey Programme (ISSP). The GGS uses the same items and answering scale as in the ISSP (Breyer & Bluemke, 2016), but with answer categories reversed. It reflects the individual’s feeling of not succeeding in combining work tasks and familiar duties. It is built combining the scores (from 1“Several times a week”to 4“Never”) the individuals indicated for the following items: 1‘I have come home from work too tired to do the chores that need to be done’; 2“It has been difficult for me to fulfil my family responsibilities because of the amount of time I spent on my job”; 3“I have arrived at work too tired to function well because of the household work I had done”; 4“I have found it difficult to concentrate at work because of my family responsibilities”. These items show good internal consistency with an ordinal \(\alpha\) = 0.83 (Zumbo et al., 2007) (see Table A1 in the Online Appendix for the polychoric correlation matrix).

Thus, we calculated factors scores with an iterated principal factors methodFootnote 2 (Fabrigar et al., 1999). in order to create a WLB index, where a higher score implicates a higher level of WLB. Additionally, we examined measurement invariance of WLB across modes and countries by running a multi-group confirmatory factor analysis (MGCFA) where we imposed between-group constraints on factor loadings and used an asymptotically distribution free (ADF) estimator (Bollen, 1989). This check suggests that the association between our four indicators of WLB and the latent concept of WLB is invariant and does not differ by country or mode (see Table A2 in the Online Appendix for details) (Mellenbergh, 1989).

4.3 Analytical Strategy

Our analytical strategy is divided in two stages. In the first sage, we examined whether the dependent variables of subjective well-being were affected by a mode effect per se. We accomplished this by performing a number of statistical analyses. First, we tested whether the means of our variables differed between modes. Next, we tested whether the complete distribution of the dependent variables differed by mode using the Wilcoxon rank-sum test. Third, we used the Blinder-Oaxaca decomposition to examine which part of the difference between modes in the mean scores of the dependent variables could be explained by differences in demographic characteristics of respondents over different modes. This method, which was originally developed to study gender wage discrimination in the labour market (Blinder, 1973; Oaxaca, 1973), has recently been applied in the literature on subjective well-being (Helliwell & Barrington-Leigh, 2010; Sarracino, 2013).

The Blinder-Oaxaca decomposition permits the decomposition of the mode gap in the well-being indicators into two components. The first component, the explained part of the gap, comes from the composition of the sample and it is attributable to different observable characteristics of respondents. The second component, the unexplained part of the gap, indicates which part is due to differences in the estimated coefficients, namely how people interplay with a specific survey mode.

Formally, the Blinder-Oaxaca decomposition is:

$$\begin{aligned}\Delta Y = \underbrace{ \left( \overline{X}_a - \overline{X}_b \right) \cdot \beta '}_\text {explained} + \underbrace{\overline{X}_a \cdot \left( \beta _a - \beta ' \right) + \overline{X}_b \cdot \left( \beta ' - \beta _b \right) }_\text {unexplained}\end{aligned}$$

\(\Delta Y\) is the gap in the well-being indicators; \(X_a\) and \(X_b\) are two vectors of explanatory variables of the two groups of respondents (WEB and F2F); \(\beta _a\) and \(\beta _b\) are the coefficients estimated for the two groups of respondents; \(\beta '\) is a vector of non-discriminatory coefficients to evaluate to what extent socio-demographic characteristics of individuals explain the overall difference in the responses by survey mode. This vector comes from the parameter \(\Omega\), estimated with a pooled model (Neumark, 1988; Oaxaca & Ransom, 1994).

We checked the randomization of the two groups controlling for a number of relevant socio-demographic characteristics, included in the vector \(\overline{X}_a\) (for F2F respondents) and \(\overline{X}_b\) (for WEB respondents): (a) gender; (b) age; (c) level of education; (d) being married; (e) having children younger than 6 years in the household; (f) total number of children; (g) being a citizen of the surveyed country; (h) country of survey. If the differences in responses to the well-being indicators by mode are determined by a“genuine”mode effect and not by a“compositional” effect, we would expect the explained part of the gap to be small and the unexplained part of the gap to be large.

In the second step, we used OLS regression models to explore whether the subjective well-being determinants differ by survey modeFootnote 3(Ferrer-i Carbonell & Frijters, 2004). We first included in the model our socio-demographic variables of interest and WLB indicator main effects (Model 1), and then we added interaction terms between mode of survey administration and our subjective well-being determinants, to estimate their moderating role on the association between mode and subjective well-being indicators (Model 2). In order to ease the comparability of coefficients, in this step we standardized all the dependent variables and the \(WLB_s\) indicator. To this aim, we estimated two nested models: the first step (Model 1) is:

$$\begin{aligned} {Y}_i = {\beta }_0 + {\beta }_1({F2F}_i) + {\beta }_2({DEMO}_i) + {\beta }_3({WLB}_{i}) + {\gamma }_c + {\epsilon }_i \end{aligned}$$

where Y represents the score for each of the subjective well-being indicators considered; F2F is a dummy variable denoting the survey mode (“WEB”;“F2F”); DEMO is a vector of relevant socio-demographic characteristics (the same as included in the Oaxaca-Blinder decomposition: gender, age, level education, being married, children younger than 6 years in the household, total number of children, being a citizen of the surveyed country); WLB is the measure of work-life balance; \({\gamma _c}\) is the country fixed effects.

$$\begin{aligned} {Y}_i= & {} {\beta }_0 + {\beta }_1({F2F}_i) + {\beta }_2({DEMO}_i) + {\beta }_3({WLB}_{i}) \nonumber \\&+ {\beta }_4({F2F}_i*{DEMO}_i) + {\beta }_5({F2F}_i*{WLB}_{i}) + {\gamma }_c + {\epsilon }_i \end{aligned}$$

In the second step (Model 2) we added the interaction between the mode of survey administration and both the socio-demographic variables and the WLB indicator to estimate to what extent mode moderates the effect of these variables. For the sake of parsimony, we will discuss in the main text only the results for the main predictors of subjective well-being potentially affected by social desirability bias (gender, age, education, WLB). Results for the other covariates included in the model are available in Table A3 in the Online Appendix.

5 Results

5.1 Mode Differences in Univariate Distributions

Table 1 shows descriptive information on the distribution of our covariates of interest across the two modes. The two samples (WEB and F2F respondents) are similar in terms of some socio-demographic characteristics. At the same time, some differences emerge with respect to (1) level of education, with WEB respondents having slightly higher education than F2F ones; (2) presence of children younger than 6 years in the household and total number of children, with both conditions more often observed among F2F than WEB respondents; (3) being a citizen of the surveyed country, a characteristic somewhat more present among WEB respondents. Thus, the two samples in this study are not fully balanced. This emphasizes the need to include these variables as covariates in our multivariate models.

In this light, to rule out the possibility that our results may be driven not by a“genuine”mode effect but by a“compositional” effect (people with different characteristics may be differently selected into the mode of response, and this may lead us to report a spurious association between subjective well-being indicators and our independent variables), we applied the Blinder-Oaxaca decomposition (see the analytical strategy section for a description). By doing so, we were able to examine whether the differential distribution of respondents across modes influenced the mode effects observed in the dependent variable. The results for our four well-being variables are graphically presented in Fig. 1. It shows that only a very small proportion of the detected mode effect is due to differences in socio-demographics (explained part), while the biggest part is unexplained. Thus, the Blinder-Oaxaca decomposition demonstrates that it is unlikely that any mode effects we observe are driven by selection bias, and thus increasing the likelihood that they are“genuine”mode effects.

Table 1 Summary statistics for independent variables by survey mode
Fig. 1
figure 1

Blinder-Oaxaca decomposition of subjective well-being indicators gap by survey mode

Hence, we move on to examining mode differences in the univariate distributions of our subjective well-being variables. In order to accomplish this analytical goal, we examine the dependent variables’ cumulative distribution functions (cdf) by survey mode and test the hypothesis that the two samples (WEB and F2F respondents) are from populations with the same distribution running the Wilcoxon rank-sum test (Wilcoxon, 1945; Mann & Whitney, 1947).

Results of the Wilcoxon rank-sum test, shown in Fig. 2, inform us that all the subjective well-being indicators have a different distribution of scores by survey mode, underscoring the existence of a mode effect in our subjective well-being indicators. In WEB mode, respondents are more likely to give a more negative assessment of their subjective well-being than in F2F mode. The average satisfaction both with the current job and with the relationship are lower in WEB than in F2F (by 4.9% and 5.0%, respectively). The largest difference between WEB and F2F is observed for loneliness: its level is almost 10% higher in WEB than in F2F (0.69 on a scale from 0 to 6). Finally, the mode effect for subjective health is equal to − 0.11 points on a scale from 1 to 5 (less than 2% of the total).

Fig. 2
figure 2

Cumulative distribution function of subjective well-being indicators by survey mode

5.2 Mode Differences in Multivariate Distributions

Next, we examined whether any evidence of mode effects on subjective well-being could be found using an OLS framework. In Model 1, we examined whether there were mode effects on the mean score of well-being, controlling for a set of socio-demographic variables and WLB. In Model 2, we focused on whether the effects of the socio-demographic variables and WLB on subjective well-being varied by survey mode by including interaction terms between survey mode and these variables in the model. Regression coefficients of these models are presented in Table 2.

Table 2 Association between mode of survey, socio-demographic variables, WLB and subjective well-being indicators. Models 1 and 2. OLS regression.
Fig. 3
figure 3

OLS regression estimates with 95% CI of the effect of socio-demographic characteristics, survey mode and WLB on subjective well-being indicators (based on Model 1 in Table 2)

Results of Model 1 confirm the existence of a mode effect on our subjective well-being variables. In Fig. 3, dots represent the point estimates of the linear regression coefficients and the lines represent the width of the 95% confidence intervals. Respondents in F2F mode report higher satisfaction with their job, with their relationship and subjective health, and lower levels of loneliness. Regarding the individual characteristics, a negative gradient between age and satisfaction with relationship and subjective health is evident, with older individuals reporting lower scores on these two outcomes. Job satisfaction shows no significant association with any individual characteristic.

With respect to the additional socio-demographic variables (see Table A3 in the Online Appendix) it turns out that those who are married are more satisfied with their relationship than people who are not married. Furthermore, the married report lower loneliness than the non-married. Respondents who have children under the age of 6 report lower satisfaction with their relationship and slightly higher loneliness levels than respondents without children under the age of 6. Finally, being a citizen of the surveyed country is negatively associated with loneliness.

Fig. 4
figure 4

Average predicted values of subjective well-being indicators for gender by survey mode (based on Model 2 in Table 2)

Fig. 5
figure 5

Average predicted values of subjective well-being indicators for different age groups by survey mode (based on Model 2 in Table 2)

Fig. 6
figure 6

Average predicted values of subjective well-being indicators for different educational levels by survey mode (based on Model 2 in Table 2)

To analyse if the effect of substantive variables on individual well-being is moderated by survey mode, we then interacted the mode of survey administration with the determinants of subjective well-being (Model 2). To complement the results presented in Table 2, we plotted the average predicted values (AP), so as to illustrate the relationship between our socio-demographic variables of interest and the indicators of subjective well-being over survey mode (Figs. 4, 5, 6 and 7). In these Figures, dots are the average predicted values and vertical lines are the width of the 95% confidence intervals. Hence, black dots show, for each of our three selected socio-demographic characteristics, the predicted values of our standardised subjective well-being indicators in WEB mode; analogously, grey dots show the predicted values in F2F mode.

Results for gender in Table 2 do not reveal any statistically significant interaction with the mode of survey administration for any of the four subjective well-being indicators considered. This suggests that gender differences in well-being are equally large in WEB and F2F mode.

When considering age, Table 2 shows that one interaction term regarding the senior individuals (aged 46 or more) on loneliness is statistically significant at \(\alpha\) = 10% (\(\beta\) = 0.64), meaning that the loneliness difference between individuals aged 18-25 and those aged 46 or more varies by survey mode. Indeed, as shown by Figure 5, while in WEB mode individuals aged 18-25 have an AP on loneliness of \(-\,\)0.04 (CI: \(-\,\)0.28, 0.19) and those aged 46 or more of 0.05 (CI: \(-\,\)0.07, 0.18), the same AP in F2F mode is equal to \(-\,\)0.86 (CI: \(-\,\)1.47, \(-\,\)0.26) for people aged 18-25 and to \(-\,\)0.12 (CI: \(-\,\)0.43, 0.19) for those aged 46 or more. With respect to the other subjective well-being indicators, Fig. 5 shows that in WEB mode, no age gradient is visible in the loneliness score, whereas in F2F mode, loneliness scores are higher for the oldest age groups. Similarly, while there is no age gradient in WEB mode for subjective health, a negative gradient emerges in F2F, with individuals aged 46 or more reporting significantly lower health (AP = \(-\,\)0.17; CI: \(-\,\)0.46, 0.12) than those aged 26–35 (AP = 0.38; CI: 0.13, 0.62). As a robustness check, the model was also run with age as either a continuous or a dichotomous variable (see Tables A6 and A7 in the Online Appendix). Results were consistent with those reported in the main text. The results for additional socio-demographic variables in Table A3 in the Online Appendix show that there is also a positive interaction between citizenship and survey mode, implying that citizens only report lower loneliness if they respond to a WEB survey, but not to a F2F survey.

When looking at the relationship between education and job satisfaction, in Table 2 the interaction between medium education and mode is negative (\(\beta\) = − 0.41) and statistically significant at \(\alpha\) = 5%. The results in Figure 6 suggest that those with medium education report lower job satisfaction (AP = − 0.05; CI: − 0.27, .17) than those with low (AP = .35; CI: − 0.00, .70) or high (AP = .27; CI: .04, .50) education in F2F mode, whereas no educational differences in job satisfaction are visible in WEB mode. Table 2 shows also that for loneliness, a positive (\(\beta\) = .30) and statistically significant interaction at \(\alpha\) = 10% is visible between high education and mode. Figure 6 points out that among highly-educated no statistically significant difference in loneliness emerges across survey mode, with an AP of − 0.14 (CI: − 0.22, − 0.07) in WEB mode and of − 0.30 (CI: − 0.52, − 0.08) in F2F mode. Both the low- and the mid-educated individuals report higher loneliness in WEB than in F2F mode, with low-educated having an AP of .16 (CI: − 0.02, .34) in WEB mode and of − 0.30 (CI: − 0.65, .05) in F2F mode, and mid-educated with an AP of .04 (CI: − 0.05, .12) in WEB mode and of − 0.47 (CI: − 0.68, − 0.26) in F2F mode. As a robustness test, the model was re-estimated with education specified as a dichotomous variable and the results, presented in Table A8 in the Online Appendix, were consistent with the main results.

Fig. 7
figure 7

Average predicted values of subjective well-being indicators for different WLB levels by survey mode (based on Model 2 in Table 2)

As our last variable of interest, we focused on the relation between WLB and each of the four well-being indicators. From Table 2, showing the relationship between WLB and our dependent variables over survey mode, it emerges that the only interaction that is significant at \(\alpha\) = 10% is between WLB and subjective health (\(\beta\) = \(-\,\)0.14). Figure 7 shows that for this indicator the WLB gradient is less steep for F2F than for WEB mode, implying that the relationship between WLB and health is stronger in WEB than in F2F. Figure 7 also shows that respondents who experience high WLB report the same level of health irrespective of whether they answered in WEB or F2F mode: for instance, individuals with a value of 1.3 in our WLB indicator have an AP of 0.42 (CI: 0.33, 0.50) if they responded in WEB mode, and of 0.34 (CI: 0.15, 0.54) if they responded in F2F mode. Conversely, respondents who experience low WLB are more likely to report low subjective health in WEB than in F2F: as an example, people with a an value of \(-\,\)2.7 in the WLB indicator have an AP of \(-\,\)0.74 (CI: \(-\,\)0.88, \(-\,\)0.59) in WEB mode and of \(-\,\)0.07 (CI; \(-\,\)0.46, 0.33) in F2F mode. For the three other subjective well-being indicators, no statistically significant interaction is observed in Table 2.

6 Discussion

Different types of survey data are used to test models about subjective well-being indicators and their determinants. It is well-known that survey mode (whether a survey is conducted in face-to-face mode, by telephone, via the web, via mail etc.) strongly affects the average scores on subjective well-being variables. However, little is known about whether survey mode affects the association between subjective well-being indicators and their determinants. This is unfortunate, as web surveys are rapidly gaining in popularity and starting to replace face-to-face interviews as the most common mode of data collection. Therefore, the aim of this study is to examine whether survey mode affects the statistical relationships within subjective well-being models, by comparing the relationship between a set of objective and subjective measures and subjective well-being indicators in WEB and F2F modes. We use representative data from a sophisticated mixed-mode design in a pilot survey of the Generations and Gender Programme conducted in Germany and Croatia.

As expected, we observe large differences in the mean scores on subjective well-being indicators. In WEB mode, respondents report a much lower satisfaction with their job, satisfaction with their partner relationship and subjective health, and higher levels of loneliness. These findings stress that, if one is interested in monitoring trends in subjective well-being, potential changes in modes over time or cross-national differences should be considered. For instance, if one moves from F2F to WEB for cost-related reasons, one should try to get an inkling of the differences in mode effects in subjective well-being indicators in order to be able to separate ‘genuine’change in subjective well-being over time from change resulting from a different survey mode administration.

We observe little evidence of a mode effect in the association between well-being indicators and basic socio-demographic variables. Where differences are shown to be statistically significant, the effects suggest a higher degree of social desirability amongst younger and more educated respondents. Such results suggest a degree of caution when describing socio-economic differentials in well-being through data collected using multiple modes.

The relationship between work-life balance (WLB) and subjective well-being indicators is strong. Those respondents who report a better WLB are more satisfied with their job and their partner relationship, less lonely and report better health. The good news is that these relationships are generally found irrespective of the mode in which the data are collected. Only in one of four comparisons do we observe a slight difference by mode: the relationship between WLB and subjective health is somewhat stronger in WEB than in F2F mode. This leads us to conclude that generally mode does not affect the substantive conclusions we draw about how work-life balance and subjective well-being are interrelated, even if a weak effect seems to emerge when considering self-assessed health. Further research is needed to shed light on this finding.

In general, change from F2F to WEB mode will not lead to a need to rethink our causal or associational models. However, associations generally were slightly larger in the WEB version. A potential reason for this difference could be that the variation in the answering patterns is larger in WEB, as answering patterns are less influenced by social desirability bias than in F2F mode. This might warrant further investigation.

Clearly, this study also has some limitations. First, we only studied a subset of all potential associations that could be analyzed. We opted to focus on a substantively interesting example rather than to embark on a‘fishing expedition’into all potential associations, as science generally develops by focusing on relationships and models of theoretical interest. But it could be, of course, that a focus on other indicators could lead to different results. However, we think that the fact that WEB generally leads to answers that are less susceptible to social desirability bias than F2F, will make the variation in answering patterns generally larger in WEB, and make it easier to observe statistically significant effects.

Second, our sample sizes were not particularly large making it hard to detect mode differences. With larger sample sizes, differences could have turned out to be statistically significant. Finally, we focused on just two country contexts. It could be that country-context matters. Although we did not focus on country differences in mode effects in this study, it could be that mode effects are stronger in countries where people are more susceptible to social desirability bias.

Overall, our conclusion is that mode effects on mean scores of subjective well-being indicators are much larger than on associations within models in a multivariate framework. Thus, these results suggest that our conclusions about substantive models studying subjective well-being are relatively robust against a change in survey mode. Still, further research is warranted to corroborate these findings and their application to specific contexts.