Keywords

Introduction

A multidimensional, multilevel and multidirectional perspective on vulnerability across the life course, capable of shedding light on the dynamic interplay among stressors, resources, and reserves for different subgroups at different moments (Spini & Widmer, this book/this volume), carries with it a number of methodological imperatives. First, it implies the need for empirical research based on rich longitudinal data and methods sensitive to processes, which, in turn, make it possible to draw inferences to wider populations of interest. This capability is key to deepening understanding of the common processes that govern vulnerability dynamics and ensuring the possibility of supporting positive changes in people’s lives. Second, a consideration of the quality of the data analysed and what can compromise comparability across different temporal and geographic contexts and population subgroups is paramount for ensuring the validity and utility of research findings. In this endeavour, survey research plays a central role, and research in the field of survey methodology is indispensable to those concerned with the substantive objectives of life course research (Oris et al., 2016).

One of the determining factors affecting the quality of survey data is the method by which participants respond to questionnaires—referred to as the mode of data collection. Mode can affect data quality by determining who is able and willing to respond to a survey and how they respond (i.e., the answers they give). Important developments in internet-based data collection technologies in recent years, combined with pressure to reduce fieldwork costs, have resulted in the growing use of ‘mixed mode’ designs, which use a combination of methods for administering questionnaires to members of the same sample. The use of multiple modes can affect the comparability of data gathered in different groups and in a longitudinal setting at different points in time. As an increasing number of longitudinal surveys begin to combine modes in the production of life course data (Voorpostel, Lipps, & Roberts, 2021), accounting for mode(s) in the analysis and interpretation of that data is becoming increasingly necessary in quantitative research on vulnerability.

In this chapter, we discuss some of the key challenges that mixed mode designs entail, particularly in longitudinal surveys, and illustrate them with a synthesis of findings from our own research relating to (1) the effects of combining modes on response rates and the representativeness of survey samples and (2) effects for measurement comparability. First, we describe some of the motivations for mixing survey modes and specific issues relevant in longitudinal studies.

Motivations for Combining Data Collection Modes

In response to growing challenges associated with interviewer modes—notably, a deterioration of data quality and rising (often prohibitive) costs due to declining response rates (Groves, 2011)—researchers have been increasingly turning to mixed mode data collection protocols (De Leeuw, 2018). While web-based surveys have begun to predominate, they often suffer from even lower response rates than surveys conducted in interviewer modes (p. 55) and still exclude the part of the general population without internet access (p. 35). As such, they remain, on their own, inadequate for achieving good representation in general population surveys. Mixed mode surveys provide a solution by using alternative modes of participation for different sample subgroups (Smyth et al., 2014).

From a survey methodological perspective, data quality is a function of two factors: (1) how well units of observation represent a population of interest and (2) how well variables measure attributes of the units of observation. With respect to coverage and representation, attention must be paid to how different modes are combined and with what goal in mind (De Leeuw, 2018; Gravem et al., 2014). For example, ‘push-to-web’ designs aim to maximise the proportion of the sample responding in the lowest-cost mode and hence the cost-saving benefits (Dillman, 2017). Alternative (more costly) modes are then offered sequentially to follow up nonrespondents. Modes can also be combined concurrently by offering respondents a choice (although evidence suggests this may be counterproductive (Tourangeau, 2017)) or by allocating preidentified sample subgroups to different modes (e.g., people with telephone numbers are interviewed by telephone). This approach offers the potential to optimise the survey design for harder-to-survey (often more vulnerable) subgroups, such as older adults (Oris et al., 2016) and national minorities (Herzing et al., 2019). The feasibility of such designs is greater in a longitudinal setting, in which data available from earlier survey waves can inform the fieldwork protocol used in later waves (Kaminska & Lynn, 2017).

Mixing modes has consequences for measurement because the answers people give to survey questions are influenced by mode characteristics such as the degree of perceived anonymity they offer and the amount of cognitive burden they place on respondents (Couper, 2011). These aspects can affect how respondents interpret questions and their ability and motivation to provide accurate and honest answers (Tourangeau et al., 2013). Therefore, mode choice should consider not only the characteristics of the target population but also the measurement goals of a survey (Dillman et al., 2014), especially where the goal is to measure sensitive subjective phenomena, as is often the case in vulnerability research.

Mixed mode designs can be effective in reducing representation errors in one mode (e.g., by increasing response rates or reducing coverage error). However, the decision to combine modes entails blending their different measurement properties, which can affect the accuracy of aggregate estimates, observed relationships between variables, and the equivalence of composite measures and can hinder comparisons between groups of interest that are more or less likely to respond with one mode over another (Tourangeau, 2017). Before proceeding with substantive analyses of mixed mode data, therefore, researchers should first assess the extent of differential measurement errors across modes and, if needed, ideally adjust for them (Hox et al., 2017).

Challenges of Combining Modes in Longitudinal Surveys

The decision to combine modes in a longitudinal survey setting involves additional complications (Jäckle et al., 2017). As with cross-sectional studies, the potential impact on data quality depends on which modes are combined and how. However, in cohort and panel studies, an additional consideration is when modes are combined. New longitudinal surveys may consider mixing modes from the outset, using a combination of modes for recruitment and/or interviewing at subsequent survey waves. Existing longitudinal studies may opt to switch modes from one wave to another for all sample members or transition from a single to a mixed mode design (e.g., to reduce fieldwork costs and mitigate nonresponse). A mode switch between waves could result in selective attrition, thereby affecting the representativeness of the remaining sample and comparisons over time. Timing is again likely to be key. For example, attrition is typically greatest between waves 1 and 2 of a longitudinal survey (De Leeuw & Lugtig, 2015), and particular subgroups—often those considered more vulnerable—are at greater risk of dropping out (e.g., young men, those with lower education, lower income or the unemployed) (e.g., Rothenbühler & Voorpostel, 2016; Lugtig, 2014). It is, therefore, of empirical interest to know whether a mode switch in the second wave will exert a more detrimental impact on sample composition than a switch later in the lifetime of the panel and, if so, which population subgroups will be most affected (Voorpostel, Roberts, & Goordhin, 2021).

Longitudinal surveys aim to produce data that are comparable over time to allow the measurement of change. Changing modes between waves could disrupt the continuity of measurement, leading to bias in such measures (Dillman, 2009). This bias is a particular concern in the study of vulnerability, which takes a special interest in evaluating the impact of life events that occur between survey waves. Switching to self-administered modes from interviewer modes, for example, can reduce errors resulting from social desirability bias (SDB). While measurement accuracy may improve, however, it can be difficult to determine whether changes over time are due to the impact of critical life events, transitions or evolving trajectories or are simply a result of the mode switch. One advantage of mixed mode longitudinal studies is that questionnaire data collected in earlier waves can be used for the purposes of isolating and adjusting differential mode effects on selection and measurement within and across survey waves (Tourangeau, 2017). Similarly, a subsample that continues to respond in the initial mode can be ringfenced and used for adjustment purposes (see Burton & Jäckle, 2020).

As a result of the theoretical risks posed to the quality of longitudinal survey data by combining modes, until recently, relatively few major panel studies had implemented (and documented the effects of) such changes in design. However, many panel studies have started to implement mixed mode designs in recent years, yielding methodological insights into the consequences for response rates and sample composition, as well as apparent effects on measurement equivalence over time (for recent reviews, see Jäckle et al., 2017; Voorpostel, Lipps, & Roberts, 2021). Far less attention has been given, however, to the substantive implications of mixing modes in longitudinal surveys for the study of vulnerability processes. In the following, we draw some tentative conclusions based on a synthesis of our own research findings on this topic.

Effects of Mixing Modes on Representation and Selection Error

In a longitudinal setting, the main outcomes of interest when comparing single and mixed mode survey designs with respect to representation and selection error are initial response rates and the representativeness of the response sample at recruitment (because the sample recruited in the first wave of a panel must be sufficient to withstand the effects of attrition over time) and the extent and impact of subsequent attrition. The assessment of these outcomes is often hindered by a lack of data about the characteristics of nonrespondents. However, in our research, such an assessment was possible thanks to auxiliary sociodemographic data from population registers supplied with the samples (by the Swiss Federal Statistical Office).

In 2012–2013, we conducted a (cross-sectional) mode experiment in the French-speaking region of Switzerland designed to investigate the impact of mode on the measurement of subjective well-being and the experience of critical life events. Sampled individuals were randomly assigned to either telephone, web or mail mode, and alternative modes were used to follow up nonrespondents. The results provided insight into how modes affect response rates and the representation of different subgroups in survey data. In terms of response rates, the three modes varied significantly; they were lowest in the web group (44.5%) compared with the mail (65.4%) and telephone groups (60.7% when considering only those with telephone numbers; 35.7% when also considering those without), demonstrating their likely varied suitability for panel recruitment purposes. Using the sampling data, we assessed the extent of noncoverage error resulting from the exclusion of individuals/households without listed numbers in the telephone group. People without listed numbers differed from those with on several sociodemographic variables: they were more likely to be younger, foreign, unmarried, and living in single-person households in urban areas (Roberts et al., 2016). Their underrepresentation in telephone surveys could introduce bias in measures correlated with these variables (e.g., experiences of vulnerability specific to these subgroups). Combining modes concurrently and sequentially was effective at increasing response rates and improving overall representativeness. For example, complementing the telephone survey with a concurrent mail survey for those without telephone numbers increased the response rate to 56.8%. Adding the mail mode sequentially as a follow-up for nonrespondents further increased the response rate to 66.2% (Roberts et al., 2016; Roberts & Vandenplas, 2017). We also evaluated a sequential ‘push-to-web’ design involving web plus mail plus telephone (for people with a listed telephone number) or web plus mail plus an in-person interview (for people without a listed number). Again, the sequencing of modes helped to increase response rates to 64.4% for web plus mail and further to 70.6% with the addition of telephone and face-to-face interviews for nonrespondents.

Nationality and migration background are often included in vulnerability research and deserve special mention here. We found different participation patterns by nationality for sample members with and without a telephone number. Among the non-Swiss sample members without a telephone number, mailed questionnaires worked better, while starting the survey with a request to respond by web resulted in a substantial bias on the nationality variable. We did not find the same pattern for foreigners with listed telephone numbers. Here, web mode appeared to work well for non-Swiss sample members from bordering countries (there being no language barrier for French nationals at least), and the use of the mail follow-up corrected the initial underrepresentation of non-Swiss sample members from nonbordering countries. These findings underline the value of combining modes for reducing selection errors due to noncoverage and nonresponse in telephone or web-only surveys. They also point to the potential value of using a tailored fieldwork strategy for certain subpopulations to mitigate bias in the representation of foreigners.

In 2017–2018, we conducted a different mode experiment in the context of the Swiss Household Panel (SHP) to investigate the benefits of combining telephone interviews (the primary mode used in the SHP since its launch in 1999) with web-based questionnaires for recruiting and re-interviewing households in a panel setting (Voorpostel et al., 2020). In addition to being a longitudinal survey, the SHP has the added complication of having to administer both a household-level questionnaire (to a respondent referred to as the household reference person (HRP)) and then individual-level questionnaires to all other household members aged 14 and over. In preparation for the third refreshment sample, launched in 2020, the two-wave pilot study compared two alternatives to the standard design. The first was a mixed mode design in which the telephone (or face-to-face if no number was available, as in the standard design) mode was used to administer the household questionnaire to the HRP, and the individual questionnaires were administered by web (with a follow-up by telephone for nonrespondents where numbers were available). The second was a web-only design in which the HRP and the eligible household members were invited by mail to complete the questionnaires online, with a nonresponse follow-up by telephone if a number was available. In wave 2, all the groups were subject to the same protocols, except that half of the mixed mode group was randomly assigned to the web-only design.

The standard SHP (predominantly telephone-based) design generated the highest response rates in the first wave (53% of approached households completed the household questionnaire), compared to 52% in the mixed mode design and 47% in the web-only design (significantly different from the standard design). In the second wave, the differences in household response rates between the groups were smaller (between 74% and 77%) and not statistically significant. To assess the extent to which the sample that participated in the study represented the overall population, we compared the composition of the sample of household members reported (by the HRPs) to be living in participating households in wave 1 with the gross sample of individuals documented as living in sampled households according to the register data from the sampling frame. All designs overrepresented married individuals and Swiss nationals and underrepresented individuals under the age of 30, highlighting generic nonresponse biases frequently observed in panel surveys (e.g., Lugtig, 2014). The telephone group also underrepresented household members who have never married and overrepresented older household members. Analysis of linked administrative data on income found that lower-income households were underrepresented in the web group, while no such difference was observed in the telephone group. Overall, attrition rates were comparable across designs, and most of the nonresponse biases observed were the result of selection in the first wave, with effects decreasing in the second wave (Voorpostel et al., 2020).

The preceding examples illustrate the effects of mixing modes when recruiting to a panel and on attrition between the first two waves. It is also important to consider the effects of switching modes in an existing panel, as this may be harmful to panel loyalty and have damaging effects on sample composition. In 2018, we conducted research into this issue in the context of the LIVES-FORS Cohort Study, a panel of young adults born between 1988 and 1997 who grew up in Switzerland (including an oversample of second-generation immigrants), which is conducted alongside the SHP (see Spini et al., 2019). The aim was to assess the likely impact of a mode switch to web on attrition and the proportion of the sample that might be restricted to participating on a mobile device, which may be suboptimal for the administration of the SHP/Cohort Study questionnaires, which were originally designed for the telephone (see Johnson, 2020). Most of the remaining cohort sample had access to and used the internet on multiple devices, both fixed and mobile, and, not surprisingly given their age, used smartphones for numerous activities on a daily basis, thereby implying that a switch to the web would, in principle, not prevent them from continuing to participate in the study. Indeed, a majority (close to 60%) of the sample said they would prefer to respond by web rather than a telephone interview. Nevertheless, 24% said they would be more willing to participate in the next wave via telephone than by web, placing approximately one quarter of the sample at increased risk of dropping out. The youngest (under 25 years), unmarried, male and Swiss participants still in education were significantly more likely to belong to this attrition risk group, meaning that existing sample selectivity resulting from attrition would likely be compounded by the mode switch.

Effects of Combining Modes on Measurement Error

Mixing modes to reduce costs and improve the representation of target populations entails a trade-off: the introduction of differential measurement errors (Roberts & Vandenplas, 2017). Two types of measurement error are particularly sensitive to mode characteristics: (1) SDB and (2) response effects associated with satisficing (Krosnick, 1991). SDB results from respondents’ tendency to try to portray themselves to researchers in a more favourable light (e.g., by selecting the more socially desirable or acceptable response option or deciding not to report particularly personal experiences) and manifests as overestimates of socially desirable characteristics and underestimates of undesirable ones (Tourangeau & Yan, 2007). In research into vulnerability and resilience, key measures of interest—e.g., exposure to particular stressors or the presence or lack of certain resources or reserves (Widmer and Spini, Chap. 2, this volume)—are likely to be considered sensitive by some respondents and may, therefore, be particularly susceptible to SDB. Because of the presence of an interviewer (and sometimes of other household members) in telephone or face-to-face surveys, respondents are more likely to give socially desirable answers in interviewer modes (Couper, 2011), thereby posing problems for comparisons with data gathered in self-administered modes.

The term ‘satisficing’ in surveys refers to respondents reducing the effort needed to provide accurate and thoughtful answers to survey questions (Krosnick, 1991), resulting in a variety of response effects such as selecting the same scale point in lengthy batteries of items; providing ‘acquiescent’ responses to agree-disagree scale items; selecting the ‘don’t know’ response; and providing shorter, less detailed answers to open-ended questions. In vulnerability research, such response effects are likely to be particularly damaging for long, multi-item scales measuring latent constructs, which are often favoured by psychologists. However, satisficing can also affect the accuracy of objective measures if reduced effort in responding affects recall and estimation accuracy, resulting in issues for retrospective measurement in general (Tourangeau et al., 2000). Depending on the nature of the effect, satisficing can introduce bias into estimates and increase measurement variance (Roberts, 2016). Different mode characteristics can affect response task difficulty (e.g., the faster pace of telephone interviews; the need to read self-administered questionnaires), meaning that the extent of measurement error due to satisficing also varies as a function of mode. Similar to mode differences in SDB, this issue can confound comparisons across samples surveyed in different modes (Hox et al., 2017; Tourangeau, 2017).

A fundamental difficulty when trying to establish the effect of mode on measurement is that the selection effect induced by the mode can render the sample incomparable with samples surveyed in other modes. To detect differential measurement errors across modes, adjustments to balance the samples are needed (Hox et al., 2017). To complicate matters, mode characteristics can interact in complex and not entirely predictable ways with features of the questionnaire design and respondent characteristics to affect the answers given. This complexity means that mode effects can take different forms and may need to be investigated using different statistical methods depending on the type of survey question (Jäckle et al., 2010).

In our 2012–2013 mode experiment, we assessed mode effects on the measurement of subjective wellbeing (known to be susceptible to differential SDB by mode) (Dolan & Kavetsos, 2016). We used a variety of sample adjustment techniques to control differences in observed characteristics across samples responding in different modes and a mix of methods to assess measurement differences between modes (see Sánchez Tomé, 2018, pp. 86–90). A consistent pattern of results emerged that supported previous research findings: Telephone interviewing produced systematically higher mean estimates of wellbeing than mail and web questionnaires, even after controlling for sample differences (Sánchez Tomé, 2018, p. 94). The SHP pilot study produced similar results with respect to SDB: web respondents reported more health problems, more negative feelings, fewer positive feelings and lower satisfaction in almost all domains compared with telephone respondents (Voorpostel et al., 2020).

The 2012–2013 mode experiment asked respondents three open-ended questions about the critical life events that had most impacted their lives. Measurement quality for open-ended questions is often affected by item nonresponse due to the additional effort required to answer them, and where responses are recorded, they may vary by mode as a function of the length and richness of the answers given and the interpretability of the content. Comparing the answers given in different modes, we found that item nonresponse was indeed lower for telephone respondents than self-completion respondents for the first of the three items but increased for the subsequent items, and on average, the answers recorded verbatim by interviewers contained fewer words than responses in the self-completion modes. Furthermore, the impact of certain events (giving birth and illness) was reported more positively by telephone respondents, presumably also due to SDB pressures when sharing answers with an interviewer (Sánchez Tomé, 2018, p. 150).

Conclusions

In this chapter, we have described some of the benefits of combining modes of data collection, particularly in longitudinal surveys. However, we have also highlighted and illustrated some of the challenges involved, notably those associated with differential selection and measurement error across combined modes. The findings of these various Swiss studies contribute to a substantial international literature in support of the conclusion that the choice of survey mode has consequences for the response propensity of different population subgroups, with important implications for the overall representativeness of samples. The development and testing of ‘adaptive survey designs’ (Schouten et al., 2017), which involve tailoring modes to the characteristics of sample members, is recommended to ensure that the optimal and most efficient ways of mixing modes longitudinally are used to achieve a balanced sample that better represents the general population, particularly the potentially more vulnerable subgroups of interest (Kaminska & Lynn, 2017). This balance could be achieved by assigning modes based on predicted response propensities for different subgroups in different modes using available sample data (such as those analysed in our own research) or using data from prior survey waves (as in Johnson, 2020).

Combining modes offers a way to improve response rates and representation. However, if mode affects measurement, comparisons across groups responding in different modes may be compromised. Survey and statistical methodologists are still grappling with the practical implications of this challenge and developing ways for analysts to address it. From a purely statistical point of view, optimal strategies for adjusting mode effects entail a mix of complex experimental designs and model-based approaches to causal inference. Nonetheless, these approaches often rely on problematic assumptions and may externalise some of the cost savings of switching to cheaper modes to data analysts. Current and future research must, therefore, continue to tackle the thorny issue of how to evaluate the severity of mode effects and offer pragmatic recommendations for adjustment solutions that researchers can easily incorporate into their analyses of mixed mode data.

Finally, mode differences in measurement sometimes appear because of suboptimal or variant question formulations and formats used in different modes. Efforts to evaluate and improve questionnaire designs must be prioritised, therefore, with a view to harmonising the stimulus offered across modes and optimising the design for all modes (and response devices) in use (Dillman et al., 2014; Lugtig & Toepoel, 2016). Nevertheless, some measures are likely to remain especially sensitive to mode effects when combining interviewer- and self-administered modes. Life-course researchers interested in comparing groups across contexts and time should remain mindful of this risk in their analyses of mixed mode data and when interpreting their findings. Improving the dialogue between substantive experts in vulnerability across the life course and methodological experts in longitudinal research design, measurement and analysis is key in this endeavour.