Advertisement

Journal of Well-Being Assessment

, Volume 2, Issue 1, pp 41–55 | Cite as

Incentivised Online Panel Recruitment and Subjective Wellbeing: Caveat Emptor

  • Melissa K. Weinberg
  • Robert A. Cummins
  • David A. Webb
  • Wencke Gwozdz
Original Research

Abstract

It is generally assumed that if a sample represents its broader population on key demographic variables, the data it yields will also be representative. Here we present evidence to suggest that this is not necessarily so when subjective wellbeing is measured from participants recruited through online panels. Using data from six countries: Australia, Germany, Sweden, the Netherlands, UK and USA, we reveal significant differences in subjective wellbeing between online panel data and nationally representative data, even though both are demographically comparable. These findings indicate that the online panels comprised an abnormally high proportion of people with low subjective wellbeing, thus rendering their data non-representative. Given the widespread use of online panels to collect data in the modern era, we issue a caveat emptor.

Keywords

Online panels Data collection Subjective wellbeing Sampling 

1 Introduction

The measurement and monitoring of subjective wellbeing (SWB) at national and international levels has gained popularity in recent years and is increasingly being used as a subjective social indicator to inform national policy (Cummins 2016; Maggino, 2016). The integrity of the data that are collected is therefore paramount.

A common problem for survey researchers is how to obtain their data. Since SWB data can only be validly supplied through self-report, this has traditionally involved mailed paper questionnaires, or more personal face-to-face or telephone interviewing. However, such methods are expensive and time-consuming. As a consequence, samples are often recruited through non-systematic or snowball sampling methods, both of which yield non-representative samples. A more acceptable alternative is now provided by the Internet.

The proliferation of Internet access, coupled with advanced survey design software, has meant that researchers with modest budgets can collect their data online. In economically developed countries such as Australia (ABS 2016), and Sweden (World Bank 2016), where about 90% of the population are internet users, large samples can be recruited for low cost and effort.

However, despite the attraction of this methodology in providing access to a large and heterogeneous participant pool, concerns of sample representativeness remain. For example, even though access to the Internet may be near universal, differences in both the amount of time spent online and the reasons for going online (ABS 2016), mean that some people are more exposed to recruitment requests involving online questionnaires than others. Moreover, respondents are able to self-select into surveys that particularly interest them, thereby creating abnormally homogeneous samples.

With these concerns in mind, the introduction of online panels offers the recruitment of demographically representative samples. In contrast to panels acquired for longitudinal studies, wherein participants agree to contribute to ongoing research commissioned by the same research team, opt-in Internet panels are typically maintained by private companies. They can be formed when people who were initially recruited into cross-sectional surveys agree to further contact by the recruiting agency, or via invitation based on their address to achieve geographic representation. Special-purpose derived online panels can then be tailor-made by commercial contractors to meet particular demographic specifications, including being nationally representative. So, are online panels the answer to social researchers’ recruiting dreams?

Unfortunately, a number of concerns attend this methodology, beginning with the problems of online panel recruitment. Potential recruits are generally offered some form of incentive for completing subsequent surveys, which raises the issue of a systematic bias in such panels due to their motivation for continuing. While their willingness to participate in an initial survey may be motivated by curiosity, a desire to help the researcher or some other intrinsic belief, their continued involvement is likely motivated, at least in part, by the tangible reward. Either way, there are reasons to suspect different motivations underlying participant engagement from people who have been randomly recruited.

The respondents also differ from new recruits chosen at random because they are involved in a continuing process of interaction with the panel organizers. In sum, online panel samples are likely to differ from random general population samples in ways that may systematically influence their responses to questionnaires. Additional features specific to online panel maintenance may also threaten data integrity.

First is the problem of dropout, caused by sample members either determinedly withdrawing their involvement in the study, or by becoming otherwise uncontactable. While attrition is practically unavoidable in any longitudinal study, it disrupts the demographic balance that is critical to any panel’s appeal. Panel managers may address this issue via a range of calibration and propensity weighting approaches (see Barber et al. 2013; Williams 2012), or by replacing lost members with demographically similar people. However, these techniques only deal with the evident statistical issue of maintaining sample sizes and demographic proportions. They neglect the possibility that there may be systematic reasons for dropout, such as being time-poor, which are not apparent based on demographic measures alone. The final sample may thus comprise a group who may appear similar on demographic characteristics, but does not represent the general population on other key features.

Second, in order to maintain the ‘integrity’ of online panels, some agencies remove members who respond to survey items in a manner that might suggest a tendency for careless or acquiescent behaviour. A number of screening techniques can be applied to detect such responding, ranging from direct indicators of inattention, like admitting a lack of effort, to statistical determination that a participant is responding atypically (DeSimone et al. 2015; Meade and Craig 2011).

One such technique, referred to as ‘Longstring’, is based on the assumption that, if participants are responding truthfully, their answers should vary such that they do not provide identical responses to too many consecutive items (DeSimone et al. 2015). Thus, participants who show no variation in their responses are dropped from the panel.

This procedure will effectively eliminate participants who affirm every item, perhaps in an attempt to complete the survey as fast as possible to receive their reward. However, this issue is specifically troublesome for psychological surveys (including wellbeing studies), which rely on consistency in responses to multi-item measures in order to achieve strong psychometric properties. Of course, some variance in responses is to be expected, but if panel members know they are at risk of removal from the panel if they respond consistently, they may consciously control their responses to reduce suspicion of acquiescence, and provide responses that vary in a random and unpredictable fashion. This, then, has the effect of compromising the reliability of both the respondent and the scale.

The third problem might be the most insidious for data integrity. There may be a predictable psychological difference between people who volunteer to join online panel studies and those who do not. Authors have noted the existence of such systematic differences between off- and online samples, especially with regard to attitudes and behaviours (Bethlehem and Stoop 2007; Couper et al. 2007). Such differences may be exaggerated when an incentive encourages participation of people with a personality trait that is relevant to the research hypothesis.

One way to detect these subtle differences is to compare online panel-derived groups on a psychological measure that has established normative values at the population level. Subjective wellbeing is one such measure.

1.1 Subjective Wellbeing

Subjective wellbeing (SWB) is frequently measured using a single item derived from Andrews and Withey (1976, p. 66) that asks, “How satisfied do you feel about your life as a whole?” This is referred to as Global Life Satisfaction (GLS). However, due to the psychometric limitations of single-item measures, many multi-item scales have been produced (for a review see Cummins and Weinberg 2015). One of these is the Personal Wellbeing Index (PWI; International Wellbeing Group 2013), which has been recommended by both the World Health Organization (WHO Regional Office for Europe 2012) and the Organization for Economic Cooperation and Development (OECD 2013), as a preferred domain-specific measure.

Of particular relevance to the current paper is that both GLS and the PWI have yielded data that have been used to establish a normal range at the population level in Australia (Cummins et al. 2013). Both measures have been included in all 33 surveys of the Australian Unity Wellbeing Index, conducted over the past 16 years (see http://www.acqol.com.au/projects#reports/auwbi). Each survey recruits 2000 new participants, stratified to represent the population based on gender and geographic distribution. All data are transformed to lie on a 0 to 100 percentage point scale.

Using the mean score from each survey as data, the mean PWI across the 33 surveys is 75.27, with a standard deviation of 0.72. This yields a normal range of 73.83–76.71 points within which the mean scores of surveys can be predicted to lie with 95% certainty. This narrow range relative to the possible range of scale scores attests to the remarkable stability of population-level SWB over time in Australia.

One explanation for such consistency is provided by the theory of subjective wellbeing homeostasis (Cummins 2016). This proposes that SWB is actively controlled and maintained by external (i.e. money, close relationships) and internal (e.g., self-esteem, perceived control) resources, termed ‘buffers’ that normally operate to restore SWB to its genetically predetermined set-point (Capic et al. 2017) in the face of challenge. Under conditions of prolonged adversity, however, the buffers may be overwhelmed, and SWB will fall beyond homeostatic control. Homeostatic defeat is indicated when the mean level of SWB drops below the normal range. At the population level, SWB scores below the normal range suggest that there are a larger proportion of individuals within the sample who are experiencing a challenge to their SWB (Cummins 2010). When this occurs, a higher than normal proportion of the sample are at risk of depression (Cummins 2010).

The stability of subjective wellbeing has also been demonstrated at the individual level, with US research revealing stability even in the face of challenging life events (Suh et al. 1996). Further, stability has been established at the population-level across countries. In cross-country comparisons, the rankings of countries by life satisfaction in the World Happiness Report 2017 (Helliwell et al. 2017) vary only by non-significant differences from one year to the next.

1.2 The Current Paper

The impetus for this paper emerged after analyzing subjective wellbeing (SWB) data from the 31st survey of the Australian Unity Wellbeing Index. While the usual modus operandi for data collection was to survey people identified at random from the telephone directory, on this occasion data were collected from participants who belonged to an online panel maintained by a commercial firm. Of the sample, 1000 participants were recruited through the usual Computer Assisted Telephone Interview (CATI), and a further 1200 through an online panel. The online-panel participants completed the survey via a link hosted on an online survey platform (described further in the Method section).

Data analysis revealed that while the CATI results were consistent with previous surveys, the online panel data were, on average, about 7–8 percentage points lower. Given the two samples were demographically similar, this discrepancy was curious, and cast doubt on the reliability of the online panel data. This finding led us to consider whether a similar difference had also been observed in other studies. So, we set about the task of locating online panel and non-panel survey data across a range of countries, with a single hypothesis: That online panel data from each country will yield a SWB mean score that is significantly lower than produced from data collected through traditional means.

2 Method

Data allowing a comparison of online panel and random recruitment were located from six countries: Australia, Germany, The Netherlands, Sweden, UK, and USA. All online panel data were collected between April and July 2014. Normative data from the USA were collected in 2007, and for all other countries between 2012 and 2014.

The first analyses, shown in Tables 1 and 2, are performed on the full available datasets from Australia. Then, to maintain consistency across surveys, only data for respondents aged 18–35 were included in the subsequent analyses. The sources of data for each country are described further below.
Table 1

Comparison of CATI and panel samples based on demographic variables

 

CATI

Online panel

N

N%

N

N%

Gender

 Male

500

50.0

585

48.6

 Female

500

50.0

605

50.3

 Prefer not to answer

  

13

1.1

 Total

1000

100.0

1203

100.0

Age

Mean Age = 56.45 (SD = 16.95)

Mean Age = 51.37 (SD = 14.87)

 18–25

62

6.2

55

4.6

 26–35

64

6.4

176

14.6

 36–45

137

13.7

183

15.2

 46–55

161

16.1

242

20.1

 56–65

214

21.4

308

25.6

 66–75

226

22.6

190

15.8

 76+

117

11.7

35

2.9

 Total

981

98.1

1189

98.8

Income/year

 Less than $15,000

46

4.6

64

5.3

 $15,000 - $30,000

149

14.9

222

18.5

 $31,000 – $60,000

195

19.5

289

24.0

 $61,000 - $100,000

168

16.8

315

26.2

 $101,000 - $150,000

138

13.8

196

16.3

 $151,000 - $250,000

89

8.9

68

5.7

 $251,000 - $500,000

33

3.3

14

1.2

 More than $500,000

5

0.5

2

0.2

 Total

823

82.3

1170

97.3

Work status

 FT employed

314

31.4

422

35.1

 FT retired

339

33.9

321

26.7

 FT volunteer

14

1.4

13

1.1

 FT home or family care

67

6.7

116

9.6

 FT study

43

4.3

30

2.5

 None of these

215

21.5

285

23.7

 Total

992

99.2

1187

98.7

Table 2

Comparison of the samples in terms of their wellbeing

 

CATI data

Online panel data

Comparison

N

M

SD

N

M

SD

t

df

Δ

Global life satisfaction

1000

78.25

15.60

1203

70.24

18.42

11.05***

2200

.51

Personal wellbeing index

952

76.32

12.50

1203

69.01

15.95

11.93***

2153

.58

Standard of living

999

79.77

15.53

1203

71.42

18.71

11.44***

2200

.54

Health

1000

74.63

19.08

1203

66.23

20.42

9.96***

2171

.44

Achieving

991

73.67

18.55

1203

66.23

20.82

8.84***

2179

.40

Relationships

985

80.59

20.03

1203

70.89

24.58

10.17***

2186

.48

Safety

993

79.99

17.12

1203

76.26

17.88

4.98***

2146

.22

Community

988

73.55

18.83

1203

67.27

19.96

7.56***

2147

.33

Future security

988

71.94

19.60

1203

64.77

21.58

7.17***

2167

.37

*** p < .001; Δ- Glass’s delta for effect size

2.1 Australia

Normative Data

The Australian normative data were obtained from the 31st survey of the Australian Unity Wellbeing Index (see http://www.acqol.com.au/projects#reports). The sample includes 1000 participants recruited via telephone, stratified according to gender and geographic location. Global Life Satisfaction (GLS) is phrased as ‘Thinking about your life and personal circumstances, how satisfied are you with your life as a whole?’ Respondents record their answer on an 11-point, end-defined scale anchored by 0 ‘not at all satisfied’ and 10 ‘completely satisfied’.

The Personal Wellbeing Index (PWI) comprises 7 items concerning different domains of life. The same response scale is used as for GLS. The data were cleaned according to the PWI manual (IWBG 2013) and the total score calculated as the average of the seven domains. Results are adjusted to the 0–100 scale by shifting the decimal point one space to the right. A total 126 participants were aged between 18 and 35.

Panel Data

The Australian online panel data were collected through a social market research agency. The survey was hosted online, and participants were sent a link to complete the questionnaire, which contained the GLS and the PWI, using the same wording and response scale as for the normative group. The online panel sample comprised 1203 Australians stratified by gender and geographic location, drawn from a larger panel pool of over 180,000 Australians. A total 231 participants were aged 18–35 and included in the second set of analyses.

2.2 Germany, Sweden, the Netherlands, and UK

Normative Data

For each European country we drew on the European Social Survey (ESS) database (ESS 2012). All ESS samples are considered demographically representative of their country due to the adoption of random probability methods of sample selection. Respondents are interviewed face-to-face and GLS is phrased ‘All things considered, how satisfied are you with your life as a whole nowadays?’(ESS 2012). Participants respond on an 11-point, end-defined scale, anchored by 0 ‘extremely dissatisfied’ and 10 ‘extremely satisfied’. Sample sizes of 679, 500, 366 and 483 participants aged 18–35 were available for Germany, Sweden, the Netherlands, and UK, respectively.

Panel Data

Online panel data for the four European countries were collected by an independent market research organization. GLS is phrased “How satisfied are you with your life in general?” with a response scale as above. The PWI was also included in these surveys, presented in the primary language of each country. The samples were stratified by gender, region, age and education, and the final sample sizes for the current study were 1336, 1352, 1429 and 1373 for Germany, Sweden, the Netherlands, and UK, respectively.

2.3 USA

Normative Data

Normative data for the USA were drawn from the Gallup World Poll (2009). These data were collected in 2007 from a nationally representative sample. Participants were interviewed face-to-face, and GLS was phrased, “All things considered, how satisfied are you with your life as a whole these days?” The response scale was anchored by 0 ‘dissatisfied’ and 10 ‘satisfied’. A sample of 1222 respondents aged 18–35 years was available for analysis.

Panel Data

The online panel data for USA were collected by an independent market research organization that recruits participants using address-based sampling. Following the same procedures as for the European countries, the sample was stratified by gender, region, age and education. The sample size was 898. The PWI was delivered in its original English version, preceded by the phrase “How satisfied are you with your life as a whole?”

2.4 Summary of Data Sources

The online panel data were collected online for all countries, while the normative data were collected face-to-face for all countries except Australia, which used CATI.

Although the phrasing for the single GLS item varies subtly between countries, both versions of the item are recommended by the OECD as appropriate for capturing subjective wellbeing, and there is no evidence that they yield different outcome scores.

2.5 Incentives

In order to maximise response rates and reduce attrition over time for follow-up purposes, online panel respondents typically receive an incentive for completing each survey. These incentives may include a small financial remuneration, the accumulation of points which can be redeemed for various goods (e.g., music downloads, airline miles etc.), sweepstakes, drawings, or instant win-games (Baker et al. 2010). In the case of the data collected for this study, US online panel respondents received financial compensation and respondents from the European countries and Australia were awarded points to be redeemed against products. In each case the value of the incentive was determined by the online panel provider and was not disclosed.

2.6 Translations

The original questionnaire was developed in English. For the Swedish, German and Dutch versions, items were professionally translated into the language of each country, and checked using the backward and forward translation convention (Douglas and Craig 1983) by the online panel provider. The translations are provided in Appendix 1. The resulting surveys were then assessed for completeness and linguistic accuracy by researchers involved in the broader project from each of the countries of interest.

3 Results

First, we present data from the Australian samples. The normative data are drawn from the 31st survey of the Australian Unity Wellbeing Index conducted by Computer Assisted Telephone Interviewing (CATI), and the online panel was recruited through a social market research agency. One thousand participants were recruited through the CATI method, and the online panel sample included 1203 respondents. Table 1 compares the samples on their demographic criteria.

Comparison of the samples using chi-square analyses revealed that the online panel group reported slightly lower annual household income on average, with a greater proportion of participants in the CATI sample earning above $150,000. The online panel sample was also slightly younger than the CATI sample, and more participants in the online panel sample reported being in full time employment while the CATI sample included more retirees. These demographic comparisons reveal that although online panel samples may be selected to represent the population on some characteristics (here, gender and geographic location), there remains some variation in the composition of the samples on other demographic factors.

Table 2 compares the scores of the two samples on the wellbeing measures.

Table 2 shows that the online panel sample recorded significantly lower scores on every wellbeing indicator than the CATI sample. Further, examination of Levene’s test for equality of variances revealed that there were also statistically significant differences between the variances of the groups on each measure. Given the heterogeneity of variance, effect sizes were calculated using Glass’s delta (Δ) which only uses the standard deviation of the control group. Effect sizes for these comparison ranged between .22 and .58. These findings suggest that the two samples are behaving quite differently in terms of their wellbeing profiles.

Given that the samples differed in terms of their age and income (see Table 1), we considered whether these may account for some differences in wellbeing between the online panel and CATI samples. There was a significant difference in wellbeing across every income group (except for those who earn above $500 K as this comparison did not achieve sufficient power with only 7 participants), and across each age group except for those aged over 76 (group sizes differed largely). When age and income were entered as covariates into the model to predict wellbeing, there remained a significant difference in wellbeing scores for the online panel and CATI samples (original model: F[1, 2153] = 134.562, p < .000; covariate model: F[1, 1946] = 69.544, p < .000). It is apparent that the inclusion of age and income as covariates did account for some of the variance, reducing the effect size of the sample from η2 = .059 to η2 = .035.

3.1 Comparisons within Other Countries

In seeking to replicate these findings within other countries, we were able to obtain suitable GLS and PWI data for five additional countries: Germany, Sweden, the Netherlands, the UK, and the US. Since some countries only had data available for participants aged 18–35, and since the Australian samples differed in the age distribution of their participants (see Table 1), the age range for subsequent analyses was restricted to 18–35. Table 3 compares the age and gender distributions of the samples in each country.
Table 3

Age and gender comparisons across data sources for each country (18–35 only)

 

Normative data

Online panel data

Australia

N = 126

N = 231

 Gender (% female)

45.24

59.31

 Age, M (SD)

26.13 (5.85)

28.91 (4.40)

 Year of data collection

2014

2014

Germany

N = 679

N = 1336

 Gender (% female)

47.94

50.71

 Age, M (SD)

26.30 (5.21)

26.04 (5.64)

 Year of data collection

2012

2014

Netherlands

N = 366

N = 1429

 Gender (% female)

59.02

50.66

 Age, M (SD)

27.40 (4.92)

25.75 (5.63)

 Year of data collection

2012

2014

Sweden

N = 500

N = 1352

 Gender (% female)

47.11

53.25

 Age, M (SD)

26.19 (5.23)

26.94 (5.47)

 Year of data collection

2012

2014

UK

N = 483

N = 1373

 Gender (% female)

60.04

50.33

 Age, M (SD)

27.22 (5.18)

25.98 (5.66)

 Year of data collection

2012

2014

USA

N = 1222

N = 898

 Gender (% female)

n/a

60.58

 Age, M (SD)

n/a

26.59 (4.76)

 Year of data collection

2007

2014

There was a significant difference in the gender distribution across the normative and online panel samples in Australia (x2 = 6.50, p < .05), the Netherlands (x2 = 8.15, p < .05), Sweden (x2 = 5.36, p < .05) and the UK (x2 = 13.53, p < .001). Since scores between males and females do not differ significantly in terms of their wellbeing (e.g., Australian normative data: Females Mean SWB = 74.44, SD = 13.36; Males Mean SWB = 73.80, SD = 13.90; t(120) = −.256, ns; Australian online panel data: Females Mean SWB = 67.70, SD = 15.45; Males Mean SWB = 65.29, SD = 14.28, t(229) = −1.199, ns), the non-equal gender distribution in the current sample should not affect group comparisons. The normative samples for Australia and Sweden were slightly younger than their online panel peers (t(355) = 5.06, p < .001 and t(1850) = 2.65, p < .01 respectively), while the normative samples for the Netherlands and the UK were slightly older than their online panel peers (t(1793) = 5.13, p < .001 and t(1854) = 4.23, p < .001). Although some of these differences reached statistical significance they translated to a maximum difference of less than 4 years between samples, and there is no reason to suspect that this level of difference would have any meaningful effect on wellbeing scores.

Table 4 provides descriptive statistics for both GLS and PWI results. Considering first the 6 normative GLS results, using each survey mean score as data, the grand mean is 75.63 with a standard deviation of 3.52. By contrast, the six GLS online panel values have a mean of 68.33 with a standard deviation of 2.99. These two groups of mean scores are significantly different, t(10) = 3.534, p < .001. These differences are confirmed by t-test values within each country, which are also all significant at p < .001.
Table 4

Comparison of online and normative data for well-being across countries

Country

Global life satisfaction

Personal Wellbeing Index

Normative data

Online panel

Comparison

Online Panel

Mean (SD)

Mean (SD)

t

d

Mean (SD)

Australia

74.0 (17.0)

67.4 (18.5)

16.15***

.37

66.7 (15.0)

Germany

76.4 (18.6)

68.3 (19.0)

15.50***

.43

65.0 (18.7)

Netherlands

78.3 (14.5)

72.9 (12.2)

26.05***

.40

70.7 (12.3)

Sweden

76.7 (17.0)

68.2 (19.4)

37.48***

.47

64.7 (19.7)

UK

71.7 (20.6)

63.7 (22.6)

32.40***

.37

59.4 (22.6)

USA

78.5 (17.9)

69.5 (20.2)

45.77***

.47

65.7 (20.0)

***p < .001

Using the PWI results to perform the same set of analyses, the six mean scores for panel PWI have a mean of 65.37 and a standard deviation of 3.65. These do not differ from the online panel GLS, t(10) = 1.414, ns. However, the comparison of normative GLS with the online panel PWI shows the normative GLS to be higher, t(10) = 4.524, p < .01.

In summary, these results from six different countries are highly consistent. Data obtained from online panels are, on average, about 7 to 10 percentage points lower than from non-panel samples. This difference applies equally to online panel data gathered through the single item GLS and the multi-item PWI.

The reliability of these findings is supported in that the normative mean scores are within the normative range for survey mean scores established for Australia, as 73.83–76.71 points (Cummins et al. 2013). They are also within the normative range for the USA and European countries as 70–80 points (Cummins 1995, 1998). The online panel mean scores, on the other hand, lie consistently below these normative ranges.

4 Discussion

The findings of this study reveal that data obtained from incentivized online panels yield scores on subjective wellbeing (SWB) that are more than 7 points lower than normative data on a 100 point scale. This finding is statistically significant across 6 different countries.

Researchers would typically look to demographic differences between the samples, such as age or income status, to try to explain such discrepancies. However, the samples in this study were specifically selected to have comparable demographic compositions, which means that there remains some unexplained difference between online panel-recruited and other-recruited samples that is consistent across the 6 countries.

Since the samples do not differ in terms of the way they look, we propose that they differ in terms of how the participants feel. That is, the groups differ in their psychological composition. Of particular interest are online panel scores below 70 on the PWI. Such levels are typical of groups who have a higher than normal proportion of people experiencing compromised levels of wellbeing, and are considered to be ‘at-risk’ or vulnerable groups (Tomyn et al. 2015). It is especially notable that 5/6 of the online panel samples evidenced a PWI mean < 70 points, with the exception of The Netherlands (70.7 points). All of these online panel mean scores are at least eight percentage points lower than their matching non-panel means (Table 4).

An implication of these findings may be derived from the theory of subjective wellbeing homeostasis introduced earlier in this paper. Such low levels of SWB are indicative of frequent homeostatic failure within each online panel and, consequently, a higher than normal incidence of depression (Cummins 2010).

Explanations for the lower SWB of online panel recruits have been considered, but found wanting. For example, the samples differ due to the incentivization offered to the online panel members. While such incentives may encourage participants to respond faster so as to complete the survey and receive their reward as efficiently as possible, it is not clear why this would produce systematically lower levels of measured SWB.

A second possible explanation may be linked to sabotage by bots (Prince et al. 2012). This is a clever strategy employed by computer coders who can effectively ‘hack’ an online panel by writing code that will automatically respond to surveys for incentives. Where online panels do not have the appropriate protection against such security violations, bots can be devised to respond en masse to surveys, with all the incentives directed toward the coder’s bank account. Again, however, it is not clear why such contamination would produce a systematic reduction in measured SWB.

It is important to acknowledge that the different modes of data collection (online vs face-to-face vs CATI) may account for some of the variance between the groups in this study. With mixed-mode methods of data collection becoming more frequent, researchers have suggested that different modes may compromise data integrity and if so, could contribute to the findings reported here.

Unsurprisingly, self-administered modes of data collection (such as online) perform better than face-to-face or administered surveys when working with sensitive information (De Leeuw 2005). So, it could be argued that the online panel data provide a more accurate measure of the population than has previously been obtained, as respondents may be more likely to disclose their true feelings with the anonymity afforded by an online survey. However, sensitive information refers to items that are intrusive or that may yield socially undesirable responses (Tourangeau and Yan 2007). Typically, this refers to questions regarding drug use, sexual behaviour, or voting preferences, which are markedly distinct from the questions posed in the present study. The questions in this study relate to wellbeing and life satisfaction, and do not constitute the level of sensitivity that would typically yield discordant results.

In support of this view, a recent study (Shawver et al. 2016) compared results on the WHOQOL-BREF for participants who completed the scale via four different modes: MTurk, Craigslist.org, college students who completed an online survey hosted by SurveyMonkey.com, and face-to-face. These authors demonstrated internal consistency and valid psychometric properties of the WHOQOL-BREF (WHOQOL Group 1998) for each mode, showing that the wellbeing scales worked as intended across varying data collection modes.

Few other studies have specifically explored the effect of data collection mode on the measurement of wellbeing, and those that have tend to call for further investigation into the matter. One study found more negative reports on the self-administered questionnaire compared to a telephone or face-to-face administration using a measure of psychological wellbeing, but effect sizes were not reported (Pruchno and Hayden 2000). An earlier study considered loneliness and subjective wellbeing, and found that differences were magnified in multivariate analyses (De Leeuw et al. 1996). It is thus plausible that the differing mode of data collection contributed to the findings reported here, though further research is required to disentangle the true sources of variance.

Future studies should recognise the potential for different survey collection modes to introduce an alternate source of variance, and work through ways to account for such error within the analyses. One way to offset the concerns regarding data quality in future mixed-mode data collection studies is to include a special subset of questions, presented in a more private mode, to ensure greater self-disclosure and less potential for social desirability bias (De Leeuw 2005).

A further systematic source of difference between the samples is the time difference between the collection of the normative and online panel data. However, given the stability of subjective wellbeing over time that is well established in Australia and the European countries (OECD 2017), this difference of approximately 2 years is unlikely to affect analyses. A possible exception is provided by evidence of a general decline in subjective wellbeing in the USA (e.g., OECD 2017, Blanchflower & Oswald 2004). The USA normative data were collected about 7 years before the online panel data. However, the magnitude of this difference, as reported by OECD (2017) is from 73 points to 69 points, which is a considerably lower magnitude of difference than that revealed through our analyses. It seems unlikely that the general decline in SWB in the USA can account for our results.

In summary, our findings are statistically and theoretically consistent with the suggestion that online panels comprise a select group of people who exhibit lower subjective wellbeing than their non-panel counterparts. If this explanation holds, then there is danger in the use of data from online panels to inform strategy and policy across scientific, industry and government agendas. Such data are not reflective of the general population. It is concluded that online panel data come with a caveat emptor for unwary researchers or policy makers.

Notes

Compliance with Ethical Standards

Conflict of Interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  1. Andrews, F. M., & Withey, S. B. (1976). Social indicators of well-being: American's perceptions of life quality. New York: Plenum Press.CrossRefGoogle Scholar
  2. Australian Bureau of Statistics (2016). Household use of information technology, Australia, 2014–2015, Category no. 8146.0. Canberra, Government of Australia.Google Scholar
  3. Baker, R., Blumberg, S. J., Brick, J. M., Couper, M. P., Courtright, M., Dennis, J. M., Dillman, D., Frankel, M. R., Garland, P., Groves, R. M., Kennedy, C., Krosnick, J., Lavrakas, P. J., Lee, S., Link, M., Piekarski, L., Rao, K., Thomas, R. K., & Zahs, D. (2010). Research synthesis: AAPOR report on online panels. Public Opinion Quarterly, 74(4), 711–781.CrossRefGoogle Scholar
  4. Barber, T., Chilvers, D., & Kaul, S. (2013). Moving an established survey online - or not? International Journal of Market Research, 55(2), 2–11.CrossRefGoogle Scholar
  5. Bethlehem, J., & Stoop, I. (2007). Online panels - a paradigm theft. In The challenges of a changing world. Proceedings of the fifth international conference of the Association for Survey Computing, September, 113–137.Google Scholar
  6. Blanchflower, D. G., & Oswald, A. J. (2004). Well-being over time in Britain and the USA. Journal of Public Economics, 88, 1359–1386.  https://doi.org/10.1016/S0047-2727(02)00168-8.
  7. Capic, T., Li, N., & Cummins, R. A. (2017). Set-points for subjective wellbeing: a replication and extension. Social Indicators Research, (in press).Google Scholar
  8. Couper, M. P., Kapteyn, A., Schonlau, M., & Winter, J. (2007). Noncoverage and nonresponse in an Internet survey. Social Science Research, 36, 131–148.CrossRefGoogle Scholar
  9. Cummins, R. A. (1995). On the trail of the gold standard for life satisfaction. Social Indicators Research, 35(2), 179–200.  https://doi.org/10.1007/BF01079026.CrossRefGoogle Scholar
  10. Cummins, R. A. (1998). The second approximation to an international standard of life satisfaction. Social Indicators Research, 43(3), 307–334.  https://doi.org/10.1023/A:1006831107052.CrossRefGoogle Scholar
  11. Cummins, R. A. (2010). Subjective wellbeing, homeostatically protected mood and depression: a synthesis. Journal of Happiness Studies, 11, 1–17.  https://doi.org/10.1007/s10902-009-9167-0.CrossRefGoogle Scholar
  12. Cummins, R. A. (2016). The theory of subjective wellbeing homeostasis: A contribution to understanding life quality. In F. Maggino (Ed.), A life devoted to quality of life – Festschrift in honor of Alex C. Michalos (Vol. 60, pp. 61–79). Dordrecht: Springer.CrossRefGoogle Scholar
  13. Cummins, R. A., & Weinberg, M. K. (2015). Multi-item measurement of subjective wellbeing: Subjective approaches. In W. Glatzer, L. Camfield, V. Møller, & M. Rojas (Eds.), Global handbook of quality of life: exploration of well-being of nations and continents (pp. 239–268). Dordrecht: Springer.CrossRefGoogle Scholar
  14. Cummins, R. A., Woerner, J., Weinberg, M., Collard, J., Hartley-Clark, L., & Horfiniak, K. (2013). Australian Unity Wellbeing Index: Report 30.0 - The wellbeing of Australians: Social media, personal achievement, and work. Melbourne: Australian Centre on Quality of Life, School of Psychology, Deakin http://www.acqol.com.au/uploads/surveys/survey-030-report-part-a.pdf.
  15. De Leeuw, E. D. (2005). To mix or not to mix data collection modes in surveys. Journal of Official Statistics, 21, 233–255.Google Scholar
  16. De Leeuw, E. D., Mellenbergh, G. J., & Hox, J. J. (1996). The influence of data collection method on structural models: a comparison of a mail, a telephone, and a face-to-face survey. Sociology Methods Research, 24, 442–472.CrossRefGoogle Scholar
  17. DeSimone, J. A., Harms, P. D., & DeSimone, A. J. (2015). Best practice recommendations for data screening. Journal of Organizational Behavior, 36, 171–181.CrossRefGoogle Scholar
  18. Douglas, S. P., & Craig, C. S. (1983). International Marketing Research. Englewood Cliffs: Prentice-Hall.Google Scholar
  19. European Social Survey. (2012). ESS round 6 (2012) project instructions (PAPI). London: Centre for Comparative Social Surveys, City University London http://www.europeansocialsurvey.org/docs/round6/fieldwork/source/ESS6_source_project_instructions.pdf. Last accessed 13 April 2015.Google Scholar
  20. Gallup. (2009). Gallup World Poll. Washington, DC: Gallup.Google Scholar
  21. Helliwell, J., Layard, R., & Sachs, J. (2017). World Happiness Report 2017. New York: Sustainable Development Solutions Network.Google Scholar
  22. International Wellbeing Group. (2013). Personal Wellbeing Index Manual. Melbourne: Deakin University http://www.acqol.com.au/instruments.Google Scholar
  23. Maggino, F. (2016) Challenges, needs and risks in defining wellbeing indicators. In F. Maggino (Ed.), A life devoted to quality of life. Social indicators research series, vol 60. Cham: Springer.  https://doi.org/10.1007/978-3-319-20568-7_13
  24. Meade, A.W., & Craig, S.B. (2011). Identifying careless responses in survey data. Paper presented at the 26th Annual Meeting of Society for Industrial and Organizational Psychology, Chicago, IL.Google Scholar
  25. OECD. (2013). OECD guidelines on measuring subjective well-being. Paris: OECD Publishing.  https://doi.org/10.1787/9789264191655-en.Google Scholar
  26. OECD. (2017). How’s life? 2017: measuring well-being. Paris: OECD Publishing.  https://doi.org/10.1787/how_life-2017-en.CrossRefGoogle Scholar
  27. Prince, K. R., Litovsky, A. R., & Friedman-Wheeler, D. G. (2012). Internet-mediated research: beware of bots. The Behavior Therapist, 35, 85–88.Google Scholar
  28. Pruchno, R. A., & Hayden, J. M. (2000). Interview modality: effects on costs and data quality in a sample of older women. Journal of Aging & Health, 12, 3–24.CrossRefGoogle Scholar
  29. Shawver, Z., Griffith, J. D., Adams, L. T., Evans, J. V., Benchoff, B., & Sargent, R. (2016). An examination of the WHOQOL-BREF using four popular data collection methods. Computers in Human Behavior, 55, 446–454.CrossRefGoogle Scholar
  30. Suh, E., Diener, E., & Fujita, F. (1996). Events and subjective well-being: only recent events matter. Journal of Personality and Social Psychology, 70, 1091–1102.CrossRefGoogle Scholar
  31. The World Bank (2016). Accessed online January 25 2017 via https://data.worldbank.org/indicator/it.net.user.zs
  32. Tomyn, A. J., Weinberg, M. K., & Cummins, R. A. (2015). Intervention efficacy among 'at-risk' adolescents: a test of subjective wellbeing homeostasis theory. Social Indicators Research, 120, 883–895.  https://doi.org/10.1007/s11205-014-0619-5
  33. Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133, 859–883.CrossRefGoogle Scholar
  34. WHO Regional Office for Europe. (2012). Measurement of and target-setting for well-being Second meeting of the expert group, Paris, 25–26 June 2012. Copenhagen: WHO Regional Office for Europe.Google Scholar
  35. WHOQOL Group. (1998). Development of the World Health Organization WHOQOL-BREF Quality of Life Assessment. Psychological Medicine, 28, 551–558.CrossRefGoogle Scholar
  36. Williams, J. (2012). Survey methods in an age of austerity: driving value in survey design. International Journal of Market Research, 54(1), 35–48.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Psychology, Faculty of HealthDeakin UniversityMelbourneAustralia
  2. 2.University of Western AustraliaCrawleyAustralia
  3. 3.Copenhagen Business SchoolFrederiksbergDenmark
  4. 4.Justus-Liebig-Universität GießenGießenGermany

Personalised recommendations