Journal of Risk and Uncertainty

, Volume 44, Issue 3, pp 261–293

Allais for all: Revisiting the paradox in a large representative sample

Open AccessArticle

DOI: 10.1007/s11166-012-9142-8

Cite this article as:
Huck, S. & Müller, W. J Risk Uncertain (2012) 44: 261. doi:10.1007/s11166-012-9142-8

Abstract

We administer the Allais paradox questions to both a representative sample of the Dutch population and to student subjects. Three treatments are implemented: one with the original high hypothetical payoffs, one with low hypothetical payoffs and a third with low real payoffs. Our key findings are: (i) violations in the non-lab sample are systematic and a large bulk of violations is likely to stem from non-familiarity with large payoffs, (ii) we can identify groups of the general population that have much higher than average violation rates; this concerns mainly the lowly educated and unemployed, and (iii) the relative treatment differences in the population at large are accurately predicted by the lab sample, but violation rates in all lab treatments are about 15 percentage points lower than in the corresponding non-lab treatments.

Keywords

Expected utility theory Allais paradox Common consequence effect Field experiments Representative sample 

JEL Classification

C93 D81 

This paper presents evidence on the consistency of risk preferences with expected utility theory in a representative population sample. We find that consistency increases with task familiarity and is linked to several personal characteristics such as education, income and asset holdings. Moreover, we investigate the external validity of a laboratory experiment with a student population that implemented the same choice problems as our household panel study. We find that, in line with studies on other biases, deviations from rationality observed in the lab provide a lower bound for deviations in the population at large.

Recently, several studies have made significant progress in understanding risk preferences in populations, making use of innovative survey methods and field experiments (Harrison and List 2004) including game shows with large stakes (Post et al. 2008; Andersen et al. 2008). From the perspective of these studies, the present paper takes one step back by focussing on consistency of risk preferences with expected utility theory in a representative subject pool—well over 1,400 members of the CentER Panel, a representative sample of the Dutch population. We do this by falling back on the oldest consistency test of all—the Allais paradox (Allais 1953). Our results help to understand the reliability and robustness of investigations into the actual distribution of risk preferences in populations.

Our research strategy is threefold. First, we implement three different treatments in the main experiment with the panel. We analyze the original Allais question with payoffs of millions of Euros that, just as when Allais asked Savage, were purely hypothetical. In our second treatment we scaled the payments down but kept them hypothetical. Our third treatment used the same downscaled payoffs but paid them out for real. This enables us to examine to what extent violations are driven by lack of monetary incentives, on the one hand, and non-familiarity with large sums of money on the other.

Second, we are able to exploit the wide range of background information that is available for our subjects in order to study the roots of violations.1 Which personal characteristics are correlated with violations? Are violations a matter of insufficient education or limited experience with financial decision making? Can we identify ‘problem groups’ that are, perhaps, more likely to suffer (in particular late in life) from erroneous financial decision making?

Third, we conduct a laboratory experiment with the usual laboratory subject population (students) employing the same design that we used in the panel experiment. Thus, we are able to examine the external validity of a laboratory experiment in a clear and detailed manner. In particular, we can compare whether and how a lab study can tell us something about the population at large.

Pursuing our threefold research strategy we are, thus, able to present very detailed and comprehensive evidence on the Allais paradox. Our results are useful for several practical issues: (1) Our results point to a number of conditions that make standard theoretical predictions more likely to hold, (2) Our results identify certain parts of the population that, due to inconsistencies, may have difficulties in making sound financial decisions, and (3) Our results contribute to a better understanding of what can be reliably learned from laboratory experiments.

Along the first dimension of our research strategy we find that violations in the original paradox are likely to be driven by very high payoffs with which, in real life, virtually nobody has any practical experience. Violations in the original Allais problem are twice as high as in both downscaled versions. This effect has been observed before with student samples (Conlisk 1989); we show that the pattern extends to the general population and across socioeconomic characteristics. Perhaps this result is not surprising as it simply stresses that economic theory can be expected to work much better in environments with which agents have experience and are, thus, well-adapted. On the other hand, we find no substantial difference between the two downscaled versions. Whether subjects are incentivized or not, violations are much lower in both cases.2

Along the second dimension, we are able to identify a whole array of personal characteristics that correlate with inconsistent decision making. Education, occupation, income and asset holdings do all correlate with inconsistent decision making and in each case the direction of effects is as one would guess. The better educated are more consistent and so are those in employment, those who earn more and those who hold financial assets.

Finally, our methodological contribution reveals that the laboratory results are rather useful in predicting behavior in a general population. First, the relative treatment differences are precisely the same for both populations, panel and lab. Second, as demonstrated in a number of other studies (see Gächter et al. (2008) for a survey) the violations of standard theory observed in the lab provide a lower bound for violations observed in the population at large.3

The remainder of the paper is organized as follows. In Section 1, we describe the main characteristics of the CentERpanel and introduce the experimental design. In Section 2 we present our results obtained with the panel. We first give a quick overview of the results and then present a more detailed analysis, based on regression results, that also accounts for the effect of sociodemographic characteristics. In Section 3 we introduce our lab results and compare them to those obtained in the panel. Section 4 concludes.

1 Design and data collection

We administer the original “Allais questions,” which consist of two pairwise lottery choices. Consider the following two choice problems. First, a subject is asked to choose between lotteries A and A ∗  where
$$ A= \rm{Certainty\; of \; €\; 1\; Million} \rm{\quad and\quad }A^{\ast }=\left \{ \begin{array}{l} \rm{ 1/100\; Chance\; of\; €\;0} \\ \rm{89/100\; Chance \;of\; €\; 1\; Million} \\ \rm{10/100\; Chance \;of\; €\; 5 \;Million} \end{array} \right. $$
Second, a subject is asked to choose between lotteries B and B ∗  where
$$ \begin{array}{rll} B&=&\left \{ \begin{array}{l} \rm{89/100 \;Chance \;of €\; 0} \\ \rm{11/100 \;Chance \;of\; €\; 1 Million} \\ \qquad \end{array} \right. \rm{\quad and\quad }\\ B^{\ast }&=&\left \{ \begin{array}{l} \rm{90/100 \;Chance\; of \;€\; 0} \\ \qquad \\ \rm{10/100 \;Chance\; of\; €\; 5 \;Million} \end{array} \right. \end{array} $$
Of the four possible answers AB, A ∗ B ∗ , AB ∗ , and A ∗ B only the first two are consistent with expected utility theory (henceforth, EUT) whereas the last two are not.4 Many laboratory experiments have shown that violations of EUT are frequent and that a larger share of subjects violating EUT chooses AB ∗  instead of A ∗ B.5
We have six simple treatments using a between-subjects design. To introduce these treatments, consider the following lotteries over three outcomes of monetary payoffs with probabilities as above, i.e., A = (0,1,0), A ∗  = (.01,.89,.10), B = (.89,.11,0), B ∗  = (.90,0,.10). Our three treatments were then as follows:
  • Treatment HighHyp: Original Allais questions with high hypothetical payoffs of € 0, € 1 million,and € 5 million.

  • Treatment LowHyp: Allais questions with low hypothetical payoffs of € 0, € 5, and € 25.

  • Treatment LowReal: Allais questions with low real payoffs of € 0, € 5, and € 25.

Note that the amounts of money we use in these treatments are the same as in Conlisk (1989) with the sole difference that he used dollars instead of euros. For all three treatments we had two sub treatments reversing the order of decisions. As we do not find any order effects in the data we pool the data throughout.

We collected data from a representative sample of the Dutch population. The experiments were conducted by CentERdata—an institute for applied economic and survey research for the social sciences—that is affiliated with Tilburg University in the Netherlands. CentERdata carries out its survey research mainly by using its own panel called CentERpanel. This panel is Internet based and consists of some 2000 households in the Netherlands which form a representative sample of the Dutch population.6 One of the advantages of the CentERpanel is that the researcher has access to background information for each panel member such as demographic and financial data. Every weekend, the panel members complete a questionnaire on the Internet from their home.

After logging on to our experiment, panel members were randomly assigned to one of the six different treatments introduced above. After being informed about the nature of the experiment, subjects decided whether or not to participate—as common with many modules of the panel. For participating subjects, the next screen introduced an example of a pair of lotteries (which were referred to as “Options”). Subjects were told that their task would be to express preference for one of the two lotteries and, additionally, how the preferred lottery would be executed.7 When subjects indicated that they were ready to start the experiment, they were, in two consecutive screens, presented with their two Allais questions. Only after answering both Allais questions, the two preferred lotteries were played out (by the computer) and subjects were informed about the outcome of their two preferred lotteries. In the treatments with real monetary payments, subjects were paid according to the outcomes in both of their preferred lotteries.8

In total 1676 members of the CentERpanel logged on to our experiment. Of the subjects logging on, 1426 (85.1%) subjects decided to participate in our experiment while 250 (14.9%) subjects decided not to participate. Table 1 shows descriptive statistics of our sample. The column labeled “Participation” in Table 1 shows descriptive statistics of participating subjects in each of the three main treatments as well as statistics of subjects who chose not to participate in the experiment. The data in Table 1 is grouped according to gender, age, education, occupation and income. (The column labeled “Violation” shows statistics for participating subjects violating or not violating EUT, respectively, which we will analyze further below. It also contains tests on the role of socioeconomic characteristics for EUT violation which will also be discussed later.)
Table 1

Descriptive statistics of the samples

Category

Participation

Violation

YES

NO

NO

YES

p-value, χ2

HighHyp

LowHyp

LowReal

Gender

Female

48.9

46.5

47.0

49.2

46.2

50.0

0.180

Age

Age 16–24

6.2

8.0

6.9

2.8

6.5

8.4

0.156

Age 25–34

24.4

19.2

19.7

8.0

20.6

21.5

 

Age 35–44

18.7

19.4

16.6

15.6

19.8

14.4

 

Age 45–54

17.5

19.2

23.3

24.8

20.2

20.1

 

Age 55–64

18.2

16.2

17.2

20.4

16.3

19.0

 

Age 65+

15.0

18.2

16.4

28.4

16.6

16.7

 

Education

Primary education

5.0

6.0

5.7

7.6

5.0

7.1

0.008

Lower secondary education

28.5

26.6

25.2

32.0

25.0

30.2

 

Higher secondary education

12.0

13.4

13.7

11.6

13.9

11.4

 

Intermediate vocational training

20.0

20.2

20.2

18.8

19.7

21.3

 

Higher vocational training

24.2

23.2

23.5

21.2

23.9

22.9

 

University degree

10.2

10.6

11.6

8.8

12.6

7.1

 

Occupation

Employed (contract)

54.4

49.1

52.3

41.6

54.7

45.2

0.003

Freelance or self-employed

3.7

3.2

3.8

4.8

4.0

2.5

 

Unemployed

1.5

2.4

1.9

1.2

1.5

3.0

 

Student

5.5

6.6

6.7

2.0

5.9

7.3

 

Works in own household

10.7

13.0

12.0

15.6

11.4

13.2

 

Retired

17.2

17.8

16.8

28.8

16.5

19.0

 

Other occupation

7.0

8.0

6.5

6.0

6.0

9.8

 

Household

HH gross income ≤ € 2250

22.7

26.0

27.9

28.8

22.7

32.6

<0.001

Income

HH gross income € 2251–€ 3130

28.4

21.8

23.5

22.0

24.0

24.9

 

HH gross income € 3131–€ 4350

25.2

24.4

25.8

24.0

26.8

21.2

 

HH gross income ≥ € 4351

23.7

28.0

22.9

25.2

26.5

21.2

 

Assets

Holds assets

18.4

16.2

15.5

12.0

18.0

13.2

0.025

Savings acct.

Has savings account

59.4

53.5

50.6

50.0

52.4

57.8

0.062

Maximum no. of observations

401

501

524

250

988

438

 

Notes: HighHyp stands for high hypothetical payoffs, LowHyp for low hypothetical payoffs, and LowReal for low real payoffs. Except for the number of observations, numbers indicate column percentages. Since some of the members of the CentERpanel did not complete the Dutch Household Survey, some observations are missing. The column labeled “p-value, χ2” shows p-levels of χ2 tests for differences between proportions of violating and non-violating subjects in the category listed in column 1

Concentrating on descriptive statistics for participating subjects in Table 1, we note that by and large most variables are relatively identically distributed across treatments. However, in some of the age and income brackets as well as in the category savings account, there is some more variation. A comparison of the descriptive statistics in the columns describing participating subjects with those of non-participating shows that there are no big differences except for the age categories. Basically, older people appear to be a little more reluctant to participate.

Since this causes concern about sample selection problems, we ran for all regressions reported below Heckman (1976) selection models using the variable “Ratio” as one of the exclusion variables. The variable “Ratio” measures the proportion of questionnaires completed by panel members in the three months proceeding our experiment. This variable can be assumed to affect the participation decision but not the decisions taken in the experiment. For none of the regressions we found evidence for a selection bias.9

2 Results

2.1 Descriptives

A summary of the experimental results is given in Table 2. The table shows both the absolute frequency of choices (left part) and the relative frequency of choices (right part). As mentioned in the introduction, we will concentrate our analysis on the incidence of subjects’ EUT violation in all treatments. However, we will also shortly answer the question whether violations, once they occur, are systematic.
Table 2

Summary of experimental results in the panel

Treatment

Absolute frequency of choices

Relative frequency of choices

AB

A ∗ B ∗ 

AB ∗ 

A ∗ B

Σ

AB

A ∗ B ∗ 

AB ∗ 

A ∗ B

Violations

HighHyp

82

121

136

62

401

20.4

30.2

33.9

15.5

49.4

LowHyp

22

373

77

29

501

4.4

74.4

15.4

5.8

21.2

LowReal

22

368

97

37

524

4.2

70.2

18.5

7.1

25.6

Σ

126

862

310

128

1426

8.8

60.4

21.7

9.0

30.7

Note: HighHyp stands for high hypothetical payoffs, LowHyp for low hypothetical payoffs, and LowReal for low real payoffs

Violation of EUT

Note that the right-most column in Table 2 indicates that violations of EUT are observed in all treatments. In fact, we observe 49.5%, 19.6% and 25.6% violations of EUT in treatments HighHyp, LowHyp, and LowReal, respectively. Furthermore, in all treatments we observe that the fraction of EUT-violating AB ∗  answers is higher than the fraction of EUT-violating A ∗ B answers. The Z-statistic proposed in Conlisk (1989) indicates that the first fraction is significantly higher than the latter fraction at p < 0.001 in all treatments. An interesting question we can answer with our data is whether the differences we report here for the aggregate data are “general” in the sense of applying across socioeconomic attributes or whether they are driven by only some of those attributes. The answer is provided in Tables 5, 6, 7 and 8 in Appendix B, which are structured as Table 1 and provide—for all data and for the three treatments separately—the relative frequency of choices for subjects with various socioeconomic attributes. We observe that EUT violations occur across all socioeconomic attributes and that the “Allais” pattern of more AB ∗  violations than A ∗ B violations is significant for most socioeconomic attributes in all treatments (see the column labeled “Sign. of Conlisk’s Z-statistic” in Tables 58 in Appendix 6). We conclude that, as in earlier studies, violations of EUT are observed and that they are systematic in the sense that AB ∗  is chosen more often than A ∗ B, mostly independent of socioeconomic background characteristics. To facilitate comparison, note that Conlisk (1989) using a student sample for his “Basic Version” (which is comparable to our treatment HighHyp) reports the following relative frequencies of AB, A ∗ B ∗ , AB ∗ , and A ∗ B choices: 7.6%, 41.9%, 43.6%, and 6.8%. Thus, he observes EUT violation in 50.4% of the cases which compares to 45.5% in our panel treatment HighHyp.

The effect of high versus small hypothetical payoffs

Next consider the effect of high versus small hypothetical payoffs on the extent of EUT violation. For this purpose we compare the rates of EUT violations in treatments HighHyp and LowHyp. Table 2 shows that the rate of EUT violations drops from 49.4% in treatment HighHyp to 19.6% in treatment LowHyp. The D-statistic proposed in Conlisk (1989) indicates that this difference is highly significant at p < 0.0001 (D = 9.115). Inspecting the relative frequencies of choices in Table 2 shows that moving from HighHyp to LowHyp sharply increases the fraction of choices consistent with expected value maximization (A ∗ B ∗ ) at the expense of all other three possible responses. In particular, many more subjects prefer the payoff-maximizing choice A ∗  over A when (hypothetical) payoffs become small. A possible explanation of this result is due to the fact that subjects in treatment LowHyp can be expected to be more familiar with the lower amounts of money leading them to make fewer mistakes.10 Again, with our data we can check whether the result regarding the effect of varying the (hypothetical) stake size just shown for the aggregate data also applies when the data is broken down to various socioeconomic characteristics. Column 3 labeled “Significance of Conlisk’s D-statistic HighHyp vs LowHyp” in Table 9 in Appendix B shows that the answer to this question is, with a few exceptions, yes.

The effect of (small) real versus (small) hypothetical payoffs

Finally, consider the effect of (small) real versus (small) hypothetical payoffs on the extent of EUT violation. To analyze this, compare the rates of EUT violation in treatments LowHyp and LowReal. Table 2 shows that the rate of EUT violations is 19.6% in LowHyp whereas it is 25.6% in treatment LowReal. Thus, we see a slight increase in the share of EUT violations when we move from (small) hypothetical to (small) real payoffs. The D-statistic in Conlisk (1989) indicates that this difference is significant (D = − 1.6716, p = 0.047). In contrast, Harrison (1994) and Burke et al. (1996) report that the use of low real instead of low hypothetical payoffs reduces the extent of EUT violation. For a broader overview on how incentives affect behavior in decisions under risk, see Camerer (1995, p. 634f). Note that the result regarding the switch from (small) hypothetical to (small) real payoffs on the extent of EUT violation is usually not significant when one zooms in on socioeconomic characteristics, as shown in column 4 labeled “Significance of Conlisk’s D-statistic LowHyp vs LowReal” in Table 9 in Appendix B.

Note that our results concerning the extent of EUT violation and the effect of high versus small hypothetical payoffs are not entirely new. We show, however, that they extend to a general population and across socioeconomic characteristics. This should be of interest due to the current discussion about the relationship between results obtained in the lab and those obtained in other settings (see, e.g., Levitt and List 2007).

Let us now turn to providing answers to the first of the two new and main dimensions of our research strategy by inspecting the role of socioeconomic background variables in subjects’ behavioral responses to the Allais questions. Refer to Table 1 that under the heading “Violation” shows descriptive statistics of the subsamples violating and not violating EUT as well as p-levels of χ2 tests. (For the latter, see the notes below Table 1.) Regarding gender, Table 1 reveals that women are slightly more likely to violate EUT than men. With respect to age, Table 1 does not suggest a clear effect although we note that the age bracket’s [35–44] relative share is higher in the panel’s subpopulation not violating EUT. Regarding education levels, those with lower secondary education and those subjects with a university degree stand out somewhat in the panel. The former because they violate EUT more often and the latter because they violate EUT less often. The most noticeable effect regarding occupation is that those employed on a contractual basis have a higher relative share in the subsample not violating EUT. Finally, with respect to household income, Table 1 does not suggest a clear effect.

Moreover, refer to the rightmost column labeled “p-value, χ2” in Table 1 that shows p-levels of χ2 tests for differences between proportions of violating and non-violating subjects in the category listed in column 1.11 The χ2 tests indicate the strongest differences in violation behavior in the categories of education, occupation and household income.

2.2 Econometrics and the role of socioeconomic characteristics

To test for across-treatment differences controlling for subjects’ sociodemographic characteristics and to check whether any of these characteristics are correlated with behavior, we ran probit regressions with the variable “Violate” as the dependent variable. “Violate” is equal to 1 if a subject’s answer to the Allais questions violates EUT (i.e., answers A ∗ B or AB ∗ ), and is equal to 0 otherwise (i.e., answers AB or A ∗ B ∗ ). The background variables we include in the regression are the ones shown in Table 1 above. The results are shown in Table 3 which reports marginal effects. Regression (1) includes all data whereas regressions (2) to (4) show results for each of the three treatments separately. Recall from the end of Section 2 that we did not find evidence for a selection bias due to non-response.
Table 3

Results of probit regressions on violation of EUT

 

(1)

(2)

(3)

(4)

All data

Only HighHyp

Only LowHyp

Only LowReal

HighHyp

0.307 ∗ ∗ ∗ 

(9.22)

      

LowReal

0.058 ∗ 

(1.87)

      

Female

0.007

(0.25)

−0.074

(1.25)

0.056

(1.33)

−0.002

(0.05)

Age 25–34

−0.062

(0.77)

0.005

(0.03)

0.017

(0.15)

−0.200 ∗ 

(1.77)

Age 35–44

−0.115

(1.45)

−0.186

(1.07)

−0.027

(0.24)

−0.175

(1.46)

Age 45–54

−0.050

(0.61)

−0.082

(0.46)

−0.011

(0.09)

−0.125

(0.96)

Age 55–64

−0.072

(0.87)

−0.162

(0.89)

0.031

(0.25)

−0.152

(1.21)

Age 65+

−0.145

(1.64)

−0.157

(0.75)

−0.134

(1.21)

−0.125

(0.86)

Lower second. edu.

−0.035

(0.63)

0.177

(1.35)

−0.147 ∗ ∗ 

(2.23)

0.045

(0.51)

Higher second. edu.

−0.086

(1.50)

0.150

(1.07)

−0.152 ∗ ∗ 

(2.40)

−0.027

(0.30)

Intermed. voc. training

−0.025

(0.43)

0.172

(1.26)

−0.134 ∗ ∗ 

(1.97)

0.116

(1.19)

Higher voc. training

−0.034

(0.59)

0.288 ∗ ∗ 

(2.18)

−0.139 ∗ ∗ 

(1.98)

−0.010

(0.11)

University degree

−0.134 ∗ ∗ 

(2.21)

0.186

(1.28)

−0.188 ∗ ∗ ∗ 

(2.85)

−0.152 ∗ 

(1.69)

Employed (contract)

−0.130 ∗ ∗ 

(2.53)

−0.234 ∗ ∗ 

(2.04)

−0.127 ∗ 

(1.82)

−0.118

(1.50)

Freelance or self-empl.

−0.149 ∗ ∗ 

(2.11)

−0.435 ∗ ∗ ∗ 

(3.08)

−0.081

(0.74)

−0.025

(0.21)

Unemployed

0.065

(0.65)

−0.023

(0.09)

0.043

(0.34)

0.046

(0.30)

Student

−0.101

(1.16)

−0.156

(0.79)

−0.033

(0.26)

−0.169

(1.43)

Works in own household

−0.061

(1.10)

−0.224

(1.78)

−0.015

(0.19)

−0.068

(0.82)

Retired

0.013

(0.19)

−0.063

(0.43)

0.145

(1.45)

−0.145

(1.56)

HH gr. inc. € 2251–€ 3130

−0.077 ∗ ∗ 

(2.26)

−0.042

(0.54)

−0.047

(0.95)

−0.129 ∗ ∗ ∗ 

(2.64)

HH gr. inc. € 3131–€ 4350

−0.110 ∗ ∗ ∗ 

(3.23)

−0.167 ∗ ∗ 

(2.10)

−0.033

(0.67)

−0.138 ∗ ∗ ∗ 

(2.84)

HH gr. inc. ≥ € 4351

−0.083 ∗ ∗ 

(2.28)

−0.073

(0.88)

−0.076

(1.49)

−0.089 ∗ 

(1.69)

Assets

−0.081 ∗ ∗ 

(2.29)

−0.118

(1.59)

0.001

(0.02)

−0.112 ∗ ∗ 

(2.02)

Savings account

0.054 ∗ ∗ 

(2.05)

0.093

(1.64)

0.043

(1.13)

0.021

(0.53)

No. of observations

1424

400

500

524

Marginal effects are reported

Notes: HighHyp stands for high hypothetical payoffs, LowHyp for low hypothetical payoffs, and LowReal for low real payoffs. Absolute value of z-statistics in parentheses.  ∗ p < 0.1,  ∗ ∗ p < 0.05,  ∗ ∗ ∗ p < 0.01. Omitted categories are treatment LowHyp; age interval [16–24]; primary education; “other” occupation; household gross income smaller or equal to € 2250. Two occupation observations missing

Let us first briefly reconsider across-treatment differences. For this purpose, refer to regression (1) in Table 3 which includes all data and controls for background variables. Importantly, note that in regression (1) the omitted treatment dummy is the one for LowHyp. Inspecting the treatment coefficients, we note that the coefficient for HighHyp is positive and big (0.302) and highly statistically significant whereas the coefficient of LowReal is also positive (0.053) but rather small and only borderline significant.

To analyze the effect of socioeconomic background variables econometrically, we examine regression (1) in Table 3. We make the following observations.
  • Controlling for other characteristics, gender and age have no significant influence on the extent of EUT violation.12

  • Regarding education, we find a strong tendency for violations to be reduced with further education.13 Overall, there is a strong effect of higher education that also shows in the separate specifications for both treatments with low payoffs. In LowHyp everything that improves on primary education goes hand in hand with reduced violations. Only in HighHyp there is no effect of education. This suggests an interesting interaction effect of experience with a decision domain and education. In the absence of any experience (as in HighHyp) education on its own does little to improve performance. Only if coupled with experience education is aligned with consistency.

  • Of the various occupational affiliations listed in Table 3, we find that the unemployed and ‘others’ do much worse than the employed, self-employed and freelancers.14 This is more pronounced in treatments with hypothetical payoffs.

  • Regarding income, we notice that having a higher gross monthly household income (vis-à-vis the control group with the lowest gross monthly household income) goes along with reduced EUT violations.15 Interestingly, this is particularly pronounced in the treatment LowReal when actual money is at stake. (One could have conjectured that it would be the other way round as the marginal utility of making some money and, hence, the incentive to think a little harder might be higher for those on low incomes. Alas, it does not work this way.)

  • Finally, subjects holding assets have significantly lower EUT violations (by about 8%) whereas subjects with a savings account have significantly higher EUT violations (by about 5%). Maybe not surprisingly, subjects holding assets tend to be expected value maximizers (mainly choosing A ∗ B ∗ ) while subjects who only have a savings account display “Allais” behavior tending toward the choice of AB ∗ .16

In all a picture emerges that is reminiscent of recent studies by Benjamin et al. (2006), Burks et al. (2009) and Dohmen et al. (2010) who show that a range of behavioral biases are correlated with (or may even stem from) cognitive limitations and low IQ. We find that violations are more prevalent in those who are lowly educated, unemployed, on low income, and who have no significant asset holdings. This is, of course, particularly worrying as imprudent financial decision making and bad planning for retirement has the worst consequences in that group.

In Appendix C we complement the above analysis by running multinomial logit regressions using all four answers AB, A ∗ B ∗ , AB ∗ , and A ∗ B, and choosing the answer representing expected value maximization, A ∗ B ∗ , as the base outcome. The results (whose interpretation is less straightforward) are shown in Tables 10, 11, 12 and 13.

3 The lab experiment

As mentioned in the introduction, the third dimension of our research strategy is concerned with the external validity of laboratory experiments that are typically carried out with rather homogenous subject pools. Of course, the preceding section has shown that there are important sources of heterogeneity in the population at large that simply cannot be detected when the subject pool is restricted to students. The same is, of course, true for any highly selected convenience sample. But what about the questions we analyzed first—the effects of different treatments, the differences between high and low and real and hypothetical payoffs? Would a lab experiment give us reliable results to analyze such questions (as it has been implicitly assumed for a long time in the experimental community, perhaps negligently without much testing)? To shed more light on these issues we conducted an additional lab experiment in the laboratory of Tilburg University using Dutch speaking student subjects drawn from the normal subject pool.

The lab experiment was conducted in the same way as the experiment using the CentERpanel. That is, student subjects did the experiment using a web browser in the lab and using the same screens as the subjects in the panel. However, there were two small exceptions. First, lab subjects received a 10 Euro show-up fee. (Potential participants were informed about this in the invitation E-mail.) But of course, mirroring the panel design again, only subjects assigned to treatments with real payment had the chance to earn additional money during the experiment. This was not announced prior to the experiment. Second, lab subjects were not offered the choice of not participating in the experiment once they had reported to the lab and the experiment was started. This was done in an effort to mimic the normal procedures in lab experiments where by reporting to the lab, a subject usually confirms his or her decision to participate. Note that when we move from the panel to the lab sample, both the subject pool and the environment changes. We deliberately accepted these two simultaneous changes as our aim was to contrast the results obtained in the panel with those obtained in a normal lab experiment.17

After the experiment we asked subjects to fill in a questionnaire in which we elicited some basic background information. Naturally, the information we collected from lab subjects is very limited and cannot be compared in scope and quality to the background information available from members of CentERpanel. The lab experiments were conducted in December 2006 using 223 subjects in total.

As in the panel experiment we did not observe any order effects of presenting the Allais questions, so we present only pooled data in Table 4 which shows the same information for the lab data that Table 2 showed for the panel. We make the following observations. First, as in the panel experiments, we observe EUT violations in all treatments, although to a much lesser degree.18 This mirrors the main result in Gächter et al.’s (2008) meta-study: Violations from orthodox theoretical predictions and biases observed in the lab form a lower bound for violations and biases observed in the population at large. Second, as in the panel, moving from high hypothetical payoffs to low hypothetical payoffs reduces the extent of EUT violation significantly ( p < 0.001, D = 4.881). Third, moving from low hypothetical payoffs to low real payoffs increases the extent of EUT violation slightly but insignificantly (p < 0.226, D = − 0.7525). The similarities between the observations in the panel and in the lab are evident.
Table 4

Summary of experimental results in the lab

Treatment

Absolute frequency of choices

Relative frequency of choices

AB

A ∗ B ∗ 

AB ∗ 

A ∗ B

Σ

AB

A ∗ B ∗ 

AB ∗ 

A ∗ B

Violations

HighHyp

4

41

20

5

70

5.7

58.6

28.6

7.1

35.7

LowHyp

0

75

4

0

79

0

94.9

5.1

0

5.1

LowReal

1

67

5

1

74

1.4

90.5

6.8

1.4

8.2

Σ

5

183

29

6

223

2.2

82.1

13.0

2.7

15.7

Notes: HighHyp stands for high hypothetical payoffs, LowHyp for low hypothetical payoffs, and LowReal for low real payoffs

Figure 1 shows the shares of choices violating EUT in the two subsamples. It appears that the graph indicating the share of EUT violation in the panel can quite accurately be obtained by shifting the graph indicating the share of EUT violation in the lab upwards by about 15 percentage points.19 This means that although the share of EUT violations is consistently higher in the panel than in the lab, the comparative statics results of moving from one treatment to another could have been reliably predicted by the lab experiments.
Fig. 1

The share of choices violating EUT in the panel and the lab. Note: HighHyp stands for high hypothetical payoffs, LowHyp for low hypothetical payoffs, and LowReal for low real payoffs

4 Conclusions

Using a representative sample of the Dutch population we revisit the Allais paradox. Our main results are threefold. First, as in previous lab samples, the violations of EUT are systematic in the population at large and much lower when stakes are low. Second, there is considerable heterogeneity in the population and violations are particularly prevalent among the lowly educated, those poor in income and asset holdings, and the unemployed. Third, comparing the panel results with a laboratory experiment we find that the relative treatment differences are identical in the panel and the lab but violation rates in all lab treatments are about 15 percentage points lower than in the corresponding non-lab treatment.

Our findings appear to imply two general messages. First, laboratory experiments with convenience samples of students might be more useful to study relative effects rather than absolute levels (see also Levitt and List (2007) who make a similar point in the context of social preferences). When it comes to the absolute measurement of behavior, it appears that lab results will draw a too optimistic picture. The population at large, it turns out, is less consistent with EUT than student samples are. Second, our results suggest that the predictive power of EUT in a general population is correlated with socioeconomic characteristics. In particular, parts of the population that are more likely to experience economic hardship are less consistent.

Of course, there exists a large literature on non-expected utility theories such as Kahneman and Tversky’s (1979) prospect theory or Machina’s (1982) fanning-out theory (both of which can explain the Allais paradox) or Viscusi’s (1989) prospective reference theory which predicts the paradox. Earlier laboratory experiments (see Camerer (1995) or Starmer (2000) for surveys) have documented the Allais paradox in student samples. Our paper highlights that, if anything, these studies underestimate the true prevalence of the paradox in general populations and indicates how violations are correlated with observable characteristics.

Footnotes
1

Several other studies have also used the CentER panel as a subject pool. Let us briefly mention some of these studies. Hey (2002) and Carbone (2005) analyze more complicated and sequential individual decision making tasks and do not find any background variable systematically influencing behavior. Bellemare and Kröger (2007) study a trust game and find “that heterogeneity in behavior is characterized by several asymmetries—men, the young and elderly, and low educated individuals invest relatively less, but reward significantly more investments.” (p. 183) von Gaudecker et al. (2011a) elicit risk preferences and report that older people, women, the relatively uneducated, and those with lower income are more risk averse. For another study on individual risk attitudes using a large and representative German sample, see Dohmen et al. (2011).

 
2

For early studies of the Allais paradox see, e.g., MacCrimmon (1968), Slovic and Tversky (1974), Allais and Hagen (1979) and Kahneman and Tversky (1979). For the effect of downscaled payoffs see Conlisk (1989), Starmer and Sugden (1991), Harrison (1994), Burke et al. (1996), Fan (2002), and van de Kuilen and Wakker (2006).

 
3

Almost all of the experiments on the Allais paradox conducted so far have used students as their subjects. There are two notable exceptions. List and Haigh (2005) test the Allais paradox both with students and professional traders from the Chicago Board of Trade. They report that both students and professional traders show Allais paradox behavior, but find that traders do so to a smaller extent. Fatas et al. (2007) use students and politicians and report similar results with students being more prone to Allais paradox behavior.

 
4

To see this note that by adding 0.89u(€ 0) − 0.89u(€ 1M) to both sides of the inequality u(A) = u(€1M) > 0.01u(€0) + 0.89u( €1M) + 0.1u(€5M) = u(A ∗ ) implies u(B) = 0.89u( € 0) + 0.11u(€ 1M) > 0.9u(€ 0) + 0.1u(€5M) + 0.1u(€ 5M) = u(B ∗ ).

 
5

See, e.g., MacCrimmon (1968), Slovic and Tversky (1974), Allais and Hagen (1979), Kahneman and Tversky (1979), Conlisk (1989), Starmer and Sugden (1991), Harrison (1994), Burke et al. (1996) and Fan (2002).

 
6

For more information about the CentERpanel and the way it is administered see http://www.uvt.nl/centerdata/en/whatwedo/thecenterpanel/.

 
7

For more details see Appendix A which contains a translation of the screens used in the treatments with low payoffs. Note that the experiment was administered in Dutch.

 
8

Note the following about payments in treatment LowReal. CentERdata reimburses the telephone costs for filling in questionnaires by exchanging “CentERpoints” (1 CentERpoint = 0.01 Euro) to panel members’ private bank accounts four times a year. Although lotteries were described in Euro amounts, subjects in the treatments with real monetary earnings were informed that: “In this experiment you can earn real money that will be paid in the form of CentERpoints.”

 
9

See Eckel and Grossman (2000), Bellemare and Kröger (2007), von Gaudecker et al. (2011b) and Harrison et al. (2009) for more evidence on selection issues.

 
10

Conlisk (1989) points out that this effect is in line with (a) Machina’s (1982) fanning out model that predicts Allais behavior for large payoffs and (b) the observation that EUT converges to expected payoff maximization for small payoffs. Notice, however, that this consistency argument is not an explanation—for it leaves open why fanning occurs and is more dramatic in its consequences with high payoffs. Non-familiarity with high payoffs is such an explanation and may, in fact, be adequately captured by fanning out of indifference curves.

 
11

Note that for the multinomial categories in the leftmost column in Table 1, the χ2-tests check for the joint hypothesis that the violation rates are identical across all categories.

 
12

In light of recent findings about sharply declining numeracy skills in the (British) population above 55 (Banks 2006) this is perhaps slightly surprising.

 
13

Wald tests indicate, however, that the effects of the education levels below university degree listed in Table 3 are not statistically different.

 
14

A Wald test indicates that the effect of these two occupations is not statistically different.

 
15

Wald tests indicate that the effects of the three income variables listed in Table 3 are not statistically different. Furthermore, controlling for household size leaves the regression results reported in Table 3 virtually unchanged.

 
16

To look at the effect of holding assets or a savings account more closely, we defined the variable “only assets” which equals 1 if a subject holds assets but has no savings account (otherwise it equals 0), the variable “only savings account” which equals 1 if a subject has a savings account but holds no assets (otherwise it equals 0), and the variable “assets & savings account” which equals 1 if a subject holds assets and has a savings account (otherwise it equals 0). Hence, the reference group consists of those subjects who neither hold assets nor have a savings account. Replacing the variables “assets” and “savings account” in regression (1) in Table 3 by the new variables “only assets,” “only savings account,” and “assets & savings account,” leaves the other variables of regression (1) almost unchanged (including significance levels) and shows that while the coefficients of the variables “only assets” and “assets & savings account” are negative (− 0.073 and − 0.032) but insignificant, the coefficient of the variable “only savings account” is positive ( 0.055 ) and significant at the 5% level. So it is not only the financially savvy who hold assets who do comparatively well but also people without any savings—perhaps because, having no financial cushion, they cannot afford making many mistakes.

 
17

von Gaudecker et al. (2011b) offer an analysis of the individual effects of implementation mode and of subject pool selection in a risk preference elicitation study and find that differences in behavior are due to selection and not implementation mode.

 
18

Again, we observe that the fraction of EUT-violating AB ∗  answers is significantly higher than the fraction of EUT-violating A ∗ B answers in all lab treatments (p < 0.001, Conlisk’s (1989) Z-statistic).

 
19

The difference in the extent of EUT violation between the panel and the lab is significant for all three treatments (HighHyp: p = 0.014, D = − 2.1732; LowHyp: p < 0.001; D = − 5.2220; LowReal: p < 0.001 , D = − 4.6935).

 

Acknowledgements

We thank Marcel Das and Marika Puumala of CentERdata (Tilburg University) for their most efficient support in collecting the data. Furthermore, we thank W. Kip Viscusi, anonymous referees, Johannes Binswanger, Oliver Kirchkamp, Tobias Klein, Sabine Kröger, Gijs van de Kuilen, Imran Rasul, Jan van Ours, Stefan Trautmann, Anthony Ziegelmeyer and participants of the 3rd International Meeting on Experimental and Behavioral Economics and the IMPRS Uncertainty Summer School as well as seminar participants at Tilburg University, University of Frankfurt (Main), Humboldt University Berlin, and the University of Amsterdam for helpful comments. We gratefully acknowledge financial help from the UK’s Economic and Social Research Council via ELSE and a grant on ‘Behavioral Mechanism Design’. The second author acknowledges financial help from the Netherlands Organisation for Scientific Research (NWO) through a VIDI grant.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Copyright information

© The Author(s) 2012

Authors and Affiliations

  1. 1.Department of EconomicsUniversity College LondonLondonUK
  2. 2.WZBBerlinGermany
  3. 3.Department of Economics and VCEEUniversity of ViennaViennaAustria
  4. 4.CentER, TILEC and Tilburg UniversityTilburgNetherlands