Prediction task performance
The proportions of participants’ choices of the high-probability option across each block of 100 trials and for each cognitive load condition (see Fig. 1) were subjected to a 3 (condition) × 5 (block) mixed-model ANOVA. The main effect of learning across trial blocks was significant, F(3.105, 493.617) = 87.886, p < .001, ƞp2 = .356, BFInclusion > 100,000, and is illustrated by the upward trajectory of all group lines in Fig. 1. Neither the main effect of condition, F(2, 159) = 0.692, p = .502, ƞp2 = .009, BFInclusion = 0.153, nor the condition by block interaction, F(6.209, 493.617) = 0.518, p = .801, ƞp2 = .006, BFInclusion = 0.002, was significant; in fact, the Bayesian analysis provided evidence against both effects. The critical follow-up t test indicated that the proportion of choices of the high-probability option in the final trial block did not differ between the n-back dual-task and single-task control groups, t(109) = −0.562, p = .576, d = −0.107, 95% CI for d = [−0.479, 0.266], BF10 = 0.232. In fact, all evaluation criteria we applied for the critical test point to a failure to replicate the original result. Specifically, using frequentist estimation, we found that the focal effect was not statistically significant (and not in the same direction as in the original study) and that the 95% CI of the effect size estimate we obtained in the replication excluded both the original effect size (estimated at d = 1.09) and the small effect size that would give only 33% power to the original study (d33% = .72; see Simonsohn, 2015), suggesting that the original study could not have meaningfully examined an effect that small.Footnote 5 Using Bayesian inference, we found that the data are 4.31 times more likely to have occurred under the null than the alternative hypothesis.
Comparing the proportion of choices of the high-probability option in the final trial block between the n-back dual-task and polygon dual-task groups, we again found no significant difference, t(103) = −0.604, p = .547, d = −0.118, 95% CI for d = [−0.501, 0.265], BF10 = 0.243. Finally, for the penultimate trial block, the effect sizes were smaller for both the n-back dual-task versus single-task control comparison (d = −0.037; 95% CI [−0.409, 0.335]) and the n-back dual-task versus polygon dual-task comparison (d = −0.105; 95% CI [−0.488, 0.278]).
Figure 2 displays the full range of individual participants’ choice proportions for all trial blocks and conditions. To assess strategy selection in individual participants toward the end of learning, Wolford et al. (2004) defined maximizing as choosing the high-probability option on no less than 95% of trials in each of the last two blocks. In our study, the choice proportions of 18 participants in the polygon dual-task condition, 15 participants in the n-back dual-task condition, and 10 participants in the single-task control condition met this definition of maximizing; the association between probability maximizing and choice condition was not significant, χ2(2) = 4.413, p = .110, BF10 = 0.344. Defining maximizing as choosing the high-probability option on no less than 95% of trials in the final block (e.g., Newell & Rakow, 2007), we found that 22 participants in the polygon dual-task condition, 17 participants in the n-back dual-task condition, and 18 participants in the single-task control condition were classified as maximizers, χ2(2) = 2.064, p = .356, BF10 = 0.121.
Secondary task performance
Table 1 shows participants’ performance on the secondary tasks for each block and each of the two dual-task conditions. Specifically, the table summarizes participants’ mean proportion of correctly recalled digits in each block of the verbal working-memory (n-back) task and the mean proportion of correct judgments about whether the polygon shown was the same as or different than the one shown in the previous trial for each block of the visual-spatial (polygon) task. Across all trials, participants were significantly more accurate on the n-back task (M = .849) than on the polygon task (M = .651), t(103) = 7.823, p < .001, d = 1.528, 95% CI for d = [1.089, 1.960], BF10 > 100,000. There was no correlation between accuracy on the secondary tasks and the proportion of choices of the high-probability option averaged across the last two blocks, r(103) = .054, p = .582, BF10 = 0.142.
The analyses reported above directly reproduce the experimental protocol of the original study conducted by Wolford et al. (2004) and utilize data from all participants we tested. These analyses failed to replicate an effect of cognitive load on strategy selection in repeated binary choice. We went to considerable lengths to conduct a well-powered, close replication of the original study; nevertheless, slight methodological differences between the original study and our replication may have contributed to our failure to replicate the original result. In this section, we therefore report supplementary analyses aimed at establishing the robustness of our conclusions. These analyses are purely exploratory, and we had no theory-based, a priori reason to anticipate systematic effects.
Figure 2 shows that some participants strictly maximized probability from the very first block of 100 trials and continued to do so throughout the entire experiment; that is, they never once selected the low-probability event. As we would have expected at least some period of probability learning before the adoption of a maximizing strategy, this finding might indicate that some participants had prior knowledge about the purpose and design of the task. Moreover, our final sample size slightly overshot the planned sample size, resulting in a somewhat unbalanced design. Addressing these issues, we reran all data analyses, restricting the sample to the planned 150 participants (n = 50 per condition, 75 per location), excluding participants who maximized in the first block (at least 95% choices of the high-probability option; N = 148), or applying both of these criteria (N = 136). None of these restrictions changed the conclusions reported above. That is, the ordinal patterns of choices across conditions remained largely the same (or collapsed virtually on top of each other), significant/insignificant p values remained either significant or insignificant, and, where BFs indicated positive evidence in favor of/against the presence of an effect, this continued to be the case (except for the chi-squared test assessing the association between probability maximizing, as defined in Wolford et al., 2004, and choice condition, for which previously inconclusive Bayesian evidence in favor of the null hypothesis became more substantial when we restricted the analysis to the 150 participants originally planned).
Finally, we included experiment location (Australia or Canada) as an additional factor in the mixed-model ANOVA on participants’ choices of the high-probability option, allowing us to explore potential differences between nationalities. This analysis revealed a significant main effect of location, F(1, 156) = 6.209, p = .014, ƞp2 = .038, BFInclusion = 3.994, and a location by condition interaction, F(2, 156) = 6.178, p = .003, ƞp2 = .073, BFInclusion = 4.716. Neither the main effect of condition nor any of the other interactions including the condition factor reached statistical significance (all ps ≥ .296; all BFInclusion ≤ 1.079). Moreover, analyzing the data obtained from each location separately,Footnote 6 we found that neither of the two data sets provided evidence for the effect we attempted to replicate. When we focused on Australian participants only, neither the main effect of condition, F(2, 72) = 1.148, p = .323, ƞp2 = .031, BFInclusion = 0.326, nor the condition by block interaction, F(5.796, 208.670) = 0.707, p = .639, ƞp2 = .019, BFInclusion = 0.027, was significant, and choice proportions in the final trial block did not differ between the n-back dual-task and single-task control group, t(48) = 1.748, p = .087, d = 0.494, 95% CI for d = [−0.071, 1.055], BF10 = 0.973, although the ordinal pattern of choice proportions approximated that of the original study (n-back dual-task > single-task control ≈ polygon dual-task). By contrast, for the Canadian sample, the main effect of condition in the mixed-model ANOVA on participants’ choices of the high-probability option was significant, F(2, 84) = 7.941, p < .001, ƞp2 = .159, BFInclusion = 30.854, but participants in the n-back dual-task condition maximized significantly less during the final trial block than participants in the single-task control condition, t(59) = −2.615, p = .011, d = −0.670, 95% CI for d = [−1.185, −0.151], BF10 = 4.266, or participants in the polygon dual-task condition, t(53) = −2.732, p = .009, d = −0.738, 95% CI for d = [−1.282, −0.187], BF10 = 5.408. The condition by block interaction was not significant, F(6.130, 257.463) = 1.078, p = .376, ƞp2 = .025, BFInclusion = 0.165.