Asymmetric coupling of action and outcome valence in active and observational feedback learning

Peterburs, Jutta; Frieling, Alena; Bellebaum, Christian

doi:10.1007/s00426-020-01340-1

Asymmetric coupling of action and outcome valence in active and observational feedback learning

Original Article
Open access
Published: 22 April 2020

Volume 85, pages 1553–1566, (2021)
Cite this article

Download PDF

You have full access to this open access article

Psychological Research Aims and scope Submit manuscript

Asymmetric coupling of action and outcome valence in active and observational feedback learning

Download PDF

Jutta Peterburs¹,
Alena Frieling¹ &
Christian Bellebaum¹

1857 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

Learning to execute a response to obtain a reward or to inhibit a response to avoid punishment is much easier than learning the reverse, which has been referred to as “Pavlovian” biases. Despite a growing body of research into similarities and differences between active and observational learning, it is as yet unclear if Pavlovian learning biases are specific for active task performance, i.e., learning from feedback provided for one’s own actions, or if they persist also when learning by observing another person’s actions and subsequent outcomes. The present study, therefore, investigated the influence of action and outcome valence in active and observational feedback learning. Healthy adult volunteers completed a go/nogo task that decoupled outcome valence (win/loss) and action (execution/inhibition) either actively or by observing a virtual co-player’s responses and subsequent feedback. Moreover, in a more naturalistic follow-up experiment, pairs of subjects were tested with the same task, with one subject as active learner and the other as observational learner. The results revealed Pavlovian learning biases both in active and in observational learning, with learning of go responses facilitated in the context of reward obtainment, and learning of nogo responses facilitated in the context of loss avoidance. Although the neural correlates of active and observational feedback learning have been shown to differ to some extent, these findings suggest similar mechanisms to underlie both types of learning with respect to the influence of Pavlovian biases. Moreover, performance levels and result patterns were similar in those observational learners who had observed a virtual co-player and those who had completed the task together with an active learner, suggesting that inclusion of a virtual co-player in a computerized task provides an effective manipulation of agency.

Intention to learn modulates the impact of reward and punishment on sequence learning

Article Open access 01 June 2020

The impact of social anxiety on feedback-based go and nogo learning

Article Open access 01 February 2021

The influence of associative reward learning on motor inhibition

Article Open access 17 February 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The ability to adjust behavior based on action consequences is critical in dynamic or novel environments. For instance, the familiar phrase “once bitten, twice shy” refers to the reluctance to repeat an action that has previously led to an unpleasant experience. The Law of Effect put forward by Edward Thorndike states that responses which produce a satisfying or pleasing effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect or fail to elicit pleasure become less likely to occur again in that situation (Thorndike, 1927). But not all contingencies between actions (or decisions) and their outcomes are learned equally! Guitart-Masip et al. (2011) devised a task that decoupled action and outcome valence, the orthogonalized go/nogo task. Adding to previous evidence for a particular coupling between reward and go responses and between punishment and no-go responses (Gray & MacNaughton, 2003), they found that in this task, learning to execute a response to obtain a reward (go to win) or to inhibit a response to avoid punishment (nogo to avoid losing) was easier than learning the reverse (Guitart-Masip, Huys et al., 2012a, b), which has been referred to as “Pavlovian” biases. Interestingly, learning success depended on concerted recruitment of bilateral inferior frontal cortex in addition to midbrain regions belonging to the “reward system”, possibly indicating that brain regions implicated in response inhibition are needed to overcome Pavlovian control (Guitart-Masip, Huys et al., 2012a, b). In line with this, midfrontal theta power as an electrophysiological index of prefrontal control has been directly linked to the ability to overcome Pavlovian biases (Cavanagh, Eisenberg, Guitart-Masip, Huys, & Frank, 2013)

More recent research has highlighted the importance of the specific context in which learning occurs. Millner et al. (2018) showed that an aversive context can facilitate action depending on whether the aversive stimulus is present or impending. In their study, Pavlovian processes interfered with feedback-based learning by promoting action to escape when an ongoing aversive auditory stimulus was present, and by promoting behavioral inhibition (i.e., withholding of responses) when the same aversive stimulus could be avoided.

Findings on the role of the neurotransmitters dopamine for learning and representing action–outcome contingencies further corroborate the notion that the factors action and outcome valence interact. Results of a functional imaging study in subjects highly trained in the orthogonalized go/nogo task suggested that levodopa enhanced striatal and substantia nigra/ventral tegmental representations of actions associated with obtaining a reward, while neither representations of actions associated with avoiding punishment nor neural responses to reward as such were enhanced, thus underlining the role of dopamine for appetitively motivated behavior (Guitart-Masip, Chowdhury et al., 2012a, b).

For learning to occur it is not necessary to perform an action and bear the consequences oneself. In everyday life, we often observe other individuals’ actions and ensuing consequences. For instance, we may watch someone operate a ticket machine at a train station and obtain a ticket before repeating the observed actions ourselves to buy our own tickets. On the other hand, if we observed that a ticket was not obtained or that the machine returned too little change due to a malfunction or inappropriate use, we would not be likely to act in the same way. This simple example illustrates that Thorndike’s Law of Effect applies to both active and observational learning. However, previous findings are somewhat inconsistent with regard to whether the two learning types are similarly effective, and it is as yet unclear if the above described Pavlovian learning biases also apply to observational learning. Bellebaum et al. (e.g., 2010, 2012) reported that learning from positive or negative feedback in a probabilistic learning task was similarly effective in an active and observational context. In contrast, Nicolle, Symmonds, and Dolan (2011) found observational learning to be associated with impaired accuracy when choosing between two low-value options, which was related to (subjective) over-estimation of the likelihood of winning in case of the lowest-value stimulus, i.e., an optimistic bias. Aside from learning, decision making has also been reported to differ between active subjects and observers. While both groups made risky choices beyond pure rationality, actors were riskier than observers (Fernandez-Duque & Wifall, 2007).

It has been proposed that active and observational learning may differ in attentional allocation during learning (Cohn et al., 1994), or in the nature of the involved knowledge representation, with observational learning possibly requiring more explicit, declarative representations, and active learning relying more on procedural and non-declarative representations (Kelly et al., 2003). Related to the latter, another key difference may lie within a reduced necessity to integrate own actions and outcome-related information in the observational context. This notion is supported by electrophysiological studies showing that variations of agency (here: own vs. observed choices) modulate aspects of action outcome processing that are related to the integration of action and outcome information. For example, the magnitude of the feedback-related negativity (FRN), an event-related potential component that has been related to outcome processing and coding of reward prediction errors (Gehring & Willoughby, 2002; Holroyd & Coles, 2002; Miltner et al., 1997; Nieuwenhuis et al., 2004), has been shown to be reduced in observational compared to active learning (Bellebaum et al., 2010; Bellebaum & Colosio, 2014; Fukushima & Hiraki, 2009; Koban et al., 2012; Kobza et al., 2011; Yu & Zhou, 2006).

In general, previous research, therefore, points to at least partially distinct mechanisms underlying active and observational learning from feedback. This notion is corroborated by clinical studies in patients with Parkinson’s Disease (PD) in whom degeneration of dopaminergic neurons in the substantia nigra results in reduced dopaminergic input to the striatum (Kish et al., 1988). In the OFF medication stage, these patients exhibit a bias towards learning from negative feedback, likely due to facilitated disinhibition of striatal “nogo” neurons in response to negative feedback which then hampers action selection in the frontal cortex (Frank et al., 2004; Frank, 2005). Interestingly, this bias was not found for learning by observation, suggesting that dopaminergic input to the striatum may play a less prominent role in observational than in active learning (Kobza et al., 2012). It has been proposed that the integration of information about (own) actions and their outcomes takes place in the dorsal striatum, where prediction errors have been shown to be more strongly represented in active than observational learning (Bellebaum et al., 2012) and in instrumental than in classical conditioning (O'Doherty et al., 2004; Valentin & O'Doherty, 2009).

If active and observational learning indeed differ with respect to striatal involvement and the coding of action–outcome contingencies, it is conceivable that the asymmetric coupling of action and outcome valence is attenuated in observational learning. In line with this, recent findings suggest that ventral striatal involvement in processing monetary feedback gradually decreases from own actions, a friend’s actions, to a stranger’s actions (Morelli et al., 2018). On the other hand, the dorsal striatum has been suggested to play a key role in linking instrumental actions and outcomes during both active and observational learning (Cooper et al., 2012), which might entail a similar Pavlovian bias in active and observational learning.

The present study was aimed to clarify in a series of behavioral experiments with an orthogonalized go/nogo task if action and outcome valence also interact in observational outcome-based learning, and, if so, whether the Pavlovian bias is similarly pronounced in active and observational learning. In Experiment 1, one group of healthy adult subjects completed the task as active learners, while participants in a second group were observational learners yoked to the active subjects. Importantly, the task was fully computerized so that for observers, the active subject’s responses were presented on the computer screen and marked by a picture of a hand. Experiment 2 was conducted to see to what extent the observers’ performance depended on the response pattern of the observed subject and thus possibly reflected mere imitation of the responses they had watched. To this end, healthy adults observed a virtual active learner’s chance performance in the orthogonalized go/nogo task. Experiment 3, which entailed two groups of subjects again, active and observational learners, used a more naturalistic setting: pairs of subjects completed the task simultaneously, with one subject as active and the other as observational learner. In general, it was hypothesized that a Pavlovian bias would also occur in observational learning, with enhanced learning of go (vs. nogo) to win and nogo (vs. go) to avoid associations. Consistent with reduced striatal involvement in observational learning, however, the coupling of action and outcome valence was expected to be less strong in observational as compared to active learning.

Experiment 1

Subjects

Forty adult volunteers (33 females, 7 males) were recruited for participation at Heinrich-Heine-University Düsseldorf, Germany, by public advertisement and/or on social media. All had normal or corrected-to-normal vision. Mean age was 22.7 years (SD = 3.8; age range 18–37 years). None of the subjects had any history of neurological or psychiatric illnesses or was currently treated with neurotropic medication. All subjects were naïve to the study’s intent. IQ estimates were obtained with a multiple choice vocabulary test (Mehrfachwahl-Wortschatz-Intelligenztest B, MWT-B Lehrl et al., 1995), a German test to measure crystallized intelligence in which subjects are presented with 37 items in each of which one real German word has to be correctly identified among 4 non-words. Points are awarded for each correct answer, and total test scores are translated into IQ estimates by means of norm tables. IQ estimates obtained with the MWT-B have been shown to correlate reasonably well with global IQ scores (Lehrl et al., 1995). Mean IQ was 113.57 (SD = 10.25) in the present sample. Written informed consent was obtained from all participants prior to participation. Subjects received course credit for participation. The study conforms to the Declaration of Helsinki and received ethical clearance by the Ethics Board of the Faculty of Mathematics and Natural Sciences at Heinrich-Heine-University Düsseldorf, Germany.

Experimental task

The experimental task was a variant of a go/nogo task specifically designed to decouple outcome valence and action (Guitart-Masip et al., 2011). In this game-like task, participants can choose between different behavioral options in order to receive or avoid losing points. Four combinations of action and outcome valence were balanced throughout the task: go to win points, go to avoid losing points, nogo to win points, and nogo to avoid losing points. Four abstract fractal images (Mathôt et al., 2015; obtained from https://github.com/smathot/materials_for_P0010.5) were used as imperative stimuli and randomly assigned to these combinations at the beginning of each test session. Separate subsamples of N = 20 subjects completed the task as active learners or observational learners, with each observational learner yoked to one actively learning subject. In order to allow for a comparable assessment of learning performance in both active and observational learners, the task comprised not only four (active or observational) learning blocks with feedback, but also four test blocks without feedback which required active responding by both active and observer participants. The types of blocks alternated, beginning with a learning block. Individual learning performance for both groups of participants was assessed based on test block performance (see below).

Figure 1 illustrates the time course and sequence of stimulus presentation in trials in the learning block. In the task version for active learners (Fig. 1a), each trial started with a fractal image which was presented for 1000 ms, followed by a fixation cross for 250–2000 ms. Afterwards, an open circle was presented on the left or right side of the screen for 1500 ms. Subjects were instructed to decide between responding and not responding, and in case of responding to press the response button (left or right STRG key on a standard USB keyboard) corresponding to the side the circle had been presented on (e.g., left button for circle on left side). Responses were required to occur within 1000 ms of stimulus onset. In case the participants chose not to respond they had to let the response period pass. If they accidentally pressed the wrong button on the opposite side of the circle, they were explicitly informed about this and the trial was aborted. Following presentation of the circle cue, a fixation cross was displayed for 750–1000 ms, before symbolic feedback about the choice (response/no response) was provided. An upward pointing arrow indicated that 10 points had been gained (win), a downward pointing arrow indicated that 10 points had been lost (loss), and a horizontal bar indicated that no points had been gained or lost (draw). Throughout the task participants could learn which fractal stimulus was associated with which kind of outcome (win/draw/loss) for which kind of choice (go or nogo). For two stimuli, the “good” outcome was to avoid losing points (draw) and the alternative was a loss of points. For two others, a win was the favorable outcome and a draw the non-favorable outcome. For one stimulus per outcome combination, the good outcome could be obtained with a go or a nogo choice, respectively. Correct choices led to the more favorable outcome in 80% of the trials, while the non-favorable outcome was received in the other 20% of the trials.

Sequence and time course of stimulus presentation in the task version for observational learners were as similar to the active version as possible (Fig. 1b). As mentioned above, observational learners completed the task in a yoked design in which an observing subject was shown a previous “active” subject’s choices. Participants were explicitly informed about this and advised that the previous subject’s choices were illustrated on the screen by a hand that was displayed hovering over the circle if the previous subject had responded. All subjects were asked to pay close attention to the observed choices and ensuing feedback.

For both active and observational learners, each learning block was followed by a test block in which no feedback was provided for the subject’s decision to respond or not respond; otherwise test trials were identical to active participants’ learning trials. Importantly, both active and observational learners were explicitly instructed to decide between responding and not responding on each trial, and to optimize their performance based on the feedback that had been provided in the learning blocks.

In total, the task comprised four learning and four test blocks with 40 trials (10 per combination) each. Trial order was randomized within each block. Subjects could take short breaks between blocks and were informed about their current score at the end of each block. In order to keep the subjects motivated and to prevent negative scores especially early on in the task, the starting score was set to 400 points. Observing subjects were instructed that they would receive both the points won by the active subject as displayed after each learning block, and the points they themselves won in the test trials. Of note, subjects were also informed that their final scores would not translate to a financial reward after testing because they would receive standardized course credit for participation. Task completion took approximately 50 min. Stimulus presentation and timing was controlled by Presentation software (Version 17.2, Neurobehavioral Systems, Inc., Berkeley, CA, USA).

Procedure

Subjects were informed that the study investigated active and observational outcome-based learning. After written informed consent had been obtained, demographic information was collected and participants completed the MWT-B. Subsequently, subjects were seated in front of a computer screen at a viewing distance of approximately 50 cm. Before the experimental task was started, on-screen instructions and five learning and five test trials for practice were presented to the subjects. Of note, these practice trials contained colored geometric shapes instead of fractal images as imperative stimuli, as they were intended to familiarize the subjects with sequence and time course of stimulus presentation in the task without inducing learning just yet. The entire test session took approximately 60 min.

Statistical analyses

Mean IQ estimates were compared between active and observational learners by means of an independent samples t test. This was done to ensure that potential effects of learning condition could not be attributed to group differences in intellectual abilities. In order to check for outliers with regard to task performance, accuracy rates (i.e., the percentages of correct responses in test blocks) according to action (go/nogo) and outcome valence (win/loss) were checked for subjects with scores that were more than 2 standard deviations (SDs) below (or above) the sample mean in more than two conditions. No outliers were identified, so all data from all subject could be used for analysis.

Accuracy rates were then analyzed with a repeated-measures analysis of variance (ANOVA) with the between-subjects factor learning condition (active/observational) and the within-subjects factors block (1–4), action (go/nogo), and outcome valence (win/loss). Greenhouse–Geisser correction was applied when the assumption of sphericity was violated. Significant main effects of block were resolved by means of linear trend analysis. Interactions were resolved by subordinate ANOVAs or post-hoc paired-sample t tests where appropriate. Bonferroni correction was applied to account for multiple testing when necessary.

In case the ANOVA yielded no significant main effects or interactions of the factor learning condition, we planned to perform complementary Bayesian hypothesis testing in order to confirm that this factor did not improve the predictive adequacy of the statistical model. To this end, a Bayesian repeated-measures ANOVA with the between-subject factor learning condition (active/observational) and the within-subjects factors block (1–4), action (go/nogo), and outcome valence (win/loss) was performed using JASP (Version 0.9.2; JASP Team, 2017; Wagenmakers, Love et al., 2018a; Wagenmakers, Marsman et al., 2018b). In the Bayesian ANOVA, the null model was compared against all other statistical models, i.e., models containing the main effects for the factors learning condition, action, outcome valence, and block as well as models containing any combination of these effects or respective interaction effects. Bayes factors (BFs) for each model were computed as the ratio of the predictive adequacy (i.e., the change from prior to posterior odds based on the present data) of each statistical model and the null model. Thus, the higher the BF, the more the evidence is in favor of the respective statistical model (Wagenmakers, Love et al., 2018a; Wagenmakers, Marsman et al., 2018b). BFs were classified as suggested by Lee and Wagenmakers (2014) (adapted from Jeffreys, 1998; see also (Wagenmakers, Marsman et al., 2018b), with values between 1 and 3 indicating anecdotal, values between 3 and 10 indicating moderate, values between 10 and 30 indicating strong, values between 30 and 100 indicating very strong, and values larger than 100 indicating extreme evidence for a specific model against the null model. The priors were set to p(m) = 0.006 for all 167 conceivable models, thus reflecting a uniform distribution of prior model probabilities. Since the complex 4 × 2 × 2 × 2 design of the present study resulted in a very large number of models, we applied Bayesian model averaging in order to quantify how much the data supported the inclusion of each effect. This procedure yields the change from prior to posterior odds (BF_Inclusion) for each effect, taking into account each candidate models’ conclusions (Wagenmakers, Marsman et al., 2018b).

Results

Mean IQ scores did not differ between active learners (mean = 112.29, SD = 7.94) and yoked observers (mean = 114.78, SD = 12.15; p = 0.482).

Standard repeated-measures analysis of variance

Figure 2 shows mean performance accuracy according to action and outcome valence collapsed across blocks for active learners and yoked observers. The respective means according to block are provided as supplementary material. The ANOVA yielded a significant main effect of block (F_{[2, 87]} = 7.838, p < 0.001, ƞ_p² = 0.171). Linear trend analysis revealed that accuracy rates increased linearly across blocks (F_{[1, 38]} = 11.466, p = 0.002, ƞ_p² = 0.232). The main effect of action was also significant (F_{[1, 38]} = 26.172, p = 0.002, ƞ_p² = 0.232), with better performance on go (mean = 76.31% ± 2.69) compared to nogo trials (mean = 52.06% ± 3.89). These effects were further qualified by a significant block by action interaction (F_{[2, 93]} = 3.052, p = 0.042, ƞ_p² = 0.074). To resolve this interaction, separate univariate ANOVAs with the within-subjects factor block (1–4) were performed for go and nogo trials. For go trials, the main effect of block was significant (F_{[3, 117]} = 2.841, p = 0.042, ƞ_p² = 0.067) and did not reflect a linear (p = 0.221) but a cubic trend in accuracy rates across blocks (F_{[1, 39]} = 7.185, p = 0.011, ƞ_p² = 0.156). For nogo trials, the main effect of block was also significant (F_{[2, 91]} = 6.797, p < 0.001, ƞ_p² = 0.148) and reflected a linear increase in accuracy across blocks (F_{[1, 39]} = 11.683, p = 0.001, ƞ_p² = 0.231).

Furthermore, the action by outcome valence interaction was significant (F_{[1, 38]} = 37.396, p < 0.001, ƞ_p² = 0.496). Post-hoc paired-sample t tests revealed that performance accuracy was higher for go to win (mean = 88.81% ± 2.60) than for go to avoid losing (mean = 63.8% ± 4.01; t₃₉ = 6.012, p < 0.001), and for nogo to avoid losing (mean = 59.88% ± 4.88) than for nogo to win (mean = 44.25% ± 4.59; t₃₉ = -3.051, p = 0.004), thus confirming the asymmetric coupling of action and outcome valence.

Last, the block by outcome valence by learning condition interaction was significant (F_[2,91] = 5.450, p = 0.003, ƞ_p² = 0.125). To resolve this interaction subordinate ANOVAs with the within-subjects factors block (1–4) and outcome valence (win/loss) were performed separately for active and observational learners. For active learners, the main effect of block (F_[2,39] = 3.024, p = 0.059, ƞ_p² = 0.137 and the block by outcome valence interaction (F_{[3, 57]} = 2.597, p = 0.061, ƞ_p² = 0.120) merely approached significance. For yoked observational learners, the analysis yielded a significant main effect of block (F_[3,57] = 5.684, p = 0.002, ƞ_p² = 0.230), reflecting a linear increase in performance (F_[1,39] = 7.176, p = 0.015, ƞ_p² = 0.274), and a significant block by outcome valence interaction (F_[3,57] = 2.869, p = 0.044, ƞ_p² = 0.131). In order to resolve the interaction, separate univariate ANOVAs were performed for win and loss trials. The main effect of block was significant for win trials (F_[2,43] = 7.460, p = 0.001, ƞ_p² = 0.282), reflecting a linear increase in accuracy across blocks (F_[1,19] = 14.037, p < 0.001, ƞ_p² = 0.425), but not for loss trials (p = 0.127). These results indicate that a linear increase in accuracy over the course of the task was more pronounced in observational learners, particularly for win trials.

All other effects failed to reach significance (all p > 0.163).

Bayesian repeated-measures analysis of variance

Table 1 shows the results of the Bayesian analysis of effects. Note that this analysis averaged across all models that contained a specific factor (Bayesian model averaging): while the prior inclusion probability for a specific factor (P(incl)) is the summed prior probability of all models that include this factor, the posterior inclusion probability of a specific factor (P(incl|data)) is the summed posterior probability of all models that include this factor. The change from prior to posterior inclusion odds is expressed as BF_Inclusion. The results show that the data strongly supported the inclusion of the main effects for the factors action and outcome valence, as well as the action by outcome valence interaction. Effects involving the factor learning condition received very weak support (all BFs_Inclusion < 1), as did all remaining effects.

Table 1 Results of the analysis of effects for data from active learners and yoked observers (Experiment 1)

Full size table

Discussion

In Experiment 1, individual subjects performed the orthogonalized go/nogo task either as active learners or based on observing a previous subject’s responses and subsequent feedback (yoked design) on the computer screen. In accordance with previous findings (Guitart-Masip, Economides et al., 2014a, b), results revealed a linear increase in performance irrespective of learning condition, indicating that active learners and observers were able to learn the stimulus-(non)response-outcome associations. Moreover, learning performance was generally better for go relative to nogo trials, which has also been observed in different variants of go/nogo tasks, including the orthogonalized version (Guitart-Masip et al., 2011; Guitart-Masip, Economides et al., 2014a, b; Ocklenburg et al., 2017), and may reflect generally increased task difficulty when response inhibition is required or could result from a general propensity to respond in experimental tasks.

Crucially, prior studies (Guitart-Masip et al., 2011; Guitart-Masip, Economides et al., 2014a, b) have yielded robust evidence for an asymmetric coupling of action and outcome valence in feedback-based learning. Learning to execute a response to obtain a reward (go to win) or to inhibit a response to avoid punishment (nogo to avoid losing) was easier than learning to inhibit a response to obtain a reward (nogo to win) or learning to execute a response to avoid losing (go to avoid) (Guitart-Masip, Duzel et al., 2014a, b). This result pattern was interpreted to reflect a conflict between Pavlovian control of behavior, which promotes active approach when rewards are anticipated, and inhibition or withdrawal when punishment is anticipated (Gray & MacNaughton, 2003), and the more flexible instrumental control that is driven by outcome valence. The present results replicate these findings.

Learning condition did not affect overall performance, which is consistent with findings from studies that applied other probabilistic learning tasks (Bellebaum et al., 2012; Bellebaum & Colosio, 2014; Rak et al., 2013). Importantly, the interaction of action and outcome valence was found to be comparable in active and observational learning, indicating that Pavlovian biases affected both learning types alike. Learning condition did, however, interact with outcome valence as a function of block: the linear increase in performance accuracy was more pronounced in observational learners, and particularly in the context of wins. However, Bayesian analysis of effects confirmed inclusion of the main effects for the factors action and outcome valence, and of the action by outcome valence interaction in the model, while providing very weak support for inclusion of the factors learning condition or block.

Taken together, the results from Experiment 1 appear to suggest that active and observational learning are similarly affected by Pavlovian biases, thus adding to evidence for similarities between processing of personal and vicarious rewards (Morelli et al., 2015). However, the possibility that observers merely imitated the responses of the active subjects rather than actually learned the stimulus-(non)response-outcome contingencies cannot be excluded. Therefore, a follow-up experiment (Experiment 2) was performed in which observational learners were presented with chance performance in order to test whether (a) their performance accuracy would still increase over the course of the task, thus reflecting true observational learning, and (b) a Pavlovian bias would still persist when chance performance was observed.