A novel paradigm for observational learning in rats

The ability to learn by observing the behavior of others is energy efficient and brings high survival value, making it an important learning tool that has been documented in a myriad of species in the animal kingdom. In the laboratory, rodents have proven useful models for studying different forms of observational learning, however, the most robust learning paradigms typically rely on aversive stimuli, like foot shocks, to drive the social acquisition of fear. Non-fear-based tasks have also been used but they rarely succeed in having observer animals perform a new behavior de novo. Consequently, little known regarding the cellular mechanisms supporting non-aversive types of learning, such as visuomotor skill acquisition. To address this we developed a reward-based observational learning paradigm in adult rats, in which observer animals learn to tap lit spheres in a specific sequence by watching skilled demonstrators, with successful trials leading to rewarding intracranial stimulation in both observers and performers. Following three days of observation and a 24-hour delay, observer animals outperformed control animals on several metrics of task performance and efficiency, with a subset of observers demonstrating correct performance immediately when tested. This paradigm thus introduces a novel tool to investigate the neural circuits supporting observational learning and memory for visuomotor behavior, a phenomenon about which little is understood, particularly in rodents.


Introduction
Throughout life, animals learn new associations and skills through direct experience.However, learning through firsthand interaction demands energy, involves trial, error, and uncertainty, and it can in some cases be dangerous or lifethreatening.Observational learning, on the other hand, permits individuals to gain knowledge by interacting with or (Bruchey et al. 2010;Jeon et al. 2010;Twining et al. 2017;Allsop et al. 2018).These and other paradigms often use food restriction, fear or other negative experiences as motivators since they provide strong survival incentives to rapidly acquire conditioned-unconditioned stimulus (CS-US) relationships.Though effective, these approaches also bring limitations.With food deprivation, demonstrators eventually become sated while performing a task, which restricts the number of trials they perform before losing motivation.Fear-based paradigms, though highly effective, rely chiefly on circuits within the amygdala, limbic system and other structures related to affective empathy (Jeon et al. 2010;Kim et al. 2012;Meyza et al. 2017;Allsop et al. 2018;Smith et al. 2021), and are ill-suited for investigating mechanisms of other types of learning, like perceptual-motor translations underlying observational skill learning.Thus, additional approaches are warranted to understand how different forms of observational learning occur in the brain in differing contexts and states.
Here, we present a novel sensory-motor observational learning paradigm that relies neither on food deprivation nor punishment, but instead uses rewarding stimulation to the medial forebrain bundle (MFB) of both demonstrator and observer animals as reinforcement.The task was designed as a form of instrumental conditioning in which an observer animal views a demonstrator tap two internally lit spheres in a particular sequence and, whenever the demonstrator performs a trial correctly (the CS), both animals receive MFB stimulation (the US).Observers viewed performers in this manner for one session per day over three consecutive days, and learning was assessed the following day.The majority of observers acquired the task and significantly outperformed control animals who viewed a similar version of the task with naïve demonstrators.The absence of learning in control animals indicated that observers integrated the actions of skilled demonstrators with contiguous reward to adaptively modify their own behavior, as would be predicted of a successful instrumental conditioning paradigm.These results demonstrate that rats can learn to perform novel behavioral sequences by visual observation, and we discuss ways in which the paradigm can be further refined for future investigations into the neural substrates supporting non-aversive observational learning.

Animal subjects
All experiments were performed in accordance with the Norwegian Animal Welfare Act, the European Convention for the Protection of Vertebrate Animals used for Experimental and Other Scientific Purposes, and the local veterinary authority at the Norwegian University of Science and Technology.All experiments were approved by the Norwegian Food Safety Authority (Mattilsynet; protocol ID # 25094).The study contained no randomization to experimental treatments and no blinding.Sample size (number of animals) was set a priori to at least 7 per condition to perform unbiased statistical analyses based on Mead's resource equation.A total of 16 male Long-Evans rats (age: >12 weeks, weight: 393-525 g at the start of experimental sessions) were used in this study.Animals were handled daily from the age of 7 weeks until the start of the experiments.All rats were housed in groups in enriched cages prior to surgery, separated prior to surgery, and housed individually in plexiglass cages (45 × 44 × 30 cm) after surgery to avoid damage to implants.Animals had ad libitum access to food and water throughout the entire study and were housed in a temperature and humidity-controlled environment on a reversed 12 h light/12 h dark cycle.Training and experimental sessions all took place during the dark cycle.
Animals from the same litter were paired for behavioral experiments whenever possible, though well-trained performers were re-used for multiple observers in some cases.The roles of "observer" and "performer" were designated randomly if the animals were siblings.In cases where animals were not from the same litter, the older animal was designated "performer".However, we observed no apparent differences in learning effects between sibling and nonsibling pairs.

Surgery
Animals were anesthetized initially with 5% isoflurane vapor mixed with oxygen and maintained at 1-3% throughout the surgical procedure, typically lasting 1-2 h.While anesthetized, they were placed in a stereotaxic frame and injected subcutaneously with Metacam (2.5 mg/kg) and Temgesic (0.05 mg/kg) as general analgesics.A local anesthetic (Marcain 0.5%) was injected under the scalp before the first surgical incision was made, followed by clearing the skull of skin and fascia.Using a high-speed dental drill, multiple holes were drilled in the skull and jeweler's screws were inserted to later anchor dental acrylic (Kulzer GmbH, Germany) at the end of the surgery.A craniotomy was opened over the right hemisphere at AP: -2.8, ML: 1.7, and a single bipolar stimulating electrode targeting the MFB was gradually lowered in the brain to the level of the lateral hypothalamus (DV: 7.8-8.0).Stimulating electrodes were 13 mm long twisted stainless-steel wires coated with polyimide (125 μm diameter, 150 μm with insulation; MS303/3-B/ SPC, Plastics One, Canada).Prior to insertion, each electrode was suspended in 70% ethanol for 45-60 min, dried and rinsed with saline.Once the electrode was in place, the craniotomy was sealed with silicone elastomer (Kwik-Sil, World Precision Instruments Inc., USA), and a thin layer of Super-Bond (Sun Medical Co., Ltd., Japan) was applied to the skull to increase bonding strength between the skull surface and dental acrylic.Once the acrylic was applied and cured, sharp edges were removed with the dental drill at the end of surgery.Following surgery, animals were placed in a heated (32 °C) chamber to recover.Once awake and moving, they were returned to their home cage.If animals showed signs of pain or discomfort post-surgery, additional medication (Metacam (2.5 mg/kg) and Temgesic (0.05 mg/ kg)) was given, and antibiotics (Baytril, 25 mg/kg) were administered if wounds showed signs of infection.The testing of stimulating electrodes prior to behavioral training was postponed while animals underwent medical treatment.We noted that the age and weight of the animals were critical factors to consider both in terms of successfully targeting the MFB during surgery, and in the stability of electrode efficacy over time.We used a minimum age of 12 weeks and minimum weight of 300 g as benchmarks, and had the highest rate of surgical success with animals in the weight range of 350 g, +/-20 g.

Behavioral arena and MFB stimulation set-up
Behavioral experiments were performed in a 93.5 cm x 42 cm x 50 cm clear plexiglass box divided in the middle by a perforated transparent barrier, creating two compartments (Fig. 1).The holes in the barrier allowed for visual, auditory and olfactory cues but were not large enough to allow physical contact between the animals.Two hollow spheres (ping pong balls; STIGA sports AB, Sweden) containing remotely triggered LEDs were mounted on top of hollow metal rods and positioned at the perimeter of the performer's side of the box (Fig. 1a and b).The LEDs in each sphere were connected via insulated wires threaded through the metal rods to a Raspberry Pi 3 (Raspberry Pi Foundation, UK).The Raspberry Pi 3 powered and controlled when LEDs were illuminated, which cued the animals to tap the spheres.
The cue and stimulation schedules in the observational learning experiments were controlled using the Raspberry Pi 3 and a standard personal Dell PC running Windows 10.Custom software written in Python 2 controlled both the LED signaling cues and delivery of electrical brain stimulation during the experiments without experimenter intervention.The Python script also allowed for manual override of the stimulation schedule, if needed, during training of performers (detailed below in Behavioral and Experimental Procedures).All training and experimental sessions lasted 30 min.A Raspberry NoIR Camera V2 (Raspberry Pi Foundation, UK) was turned on each time the script was activated and recorded each session from an overhead view.The experimental room was dimly lit, and 850 nm near-infrared lighting was used to illuminate the video recordings.
Intracranial stimulation of the MFB was delivered via a pulse stimulator (Master 9, Microprobes, USA) and two stimulus isolator units (SIU) (ISO-Flex, Microprobes, USA), each connected to a stimulating cable (305-305 (C)/ SPC, Plastics One, Canada) connected to the head of the animal.The size of the implants was kept as small as possible so as not to interfere with the behavior of the animals, and attaching the cable to the implanted electrode had to be done delicately while the animal sat still.The cord from each SIU was attached to a 2-channel commutator (Plastics One, Canada) to prevent the cords from excessive twisting while the animals moved freely in the box.Each stimulation pulse consisted of 500ms trains of square-wave pulses, each lasting 400µsec, with 200µsec on and 200µsec off per cycle, delivered at 150 Hz.

Habituation
Prior to surgery, experimental animals were handled until calm and comfortable with the experimenter and habituated to the observer-side of the apparatus, where electrodes would eventually be tested.After surgery, but before any intracranial stimulation was given, all animals were habituated to the performer-side of the box on two occasions: once without the stimulating cable attached and a second session with the cable attached.After the efficacy of the electrode was tested in the observer-side of the box (see Electrode testing, below), no further habituation or exposure to the behavioral apparatus was given until the start of the experiments.Prior to the start of any subsequent session, regardless of the stage of training or testing, each animal was allowed to habituate to their compartment until they were calm, either sitting still or grooming.After each session, the box and manipulanda were cleaned thoroughly with detergent (Zalo Ultra, Lilleborg, Norway) and wiped dry to remove trace odors or cues that might influence subsequent experimental or training sessions.

Electrode testing
After a minimum of 5 days of post-operative recovery, stimulating electrodes were tested for efficacy and the strength of stimulation required to reinforce behavior was determined for performers and observers.Efficacy was tested by reinforcing the animal's preference for a neutral object (e.g. a pen) in the observation chamber of the experimental The two-step sequence performed to attain rewarding stimulation.The picture on the left shows a performer tapping the first lit sphere, which triggers the second sphere to light up.The picture on the right shows the moment the animal taps the second sphere and receives intracranial MFB stimulation.(d) Total timeline for the experiment, starting from before surgery to after the completion of testing in step 2, but only when the spheres were lit (with lighting controlled manually by the experimenter), (4) withholding reward when tapping the first sphere, but delivering reward when tapping the second sphere when cued by the light, (5) fully automatic training sessions until the animal performed > 75% successful trials per session.Step 4 and step 5 both relied on the Raspberry Pi controlled script to run the task and deliver instantaneous reward after the second tap, but differed in that step 4 allowed for extra motivational stimulation from the experimenter when the performer struggled with the task.Step 5 was initiated only after the animal toggled consistently back and forth between manually lit spheres without additional stimulation from the experimenter.Training was considered complete and stable once a performer exceeded 75% correct trials for three consecutive 30-minute sessions.Learning rates varied from animal to animal, and reaching criterion performance took anywhere from 2 to 15 training sessions.Naïve animals showed no initial spontaneous task-performance and typically required repeated stimulation to acquire the entire task-sequence.Performer rats used for subsequent experimental sessions were given a minimum of one day of rest after reaching criterion.The automatic training sessions utilized the same script as the subsequent experimental sessions.

Experimental task-structure
During the experiments, observer and demonstrator rats were placed in their respective sides of the box and allowed to settle.Once the animals were calm the experimenter initiated the task and an automated script turned on the LED in the first sphere with a random time-interval between 3 and 30 s.The first light remained on for a maximum of 30 s if the demonstrator did not tap the sphere.If > 30 s elapsed, the LED turned off and the trial was scored as a "missed trial", followed by a 3-30 s random time interval (inter-trial interval (ITI)) before the LED turned on again and a new trial started.If the animal tapped the first sphere within the 30 s when the light was on, the first LED turned off and the second LED in the other sphere turned on.The second LED also had a 30 s permissive time window.If the animal tapped the second sphere within this time window, the trial was scored as "successful", and both the performer and observer received concurrent MFB stimulation as a reward.If the demonstrator animal did not tap the second sphere within 30 s, the trial was scored as "failed".After each trial, whether successful or failed, another random ITI of 3-30 s was initiated before the next trial started and followed the same sequence as above.Each session lasted 30 min.apparatus.During these tests, the animal was placed in the observation-side of the box and allowed to settle (1-5 min), after which single stimulations were delivered when the animal oriented toward or interacted with the object.All animals started at a stimulation intensity of 20µA.If they were non-responsive then current was increased incrementally by 2µA until behavioral effects were observed, such as increased investigation or physical interaction with the object.Current was then further increased in 2µA increments until side-effects were observed (such as motor artifacts or aversive reactions), or if there had been a cumulative increase of 10µA from the identified effective stimulation intensity without any observed side-effects.If movement artifacts were elicited by the simulation, current strength was incrementally lowered by 2µA until a stimulation intensity that did not elicit artifacts was identified.The final current strength was chosen from the upper range that elicited apparent reward without side-effects (ranging from 18 µA to 60µA across animals).If the optimal range was too narrow or unclear, the animal was re-tested and the optimum was determined on a subsequent day.Electrode testing for performers took place at the start of their training (described below in Training of performer rats), and for observers within 24 h before beginning the first observational session.

Training of performer rats
The overall procedure for training demonstrator rats consisted of three phases: (i) an initial shaping phase, followed by (ii) a continuous reinforcement schedule (i.e.stimulation delivered for every sphere-tap), which then changed to (iii) a partial reinforcement phase, in which reward was delivered only when the second sphere was tapped after the first sphere had been tapped without reward.At the start of the first shaping phase, MFB stimulation was given manually whenever the animals oriented toward or approached the spheres to encourage exploration.If no behavioral changes were observed after prolonged bouts of stimulation, or if the animals showed signs of aversion, stimulation current intensity was up-or down-adjusted, respectively.If the stimulation was still aversive or had no noticeable effect, training was discontinued and the rat was excluded from the study.After the animals showed sustained interest in either of the spheres, current intensity was again fine-tuned to the lowest current strength which yielded consistent behavioral responses.
Following initial shaping, subsequent training steps were structured as follows: (1) rewarding stimulation was given when the animal started to physically interact with either sphere, (2) they were rewarded only when starting to tap each sphere alternatingly (partial reinforcement of tapping behavior), (3) rewarded when tapping the spheres as

Histology
After experiments were completed, animals were deeply anesthetized with Isofluorane and injected intraperitoneally with an overdose of pentobarbital (Exagon vet., 400 mg/ ml, Richter Pharma Ag, Austria), after which they were perfused intracardially with 0.9% saline and 4% paraformaldehyde.The animals were then decapitated, and skin and muscle were removed from the skull before leaving it to post-fix overnight in 4% paraformaldehyde.The following day the electrode implants were removed from the skull and the brains were extracted and stored in dimethyl sulfoxide (DMSO) at 4 °C.On the day of sectioning, the brains were removed from DMSO, frozen with dry ice and sectioned with a sliding microtome at 40 μm in the coronal plane in three series.The first series was mounted immediately on glass slides, Nissl-stained and cover slipped, and the other two series were kept for long term storage in DMSO at -25 °C.Electrode placement was confirmed using the Nisslstained series (Fig. 2).Of 23 total rats implanted for the final version of this paradigm, two were excluded due to electrodes being off the intended target, and 5 animals had correctly placed electrodes but were excluded due to disruptive behavior (e.g.excessive jumping) on the day of testing.

Results
Observer animals (n = 8) watched skilled demonstrators perform the sphere-tapping task in single 30-minute sessions each day for three days, with demonstrators on average performing 73.7 ± 3.0 (mean ± SEM unless otherwise indicated) trials per session, corresponding to 2.5 ± 0.1 trials per minute, and an overall success rate of 95.8%.A separate group of animals (n = 8) observed three days of the control version of the task, which consisted of naïve demonstrators, the spheres turning on and off in the correct order independently of the demonstrator's behavior, and both observers and demonstrators receiving MFB stimulation whenever the second sphere turned off (average of 87.3 ± 0.8 automated trials per session).Importantly, the total number of MFB stimulations received during the training phase did not differ between experimental and control groups (bootstrapped independent samples t-test, p = 0.60).Observers and control animals were tested in the task 24 h after the final training session, with observers performing a significantly higher proportion of correct trials (23.3 ± 8.3%) than controls (5.8 ± 3.1%; Mann-Whitney U = 14.5, p = 0.0325; Fig. 3).The rate of successful trials varied considerably across observer animals, ranging from just under 60% correct performance in the best animal to 0% in one observer who failed to learn the task (Fig. 3

Training and testing of observers
Observer animals observed either a well-trained performer or a naïve control performer (described below in Control group) for one 30-minute session per day over three consecutive days and were tested 24 h after the final observation session.The 24-hour delay was introduced to preclude spontaneous imitation (Zentall 2006).During observational sessions, observers were able to see the performer for the entirety of each 30-minute session while allowed to move freely in their side of the box.On the day of testing, observers were placed in the performer-side of the box, and the same pre-programmed LED-and MFB stimulation-script was run as when demonstrators performed the task.No pre-training or priming with stimulations was given to the observers before testing.

Control group
The control condition was run identically as the real experiments, but with performer animals that were completely naïve to the task.During each of the three control observation sessions, the naïve performer was allowed to move freely on the demonstrator side of the box while a custom script drove the spheres to light up and extinguish in the correct sequence, with reward stimulation delivered to both animals when the LED in the second sphere turned off.The lighting of the spheres and MFB stimulation progressed automatically, irrespective of the demonstrator's actions, and trials were started at random and followed the same randomized 3 to 30 s ITIs as with the experimental trials.This way, reward delivery was dissociated from the behavior of the performer but maintained the sequential structure of correct trials.Critically, this condition also provided the social cue of a would-be performer but lacked the demonstration of task-specific behavior in conjunction with the reward.4e), suggesting that the saliency of the lighting of the first sphere was similar at the group level.There were, however, individual differences between observers, with some waiting near the first sphere, tapping it immediately at the start of the trial and turning directly to tap the second sphere.Other observers had longer latencies between the first and second tap and appeared less motivated, as they would tap the first sphere but then explore the chamber or groom before approaching and tapping the second sphere.Finally, we note that several comparisons had the same statistical result (i.e. total successful trials, trials per minute, and mean latency between 1st and 2nd sphere tap) because these aspects of performance were highly correlated, resulting in similar rankings and, consequently, the same U-and p-values.

Discussion
In this study, we demonstrate a new observational learning paradigm in which rats learn to perform a novel sequence of actions by observing the behavior of conspecifics.The task was designed as a means of investigating the neural substrates of observational learning without the use of fear, hunger or other negative experiences as motivators.The task relied on instrumental conditioning principles, where the demonstration of a successful trial by the performer between that of first-person trained demonstrators, who achieved 95-100% correct performance after extensive training, and naive animals, who exhibited 0% performance prior to shaping.Compared to observers, the range of performance for control animals was substantially lower, with the best animal performing 23% correct trials and five with 0% (Fig. 3, right).
Observer animals exceeded controls in other aspects of task performance that reflected other features of learning.These included a higher total number of correct trials executed in the test session (mean of 9.3 ± 3.3 for observers vs. 2.3 ± 1.2 for controls; Mann-Whitney U = 14.5, p = 0.0325; Fig. 4a) and a higher number of trials performed per minute (0.30 ± 0.11 vs. 0.07 ± 0.04 for controls; Mann-Whitney U = 14.5, p = 0.0325; Fig. 4b), demonstrating higher levels of efficiency and vigor of task execution in trained observers.The faster performance rate of observer animals was also reflected by the shorter average trial length (45.6 ± 3.8 s for observers vs. 54.1 ± 2.6 s for controls; one-tailed Mann-Whitney U = 50, p = 0.0325; Fig. 4c), and by the fact that the distribution of trial latencies for control animals tended to cluster at the maximal allowable time of 60 s per trial.Perhaps most critically, the mean and median latencies between tapping the first and second sphere were shorter for observers than controls (mean of 20.3 s for observers vs. 25.5 s for controls; Mann-Whitney U = 14.5, p = 0.0325; median 18.9 s vs. 30.0s (the maximum), respectively; Fig. 4d), indicating that observers had learned the alternation response between the two spheres rather than approaching and the 2nd sphere directly, as would be expected with Fig. 3 Observer animals performed a significantly higher fraction of successful trials than controls when tested 24 h after the final observation session.Whisker plots show the variability and overall success rate (out of all trials) for observers (left) and controls (right).Success rates for each animal are shown as individual dots; median performance in each group is indicated by the black line (16.7% for observers; 5.2% for controls); the 25th percentile of the distribution is shown below; 75th percentile is shown above; whiskers show the full distribution of data.Statistical significance between groups (p < 0.05; one-tailed Mann-Whitney U test) is noted with a star required the animals to perform a sequence of actions to attain the reward, rather than going straight for the rewarded sphere.Critically, after tapping the first sphere, observer animals tapped the reward sphere more quickly and more often than controls, whereas the latency to tap the unrewarded sphere did not differ, indicating that the cueing stimuli were salient to both groups.Another factor to control for was autoshaping, in which smaller instinctive reactions to a stimulus happen to coincide with a reward and, after some repetition, this leads to the reinforcement of those actions even though they are independent of the reward (Brown and Jenkins 1968).Here, although we cannot rule out a speciesspecific tendency of the rats to manipulate an illuminated object, autoshaping would more likely come into play if there were only one sphere which needed to be tapped for the reward, rather than first tapping the unreinforced "trigger" sphere.Autoshaping may have been more involved in the initial shaping stages of first-person training, when demonstrators learned that tapping a sphere per se caused rewarding MFB stimulation.served as the CS for observers, and rewarding intracranial stimulation served as the US.By undergoing CS-US pairing over three days, observers learned to associate task-specific actions with reward, in effect learning to perform the task by observation.The observer animals performed a higher percentage of correct trials when tested, and they outperformed controls in several temporal metrics indicating a higher efficiency in task performance.Key design features included the use of MFB stimulation to maintain motivation and task engagement for both observers and demonstrators, as well as the use of large spherical cues that were visible from any angle.Crucially, an unrewarded trigger sphere needed to be manipulated first in order to activate the second, rewarded sphere, which required observers to learn the relevant sequence of actions rather going directly for the rewarded sphere or tapping either sphere indiscriminately.
The use of a two-part action sequence was incorporated to reduce the potential contribution of more automatic forms of associative learning, including stimulus enhancement and autoshaping.We argue that stimulus enhancement was not likely the basis for learning since success at the task The average latency to tap the first sphere after it was lit was lower in observer animals, but the difference was not statistically significant.Significant differences (p < 0.05, one-tailed Mann-Whitney U test) are noted with a single star, non-significant differences are noted with n.s.
paradigms (e.g.Carlier and Jamon (2006) reported a total of 70 learning trials summed over 14 daily observational sessions; Jurado-Parras et al. ( 2012) used a lever-pressing paradigm eliciting ~ 20 trials per 20-minute session) (Valsecchi et al. 1989b).We had previously piloted a version of the present paradigm that depended on food reward and found a steep drop in performance by demonstrators after ~ 10 min, presumably due to the animals reaching satiety.A third advantage of MFB stimulation was the consistent salience of the reward, which may have helped sustain task engagement for observers as well as performers.Previous studies also sought to enhance the attention of observers by using performers of the opposite sex (Collins 1988;Carlier and Jamon 2006), or by tapping into the natural inclination of weanling animals to observe their mothers or older adults (Valsecchi et al. 1989a;Prato Previde and Poli 1996).The use of weanling animals in particular proved effective, though using pups would present challenges for recording or manipulating the brain due to their small size.A final advantage of using MFB stimulation was that its delivery, as well as the other timed features of the task, were automated, giving precise programmatic control over the timing of events within a trial.This reduced training variability within and between animals and allowed us to systematically adjust timing intervals (e.g.upper and lower bounds of inter-trial intervals) while developing the task.
As can be seen from the wide distribution of performance rates among observers, a majority of trained observers learned the task while others apparently did not, so there is still room for improving the paradigm.Based on earlier in-house observations, one way to better the performance of observers would be to increase the total number of observation trials by increasing from three to 5 days of training.For example, while developing the task, two pilot animals with 5 days of training gave > 91% successful performance during testing.However, a caveat was that the animals were tested immediately after the final observational session, so their high rate of performance could have been explained as a result of spontaneous imitation (Zentall 2006).This prompted the inclusion of a 24-hour delay to remove such effects, though we note that reducing the delay to 8 or 12 h might produce more robust performance while still testing genuine associative learning.Another way to increase the total number of observation trials would be to use session lengths longer than 30 min, but careful attention should be given not to fatigue the demonstrator animals.We found that well-trained performers were in constant motion for over 30 min, and most animals' performance began to decline beyond this point.
In summary, although the present study had a limited sample size and the learning effects varied across observers, it demonstrates the proof of principle for an observational Our broader motivation to create this task stemmed from the fact that much of our current knowledge of the neurobiological mechanisms underlying observational learning derives from fear-based learning paradigms in rodents.Even though fear and pain are powerful stimuli (e.g.Carrillo et al. 2019) which can drive robust learning in different species (John et al. 1968;Olsson et al. 2007;Jeon et al. 2010), observational learning encompasses more than just acquisition of fearful associations, and includes other forms of learning that depend on visual, motor or spatial cognition (Heyes 1994;Galef 2005;Gariepy et al. 2014;Carcea and Froemke 2019).Naturalistic examples have been reported in a wide range of vertebrates and invertebrates, and include acquiring tool usage (Whiten et al. 1999;Biro et al. 2003;Sanz and Morgan 2007;Holzhaider et al. 2010;Loukola et al. 2017), vocal learning (Konishi 2004;Mooney 2014), learning where to shoal or forage for food (Valsecchi et al. 1989a;Laland and Plotkin 1990;Laland and Williams 1997;Emery et al. 2008), or how to solve a specific task to gain access to food (Palameta and Lefebvre 1985;Zohar 1991;Prato Previde and Poli 1996;Carlier and Jamon 2006;Gruber et al. 2009).While remarkable neurobiological insights have also been made, for example, in the songbird learning system (Prather et al. 2008;Vallentin et al. 2016), progress in uncovering the neural bases of non-aversive observational learning has been impeded by a dearth of paradigms which reliably yield de novo skill learning in the lab.In rodents, it is more common that observational learning tasks impart a general strategy or prime subjects to perform behaviors (e.g.(Leggio et al. 2003;Jurado-Parras et al. 2012) rather than produce learners who execute a novel behavior from scratch (e.g.(Carlier and Jamon 2006).It was therefore our goal to design a task for rodents that depended on visuomotor cognition in which at least a subset of animals performed the task correctly when first tested.
Another pivotal design aspect of the paradigm was the use of intracranial MFB stimulation as the reward, which brought at least four advantages.One was that we did not need to rely on food or water deprivation, which avoided potential deprivation-related detriments in performance, as well as the loss of motivation from demonstrators once sated.In addition, it avoided the use of traditional aversive motivators, like foot shocks, which was better for the animals' stress and general welfare.The second main advantage, related to the first, was that MFB stimulation was a powerful source of positive reinforcement delivered directly to the brain, bypassing the animals' need for calories or water.It was therefore possible to sustain a high number of trials demonstrated by performers, with an average of more than 70 correct demonstrations in a 30-minute session, and in some sessions > 85.This exceeded the rates of performance for demonstrators in food-motivated operant learning paradigm in rats which does not require aversive stimulation or food deprivation.With further refinement, the task could serve as a powerful tool for studies seeking to elucidate the neural substrates supporting the acquisition and long-term (> 24 h) memory of non-aversive observational learning.More generally, the establishment of such a paradigm opens the door for a broader comparison of pathway-specific processes supporting different kinds of observational learning, such as those depending on visuomotor cognition versus fear-motivated associative learning.

Fig. 1
Fig. 1 Apparatus and timeline for behavioral experiments (a) Schematic of the two-sided behavioral training arena, with a demonstrator animal and cue-spheres in the left chamber and an observer in the right.Schematic created with BioRender; illustration not to scale.(b) Overhead view of an ongoing experiment, with the performer investigating an unlit cue-sphere (circled), and the observer watching.(c)

Fig. 2
Fig. 2 Histological verification of electrode placement.A Nissl-stained tissue section showing the termination point of a bipolar stimulating electrode targeting the medial forebrain-bundle.Scalebar = 2000 μm

Fig. 4
Fig. 4 Observer animals surpassed controls on additional task performance metrics.(a) Observers had a higher total number of successful trials than controls during testing; each dot indicates individual animal performance.(b) Observers performed significantly more trials per minute than controls, reflecting a higher and more consistent pace of task-performance.(c) Observers completed trials significantly faster than controls, measured as the time interval from when first sphere was lit to when the second sphere turned off; maximum possible time for