Abstract
In a variety of laboratory preparations, several animal species prefer signaled over unsignaled outcomes. Here we examine whether pigeons prefer options that signal the delay to reward over options that do not and how this preference changes with the ratio of the delays. We offered pigeons repeated choices between two alternatives leading to a short or a long delay to reward. For one alternative (informative), the short and long delays were reliably signaled by different stimuli (e.g., SS for short delays, SL for long delays). For the other (non-informative), the delays were not reliably signaled by the stimuli presented (S1 and S2). Across conditions, we varied the durations of the short and long delays, hence their ratio, while keeping the average delay to reward constant. Pigeons preferred the informative over the non-informative option and this preference became stronger as the ratio of the long to the short delay increased. A modified version of the Δ–Σ hypothesis (González et al., J Exp Anal Behav 113(3):591–608. https://doi.org/10.1002/jeab.595, 2020a) incorporating a contrast-like process between the immediacies to reward signaled by each stimulus accounted well for our findings. Functionally, we argue that a preference for signaled delays hinges on the potential instrumental advantage typically conveyed by information.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Under some conditions, animals engage in an unprofitable bargain: While hungry, they trade unusable information for food by preferring an informative but suboptimal option to a non-informative but optimal one. The typical procedure, known as the ‘suboptimal choice task’, is depicted in the top panel of Fig. 1. The animal faces repeated choices between the two options. If it chooses the informative option (cross in the figure), on 20% of the occasions an S+ stimulus is presented, and 10-s later food is delivered; in the remaining 80% of the occasions an S− stimulus is presented, and 10 s later no food is delivered. If the animal chooses the non-informative option instead (circle in the figure), stimulus S1 or S2 follows for 10 s and then food occurs on 50% of the occasions regardless of the stimulus shown. In other words, choice is between an option that signals the trial outcome, but the outcome consists mostly of no food and rarely of a bit of food, and another option that does not signal the outcome, but the outcome consists of significantly more food. Although the non-informative option usually yields more reward (2.5 times more in the figure), human gamblers (Molet et al. 2012), pigeons (e.g., Fortes et al. 2018; Gipson et al. 2009; Smith et al. 2016; Stagner & Zentall 2010), starlings (Vasconcelos et al. 2015), rhesus macaques (Blanchard et al. 2015), and rats (e.g., Ajuwon et al. 2023; Chow et al. 2017; Cunningham and Shahan 2019, 2020) prefer the informative option.
This seemingly paradoxical preference generated an intense empirical and theoretical effort to understand its causes (e.g., Cunningham and Shahan 2018; Daniels and Sanabria 2018; Dunn et al. 2023; González et al. 2020a; McDevitt et al. 2016; Vasconcelos et al. 2018; Zentall 2016). Whatever the mechanism involved, this preference for informative signals when they do not yield any tangible instrumental benefit is reminiscent of the information-seeking hypothesis first suggested in the ‘observing response’ literature. In the original preparation, Wyckoff (1969) presented pigeons with a white key where a mixture of two equiprobable, non-signaled reinforcement schedules alternated unpredictably. One offered food every 30 s (FI 30 s) and the other never produced food (extinction). Importantly, if the pigeon stepped on a pedal, then the white key turned green when the FI 30 s schedule was in effect and red when extinction was in effect. In other words, stepping on the pedal provided information about the state of the world and was labeled the ‘observing response.’ Even though the information could not be used to change the outcome (food or no food), pigeons readily learned to press the pedal (see also Browne & Dinsmoor 1974; Dinsmoor et al. 1972; Mulvaney et al. 1974; Prokasy 1956; for a review, see Dinsmoor 1983). In a related paradigm, rats prefer alternatives wherein forewarning information about impending unavoidable shock is given over alternatives with no such information (e.g., Lockard 1963; Perkins et al. 1963).
The preceding examples suggest that animals may be biased to perform responses that reduce the uncertainty about future events, despite being unable to change the occurrence of these events. If this is a somewhat general proclivity, then it should extend to other biologically relevant situations beyond the presence or absence of food or shock. This bias should emerge in any preparation in which a response produces reliable information about an upcoming biologically relevant event, whether it regards its quantity, delay, probability, or quality.
Bromberg-Martin and Hikosaka (2009; see also Bromberg-Martin and Hikosaka 2011) tackled the issue of magnitude. They modified the suboptimal choice task (cf. top panel of Fig. 1) to study the effect of signaling the magnitude of impending rewards. Using a preparation where both the informative and non-informative options led to a small or large reward with equal probability (i.e., the choices did not affect the rate of reward), they found that, all else equal, macaque monkeys prefer options that signal the size of the upcoming reward over options that do not (see also Laude et al. 2014; Zentall and Stagner 2011).
This study focuses on another dimension of reward—the delay to collect it. Suppose that animals face repeated choices between two options, one informative, the other non-informative. Both options lead to reward, half of the trials after a short delay and the other half after a long delay (see bottom panel of Fig. 1). The difference between the options is that, in the informative option, the delays are perfectly correlated with the terminal stimuli (e.g., Green during the short delay, SS, and Red during the long delay, SL), whereas in the non-informative option, the delays are not correlated with such stimuli (e.g., Yellow, S1, and Blue, S2, during SS and SL). The two options yield the same rate of reward, and choices do not affect the obtained rate. Will animals prefer signaled over unsignaled delays to reward under these conditions?
A modified version of the Δ–Σ hypothesis provides a prediction. Developed to account for the standard result in the suboptimal choice task, the hypothesis assumes that the value of an alternative depends on two high-order variables: the difference between the two reinforcement probabilities associated with the terminal-link stimuli of each option, ∆, and the overall probability of reinforcement of an option, ∑ (cf. top panel of Fig. 1; see González et al. 2020a, b for further details). A modified version of the Δ–Σ hypothesis suggests that animals should not only prefer signaled to unsignaled delays but also that the strength of preference should depend on the disparity between possible delays. To see why, consider first the suboptimal choice task (top panel of Fig. 1). According to the model, the value of an alternative depends on two higher-order variables: (1) Delta (∆), the difference between the reward probabilities associated with each stimulus within each option (a contrast-like effect), and (2) Sigma (∑), the overall probability of reward associated with each option. These two variables determine the overall value of an option according to the equation
with the scaling parameters c and β both > 0, and i standing for the “info” or “noninfo” option. Preference for the informative option can be estimated by Luce’s ratio (1959), Vinfo/(Vinfo + Vnoninfo), which simplifies to
Equation 2 predicts indifference between the options in the delay-based version of the task (cf. bottom panel of Fig. 1). This is because the probability of reward with each terminal stimulus is 1.0 (thus, both Δinfo and Δnoninfo are zero) and ∑ is 1.0 for both options. However, even though the contrast between the terminal probabilities of reward is indeed null, the delay-based task introduces a new source of contrast, that between the reward immediacies experienced with each terminal stimulus of each option. To illustrate, suppose that the short delay is 5 s and the long delay is 20 s. Then the ∆ of immediacies in the informative option would be 0.15 = [(1/5) − (1/20)] and the ∆ of immediacies in the non-informative option would be 0 because the average time to food is 12.5 s in the presence of both S1 and S2.
Thus, given this new source of contrast and given that ∑ remains the same for both options, Eq. 2 reduces to a one-parameter equation,
where ds and dl correspond to the short and long delays, respectively. If we let dl/dS = r and ds + dl = S, Eq. 3 becomes
Figure 2 shows preference for the informative option as predicted by Eq. 4 when β/S = 0.5, 1.0, and 2.0. Two predictions are noteworthy: (1) When the two delays are equal (dl/dS = 1), the model predicts strict indifference between the two options regardless of β/S, and (2) as the delays become more dissimilar (i.e., as dl/ds increases as the sum remains constant), preference for the informative option should increase in a logistic-like way modulated by the specific β/S.
A functional ecologically-based account allows for similar predictions. Because, under natural circumstances, information typically comes with instrumental value, we hypothesize that a bias for information may have been sculpted by natural selection across generations. After all, foraging animals continuously face cues associated with different and often uncertain delays to food. Being able to anticipate the time to food in such a temporally fluctuating environment may afford an instrumental advantage that could be used to exploit the often time-limited foraging opportunities available. For example, an animal may be willing to accept a ‘short’ delay to capture a given prey but reject the opportunity to pursue a prey when the expected delay exceeds a given threshold of acceptability. By using this decision rule, the animal would concentrate efforts on the most profitable prey and thus improve the overall rate of return. Thus, if this information bias is indeed built-in, one can anticipate a preference for signaled over unsigned delays. Furthermore, one can also argue that such a preference should be modulated by how different the delays are. To exemplify, imagine a forager facing an option yielding food, sometimes after a short delay, sometimes after a long one. When the two possible delays are considerably different (e.g., 5 s and 20 s) compared to similar (e.g., 10 s and 15 s), information regarding the length of the current delay should be more valuable because the benefits of rejecting long delays would have a larger impact on the average rate of capture. In other words, because the putative value of information should increase as the possible delays become more dissimilar, so should the preference for signaled delays.
Some have also argued that information is reinforcing per se. The claim of what could be labeled as the intrinsic value hypothesis is that animals attach inherent value to information about future outcomes independently of any instrumental purpose (e.g., Bennett et al. 2016; Eliaz and Schotter 2007; Grant et al. 1998; Kreps and Porteus 1978). In fact, some findings suggest that the value of non-instrumental information shares with the value of primary reward a common neural code (Bromberg-Martin and Hikosaka 2009; Kobayashi and Hsu 2019). Nonetheless, the question of why and how information became intrinsically valuable remains unanswered. Most important, this proposal predicts a preference for signaled delays but not its modulation by the similarity between possible delays because the value of information is not driven by any instrumental edge.
The evidence to date suggests that animals may prefer signaled to unsignaled delays, at least when the possible delays are very different. Bower et al. (1966) used a procedure similar to the one depicted in the bottom panel of Fig. 1 and found a preference for signaled delays when the long delay was four times longer than the short delay. Frankel and Vom Saal (1976) reported even stronger preferences with a threefold difference between delays. Similar findings have also been reported as a preference for multiple over mixed schedules of reinforcement (e.g., Alsop and Davison 1986; Davison 1972; Fantino 1969; Hursh and Fantino 1974; Richards 1981).
To our knowledge, the effect of the similarity between delays has not been previously studied. Both the modified Δ–Σ hypothesis and an ecologically-driven analysis suggest that preference for signaled delays should be modulated by the similarity between delays—the more dissimilar the delays, the stronger the preference. On the other hand, the intrinsic value hypothesis predicts no modulation of preference by delay similarity. The experiment that follows puts the issue to the test. The procedure followed closely the one depicted in the bottom panel of Fig. 1 with options differing only in that one had distinctive cues signaling the short and the long delays, while the other had ambiguous cues (short and long delays were equally likely in their presence). Across conditions, we varied how different the delays were (r = dl/ds) while keeping their total duration (S = ds + dl) constant.
Method
Subjects
Seven pigeons (Columba livia), between 80 and 85% of their free-feeding weights, were used in this experiment. They were individually housed in a temperature-controlled room (around 21 °C) on a 13:11 h light/dark cycle (lights on at 8:00). Pigeons had previous experience with the suboptimal choice procedure described in Fortes et al. (2016), including the stimuli used in this experiment. We implemented a corrective procedure during Pretraining (see below) to reduce potential biases. Grit and water were always available in the home cage. The pigeons were cared for according to the animal care guidelines of the Directorate-General for Food and Veterinary (DGAV), the Portuguese national authority for animal health, and the University of Minho. All experimental procedures were conducted in agreement with European (Directive 2010/63/EU) and Portuguese law (Ordinance 1005/92 of October 23), and were approved by DGAV (Authorization #024946).
Apparatus
Three Med Associates operant boxes for pigeons were used. The boxes were 28.5-cm high, 24-cm long, and 30-cm wide. Each box was enclosed in a sound-attenuating chamber, equipped with a fan that circulated air and masked extraneous noises. The response panel had three circular keys, each 2.5 cm in diameter, and placed 6 cm apart (center-to-center), with the lowest edge 21 cm above the floor grid. The response panel also included a 6-cm wide × 5-cm high opening, centered 4 cm above the floor grid. The pigeon had access to food when the opening was illuminated by a 1.1-W light and the food hopper was raised. In the panel opposite to the response panel, a houselight (2.8 W) centrally located 23 cm above the floor illuminated the whole box. A personal computer with ABET II software (Lafayette Instruments) controlled the events and recorded data. Communications with the experimental chambers used the Whisker interface (Cardinal and Aitken 2010).
Procedure
Pretraining. Pigeons were initially trained on different Fixed-Ratio (FR) schedules to reduce any carryover effects from their previous experiments. Each color (red, green, yellow, and blue) and symbol (cross and circle) was presented eight and four times per session, respectively. Color stimuli were always shown on the center key and the symbol stimuli were shown equally often on each side key. Once the FR schedule was completed, the pecked key was turned off and the feeder was raised for 3–5 s adjusted for each pigeon to maintain its body weight. After food, a 10 s Inter-Trial Interval (ITI) followed with the house light on. Pigeons were trained for two sessions with a FR1 schedule and for one further session with a FR5 schedule in effect during the first half of the session and a FR10 in effect during the second half.
Experimental task
After pretraining, pigeons were exposed to the procedure depicted in the bottom panel of Fig. 1. Each session comprised 96 trials, 32 choice trials, and 64 forced trials. During the initial link of a choice trial, the left and right keys were illuminated with symbols (Cross and Circle). A single peck on either side key turned off both keys and gave way to the corresponding terminal stimuli. The terminal delay to food in both alternatives was either short or long, each occurring on 50% of the trials. When the informative option was chosen, short delays elapsed with the center key illuminated with one color stimulus (SS), whereas long delays elapsed with the center key illuminated with a different color (SL). When the non-informative option was chosen, short and long delays to food elapsed with the center key illuminated with one of two stimuli (S1 or S2), each occurring randomly on 50% of the occasions. Forced trials had the same structure as choice trials, except that only one of the alternatives was presented at trial onset. They were pseudo-randomly distributed such that of the 32 trials with each option, 16 occurred on the left-side key and 16 on the right-side key. Choice trials were also pseudo-randomly arranged such that the left vs. right location of the options was balanced. The initial-link symbols and the terminal-links colors were counterbalanced across pigeons. The time between the presentation of one (forced trials) or two (choice trials) options and the peck at one of the side keys defined the latency to respond on each trial.
Each pigeon went through two baseline and three experimental conditions differing in the long to short delay ratio. In baseline, all the delays to food were set at 12.5 s (long/Short ratio = 1.0). In the other conditions, the short and long delays became increasingly different, but the average delay remained at 12.5 s. The delays used were 10 and 15 s (long/Short ratio = 1.5), 7.5 and 17.5 s (long/Short ratio = 2.3), and 5 and 20 s (long/Short ratio = 4.0). All pigeons started and finished with the 1.0 ratio (baseline), with the order of the remaining conditions counterbalanced across pigeons. Table 1 shows the order experienced by each pigeon. Each condition lasted for a minimum of 10 sessions and remained in effect until stability was reached. Stability was assumed when during the last three sessions (a) there was no strictly increasing or strictly decreasing trend in preference and (b) the range of choice proportions was at most 15%.
Data analysis
Our primary dependent measures were the proportion of choices for the informative option, the latencies to respond to each option, and the response rate during the four terminal stimuli, all during the last three sessions of each condition. Prior to analysis, latency data were successfully normalized using a natural log transformation. Violations of sphericity were corrected by the Greenhouse–Geisser method. A Type-1 error rate of 0.05 was adopted for all statistical comparisons.
Results
On average, pigeons took 16 sessions to complete each long/short ratio (range 15–25). Table 1 shows the number of sessions per ratio for each bird. A repeated-measures ANOVA revealed that the amount of training to reach stability did not vary significantly across conditions [F(4,24) = 2.09, p = 0.114].
The symbols in Fig. 3 show preference for the informative option as a function of the delay ratio for each subject. The lower right panel shows the average preference across birds. As expected, pigeons were indifferent between the options when the long/short ratio = 1.0, both in the first and second baseline conditions (filled and unfilled dots, respectively): the average preferences for the informative option (± SEM) were 0.47 (± 0.047) and 0.53 (± 0.038), respectively. However, with the 1.5, 2.3, and 4.0 long/Short ratio, the average preference increased to 0.74 (± 0.061), 0.78 (± 0.068), and 0.88 (± 0.044), respectively. These preferences did not differ significantly from chance when the long/Short ratio = 1.0 [largest t(6) = 0.70, p = 0.513], but were significantly above chance in the remaining long/Short ratios [smallest t(6) = 3.97, p = 0.007, d = 1.498]. Despite some variability between and within pigeons, preference for the informative option increased with the long/Short ratio. A repeated-measures ANOVA confirmed that this trend was statistically significant [F(3,18) = 16.51, p < 0.001, η2 = 0.733].
The solid lines on each bird’s panel of Fig. 3 show the best-fitting predictions of the modified Δ–Σ hypothesis obtained by the least-squares method. The best-fitting parameter, β, for each pigeon is shown in Table 2. The solid line in the average panel of Fig. 3 (bottom right) shows the average of the individual fits. Overall, the model captures well the general increasing trends and the average fit accurately describes the average preference.
Previous research has shown that latencies to respond can be used as a sensitive metric of value and preference: Organisms usually respond faster to preferred than to non-preferred alternatives when presented individually (e.g., Aw et al. 2012; Lagorio and Hackenberg 2012; Macías et al. 2021; Monteiro et al. 2020; Reboreda and Kacelnik 1991; Shull et al. 1990; Vasconcelos et al. 2013; for reviews see Kacelnik et al. 2011, 2023). We analyzed the latencies to respond to the initial link during forced trials in each condition in search of converging evidence regarding the effect of the long/short ratio on preference. As expected from choice data, latencies to accept each option were similar in both baselines (long/Short ratio = 1.0) and were thus averaged into a single condition [largest t(6) = 1.58, p = 0.191]. Figure 4 shows the average of the median latencies to respond to each option on each long/Short ratio. As the long/Short ratio increased, latencies to accept the options diverged, with animals expressing longer latencies to accept the non-preferred, non-informative option than to accept the preferred, informative option. These visual impressions were confirmed by a two-way repeated-measures ANOVA with option (informative vs. non-informative) and the long/Short ratio as factors ran on log median latencies. Both the main effects of option and ratio were significant [F(1,6) = 11.09, p = 0.016, ηp2 = 0.649; Greenhouse–Geisser corrected F(1.78,10.66) = 10.53, p = 0.004, ηp2 = 0.637, respectively], as well as their interaction [F(3,18) = 7.02, p = 0.003, ηp2 = 0.539]. The latter indicates that latencies to accept each option progressed differently as the dl/ds ratio increased: while latencies to accept the non-informative option remained roughly constant (the decrease is non-significant), latencies to accept the informative option decreased significantly as expected because the larger the ratio the more attractive this option becomes. Overall, the observed differences in latencies to accept each option are consistent with the preference data.
Lastly, we looked at response rates at the four terminal stimuli (SS, SL, S1, and S2). Four of the seven pigeons did not peck at any of the terminal stimuli (an average of fewer than 2 pecks per trial across the whole delay). For the remaining three pigeons, when the delays were unequal (dl/ds > 1), response rate at SS was consistently higher than at SL. The response rates to S1 and S2 were roughly similar and, when averaged, tended to fall between the rates for SS and SL. These findings suggest that these pigeons learned the delay (or mixture of delays) signaled by each terminal stimulus and modulated their response rate accordingly.
Discussion
This experiment analyzed the effect of signaling different delays to food on choice. Pigeons chose between two alternatives, each leading to food after an equally likely short or long delay. The alternatives differed in that, for one of them, the short and long delays were associated with distinctive cues (informative option) and, for the other one, the delays were not associated with distinctive cues (non-informative option). We found that pigeons reliably prefer the informative alternative when the signaled delays differed, even though both alternatives led to the same overall rate of reward. An examination of response latencies as a measure of value buttressed our interpretation as pigeons expressed significantly shorter latencies to respond at the informative than at the non-informative option when the long/short ratio > 1.0. An analysis of the response rates at the terminal stimuli, although limited to three subjects, confirmed that the cues allowed pigeons to identify the ongoing delay in the informative but not in the non-informative option. Importantly, we also found that the preference for signaled delays increases in an orderly fashion with the ratio of the delays.
These findings are consistent with those reported by Bower et al. (1966) and Frankel and Vom Saal (1976). The bottom right panel of Fig. 3 shows their average findings. Using a long/Short ratio of 4 (FI 40 vs. FI 10), Bower et al. (1966) reported an average preference very similar to the one we found (cf. unfilled diamond). Frankel and Vom Saal, on the other hand, reported an even stronger preference for the signaled option (cf. filled diamond) with a smaller long/Short ratio of 3 (FI 45 vs. FI 15). The reasons for this somewhat higher preference relative to the ones reported here and by Bower et al. remain unclear. Perhaps the contrast between the successful manipulation and the previous five conditions where no preference emerged inflated preference to values that would not have been observed otherwise. This reasoning is nonetheless speculative and complicated by cross-experiment comparisons.
The observed modulation of preference by the long/Short ratio is inconsistent with the intrinsic value hypothesis (cf. Bennett et al. 2016; Eliaz and Schotter 2007; Grant et al. 1998; Kreps and Porteus 1978). Were that the case, one would expect similar preferences for the informative option whenever the long/ratio > 1. The increase in preference for the informative option as the long/Short ratio increased suggests, on the contrary, that the value of information about future outcomes is not intrinsic; it depends on the potential instrumental advantage it conveys.
On the other hand, our findings are consistent with the predictions of the modified Δ–Σ hypothesis and the ecologically-based account. Regarding the Δ–Σ hypothesis, despite some variability within and between subjects, the average of the individual fits captured well our average findings (cf. bottom right panel of Fig. 3). The new source of contrast—that between the immediacy of reward experienced with each terminal stimulus within each option—proved pivotal both in the design of the experiment and the interpretation of results. Our findings thus buttress the model’s assumption that, all else equal, the value of an alternative varies directly with the Δ of immediacies: Greater ratios between the two terminal stimuli immediacies, for the same delay average, mean greater value of the alternative (see Fig. 2). The model also predicts that, for a constant ratio, preference for the informative option should decrease with the average (or sum), a novel prediction that remains to be tested. Furthermore, our results encourage the extension of the model to other dimensions of reward besides probability and delay, dimensions that may be additional sources of contrast.
Ecologically, an increased preference for signaled delays as the long/Short ratio increases tracks the putative increase in information utility—the benefit of avoiding long delays increases with larger long/Short ratios. Under natural circumstances, animals can use information to adjust their behavior and thus abandon patches when the rate of return falls below some threshold. In practice, when foraging, an animal can pursue a prey if the available cues signal a short waiting time for food or continue searching if the available cues signal delays beyond a threshold of acceptability (e.g., Charnov 1976; Parker and Stuart 1976; for a review, see Vasconcelos et al. 2017). In the present study, pigeons learned the contingencies associated with each stimulus, used the information so conveyed to guide preference, but did not collect the benefit of avoiding long delays: they had to endure all the waiting times in both alternatives (i.e., pay the opportunity cost). In other words, even though the foraging mechanisms may have been sculpted to gather and use information, the information gathered in the artificial laboratory preparation cannot be used—the domain of selection mismatched the domain of testing (Fortes et al. 2016; Stevens and Stephens 2010). Thus, pigeons behaved in the experimental preparation as if information were usable; they seemingly deployed decision mechanisms evolved to deal with the statistical properties of natural environments, not with the artificial preparations of unavoidable opportunity costs.
More generally, a preference for options that signal what event will occur and when may impart somewhat unappreciated benefits, particularly in the form of adaptive anticipatory responses. Evidence for this claim has been accumulating in a variety of domains, including aggressive behavior (e.g., Hollis 1984, 1990; Hollis et al. 1995), aversive conditioning (e.g., Blustein et al. 1997; Fanselow and Baackes 1982; Mongeluzi et al. 1996), drug tolerance (e.g., Grisel et al. 1994; Larson and Siegel 1998; Siegel 1975; Siegel et al. 2000), feeding and digestion (e.g., Woods 1991; Woods and Ramsay 2000; Woods and Strubbe 1994; Zamble 1973), maternal behavior (e.g., Tancin et al. 2001), and sexual behavior (e.g., Domjan et al. 1986; Hollis et al. 1989, 1997; Zamble et al. 1985), among others. In a particularly illustrative example, Hollis et al. (1997) allowed male blue gourami to copulate and tend eggs either after being exposed to a sexually conditioned stimulus or without such prior exposure. Males with signaled sexual encounters showed reduced aggressive behavior toward the female, shorter spawning latencies, and increased nest-building behavior, among others. Notably, compared to unsignaled encounters, the signaled sexual encounters resulted in more than 10 times as many offspring.
We argue that the preference for informative options or a general bias for information is consistent with the premise that the function of learning mechanisms is to enhance the organisms’ interaction with biologically relevant events rather than the acquisition of arbitrary relations (e.gDomjan 2005; Hollis 1997; Shettleworth 1994) as sometimes the excessive focus on responses to antecedents of biologically relevant events suggests.
To conclude, in this experiment, we showed that the preference for informative over non-informative options extends naturally to a situation where information is about delays, not probabilities of reward. We found that preference for signaled delays varied with the ratio between the long and short delays within alternatives. From a normative standpoint, this preference and trend may occur because animals use evolved mechanisms shaped to deal with usable information: they use information to maximize the rate of food intake by avoiding, for example, relatively delayed opportunities (in comparison with the background) and by deploying anticipatory responses that improve their interaction with biologically relevant events. Although this strategy cannot be implemented in the experimental condition, we believe it reflects the conditions where it evolved and where most, if not all, information can be instrumentally used. Although, this line of reasoning permits a better understanding of selective pressures shaping behavior and derive optimal policies, it cannot pinpoint the specific behavior-generator. To that end, we resorted to a modified version of he Δ–Σ hypothesis incorporating a new type of contrast: that between reward immediacies. The modified version accounted well for our data. Together, these approaches permit a first glimpse at the ultimate and proximate causes of preference for signaled delays.
Availability of data and materials
On request.
References
Ajuwon V, Ojeda A, Murphy RA, Monteiro T, Kacelnik A (2023) Paradoxical choice and the reinforcing value of information. Anim Cogn 26(2):623–637. https://doi.org/10.1007/S10071-022-01698-2
Alsop B, Davison M (1986) Preference for multiple versus mixed schedules of reinforcement. J Exp Anal Behav 45(1):33–45. https://doi.org/10.1901/jeab.1986.45-33
Aw J, Monteiro T, Vasconcelos M, Kacelnik A (2012) Cognitive mechanisms of risky choice: is there an evaluation cost? Behav Proc 89(2):95–103. https://doi.org/10.1016/j.beproc.2011.09.007
Bennett D, Bode S, Brydevall M, Warren H, Murawski C (2016) Intrinsic valuation of information in decision making under uncertainty. PLoS Comput Biol 12(7):e1005020. https://doi.org/10.1371/journal.pcbi.1005020
Blanchard TC, Hayden BY, Bromberg-Martin ES (2015) Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron 85(3):602–614. https://doi.org/10.1016/j.neuron.2014.12.050
Blustein JE, Ciccolone L, Bersh PJ (1997) Evidence that adaptation to cold water swim-induced analgesia is a learned response. Physiol Behav 63(1):147–150. https://doi.org/10.1016/S0031-9384(97)00382-X
Bower G, McLean J, Meacham J (1966) Value of knowing when reinforcement is due. J Comp Physiol Psychol 62(2):184–192. https://doi.org/10.1037/h0023682
Bromberg-Martin ES, Hikosaka O (2009) Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron 63(1):119–126. https://doi.org/10.1016/j.neuron.2009.06.009
Bromberg-Martin ES, Hikosaka O (2011) Lateral habenula neurons signal errors in the prediction of reward information. Nat Neurosci 14(9):1209–1216. https://doi.org/10.1038/nn.2902
Browne MP, Dinsmoor JA (1974) Wyckoff’s observing response: pigeons learn to observe stimuli for free food but not stimuli for extinction. Learn Motivat 5(2):165–173. https://doi.org/10.1016/0023-9690(74)90023-X
Cardinal RN, Aitken MRF (2010) Whisker: a client-server high-performance multimedia research control system. Behav Res Methods 42(4):1059–1071. https://doi.org/10.3758/BRM.42.4.1059
Charnov EL (1976) Optimal foraging: the marginal value theorem. Theor Popul Biol 9(2):129–136. https://doi.org/10.1016/0040-5809(76)90040-x
Chow JJ, Smith AP, Wilson AG, Zentall TR, Beckmann JS (2017) Suboptimal choice in rats: incentive salience attribution promotes maladaptive decision-making. Behav Brain Res 320:244–254. https://doi.org/10.1016/j.bbr.2016.12.013
Cunningham PJ, Shahan TA (2018) Suboptimal choice, reward-predictive signals, and temporal information. J Exp Psychol: Anim Learn Cogn 44(1):1–22. https://doi.org/10.1037/xan0000160
Cunningham PJ, Shahan TA (2019) Rats engage in suboptimal choice when the delay to food is sufficiently long. J Exp Psychol: Anim Learn Cogn 45(3):301–310. https://doi.org/10.1037/xan0000211
Cunningham PJ, Shahan TA (2020) Delays to food-predictive stimuli do not affect suboptimal choice in rats. J Exp Psychol: Anim Learn Cogn 46:385–397. https://doi.org/10.1037/xan0000245
Daniels CW, Sanabria F (2018) An associability decay model of paradoxical choice. J Exp Psychol: Anim Learn Cogn 44(3):258–271. https://doi.org/10.1037/xan0000179
Davison MC (1972) Preference for mixed-interval versus fixed-interval schedules: number of component intervals. J Exp Anal Behav 17:169–176. https://doi.org/10.1901/jeab.1972.17-169
Dinsmoor JA (1983) Observing and conditioned reinforcement. Behav Brain Sci 6(4):693–704. https://doi.org/10.1017/S0140525X00017969
Dinsmoor JA, Browne MP, Lawrence CE (1972) A test of the negative discriminative stimulus as a reinforcer of observing. J Exp Anal Behav 18(1):79–85. https://doi.org/10.1901/jeab.1972.18-79
Domjan M (2005) Pavlovian conditioning: a functional perspective. Annu Rev Psychol 56(1):179–206. https://doi.org/10.1146/annurev.psych.55.090902.141409
Domjan M, Lyons R, North NC, Bruell J (1986) Sexual Pavlovian conditioned approach behavior in male Japanese quail (Coturnix coturnix japonica). J Comp Psychol 100:413–421. https://doi.org/10.1037/0735-7036.100.4.413
Dunn RM, Pisklak JM, McDevitt MA, Spetch ML (2023) Suboptimal choice: a review and quantification of the signal for good news (SiGN) model. Psychol Rev. https://doi.org/10.1037/rev0000416
Eliaz K, Schotter A (2007) Experimental testing of intrinsic preferences for noninstrumental information. Am Econ Rev 97(2):166–169. https://doi.org/10.1257/aer.97.2.166
Fanselow MS, Baackes MP (1982) Conditioned fear-induced opiate analgesia on the Formalin test: evidence for two aversive motivational systems. Learn Motivat 13(2):200–221. https://doi.org/10.1016/0023-9690(82)90021-2
Fantino E (1969) Conditioned reinforcement, choice, and the psychological distance to reward. In: Hendry DP (ed) Conditioned reinforcement. The Dorsey Press, pp 163–191
Fortes I, Vasconcelos M, Machado A (2016) Testing the boundaries of “paradoxical” predictions: pigeons do disregard bad news. J Exp Psychol Anim Learn Cogn 42(4):336–346. https://doi.org/10.1037/xan0000114
Fortes I, Pinto C, Machado A, Vasconcelos M (2018) The paradoxical effect of low reward probabilities in suboptimal choice. J Exp Psychol Anim Learn Cogn 44(2):180–193. https://doi.org/10.1037/xan0000165
Frankel PW, Vom Saal W (1976) Preference between fixed-interval and variable-interval schedules of reinforcement: separate roles of temporal scaling and predictability. Anim Learn Behav 4(1):71–76. https://doi.org/10.3758/BF03211990
Gipson CD, Alessandri JJD, Miller HC, Zentall TR (2009) Preference for 50% reinforcement over 75% reinforcement by pigeons. Learn Behav 37(4):289–298. https://doi.org/10.3758/lb.37.4.289
González VV, Macías A, Machado A, Vasconcelos M (2020a) The Δ–∑ hypothesis: How contrast and reinforcement rate combine to generate suboptimal choice. J Exp Anal Behav 113(3):591–608. https://doi.org/10.1002/jeab.595
González VV, Macías A, Machado A, Vasconcelos M (2020b) Testing the Δ-∑ hypothesis in the suboptimal choice task: same delta with different probabilities of reinforcement. J Exp Anal Behav 114(2):233–247. https://doi.org/10.1002/jeab.621
Grant S, Kajii A, Polak B (1998) Intrinsic preference for information. J Econ Theory 83(2):233–259. https://doi.org/10.1006/jeth.1996.2458
Grisel JE, Wiertelak EP, Watkins LR, Maier SF (1994) Route of morphine administration modulates conditioned analgesic tolerance and hyperalgesia. Pharmacol Biochem Behav 49(4):1029–1035. https://doi.org/10.1016/0091-3057(94)90260-7
Hollis KL (1984) The biological function of Pavlovian conditioning: the best defense is a good offense. J Exp Psychol Anim Behav Process 10:413–425. https://doi.org/10.1037/0097-7403.10.4.413
Hollis KL (1990) The role of Pavlovian conditioning in territorial aggression and reproduction. In: Contemporary issues in comparative psychology. Sinauer Associates, pp. 197–219. https://doi.org/10.1037/11525-009
Hollis KL (1997) Contemporary research on Pavlovian conditioning: a “new” functional analysis. Am Psychol 52:956–965. https://doi.org/10.1037/0003-066X.52.9.956
Hollis KL, Cadieux EL, Colbert MM (1989) The biological function of Pavlovian conditioning: a mechanism for mating success in the blue gourami (Trichogaster trichopterus). J Comp Psychol 103:115–121. https://doi.org/10.1037/0735-7036.103.2.115
Hollis KL, Dumas MJ, Singh P, Fackelman P (1995) Pavlovian conditioning of aggressive behavior in blue gourami fish (Trichogaster trichopterus): winners become winners and losers stay losers. J Comp Psychol 109:123–133. https://doi.org/10.1037/0735-7036.109.2.123
Hollis KL, Pharr VL, Dumas MJ, Britton GB, Field J (1997) Classical conditioning provides paternity advantage for territorial male blue gouramis (Trichogaster trichopterus). J Comp Psychol 111:219–225. https://doi.org/10.1037/0735-7036.111.3.219
Hursh SR, Fantino E (1974) An appraisal of preference for multiple versus mixed schedules. J Exp Anal Behav 22(1):31–38. https://doi.org/10.1901/jeab.1974.22-31
Kacelnik A, Vasconcelos M, Monteiro T, Awaa J (2011) Darwin’s “tug-of-war” vs. starlings’ “horse-racing”: how adaptations for sequential encounters drive simultaneous choice. Behav Ecol Sociobiol 65(3): 547–558. https://doi.org/10.1007/S00265-010-1101-2
Kacelnik A, Vasconcelos M, Monteiro T (2023) Testing cognitive models of decision-making: selected studies with starlings. Anim Cogn 26(1):117–127. https://doi.org/10.1007/S10071-022-01723-4
Kobayashi K, Hsu M (2019) Common neural code for reward and information value. Proc Natl Acad Sci USA 116(26):13061–13066. https://doi.org/10.1073/pnas.1820145116
Kreps DM, Porteus EL (1978) Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46(1):185–200. https://doi.org/10.2307/1913656
Lagorio CH, Hackenberg TD (2012) Risky choice in pigeons: preference for amount variability using a token-reinforcement system. J Exp Anal Behav 98(2):139–154. https://doi.org/10.1901/jeab.2012.98-139
Larson SJ, Siegel S (1998) Learning and tolerance to the ataxic effect of ethanol. Pharmacol Biochem Behav 61:131–142. https://doi.org/10.1016/S0091-3057(98)00072-0
Laude JR, Stagner JP, Zentall TR (2014) Suboptimal choice by pigeons may result from the diminishing effect of nonreinforcement. J Exp Psychol Anim Learn Cogn 40(1):12–21. https://doi.org/10.1037/xan0000010
Lockard JS (1963) Choice of a warning signal or no warning signal in an unavoidable shock situation. J Comp Physiol Psychol 56:526–530. https://doi.org/10.1037/h0041552
Luce RD (1959) Individual choice behavior. Wiley, New York
Macías A, González VV, Machado A, Vasconcelos M (2021) The functional equivalence of two variants of the suboptimal choice task: choice proportion and response latency as measures of value. Anim Cogn 24(1):85–98. https://doi.org/10.1007/S10071-020-01418-8
McDevitt MA, Dunn RM, Spetch ML, Ludvig EA (2016) When good news leads to bad choices. J Exp Anal Behav 105(1):23–40. https://doi.org/10.1002/jeab.192
Molet M, Miller HC, Laude JR, Kirk C, Manning B, Zentall TR (2012) Decision making by humans in a behavioral task: do humans, like pigeons, show suboptimal choice? Learn Behav 40(4):439–447. https://doi.org/10.3758/S13420-012-0065-7
Mongeluzi DL, Rosellini RA, Caldarone BJ, Stock HS, Abrahamsen GC (1996) Pavlovian aversive context conditioning using carbon dioxide as the unconditional stimulus. J Exp Psychol Anim Behav Process 22:244–257. https://doi.org/10.1037/0097-7403.22.3.244
Monteiro T, Vasconcelos M, Kacelnik A (2020) Choosing fast and simply: construction of preferences by starlings through parallel option valuation. PLoS Biol 18(8):e3000841. https://doi.org/10.1371/journal.pbio.3000841
Mulvaney DE, Dinsmoor JA, Jwaideh AR, Hughes LH (1974) Punishment of observing by the negative discriminative stimulus. J Exp Anal Behav 21(1):37–44. https://doi.org/10.1901/jeab.1974.21-37
Parker GA, Stuart RA (1976) Animal behavior as a strategy optimizer: evolution of resource assessment strategies and optimal emigration thresholds. Am Nat 110(976):1055–1076. https://doi.org/10.2307/2460030
Perkins CC, Levis DJ, Seymann R (1963) Preference for Signal-shock vs shock-signal. Psychol Rep 13(3):735–738. https://doi.org/10.2466/pr0.1963.13.3.735
Prokasy WF Jr (1956) The acquisition of observing responses in the absence of differential external reinforcement. J Comp Physiol Psychol 49:131–134. https://doi.org/10.1037/h0046740
Reboreda JC, Kacelnik A (1991) Risk sensitivity in starlings: variability in food amount and food delay. Behav Ecol 2(4):301–308. https://doi.org/10.1093/beheco/2.4.301
Richards RW (1981) A comparison of signaled and unsignaled delay of reinforcement. J Exp Anal Behav 35(2):145–152. https://doi.org/10.1901/jeab.1981.35-145
Shettleworth SJ (1994) Biological approaches to the study of learning. In: Mackintosh NJ (ed) Animal learning and cognition. Academic Press, pp 185–219
Shull RL, Mellon RC, Sharp JA (1990) Delay and number of food reinforcers: effects on choice and latencies. J Exp Anal Behav 53(2):235–246. https://doi.org/10.1901/jeab.1990.53-235
Siegel S (1975) Evidence from rats that morphine tolerance is a learned response. J Comp Physiol Psychol 89:498–506. https://doi.org/10.1037/h0077058
Siegel S, Baptista MAS, Kim JA, McDonald RV, Weise-Kelly L (2000) Pavlovian psychopharmacology: the associative basis of tolerance. Exp Clin Psychopharmacol 8:276–293. https://doi.org/10.1037/1064-1297.8.3.276
Smith AP, Bailey AR, Chow JJ, Beckmann JS, Zentall TR (2016) Suboptimal choice in pigeons: stimulus value predicts choice over frequencies. PLoS ONE 11(7):e0159336. https://doi.org/10.1371/journal.pone.0159336
Stagner JP, Zentall TR (2010) Suboptimal choice behavior by pigeons. Psychon Bull Rev 17(3):412–116. https://doi.org/10.3758/PBR.17.3.412
Stevens JR, Stephens DW (2010) The adaptive nature of impulsivity. In: Impulsivity: The behavioral and neurological science of discounting (pp. 361–387). American Psychological Association. https://doi.org/10.1037/12069-013
Tancin V, Kraetzl W-D, Schams D, Bruckmaier RM (2001) The effects of conditioning to suckling, milking and of calf presence on the release of oxytocin in dairy cows. Appl Anim Behav Sci 72(3):235–246. https://doi.org/10.1016/S0168-1591(01)00113-7
Vasconcelos M, Monteiro T, Kacelnik A (2013) Context-dependent preferences in starlings: linking ecology, foraging and choice. PLoS ONE. https://doi.org/10.1371/journal.pone.0064934
Vasconcelos M, Monteiro T, Kacelnik A (2015) Irrational choice and the value of information. Sci Rep 5:13874. https://doi.org/10.1038/Srep13874
Vasconcelos M, Carvalho MP, Machado A (2017) Timing in animals: from the natural environment to the laboratory, from data to models. In: Call J, Burghardt GM, Pepperberg IM, Snowdon CT, Zentall T (eds) APA handbook of comparative psychology: perception, learning, and cognition, Vol. 2 (pp. 509–534). American Psychological Association. https://doi.org/10.1037/0000012-023
Vasconcelos M, Machado A, Pandeirada JNS (2018) Ultimate explanations and suboptimal choice. Behav Proc 152:63–72. https://doi.org/10.1016/j.beproc.2018.03.023
Woods SC (1991) The eating paradox: how we tolerate food. Psychol Rev 98:488–505. https://doi.org/10.1037/0033-295X.98.4.488
Woods SC, Ramsay DS (2000) Pavlovian influences over food and drug intake. Behav Brain Res 110(1):175–182. https://doi.org/10.1016/S0166-4328(99)00194-1
Woods SC, Strubbe JH (1994) The psychobiology of meals. Psychon Bull Rev 1(2):141–155. https://doi.org/10.3758/BF03200770
Wyckoff LB (1969) The role of observing responses in discrimination learning. In: Wright GD (ed) Conditioned reinforcement. Dorsey Press, pp 237–260
Zamble E (1973) Augmentation of eating following a signal for feeding in rats. Learn Motiv 4:138–147. https://doi.org/10.1016/0023-9690(73)90026-X
Zamble E, Hadad GM, Mitchell JB, Cutmore TRH (1985) Pavlovian conditioning of sexual arousal: First- and second-order effects. J Exp Psychol Anim Behav Process 11:598–610. https://doi.org/10.1037/0097-7403.11.4.598
Zentall TR (2016) Resolving the paradox of suboptimal choice. J Exp Psychol: Anim Learn Cogn 42(1):1–14. https://doi.org/10.1037/xan0000085
Zentall TR, Stagner J (2011) Maladaptive choice behaviour by pigeons: an animal analogue and possible mechanism for gambling (sub-optimal human decision-making behaviour). Proc R Soc b: Biol Sci 278(1709):1203–1208. https://doi.org/10.1098/rspb.2010.1607
Funding
This study was supported by the Portuguese Foundation for Science and Technology (UID/PSI/01662/2013 and UIDB/04810/2020). Alejandro Macías was supported by the Mexican National Council for Science and Technology (239027/438354); Armando Machado was supported by the Brazilian Instituto Nacional de Ciência e Tecnologia sobre Comportamento, Cognição e Ensino (INCT-ECCE / CNPq Grant #465686/2014–1), and; Marco Vasconcelos was supported by a Fulbright Grant for Scholars and Researchers, AY2022/2023, with the support of the Luso-American Development Foundation. The authors declare they have no other known conflicts of interest.
Author information
Authors and Affiliations
Contributions
AM collected the data. AM and MV prepared the figures. AM, AM, and MV analyzed the data, wrote, and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interests.
Ethical approval
The pigeons were cared for according to the animal care guidelines of the Directorate-General for Food and Veterinary (DGAV), the Portuguese national authority for animal health, and the University of Minho. All experimental procedures were conducted in agreement with European (Directive 2010/63/EU) and Portuguese law (Ordinance 1005/92 of October 23), and were approved by DGAV (Authorization #024946).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study was conducted at the Psychology Research Centre (UID/PSI/01662/2013) of the University of Minho.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Macías, A., Machado, A. & Vasconcelos, M. On the value of advanced information about delayed rewards. Anim Cogn 27, 10 (2024). https://doi.org/10.1007/s10071-024-01856-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10071-024-01856-8