Introduction

Impulsivity is a multifaceted construct that can be summarized as engaging in actions without foresight (Winstanley et al., 2006). Researchers have operationalized impulsivity into three dimensions—impulsive choice, impulsive responding, and impulsive personality. Impulsive choice is characterized by the preference for a smaller reinforcer available sooner (SS, “impulsive option”) compared with a larger reinforcer available later (LL, “self-controlled option”). The LL is the self-controlled option because it typically provides more reinforcement over time (i.e., optimal choice). Impulsive responding refers to the failure to suppress or withhold an action in the presence of certain stimuli. Impulsive personality, or trait impulsivity, measures persistent and stable aspects of personality primarily through self-report assessments (Reynolds et al., 2006). Although points of overlap exist, these dimensions are largely considered to be mechanistically distinct from one another (MacKillop et al., 2016). This review discusses learning, motivational, cognitive mechanisms of impulsive decision-making in both humans and nonhuman animals. We also review a wide array of mathematical models developed to predict impulsive choice and discuss the areas of disconnect between empirical research and theory development with the goal of motivating future research.

An essential facet of impulsive choice is that individuals devalue temporally distant reinforcers (Madden & Bickel, 2010), a phenomenon known as delay discounting. Delay discounting reflects the level of impatience, or unwillingness to wait for larger delayed outcomes that has been proposed as a key underlying mechanism that drives impulsive choice (Mazur, 1987). Impulsive choice has received considerable attention because of its relationship with important health outcomes. For example, individuals with higher discounting rates have a higher prevalence of substance use (Amlung et al., 2017; Bickel et al., 1999; MacKillop et al., 2011), obesity (Jarmolowicz et al., 2014; Weller et al., 2008), attention-deficit/hyperactivity disorder (ADHD; Marx et al., 2021; Patros et al., 2016), and gambling disorders (Grecucci et al., 2014). As a result of the breadth of application, impulsive choice has been proposed as a trans-disease process (Bickel et al., 2019; Bickel & Mueller, 2009).

Laboratory procedures have been developed to assess impulsive choice in both humans and animals. The SS and LL options are usually presented as a concurrent discrete choice so that once individuals choose one option, the remaining option is removed. The key feature of impulsive choice tasks is that individuals must weigh both the reinforcer amount and delay to reinforcement when determining the value of each option. The tasks are typically constructed so that LL choices maximize reinforcement earning over time. In these circumstances, individuals that prefer the SS are considered impulsive.

In tasks with animals, different pairings of amounts and delays are offered, and the animal can choose an option by making a specific response. The outcomes (delay and amount) are learned through experience. For example, rats may choose between pressing two levers, with one lever providing one food pellet after 10 s and another providing two food pellets after 30 s. Most impulsive choice tasks manipulate the amount or delay associated with the SS or LL. Systematic procedures employ choice parameters where the delay or magnitude of reinforcer(s) for the SS or LL option changes systematically between each session (e.g., Green & Estle, 2003). Alternatively, the delay or magnitude of the SS or LL may change systematically within each session (e.g., Evenden & Ryan, 1996). For systematic procedures, the proportion of LL choices is often the index of self-control (or impulsive choice). Finally, adjusting procedures change the delay or magnitude of the SS or LL based on recent previous choices in a titrating fashion. For example, repeated choices of the LL delay may lead to an adjustment to make that option less attractive (e.g., LL delay increases or magnitude decreases). Alternatively, preference for the SS may lead to an adjustment to make the alternative LL option more attractive (e.g., LL delay decreases, or magnitude increases). Adjustments of LL (or SS) delay or magnitude continue until achieving an indifference point, where either option is selected equally often (Mazur, 1987, 1988). In the adjusting procedure, the duration or magnitude of the adjusting option associated with the indifference point is an index of self-control.

Assessments of impulsive choice in humans utilize similar methods but often rely on hypothetical reinforcers and delays. For example, the Monetary Choice Questionnaire (MCQ; Kirby et al., 1999) delivers a fixed set of questions with specific amount/delay pairings. Adjusting tasks are delivered similar to animal tasks but with hypothetical delays and outcomes (Frye et al., 2016). Some studies make outcomes quasi-experiential by giving participants a randomly selected single outcome from their choices during the study (e.g., Rotella et al., 2019). In experiential discounting tasks, participants experience actual delays and/or magnitudes (Reynolds & Schiffbauer, 2004; Smits et al., 2013; Steele et al., 2019) that may better approximate tasks used in animals.

Steele et al. (2019) investigated whether experiencing real versus hypothetical delays and reinforcers influenced preferences. The SS and LL options delivered 1–5 mini M&Ms after 5–30 s with the LL always involving a longer delay and larger amount. The delays and amounts could be hypothetical or real. In the real conditions, the participants received the M&Ms and had to wait for the delay. Steele et al. found that there was no difference in performance with real versus hypothetical M&Ms. However, in a condition where real delays were experienced initially followed by hypothetical delays (both paired with hypothetical M&Ms), participants increased their sensitivity to delays in the hypothetical delay task as a result of experience with the real delays. In addition, choices from the MCQ did not significantly correlate with performance on the experiential task, consistent with other reports (Reynolds et al., 2006; Reynolds & Schiffbauer, 2004). One factor that may explain the poor correlation across hypothetical and experiential tasks is that different combinations of factors and behaviors can produce similar choice patterns. It is possible that hypothetical discounting better reflects choice intentions, whereas experiential discounting may better reflect actual choices. This suggests the importance of measuring specific choice mechanisms rather than simply measuring choice behavior, an issue that is discussed in the following section.

An alternative approach for measuring experiential choices in humans is the delayed gratification procedure. Delayed gratification tasks present two options (SS or LL) in succession so that choosing the SS during an initial delay forfeits access to the LL reinforcer that would otherwise be available later. This contrasts with the standard impulsive choice task in that there is no upfront commitment. One notable study that measured individual’s ability to delay gratification is the “marshmallow task” where preschool-aged children were told they could have one small, but immediate reinforcer now, or two small reinforcers if they chose to wait for a specified time (Mischel et al., 1972).

The delayed gratification procedure has also been used in nonhuman animals by incorporating a defection response into an impulsive choice task that allows for switching to SS following an initial choice of the LL (Haynes & Odum, 2022). Reynolds et al. (2002) compared impulsive choice and delayed gratification procedures in rats with an adjusting amount procedure with the addition of a defection response opportunity. In both groups, if the rats chose the SS (by making a nose-poke response), then they received an immediate small reward and if they chose the LL, they received a larger delayed reward. During the LL delay, the SS nose-poke response remained available. For the rats in the impulsive choice task, SS responses during the LL delay were recorded but had no consequence. In contrast, rats in the delay gratification task could defect by making an SS nose-poke response at any time during the LL delay to receive the immediate smaller reward. There were no significant differences in the discounting functions between the tasks, suggesting that the two experiential procedures may measure similar processes in rats. Göllner et al. (2018) also found a correlation between impulsive choices and delay gratification in humans, further supporting the claim that the tasks are measuring similar processes.

The above tasks can be used for measuring choice mechanisms and for modeling choices. The impulsive choice function generated by these procedures is likely a product of multiple empirical relationships that can be accounted for by a combination of several processes. Impulsive choice researchers have achieved an emerging understanding of the components that explain impulsive choice, but research on how these factors interact is lacking. Further experimentation is necessary to fill in the gaps. In addition, models designed to encapsulate the relevant factors are necessary. In addition to their predictive power, models can provide an organizational structure for interpreting research findings. In the absence of such a structure to explain impulsive decision-making, and the myriad of factors underlying it, researchers are only able to contribute to an ever-growing catalog of effects. Given the importance of both empirical and theoretical contributions to a unified understanding of impulsive choice, we review both elements in the current paper.

This review is broad but not comprehensive. The impulsive choice literature is vast with an “impulsive choice” search in Google Scholar producing 9,460 results and “delay discounting” producing 26,600 results (as of September 2022). Instead, this paper focuses on exemplar studies (i.e., recent, seminal, and/or highly cited) that cover the mechanisms that are relevant to both human and animal research on impulsive choice. The primary goal is to synthesize the cognitive, motivational, and attentional processes involved in impulsive choice, review the contemporary theoretical models, and highlight future directions for research that can propel the field forward.

Learning, motivational, and cognitive factors underlying impulsive decision-making

This section highlights experimental research investigating the mechanisms underlying impulsive decision-making. The consideration of learning factors will focus on learning history and impulsive-choice training procedures to provide clues about the mechanisms of impulsive choice. The discussion of cognitive and motivational factors is grouped because influential frameworks of substance use disorders often explain impulsive decision-making as an imbalance between dysregulation of the executive control system and a hyperactive motivational system (Bechara et al., 2019). Experimentation testing these frameworks can reveal the underlying mechanisms involved in impulsive decision-making. The section on motivational factors will focus on how reinforcer quantity, quality, and incentive motivation affect impulsive choice. Finally, we focus on the underlying cognitive processes that are recruited to affect impulsive decision-making, including working memory, attention, and perceptual processes (especially timing processes). This section will highlight the empirical foundation that influences models of impulsive choice, which will be discussed subsequently.

Learning factors

Learning factors relate to the organisms’ behavioral adaptation to the environment based on experiences. In this section, we focus on how impulsive decision-making is affected by learning the local choice contingencies (i.e., what the SS and LL options provide in terms of reinforcers and delays), the context and framing of the choice options (i.e., how future choices are affected by prior choices/outcomes and how presenting the choices successively or concurrently affects behavior), and the broader choice contingencies (i.e., does preference track reinforcement maximization or not).

Learning the local contingencies: Amounts, delays, and immediacy

The most elemental learning process in an impulsive choice environment involves understanding the delay and amounts associated with each choice. Correlational research indicates that rats and humans who time delays better also show greater self-control (Baumann & Odum, 2012; Brocas et al., 2018; Darcheville et al., 1992; Marshall et al., 2014; McClure et al., 2014; Moreira et al., 2016; Navarick, 1998; Paasche et al., 2019; Smith et al., 2015; Stam et al., 2020; van den Broek et al., 1992; Wittmann & Paulus, 2008). In addition, rats that display superior amount discrimination also show better self-control (Marshall & Kirkpatrick, 2016; Experiment 1). These results support a basic assumption that more self-controlled choices should follow from a better understanding of the choice outcomes.

Given the observation of correlations in timing ability and amount discrimination with impulsive choice, it is reasonable to assume that training those abilities could improve self-control. Training designed to improve self-control has been useful in determining what learning experiences, and associated processes, underlie impulsive choice. Smith et al. (2015) trained rats on different schedules of reinforcement designed to promote timing of the SS and LL delays followed by impulsive choice assessment. Training occurred on the same levers as the choice assessment to promote transfer across tasks. The DRL-Delay delivered training with a differential reinforcement of low-rate schedule where the rats had to withhold responding for a fixed delay. The FI-Delay involved rats responding after a fixed interval elapsed. The VI-Delay required rats to respond after a variable interval, with the delay varying across trials according to a uniform schedule. All three procedures increased LL choices coupled with improved timing precision of the SS and LL delays. Timing precision was indexed by reduced variability in responding on a peak interval procedure. In this procedure, rats received nonreinforced trials that extended beyond the usual time of reinforcement and response rates peaked at the anticipated time of reinforcer delivery. The location of the peak is an index of timing accuracy and the variability of the peak distribution indexes timing precision. Previous research has shown that rats can time delays in DRL, FI, and uniformly distributed VI schedules (Church et al., 1998; Pizzo et al., 2009). It appears that temporal learning during the training with the three schedules may have transferred to the choice task.

Similarly, Peterson and Kirkpatrick (2016) trained rats on a VI-Delay procedure with uniformly distributed variable delays compared with a No-Delay control group. The VI-Delay group showed greater LL choices. In addition, an individual differences analysis revealed that the rats in the VI-Delay group (but not the control) with the highest self-control showed the best timing in a temporal discrimination task. Overall, these studies suggest that training-induced improvements in self-control is linked to improved timing in peak interval (Smith et al., 2015) and temporal discrimination (Peterson & Kirkpatrick, 2016) procedures. However, Rung et al. (2018) failed to find a relationship between temporal discrimination and LL choices and Fox et al. (2019) did not find a relationship between peak interval timing and LL choices; these inconsistencies are discussed later.

An alternative training approach involved training rats to discriminate the SS and LL amounts, which increased LL choices along with increasing amount discrimination ability (Marshall & Kirkpatrick, 2016; Experiment 2). Specifically, rats were trained to discriminate 1 versus 2 pellets and 1 versus 4 pellets, whereas the control group only chose between 2 versus 2 pellets. The rats in both groups were then tested to determine their ability to discriminate between 1 versus 2, 2 versus 3, 3 versus 4, and 4 versus 5 pellets. Rats in the training, but not the control, group showed improved discrimination performance coupled with a numerical distance effect (i.e., 1 vs. 2 was easier to discriminate than 4 vs. 5), which suggests that numerical processing was selectively improved for the training group. In addition, across rats, there was a positive correlation between improvement in self-control (pre- vs post-training) and amount discrimination accuracy for the training group but not the control (i.e., reminicent of what Peterson & Kirkpatrick, 2016, observed with the discrimination of delays). Collectively, these studies revealed that refined training to learn the larger/smaller and later/sooner contingencies involved in the choice procedure resulted in more self-controlled choices.

Although research in rodents using delay and amount training techniques appears promising, the generality of these studies to humans needs further investigation. Self-control is increased in humans that are exposed to delays (Binder et al., 2000; Dixon et al., 1998; Dixon et al., 2003; Fisher et al., 2000; Neef et al., 2001; Schweitzer & Sulzer-Azaroff, 1988; Vessells et al., 2018; Young et al., 2011; Young et al., 2013), but it is unclear whether these benefits are the result of learning about the delays and/or amounts or occur through other mechanisms (see delay tolerance below).

Comparisons, contrasts, and carryover effects

Learning the amounts and delays of the options is important, but the question remains: what is learned through experiences with SS and LL outcomes? This section covers generalization of different experiences with SS and LL outcomes that can produce subsequent effects on choice. When given two options the most basic assumption is that the value of one amount–delay trade-off (e.g., LL option) is compared with the value of the alternative (e.g., SS option). Given that there are moderate positive cross-task correlations between impulsive choice methods in rodents (Craig et al., 2014; Peterson et al., 2015) it may be assumed that a common learning process may occur during experience with any impulsive choice procedure (i.e., trait effects; Odum, 2011). Be that as it may, procedural decisions for training and assessing impulsive choice may bias choices and this reflects the importance of learning- and context-based effects.

Training procedures designed to influence choices sometimes reveal that rats do not always simply learn the smaller/larger and sooner/later contingencies of the impulsive choice task. In comparison to FI-Delay and VI-Delay training, experiments that utilized a No-Delay control group trained the rats to differentiate SS and LL amounts associated with each lever (e.g., FR 2 for 1 SS or 2 LL pellets). In several studies, the No-Delay experience has not shown any significant changes in LL choices (Bailey et al., 2018; Panfil et al., 2020; Stuebing et al., 2018). This outcome is somewhat counterintuitive because the No-Delay procedure effectively trains the rats to expect 2 pellets from the LL lever and 1 pellet from the SS lever, but that experience does not promote the choice of 2 pellets over 1 pellet. If the rats learn simple associations between the SS and LL levers and their respective amounts during training, then the No-Delay training should increase LL choices. The absence of this outcome suggests that preference for the LL and SS options as a result of training does not reflect some composite associative value that generalizes to the choice procedure.

Training procedures have shown that different ways of introducing SS and LL options may lead to different preferences in choice. Marshall and Kirkpatrick (2016) found that amount discrimination training improved self-control, so it could be assumed that the No-Delay training could do the same. The No-Delay training procedures typically expose rats to SS and LL options successively and this may impede the ability to discriminate amounts. Consistent with this idea, Marshall et al. (2014) failed to find a relationship between impulsive choice and amount discrimination accuracy with a procedure that trained amount–lever associations across successive blocks. However, Marshall and Kirkpatrick (2016) observed a relationship between amount discrimination and impulsive choice after training with concurrently presented amounts. Concurrent discrimination training might be a necessary condition to observe reinforcer-amount training effects on LL choices.

Although concurrent training may promote amount discrimination, it is unknown whether it is a necessary condition for delay-based training. Outside of the impulsive choice paradigm it has been shown that pigeons only learn the temporal value of different delay-correlated cues when they are trained simultaneously with other delays in the same session, rather than successively across blocks of sessions, using the concurrent chains procedure (Grace & Hucks, 2013). The effects of delay-correlated cues were described in terms of their conditioned reinforcement value in concurrent chains research, but these effects can easily be interpreted in terms of how the context modifies the way that delayed reinforcement is learned. The impulsive choice tasks described previously are functionally concurrent chains schedules with a choice initial-link (fixed-ratio, FR, 1) chained to a delay-to-reinforcement terminal-link (FI or fixed-time, FT). The concurrent chains schedules described in this section utilize a VI initial link choice schedule where responses can be allocated to both options and the first response that completes the VI schedule on either of the options registers as the selected choice. The significance of the VI initial link choice period (vs. FR 1) is that it provides an added delay context that precedes a signaled terminal-link delay.

Grace and Savastano (2000) trained pigeons simultaneously on two different concurrent chains components that differed on initial-link schedules (VI 20 s for the “short” initial-link component and VI 100 s for the “long” initial-link component). Those components had the same terminal-link delays (VI 10-s delay, VI 20-s delay), but different terminal-link key light colors depending on the component (e.g., VI 10-s for the short component might be green, whereas the VI 10-s for the long component might be red). Then the pigeons’ preferences were probed with the green VI 10 from the short component compared with the red VI 10 from the long component. The pigeons had equivalent preference for the VI 10-s schedules. This outcome would be expected assuming that the pigeons learned that the average terminal link delays were the same and if the length of the initial link context was ignored. However, O’Daly et al. (2005) trained pigeons successively on two different multiple chains components that differed on initial-link schedules (VI 10 s for the “short” initial-link component and VI 100 s for the “long” initial-link component), but shared the same FT 30-s terminal link that differed in terms of key light color based upon the associated component (e.g., red FT 30-s schedule following VI 10-s, green FT 30-s schedule following VI 100). The pigeons’ preference between red versus green FT 30-s terminal links was probed, and the pigeons favored the terminal link signal associated with the VI 100-s initial link (e.g., green). This outcome would be expected if the pigeons learned the duration of the initial-link and terminal-link components and the signaled FT delay represented the time remaining within the component (consistent with the delay reduction hypothesis; e.g., Fantino et al., 1993). The 30-s delay following the (average) 100-s delay would indicate “most of the waiting for food has passed” in the context of the component, and the 30-s delay following a 10-s delay would indicate “most of the waiting for food is still to come.”

Overall, the results of Marshall et al. (2014) and Marshall and Kirkpatrick (2016) show that concurrently training different reinforcer amounts may be necessary to produce accurate amount discrimination. On the other hand, the results of Grace and Savastano (2000) in comparison with O’Daly et al. (2005) show that concurrent training of different delays leads to them to be compared based upon their absolute times, whereas successive training of delays leads them to be compared based upon whether they signal that food is temporally nearer or farther in the overall context. Previous FI-Delay training has used successive procedures to produce improvements in timing and self-control. Future FI-Delay training using concurrent procedures might produce relatively greater improvements in timing and self-control.

Learning-based procedures often increase preference for the choice option that provides more reinforcers, but procedures can also affect choice by increasing preference for the option that provides reinforcers sooner. T. R. Smith et al. (2022; Experiment 2) compared FI-Delay training between groups where the SS option involved either a short delay (5-s) or a long delay (10-s SS) with 30-s LL delay training delivered to both groups. The group experiencing the 5-s SS delay showed greater delay discounting in comparison to the group experiencing the 10-s SS delay training. These results were interpreted in terms of the rats experiencing the 10-s SS delay being trained to tolerate the aversiveness of waiting (i.e., delay tolerance, a mechanism discussed later), but both groups experienced 30-s LL delay training that has been shown to increase LL choices (Fox et al., 2019; only trained LL delays). The attractive dimension of the SS delay is that it is “short”; the 5-s SS delay group perhaps learned to attend to that dimension because of training, whereas the 10-s SS delay group attended to the attractive “larger” reward dimension of the LL.

Bailey et al. (2021) also reported training effects indicating that exposure to short delays can increase preference for that option, in this case the LL option with short VI delays. A VI-Delay group received training with a Weibull distribution of delays with different mean and shape parameters to produce increasing, decreasing, or constant hazard functions with mean VI durations for the SS and LL levers of 10 s and 30 s, respectively. The decreasing hazard function delivered many short delays offset by a few very long delays, whereas the constant hazard function had an exponential distribution, and the increasing hazard function closely approximated a uniform distribution. Compared with the increasing and constant hazard functions, the decreasing hazard function VI produced the greatest improvement in self-control. The frequent short delays to the LL outcome during training might have increased the LL value (similar to what the 5-s SS delay did in T. R. Smith et al., 2022), but the occasional long delay might have increased LL delay tolerance.

To summarize, what animals learn about delay and amount contingencies in a choice procedure depends on how the animals are introduced to those contingencies. These dynamics, however, have been studied using animals that learned the contingencies in a highly controlled environment. It is unclear whether these dynamics would apply to human decision-making. We will return to this issue below (see Attention section) when discussing framing effects on impulsive decision-making and will discuss the extent to which the studies in this section with rats and pigeons may bias attention in similar ways to framing effects. This provides a possible connection across species.

Learning the global contingencies: Optimal preference for reinforcement maximization

In an impulsive choice procedure, the LL option is the “optimal choice” defined in terms of maximizing reinforcer outcomes over time, thus making it the self-controlled choice. Humans can adopt an optimal strategy under the constraints of a delay discounting task (Schweighofer et al., 2006) and animals’ behavior often is adaptive to the environment and sometimes approximates the optimal solution in a variety of experimental preparations (Fantino & Abarca, 1985; Stevens & Stephens, 2010). For example, Schuweiler et al. (2021) found that rats could optimally delay gratification in a choice and diminishing returns procedure, which offered a choice between a progressive interval (PI) and an FI. The PI delay started at 0 s and increased by 1 s for each successive PI choice, and the FI option was always a 10-s delay. Choosing the FI option reset the PI delay for the next trial to 0 s. The optimal response pattern to maximize reinforcers was to choose the PI until reaching 4 s and then switch to the FI option to reset the PI to 0 s. This would keep the rat routinely encountering low PI delays. The rats in Schuweiler et al. (2021) on average switched at the 4-s delay—demonstrating precision in learning the optimal strategy.

Given that rats can optimize in the diminishing returns task, one might anticipate that they would adopt the optimal strategy in impulsive choice procedures. Typically, in impulsive choice procedures, the LL results in greater reinforcement maximization because the intertrial interval (ITI), the time between food delivery and the next choice trial, represents an opportunity cost in a limited-time session. For example, even if the SS offers 1 pellet after 10 s and the LL offers 3 pellets after 30 s, a common 60-s ITI following both choices ensures the LL option will have a greater payoff when considering trial and ITI time together. Self-control is often touted as the optimal choice, but humans and animals almost never exclusively prefer the LL option, hence why impulsive decision-making is such a ubiquitous problem. If animals are insensitive to the ITI and fail to see the “big picture” in terms of the LL option maximizing reinforcement in the long run, then that might partly explain why optimal self-controlled solutions are uncommonly observed.

Studies examining sensitivity to the ITI have found mixed results. Smethells and Reilly (2015) demonstrated that rats in a condition with a 6-s LL delay coupled with a 10-s ITI chose the SS option, providing immediate reinforcement, more often compared with a group with a 45-s ITI. The 10-s ITI imposed a low opportunity cost for choosing the SS and the rats appeared to be sensitive to that contingency. However, sensitivity to the ITI may be the exception rather than the rule. Blanchard et al. (2013) demonstrated that rhesus monkeys showed poor sensitivity to ITI durations unless the ITI was made salient with signaling. Sjoberg et al. (2021) reported that increasing the length of the ITI for the LL choice had no effect on impulsive choices in rats, and attempts to make the ITI more salient with audio cues did not improve sensitivity. Pigeons also generally showed poor sensitivity to ITIs (Logue et al., 1985). It is curious that rats (Schuweiler et al., 2021) and pigeons (Hackenberg & Hineline, 1992) generally show optimal response patterns in a diminishing returns contingency, and yet when similar delays are packaged into an impulsive choice procedure, preference often deviates from optimality.

The ITI is not the only extraneous variable that could affect impulsive choices. Low opportunity costs induced by using reinforcer postponement, rather than reinforcer waiting, can also affect choice behavior (Paglieri, 2013). In illustrating the effects of reinforcer postponement, Addessi et al. (2021) showed that when a new trial occurred immediately after a choice, but before delayed reinforcers from the previous trial were delivered, capuchin monkeys made more LL choices. This is because the LL delay imposed a smaller opportunity cost if a new choice trial could present itself concurrently with the LL delay from the previous trial. In other words, the monkeys were not waiting for the LL reinforcer; they were postponing the LL reinforcer to be delivered later while they engaged in other reinforcing activities. Although animals may often ignore the ITI, the SS and LL delays might represent salient opportunity costs that animals do not ignore. On the other hand, the hypothetical impulsive choice tasks that humans receive inherently imply reinforcement postponement because participants will not assume that a “$27 in 2 weeks” choice commits them to captively wait in a laboratory room for those weeks (Paglieri, 2013). Thus, choosing the LL does not necessitate an opportunity cost for other reinforcing activities in the hypothetical choice paradigm.

Reinforcer bundling is another extraneous contingency that affects choice outside of delay/amount contingencies (Ashe & Wilson, 2020). In bundling, a single choice results in a series of delayed outcomes that occur in succession, and this has the effect of increasing LL choices. For example, the SS option may offer $100 immediately and another $100 after two weeks, and the LL option may offer $200 after 2 weeks and another $200 after 4 weeks. Bundling effects have been observed in humans (e.g., Kirby & Guastello, 2001) and rats (e.g., Stein et al., 2013b). Stein et al. (2013b) demonstrated greater LL choices in group of rats that experienced a bundle of 9 SS or LL delayed reinforcer events (followed by a single ITI), compared with a no bundle group with an ITI length to equate rate of reinforcement between the two groups. Additionally, rats in the bundled group later made more LL choices in a standard impulsive choice task. Thus, the bundling experience resulted in learning that transferred across tasks. Stein and Madden (2021) proposed that bundling increases LL choices by allowing the sum of the values of discounted LL reinforcers in the bundle to be compared against the sum of the values of discounted SS reinforcers in the bundle (which includes delays between each subsequent SS reinforcer delivery). The first SS reinforcer in the bundle might have greater value than the LL reinforcer, but each successive SS reinforcer may have lower value than the LL; thus, the sum these reinforcer values will drive preference for the LL. The process underlying bundling effects might explain why the optimal response strategy is observed in the diminishing returns procedure. The PI option is often selected several times consecutively before switching to the FI option to reset the PI delays. If each discrete choice for the PI option is framed as a “bundle,” then the discounted value of each PI reinforcer in the series of choices might summate to compare against the value of the delayed FI reward. Future research is necessary to explore this hypothesis.

To summarize, animals can adapt their choices to find an optimal response solution to maximize reinforcers. However, the standard impulsive choice contingencies often lead to suboptimal choices. This might in part be because the prototypical impulsive choice situation is not designed to highlight the global reinforcement context. The learning effects on choice behavior highlight many possible mechanisms that may determine impulsive choice and those may independently affect choices in complex ways. The following section will discuss mechanisms of impulsive choice from a more conceptual theory-driven perspective.

Motivational and cognitive factors

Learning implicitly includes motivational and cognitive factors that may interact with impulsive choice. Motivational factors relate to influences of reinforcers and aversive stimuli/punishers in affecting choices. Within the framework of impulsive decision-making between larger/later and smaller/sooner options, the reinforcers often refer to the quantity or quality of the larger/smaller outcomes, whereas the aversive stimuli refer to the duration of the later/sooner delays. Motivation also relates to the conditions that modify the subjective value of reinforcers across time (reinforcer deprivation being the prototypical example). Collectively, the value of a consequence in a given deprivation state could be considered the “utility” of that consequence. Learning does not uniformly occur between individuals and within the same individual across all situations. Cognitive factors that functionally describe how experiences translate into adaptations in behavior are also important. The key cognitive processes that are relevant to impulsive decision-making are delay and amount perception, attention, and working-memory. Learning necessitates perceptual contact and attention to be focused on those relevant environmental features for the organism to encounter the prevailing reinforcement contingencies. Learning also requires adequate working memory to associate current environmental conditions with past outcomes to accurately plan prospectively and optimize future outcomes.

Impulsive versus self-controlled decision-making is often framed as a competition between a motivational system and a cognitive (executive) system, respectively (Bechara et al., 2019). The motivational system is driven by the short-sighted pursuit of immediate attractive reinforcers at the expense of long-term outcomes. The cognitive system strives to achieve optimal outcomes that take long-term reinforcers into consideration, foregoing any immediate reinforcer that is in competition with the long-term goals. The competing neurobehavioral decision systems (CNDS; Koffarnus et al., 2013) theory posits two neurobehavioral processes responsible for impulsive decisions and executive decisions. Brain regions associated with the (bottom-up) impulsive system include the amygdala and striatum (i.e., dopamine mediated wanting system associated with habitual reinforcer seeking and cravings). Brain regions associated with (top-down) executive control systems include the orbitofrontal cortex, prefrontal cortex, and anterior cingulate cortex (i.e., anticipation, foresight/planning, value processing in decision-making, etc.). The insular system (i.e., sensory processing, salience detecting, information integration, etc.; Gogolla, 2017) is proposed to modulate how deprivation and stress can produce an imbalance allowing hyperactivity in impulsive systems to override hypoactivity in executive systems (Bechara et al., 2019). The motivational and cognitive factors discussed in this section may fit within the CNDS framework.

Aversion to delays and temptation by immediacy

Delay aversion is a common explanatory framework that describes impulsive choices as avoidance of the aversive properties involved in waiting for a delayed reinforcer. The delayed gratification paradigm can evaluate delay aversion where waiting for the LL is challenging in the face of temptation from the immediate SS (Mischel et al., 1989; Watts et al., 2018). For example, individuals with ADHD are often considered to be delay averse (Solanto et al., 2001; Sonuga-Barke et al., 1992) and often fail to wait for the LL (Rapport et al., 1986). Individuals with ADHD often fail to wait for the LL and show greater avoidance of a cue that was associated with the delay, and this is interpreted as an aversion to the delay that the cue represents (Van Dessel et al., 2018). Their study also showed increased activity in the amygdala and dorsolateral prefrontal cortex, which are linked to negative affect and avoidance behavior, in delay-avoidant individuals. These results are further supported by Mies et al. (2018), who found that higher amygdala activity was correlated with self-reports of delay aversion and more impulsive choices. Like in humans, research with rodents has also observed that impulsive rats will respond to turn off cues associated with longer delays (Peck et al., 2019). Delay aversion as an explanation of impulsive choice is a construct that is supported by behavioral data (cue avoidance responses), subjective reports, and brain measures (activity in regions associated with aversive affect).

Preclinical research has further suggested that there might be a causal link between delay aversion and impulsive choice as exposure to delays increases self-controlled choices in humans and animals (e.g., Rung et al., 2019; Smith et al., 2019). Stein et al. (2013a) required rats to repeatedly respond for delayed reinforcement (using a fixed-time, FT, schedule) on a training lever and subsequently tested those rats’ impulsive choices using a separate pair of choice levers. Compared with rats that were trained to respond for immediate reinforcement (FR 1), the delay exposure rats made more LL choices. This effect has been replicated repeatedly using the procedure described above (Peck et al., 2019; Renda & Madden, 2016; Renda et al., 2018, 2021; Rung et al., 2018; Stein et al., 2015). Peck et al. (2019) found that that exposure to delays improved self-control and decreased avoidance of a delay-associated cue. This finding suggests that the delay exposure training may have improved self-control by reducing aversion to the LL delay evidenced by reduced responses to escape from delay-correlated stimuli. Improving delay tolerance through training was reported in Fox et al. (2019; Experiment 2) with rats exposed to FI-Delay training and assessed on an impulsive choice and peak interval procedure. Unlike A. P. Smith et al. (2015), where training involved SS and LL forced-choice trials, the training in Fox et al. delivered LL and SS amounts from both of the choice levers (randomly) after an LL delay. Fox et al. (2019) reported that their version of the FI-Delay training improved self-control, but they did not observe any improvements in peak interval timing. The different training procedures in Fox et al. (2019) allowed experience with the LL delay to occasionally lead to SS reinforcer amounts on the SS lever and this might have led to poor generalization of timing information acquired in training to the choice task. These results suggest that the improvement in self-control was driven by improvements in delay tolerance.

Delay tolerance suggests that LL delays resulting from self-controlled choices are not aversive. Alternatively, an immediacy preference suggests that reinforcer promptness associated with the SS choice is attractive. Fox et al. (2019; Experiment 2) found that the FI-Delay training improved delay tolerance, but No-Delay training increased SS preferences. The No-Delay training involved immediate access to pellets using an FR 2 contingency with the SS and LL reinforcer amounts presented randomly on each lever (like the FI-Delay training). Fox (2021) replicated this effect of the No-Delay training. Collectively, these studies support the conclusion that FI-Delay exposure may promote self-control by increasing delay tolerance, whereas a No-Delay exposure may increase impulsive choice by increasing an immediacy preference. The No-Delay training in these studies differs procedurally from previous No-Delay tasks (Bailey et al., 2018; Panfil et al., 2020; Stuebing et al., 2018) in the delivery of random SS and LL reinforcer amounts. Those previous studies did not report any change in LL choices. It is possible that the random reinforcer amount deliveries may have increased attention to the immediacy of the delays, a possibility that necessitates further research.

Preference for immediacy has also been studied by increasing the SS delay, rather than the LL delay, resulting in an increased preference for the LL (Bailey et al., 2018; Mazur & Biondi, 2009; Rodriguez & Logue, 1988). Experiments that eliminate immediacy by using a precommitment contingency can also increase LL choices (Rachlin & Green, 1972). The precommitment opportunity was presented as an initial link choice between a later terminal link offering a free choice between SS and LL options or a terminal link only offering a forced-choice LL option. When the pigeons choose to proceed to the free-choice terminal link, they favored the SS option, but overall, the pigeons preferred to commit to the forced-choice LL option in the initial link. Thus, if the SS option was unavailable to tempt immediate reinforcer delivery in the initial link, then the pigeons showed more self-controlled choices by favoring a terminal link that led to a forced-choice LL option.

Jackson and Hackenberg (1996) also demonstrated that eliminating reinforcer immediacy increased self-control by using token-reinforcement procedures. In this procedure responding did not directly lead to food, but rather illuminated LED lights that were accumulated across trials and later exchanged for food. Pigeons chose between an LL option of 3 LEDs after 6 s or an SS option of 1 LED available immediately. Under conditions where LEDs were traded for food immediately the pigeons showed an SS preference (similar to when they responded for food directly). However, if the opportunity to trade accumulated LEDs for food was delayed for LL and SS choices, then the pigeons were more likely to choose the LL option. Similar to what was observed with Rachlin and Green (1972), once the temptation of an immediately consumable reinforcer was removed, the pigeons demonstrated better self-control.

Collectively, self-control can be increased by reducing LL delays, increasing tolerance to LL delays, increasing SS delays, or preventing immediate access to SS outcomes. Preference for immediacy and aversion to delays are both significant factors in modulating impulsive choice.

Reinforcer valuation

Reinforcer valuation refers to the motivation to obtain a given reinforcer based on how intrinsically valuable an outcome is to an individual (i.e., its utility). In addition, motivational states can modulate the value of a reinforcer across situations and over time. For example, food deprivation will increase food value until the hunger state is satisfied. The quantity and quality of the “larger” and “smaller” reinforcer determines how reinforcer value affects impulsive decision-making—this section focuses on qualitative reinforcer differences.

Madden et al. (1997) found that opioid dependent individuals were more impulsive when choosing between monetary SS and LL rewards in a hypothetical choice task, but they were even more impulsive when the reinforcer was hypothetical heroin—a commodity that is highly valued to individuals with an opioid dependency. Odum et al. (2020) reviewed the effects of qualitatively different reinforcers on impulsive choice and explored why some reinforcers are discounted at higher rates. For example, nonmonetary outcomes lose value with delays more steeply than monetary outcomes. They conclude that the discounting of qualitatively different reinforcers was determined by the perceived future preference for a reinforcer (i.e., does the individual anticipate that they would want it later) and the utility of a future reinforcer (i.e., would the future value of the reinforcer be lost). But those conclusions are limited to human participants. In animals, preference between the SS and LL options do not appear vary between conditions where quantitatively (e.g., 10 vs. 30 pellets) and qualitatively different reinforcers were offered (e.g., sucrose vs. cellulose pellets; Calvert et al., 2010). It therefore seems that any consumable reinforcer may produce the same rate of delay discounting. However, this conclusion needs to be considered with caution because reinforcer quality was identical on the SS and LL options (same-reinforcer tasks) in Calvert et al. (2010). Using qualitatively different reinforcers (e.g., sucrose on SS, cellulose on LL) might have produced different results.

Evaluating impulsive choice using cross-reinforcer tasks where the SS and LL offer different types of reinforcers rather than different quantities is insightful, but is uncommon in impulsive choice assessments (Pritschmann et al., 2021). Bickel et al. (2011) looked at cross-reinforcer impulsive choice between hypothetical money and cocaine using participants meeting clinical criteria for stimulant use disorder. The cocaine-SS and money-LL group showed high degrees of self-control favoring the delayed money option, suggesting that immediate cocaine did not compete strongly against delayed money. The money-SS and cocaine-LL group showed the strongest rate of discounting, presumably because a delayed consumable reinforcer loses much of its anticipated value (Odum et al., 2020). They included same-reinforcer groups and reported that the LL choices were highest with money, and greater SS choices were found with the cocaine, consistent with steeper discounting of nonmonetary reinforcers. The use of hypothetical cocaine as a reinforcer might partially explain these effects because the individuals may have lacked motivation for that reinforcer while completing that task. However, providing a real cocaine reinforcer is not feasible in most human research studies.

Animal studies have assessed choices with different types of consumable reinforcers (Huskinson et al., 2016; Huskinson et al., 2015). Huskinson et al. (2015) reported that when rhesus monkeys were offered a food-LL and a cocaine-SS, they discounted the food-LL more steeply compared with a condition when food was available for both options. This suggests that impulsive choices may increase when a highly valued consumable reinforcer is attached to the SS option. However, Huskinson et al. (2016) presented the opposite cross-reinforcer options with cocaine-LL and food-SS. They reported that monkeys preferred the cocaine-LL with stronger preferences when higher doses of cocaine were available. Thus, with real outcomes the higher-valued cocaine reinforcer dominated preferences. The difference between the results in monkeys and humans might reflect the use of experiential versus hypothetical outcomes. Regardless, the use of cross-reinforcers in the impulsive choice tasks reveal novel dynamics that invite further research. For instance, the utility of qualitatively different reinforcers can interact in ways that affect the value of each reinforcer. Some cross-reinforcers interact in a way where consumption of one reinforcer increases the value of the alternative—these are known as complimentary cross-reinforcer interactions (e.g., consumption of salt may increase thirst and increase the consumption of a beverage). Other cross-reinforcers interact in a way where consumption of one reinforcer decreases the value of the alternative—these are known as substitutable cross-reinforcer interactions (e.g., consumption of water may decrease thirst and decrease the consumption of alternative beverages). Cross-reinforcer relationships are “independent” when they do not interact.

The economic demand procedure (Hursh & Roma, 2016) is a useful method to assess the value of a reinforcer and evaluate how cross-reinforcers interact. Procedurally, reinforcer value is evaluated by having individuals pay some cost (e.g., effort, hypothetical money) to obtain the reinforcer where the unit price (cost per amount) of the reinforcer is varied across conditions and the amount of the reinforcer obtained is measured. The reinforcer value is indexed by the elasticity of demand, the degree to which reinforcer consumption drops when the unit price is increased. Shallow decreases in consumption are termed inelastic demand and this is an index of high reinforcer value relative to elastic demand where consumption drops steeply with increases in price. In everyday terms, inelastic demand is associated with necessities that individuals would pay almost any price to obtain (e.g., food, water) and elastic demand is associated with luxuries that can be forgone if the costs are too high or reinforcer value offered is too low (e.g., entertainment). Cross-reinforcer interactions can be evaluated when the unit price of one target reinforcer is varied while the cost of a qualitatively different alternative is held constant. For substitutable relationships, the consumption of the alternative would increase with increased price for the target reinforcer, whereas for complimentary reinforcers, the consumption of the alternative would decrease with the increase in price of the target reinforcer. These cross-reinforcer relationships underscore the point that the utility of a reinforcer in a context is determined by what other reinforcers are present. Understanding these interactions is useful in substance abuse research where reinforcer overevaluation can become maladaptive.

In the extreme, overevaluation occurs when a reinforcer is excessively consumed or sought out at the expense of other outcomes. As previously discussed, conditions like substance use disorders are associated with impulsive choice, but prediction of these conditions is improved when jointly factoring impulsive choice and reinforcer valuation assessments. Observations that high impulsive choice and high reinforcer valuation (assessed using the demand procedures) are associated with maladaptive substance use behaviors has been termed reinforcer pathology (Bickel et al., 2014, 2020). Reinforcer pathology is linked to unhealthy behaviors associated with alcohol (Lemley et al., 2016; Stancato et al., 2020), cannabis (Aston et al., 2016; reinforcer value and discounting were associated with different THC use outcomes), body mass index (Epstein et al., 2014), caloric intake (Rollins et al., 2010), unsafe sexual behaviors (Harsin et al., 2021), and relapse from smoking cessation (García-Pérez et al., 2022). The observation that impulsive choice and reinforcer value often covary in individuals suggests that they may share a common underlying motivational mechanism. Interventions designed to treat maladaptive reinforcer-driven behavior should therefore be assessed for their ability to reduce impulsive choice and the value of the problem reinforcer—be it drugs, food, gambling, or sex.

Interventions such as episodic future thinking (EFT) can affect both impulsive choice and reinforcer value. EFT requires individuals to vividly imagine a future event and this has been shown to reduce impulsive choice (Peters & Büchel, 2010) along with decreased excessive valuation for alcohol (Bulley & Gullo, 2017), nicotine (Stein et al., 2018), and palatable foods (Sze et al., 2017). Alternatively, participants that receive scenarios of stressful situations, such as income constraints (Mellis et al., 2018) or hurricane losses (Snider et al., 2020), are more likely to show increased impulsivity and (food) reinforcer valuation. Collectively, this supports the reinforcer pathology framework and demonstrates that impulsive choice and reinforcer valuation are mechanistically associated. These results also support the CNDS proposal that impulsive behaviors emerge from the motivational system and are associated with heightened reinforcer valuation (Bechara et al., 2019).

Motivating operations refers to the conditions that temporarily modulate the value of a reinforcer across time (Edwards et al., 2019; Michael, 1993). The most basic example of a motivating operation is deprivation. For example, food deprived animals experience hunger, water deprived animals experience thirst, and drug-dependent animals in abstinence experience withdrawal. However, other conditions can serve as motivating operations. For example, salt intake results in thirst, advertisements for palatable food and drugs can trigger cravings, and stressful environments motivate avoidance behavior. Downey et al. (2022) provides a thorough review of the effects of deprivation on impulsive decision-making that does not need to be fully recounted here. However, there are several key proposals that highlight how motivational processes may affect impulsive choice. In humans, it has been generally concluded from the results of a variety of studies that sleep or nicotine deprivation have no effect on impulsive choice, whereas deprivation of opioids or financial resources increase impulsive choice (Downey et al., 2022). In animals, mild deprivation does not seem to affect impulsive choice in food-deprived pigeons (Oliveira et al., 2013) or water-deprived rats (Richards et al., 1997). However, opioid-dependent rats show greater impulsive choices during deprivation-induced withdrawal (Harvey-Lewis & Franklin, 2014). Collectively, there are mixed results about whether deprivation impacts impulsive decision-making. However, from prevailing trends, it is possible that mild deprivation (or other low-stress conditions) would have negligible effects while stressful deprivation (e.g., withdrawal) may increase impulsive choice.

Attention

Attention is a perceptual and cognitive concept that is broad and difficult to define precisely (Hommel et al., 2019). For the present purposes we operationalize attention as the degree to which reinforcement contingencies and stimulus cues in the environment affect impulsive choice behavior.

The importance of attention can be observed in studies where conditions lead to poor learning. As discussed earlier, delay training increased self-control corresponding with improvements in timing in some studies (Peterson & Kirkpatrick, 2016; Smith et al., 2015), but not others (Rung et al., 2018). A prominent procedural difference that may explain differences between the studies is the contingency employed at the end of the delay. Smith et al. (2015) used response-initiated FI schedules requiring the rats to make a response after the delay to collect the reinforcer during training and impulsive choice tasks. On the other hand, Rung et al. (2018) used response-initiated FT schedules where reinforcement was delivered automatically after the delay. The FI response requirements promote active attention to the delay. The FT schedules deliver the reinforcers automatically after the delay. The FT schedules in the choice procedure might not require sufficient attention to delays, and this might have led to the absence of an effect in the timing task. In another, previously discussed example, Marshall and Kirkpatrick (2016) reported that rats learned to discriminate reinforcer amounts with a concurrent training procedure (e.g., choosing between different pellet amounts in a trial), but not in a successive training procedure (e.g., responding for different pellet amounts on a lever across blocks of sessions; Marshall et al., 2014). This benefit of learning during concurrent training might be mediated by the procedure requiring attention and comparison between the two options. The successive procedure does not easily permit such comparisons between the two options. Poor attention may explain why some procedural differences produce limited effects on learning.

Refocusing attention is a proposed method to help individuals with impulsive decision-making and associated maladaptive behavior, like substance use disorder (Ashe et al., 2015). Mischel and Ebbesen (1970) and Mischel et al. (1972) investigated the impact of attention in the delayed gratification task in children. Conditions that encouraged attention to the outcome, such as thinking of the reinforcer or making the reinforcer visible, decreased delay gratification. On the other hand, conditions that distracted the children from the reinforcer, such as thinking of something fun, improved delay gratification for the LL option. Evans and Beran (2007) reported that chimpanzees engaging in self-distraction activities were better able to wait for a larger accumulation of reinforcers in a modified delay of gratification task. This finding demonstrates that attentional focus can be a relevant mechanism for self-control in animals who are not simply following verbal instructions or obeying potential demand characteristics, as may be the case with human participants. Overall, this demonstrates that shifting attention away from a tempting SS option can increase the ability to wait for an LL option.

Just as distraction can shift attention away from the choice situation, attention can also be shifted toward different aspects of the choice situation by using stimulus cues. As discussed above, ITIs often do not affect impulsive decision-making in rats (e.g., Sjoberg et al., 2021), but Pearson et al. (2010) reported that signaling the ITI increased LL choices in rhesus macaques. This signaling effectively drew attention to the ITI and increased reinforcement maximizing. The results from Peck et al. (2019), where rats avoided delay-correlated cues, indicate that cue lights associated with a choice appear to represent the aversive dimension of the delay rather than the reinforcing dimension of the food. Attention to the delay-associated cue may condition the delays to represent the aversive aspects of waiting.

Studies have also shown that cues associated with delays in the terminal link within concurrent chains can represent the delay to food (Grace & Savastano, 2000) or time left waiting for food (from transition between initial link and terminal link; O'Daly et al., 2006) based on concurrent or successive value training, respectively. These contrasting results might be best understood in terms of how the training focuses attention on learning what the delay represents. Using short SS delays in training increased impulsive choices (Smith et al., 2022), and this might be due to biasing attention towards short SS delays (i.e., rats learned the appeal of a short SS delay). The way animals experience the SS and LL options (outside of a choice procedure) may bias learning by training the animals to attend to different aspects of the impulsive choice contingencies. Overall, whether delays represent the aversive aspects of waiting, the value of the outcome, the time that has already passed during a delay, or the time left waiting may depend upon how these aspects were learned. The relevant mechanism underlying this learning might be attention to the contingencies.

The contingencies of reinforcement or cues in an initial link of an impulsive choice procedure can affect choices and these effects relate to attentional framing. Calvert et al. (2011) also demonstrated that cues can affect impulsive choices by signaling parts of the delay. They assessed impulsive choice in a study with a common delay added to the SS and LL options so that the SS option did not produce immediate reinforcer delivery. In comparison to control condition with no common delay, unsignaled common delays increased impulsivity and signaled common delays decreased impulsivity. The reason for this signaling effect is unclear, but it may have manipulated attention to the contingencies where the unsignaled delay was associated with a long delay that was aversive (e.g., Peck et al., 2019). In contrast, the signaled condition may have reframed the contingencies so that the experience with the LL option was not as subjectively long in comparison to the SS option. To compare with humans, Green et al. (2005) added a common delay to hypothetical SS and LL choices and found more LL choices as a result. Thus, pigeons showed sensitivity to the contingencies similar to humans if the task was framed in a way that highlighted the common delay between the SS and LL.

Attention can also be manipulated in experiments without the explicit use of stimulus cues. As described previously, Rachlin and Green (1972) demonstrated that pigeons committed to the LL option if given the opportunity in an initial link. In a similar experiment, Siegel and Rachlin (1995) found that an FR 31 response requirement, on either the SS or LL initial link leading up to the choice, increased LL choices compared with an FR 1 response requirement. The pigeons could switch options during the FR 31 and only the last response counted as the choice, which led to the SS or LL delay and subsequent reinforcer delivery. Under this contingency, pigeons tended to respond on the LL option early and rarely switched to the SS option. Monterosso and Ainslie (1999) suggested that the pigeons’ attention was focused on the LL at the start of the trial in the FR 31 condition because the initial link distance from the outcome produced an LL bias and the response contingency helped maintain attention on the LL for the remaining 30 responses. The self-control promoting effects of precommitment in Rachlin and Green (1972) can emerge when the initial link delay involves a response contingency that captures attention and guides the pigeon to the terminal choice associated with that option, often the LL due to an LL preference at the precommitment stage of decision-making.

Framing effects in the impulsive choice literature with humans might be understood in terms of attentional control. Instructions to human participants can reframe tasks to shift attention that can be observed with outcome framing and date framing. Hypothetical choice tasks typically ask participants, “Would you prefer $9 now or $18 in 14 days?” The explicit-zero framing asks participants, “Would you prefer $9 now and $0 in 14 days or $0 now and $18 in 14 days?” Radu et al. (2011) conducted a series of experiments comparing explicit and implicit $0 (i.e., a standard question format). They tested preferences of future outcomes in one group and satisfaction from past outcomes (e.g., “$9 an hour ago” or “$18 fourteen days ago”) in another group. They reported fewer impulsive choices with explicit zero for both the future and past outcome groups. They explained these results in terms of temporal attention, where myopic temporal horizons (for future and past outcomes) can at least partially account for impulsive choice without needing to appeal to temptation. It is interesting to point out that the focus on temporal attention in Radu et al. (2011) as a causal mechanism shares some parallels with animal research where FI-Delay training improves the timing of FI delays and leads to more LL choices (Smith et al., 2015), which may expand the temporal horizons.

Another form of framing is explicit date framing where choice questions indicate the date when the outcome would be delivered (e.g., $15 on 7/22/22 instead of $15 in 7 days). Read et al. (2005) found that date framing increased LL choices. This may have occurred because date framing does not explicitly highlight the delay dimension and may promote attention to the amount dimension. Naudé et al. (2018) found that improvements in self-control from date discounting were more likely to be observed with highly impulsive individuals. These results suggest that date framing shifts attention away from the delay dimension. If impulsivity is driven by delay aversion, then this attentional shift would disproportionally affect choices of impulsive individuals.

Attention is also implicated in studies designed to bias time perspectives. Future time perspective (FTP) measures the degree to which individuals think about the future and consider future consequences. Greater degrees of FTP are associated with greater self-control and healthful behaviors (Daugherty & Brase, 2010). Göllner et al. (2018) assessed the relationship between FTP in both impulsive choice and delayed gratification measures and found that LL choices and success in delaying gratification was correlated with a longer FTP time horizon. Thus, temporal attention is a mechanism that may explain impulsive choice and subsequently why EFT is successful in promoting self-control.

More recent versions of reinforcer pathology theory include a temporal window or time horizon as the target for interventions (Bickel et al., 2020). An individual’s short time horizon may potentially be linked to poor time perception, inattention to the future, and/or inability to make well-informed temporal choices. Short time horizons could be a possible cause of impulsive choice. If so, then training procedures designed to improve temporal horizons should improve self-control. As previously mentioned, EFT reduces impulsive choice and lowers reinforcer valuation in humans by having them vividly imagine future events (Peters & Büchel, 2010). The act of vividly imagining the future is correlated with activity in the anterior cingulate cortex (ACC, related to attention; Davis et al., 2000) and hippocampus (processes temporal information relating to episodic memories, Umbach et al., 2020). Attention is implicated in EFT in two main ways. First, it increases self-control when implemented during a choice trial, when attention would most likely influence decision-making. Second, focusing attention on episodic recent events can increase impulsive choices (Rung & Madden, 2019), demonstrating that the mechanism of attention on choice can work in both directions. Reinforcer pathology links high impulsive choice, high reinforcer valuation, and short time horizons as key aspects predictive of maladaptive behavior. The ability of EFT to increase time horizons and reduce impulsive choice and reinforcer valuation suggests that temporal attention is target mechanism for positive behavioral change.

Working memory

Working memory is the ability to maintain goal-relevant information despite interference from competing or irrelevant information. Shamosh et al. (2008) reported that working memory is negatively correlated with impulsive choice and intelligence (replicated in Bobova et al., 2009), and is partly explained by activity in the anterior prefrontal cortex (functionally associated with prospective planning; Ramnani & Owen, 2004). This correlation implies that working memory may be another target mechanism mediating impulsive choice.

To test this hypothesis, Bickel et al. (2011) gave stimulant users working memory training using digit-span and word recall tasks and reported improved self-control. Jimura et al. (2018) measured fMRI with participants making impulsive choices and completing a working memory exercise. They reported that the anterior prefrontal cortex and the dorsolateral prefrontal cortex activation was correlated with difficult working memory trials (i.e., high cognitive loads) and difficult impulsive choice trials (i.e., where both options have similar subjective values)—when the self-controlled option was chosen. This suggests that the neurobiological networks associated with challenging cognitive tasks also participate in difficult choices. Snider et al. (2018) used working memory training in addition to EFT to increase self-control in individuals with alcohol use disorder coupled with high baseline impulsive choices. Working memory training coupled with EFT improved self-control. Individuals showed improvements in a working memory transfer task that was procedurally distinct from the training task to confirm that the training improved working memory in general. Working memory training may support EFT by improving the generation of the vividly imagined stimuli that improve self-control. Overall, working memory training has had some promising results in individuals high in impulsivity, but some working memory training studies have failed to improve self-control (Hendershot et al., 2018; Wanmaker et al., 2018). Future research is needed to help better understand the mechanisms that working-memory training targets when improving self-control.

In preclinical models, working memory has also been assessed in rodents as a possible process underlying impulsive choice. Renda et al. (2014) found that rats with better working memory accuracy, assessed using a delayed match-to-position task, made more LL choices. However, training working memory using the same procedure did not increase LL choices in impulsive rats (Renda et al., 2015). It is possible that alternative working memory training procedures may produce effects on self-control, but this remains to be tested. Overall, in both rats and humans, there is evidence indicating a relationship between working memory and self-control, but more research is needed to determine whether working memory training can reliably improve self-control.

Perception, discrimination, and timing

Impulsive decision-making involves the discrimination between the LL and SS contingencies. Perceptual processes are involved in translating the objective stimulus properties (delays and reinforcer amounts) into subjective representations. Meck and Church (1983) demonstrated that amount discrimination (i.e., counting) and time discrimination (i.e., waiting) both displayed a psychophysical function where the point of subjective equivalence (judged to be the midpoint between two stimulus values) was located at the geometric mean. This is consistent with the psychophysical principles of perception that are explained by Weber’s law. As already discussed, self-control is correlated with accurate discrimination between reinforcer amounts (Marshall & Kirkpatrick, 2016), accurate discrimination between delays (Baumann & Odum, 2012; McClure et al., 2014), and accurate expectation of delayed reinforcer delivery (Marshall et al., 2014). Collectively, this demonstrates that variance in self-controlled choices can be explained by variance in an accurate representation of the choice options.

Distorted timing processes, in particular, have received considerable attention in explaining impulsive choice (Bailey et al., 2018; Baumann & Odum, 2012; Berlin et al., 2004; Kim & Zauberman, 2009; Marshall et al., 2014; McGuire & Kable, 2012, 2013; Noreika et al., 2013; Reynolds & Schiffbauer, 2004; Rubia et al., 2009; Smith et al., 2015; Wittmann & Paulus, 2008; Zauberman et al., 2009). Additionally, impulsive choice and timing dysfunctions are associated with substance use disorders with distorted timing being proposed as a mechanistic link mediating the relationship between self-control and substance use (Paasche et al., 2019). Timing distortions may affect impulsive choice via overestimations of delays, thus making the LL option less attractive, or imprecisely estimating delays may lead to uncertainty in temporal anticipation and difficulty predicting events in time. Baumann and Odum (2012) demonstrated that human impulsive choice was associated with overestimation of delays, and McGuire and Kable (2012, 2013) reported that individuals overestimating delays also were less likely to wait for delayed reinforcers. Temporal imprecision has also been associated with impulsive choice in rats (Marshall et al., 2014; McClure et al., 2014; Peterson & Kirkpatrick, 2016; Smith et al., 2015). To summarize, self-controlled choices are associated with accurate psychophysical representations of choice amounts and delays. Accurate perception of delays might be particularly important driving self-control.

Much of the research reviewed in this section included both human and animals and shows overlapping mechanisms in learning, motivational, and cognitive effects related to impulsive choice. Species differences are sometimes obvious—for example, rats and pigeons cannot read a series of questionnaire items and indicate their preference to a hypothetical offer, nor can they be tasked to vividly imagine a future outcome. Future comparative research needs to explore the ability of animal models to inform the mechanisms of impulsive choice broadly, and the extent to which those mechanisms can be translated across species. There are numerous hypotheses and predictions that are made by each factor discussed above. The foundational theoretical models that have historically described delay discounting have often failed to account for these additional factors and the following section on models of impulsive decision-making highlights this point. Researchers should consider how cognition, motivation, attention, and working memory intersect when investigating the mechanisms of impulsive decision-making.

Models of impulsive decision-making

A wide range of mathematical formulations have been proposed to explain behavior in impulsive choice tasks, but the breadth of models has not necessarily led to new insights into understanding the mechanism of impulsive choices. Here, we focus on the models that are most pertinent to the cognitive processes discussed in the previous section and point to strengths and weaknesses in the efficacy of the models in shedding light on interpreting empirical data. We present an analysis of 16 models that are grouped in four different families. All models presented have a fundamental focus on predicting subjective value of as a function of reinforcer amount and delay. Throughout this section, we differentiate the models based on whether they are better suited to predict human and/or animal data. We do not include a discussion of models that are designed to predict decision heuristics in hypothetical choice situations only (Marzilli Ericson et al., 2015), nor do we include drift diffusion models that are designed to predict reaction time distributions and/or describe evidence accumulation in place of subjective value (Amasino et al., 2019; Peters & D’Esposito, 2020). Finally, the models here assume that choice behavior follows reinforcer value in a straightforward way. We do not include discussion of separate decision rules (e.g., softmax) that could affect choice behavior (e.g., Rodriguez et al., 2014). Following a discussion of individual models, we evaluate the models overall and then discuss their relation to the empirical results described in the previous section.

Foundational models

Exponential versus hyperbolic discounting

The original discounting equation, derived for economics applications, is the exponential (EXP; Samuelson, 1937; Table 1, Eq. 1), which assumes a constant rate of discounting over time. Because discounting is a constant rate, the EXP predicts rational decision-making so that preferences should not change over time. The sole free parameter is k, which is the discounting rate. This determines the rate at which the subjective value (VD) decays as a function of the delay (D) until future reinforcer receipt. The subjective value is also a function of the reinforcer amount (A). The delay and amount entered in the model are the actual values rather than perceived values.

Table 1 Foundational discounting models including the exponential (EXP), hyperbolic (HYP), Rachlin hyperboloid (RACH), and Myerson–Green hyperboloid (MG)

There are challenges to EXP discounting, with two prominent objections. The first is that there are systematic deviations in the fit of the EXP to data from impulsive choice tasks in humans and animals (e.g., Frederick et al., 2002; Laibson, 1997). The second, and perhaps more important objection, is that the EXP discounting function predicts that choices should be constant over time. For example, if an individual prefers $20 now (the SS) over $50 in 3 months (the LL), they should also prefer $20 in 1 week over $50 in 3 months and 1 week. Instead, preference reversals may occur in which individuals may prefer the LL in the latter example. Preference reversals have been observed in both humans and animals (e.g., Frederick et al., 2002; Green & Estle, 2003). Thus, adding a constant amount of time (1 week in this case) can change preferences, which should not occur under EXP discounting. Preference reversals are thought to occur because discounting rates are higher for shorter delays than longer delays (Thaler, 1981). Another problem discussed below is that EXP discounting assumes that individuals have perfect knowledge of the amounts and delays. Although this may be a feasible assumption in hypothetical discounting tasks (at least in healthy adults), experiential delay and amount tasks could potentially be associated with errors in judging delays and/or amounts. This is a particular issue when judging real delays, as the timing system naturally includes imprecision and inaccuracy in estimates, and timing errors increase as the estimated intervals increase (Gibbon, 1977). Timing errors may result in time contraction—the observation that longer delays are often underestimated and shorter delays are often overestimated.

A widely accepted alternative to the EXP function is the hyperbolic discounting equation (HYP; Mazur, 1987; Table 1, Eq. 2), which has the same parameters holding the same meaning as in the EXP function. The HYP equation typically provides a better fit to impulsive choice functions and correctly predicts preference reversals (Frederick et al., 2002). Note that the HYP equation predicts that decision-making may be irrational so that preferences can shift over time, depending on the discounting rate. However, like the EXP model, the HYP model also assumes perfect knowledge of amounts and delays. In addition, the hyperbolic fits to individual subjects can have systematic deviations from the data, suggesting the need for alternative models. Specifically, both the HYP and EXP tend to overestimate value at shorter delays and underestimate value at longer delays relative to data (McKerchar et al., 2009).

Hyperboloid models

Similar to individual differences in the rate of discounting, not all individuals share the same sensitivity to delays. There are two primary models that incorporate delay sensitivity parameters into the HYP equation, with an attempt to improve fits to data and/or add meaningful parameters to account for amount and/or delay perceptual effects on subjective value. The Rachlin hyperboloid (RACH; Rachlin, 1989; Table 1, Eq. 3) adds a sensitivity to delay (s) parameter as an exponent on the actual delay within the HYP equation. In this model, the s parameter moderates the effects of delay on subjective value, independent of any effect of amount of reward. The s parameter is theoretically meaningful because it reflects the observation that perceived time is a non-linear function of actual time, in this case a power function relationship (Stevens, 1957), which results in time contraction. To capture this effect, s must be between 0 and 1, if s = 1, then this model is equivalent to the hyperbolic.

Alternatively, Myerson and Green (MG; Myerson & Green, 1995; Table 1, Eq. 4) proposed a hyperboloid equation which includes an s parameter on the denominator of the HYP equation. The s parameter is the ratio of amount and delay sensitivities so that s can capture nonlinearities in the perception of delay and/or amount. As a result, the value of s could potentially be greater or less than 1 and be theoretically meaningful (but must be greater than or equal to 0). If s is less than 1, then the function is steeper at shorter delays but shallower at longer delays (reflecting time contraction). If s is greater than 1, this produces a steeper function than predicted by the base HYP function; observations of fits with s > 1 in applications of this model are relatively uncommon (Mitchell et al., 2015). Because delay and amount sensitivity are captured in the same parameter, the MG model does not provide specificity of parameter interpretation in comparison to the RACH hyperboloid. As with RACH, the MG model is equivalent to the HYP when s = 1.

Figure 1 compares the EXP, HYP, RACH, and MG models for a relatively shallow and steep set of discounted value curves. The parameters used to generate the curves in the figure are given in Table 2. Because the RACH and MG models are equivalent to HYP when s = 1, the functions in Fig. 1 show the predictions for the models when k is set to the same value as in the HYP model but with s = .5 (McKerchar et al., 2009). Note that the EXP function predicts that shorter delays lose value less quickly, but longer delays lose value more quickly in comparison to predictions of the HYP. The two models with the s parameter can further moderate the loss in value with delay by setting s less than 1. McKerchar et al. (2010) provides an excellent discussion of the interpretation of the s parameter in these models, for further reference.

Fig. 1
figure 1

Top: Subjective value (VD) as function of delay to reward in seconds for relatively low discounting rate as predicted by the Exponential (EXP), Hyperbolic (HYP), Rachlin hyperboloid (RACH), and Myerson-Green hyperboloid (MG) models. Bottom: VD as a function of delay for a relatively high discounting rate as predicted by the models

Table 2 Parameter values that were used to calculate the functions shown in Fig. 1 for the Exponential (EXP), Hyperbolic (HYP), Rachlin hyperboloid (RACH), and Myerson–Green hyperboloid (MG)

Comparisons of the HYP against the RACH and MG hyperboloid models have yielded some insights—namely, that the models with an s parameter fit data from hypothetical choice tasks in humans better than the base HYP model (McKerchar et al., 2009; Mitchell et al., 2015; Peters et al., 2012), suggesting that sensitivity to delay may be an important contributor to subjective value computations. (Note that comparable assessments of model fits to animal data have not been conducted to our knowledge.) However, comparisons of the RACH and MG fits to data yielded mixed results. McKerchar et al. (2009) reported that both models fit data better than the HYP and that the s parameters were significantly less than 1 for most participants in both models. However, the hyperboloid models were indistinguishable in the fit to the data (see also Rachlin, 2006, for similar findings). It is worth noting that the assessment of the model fits did not control for model complexity, which can produce overfitting, or better approximations of the data, at the cost of generalizability to new data sets (Babyak, 2004). Mitchell et al. (2015) indicated that the RACH and MG hyperboloids fit the data better than the HYP when controlling for model complexity. However, a comparison of the RACH and MG models yielded somewhat mixed results with the RACH model fitting the data better in some, but not all cases. Finally, Peters et al. (2012) found that the two hyperboloid models fit the individual participant’s data better than the HYP model, but were less sensitive at detecting group differences; this was also noted by Mitchell et al. (2015). This is likely because k and s in both hyperboloid models can be strongly negatively correlated (Mitchell et al., 2015; Peters et al., 2012), so that the two parameters compete to explain the same variance (multicollinearity). This will reduce sensitivity for both parameters to distinguish group differences. In addition, the s parameter values may not distinguish group-level effects very well, regardless of multicollinearity, creating challenges for interpretation (Mitchell et al., 2015).

Despite their issues, the hyperboloid models are important in their recognition of the nonlinear nature of the psychophysical relationship between actual and perceived delays and amounts (McKerchar et al., 2010), which may take the form of a power function (Stevens, 1957). The RACH model can be directly interpreted in the framework of Stevens’s power law. The MG model is more complicated where s is the ratio of sensitivities to amount and delay. Nevertheless, the MG model does still propose a psychophysical scaling of amount and delay.

Models derived from properties of the timing system

The dual parameter delay sensitivity models incorporate psychophysical principles with the s parameter added onto the base HYP model. Another approach has been to derive models directly from the known properties of the timing system. Although temporal discounting almost certainly necessitates timing processes, models of discounting and timing have been developed largely independently of one another. The models in this section aim to bridge the gap between timing processes and discounted value computations and yield interesting insights into potential mechanisms of impulsive choice behavior.

One early attempt to explain hyperbolic discounting within the timing system was by Gibbon (1977), who showed that hyperbolic discounting emerged directly from the scalar property of time perception. The scalar property is derived from Weber’s law and reflects the relative nature of time. This is seen in timing errors, where the standard deviation in timing estimates increases linearly with the mean estimated delay. In addition, delay and amount judgments take a relative form in which the ability to discriminate two delays follows a ratio rule (discriminating 2 vs. 4 s is the same difficulty as 20 vs. 40 s). Cui (2011) later revisited this issue, proposing the scalar timing model (ST; Table 3, Eq. 5) to explain hyperbolic discounting. This model incorporates two parameters to reflect the Weber fractions for amount and delay, which measures the just-noticeable difference that participants can reliably detect. Unlike the MG hyperboloid, the ST model provides separate parameters for delay and amount sensitivity that are distinct (see Young, 2018, for further discussion of this issue). This can provide an advantage of parsing specific cognitive mechanisms, as it appears that delay and amount processes exert different influences on human choice behavior in hypothetical tasks (Amasino et al., 2019). Interestingly, the ST model does not include a discounting rate parameter. Instead, hyperbolic discounting emerges directly from the psychophysical properties of delay and amount perception. This raises a key consideration as to whether predictions of impulsive choice require the inclusion of a discounting parameter, or if discounting may be explained by other established cognitive and/or perceptual processes.

Table 3 Models derived from properties of the timing system including the scalar timing (ST), constant sensitivity (CS), logarithmic timing (LOG), Modified Generalized Hyperbolic (MGH), Kim–Zauberman (KZ), and Training-Integrated Maximized Estimate of Reward (TIMERR)

The other models in this family have taken the approach of integrating timing system functions into the EXP model coupled with nonlinear perception of delays. Note that there has been an extensive discussion of whether timing is linear or nonlinear in humans and animals and nonlinear time perception has not always been supported (Crystal, 2001; Gibbon & Church, 1981; Wearden & Jones, 2007; Yi, 2009). The timing system approach is related to the RACH hyperboloid in that it assumes discounting (in this case an exponential function) coupled with nonlinear delay perception. The contrast sensitivity model (CS; Ebert & Prelec, 2007; Table 3, Eq. 6), which was implemented to fit data from human hypothetical choice tasks, separates the impatience component of delay discounting (k) from the delay sensitivity component (s) like the hyperboloid models. Here, s is the exponent of a power function relating perceived and actual time, as in the RACH model. Thus, when s = 1 (linear timing), exponential discounting is observed and when s is significantly less than 1, discounting is approximately hyperbolic. This model was shown in one study to provide a superior fit to data from a hypothetical choice task in humans compared with the HYP, EXP, RACH, and MG models (Peters et al., 2012), suggesting that exponential discounting coupled with nonlinear (power function) time perception may be a better modeling approach.

In addition, the CS model is supported by the observation of sub-additive discounting in hypothetical choice tasks (Read, 2001). This phenomenon occurs when a time interval is divided into subintervals. Because the subintervals tend to be overestimated (due to their shorter delay) relative to the full interval, the subjective value of the subintervals decreases. This leads to the observation that the total subjective value over the subintervals is lower than the subjective value when the interval is judged as a whole. Subadditive discounting is explained by the power function relationship between perceived and actual delay. When coupled with EXP discounting, the effects of subadditive discounting on perceived delay translate into alternations in subjective value. Note that the HYP function does not explain subadditive discounting—instead the HYP is additive over any subdivision of delays (Read, 2001), so the prediction of subadditive discounting is an advantage of the CS model. Although the CS model has some promising features, further comparisons across a broader range of data sets and empirical phenomena (including application to human experiential and animal choice tasks) are needed before drawing strong conclusions.

Alternatively, Takahashi (2005) developed logarithmic timing model (LOG; Table 3, Eq. 7) that couples an exponential discounting model with logarithmic timing of delays instead of a power function. The LOG model was based on the observation that logarithmic timing, derived from Weber’s law produced a hyperbolic function. A logarithmic timing function is predicted by Weber’s law and is the preferred framework for interpreting timing data in humans and animals (Gibbon, 1977), as opposed to the power function predicted by Stevens’s law (S. S. Stevens, 1957). In addition, Takahashi et al. (2008) compared the HYP, LOG, and CS models in their fit to hypothetical choice data in humans and found that the LOG outperformed the CS, followed by the HYP. Thus, both the observation that temporal psychophysics tends to follow logarithmic timing (although not always—see above) coupled with better performance of the LOG model argues in favor of a logarithmic nonlinear representation of time coupled with EXP discounting.

Interestingly, the LOG model is mathematically equivalent to the MG hyperboloid (Takahashi, 2005). In the transformation, the β parameter from the LOG model serves in place of the discounting rate (k) in the MG hyperboloid and the α and k parameters in the LOG model combine multiplicatively to serve in the place of the s parameter. The observation that the same mathematical function can be produced by adding an exponent (ratio of two power functions for delay and amount sensitivity) to a HYP or by adding logarithmic time perception to an EXP cuts to the heart of the core debate about whether choices are rational (EXP) or irrational (HYP). If choices are rational but perception is flawed, that is a different systemic conclusion than if choices are irrational (and perception may also possibly be flawed; see Nakahara & Kaveri, 2010, for further discussion of this issue). For example, this could lead to very different approaches for treatment of impulsive choice as the LOG models suggests targeting faulty perception and the MG suggests targeting impatience. The two models’ predictions of choice behavior are identical.

A few studies have attempted to address this issue by measuring perceived time (which may be logarithmically related to actual time) and then entering those measurements into an EXP function to predict choice behavior in hypothetical tasks (Agostino et al., 2021; Zauberman et al., 2009). These studies have supported the idea that accounting for time perception leads to exponential rather than hyperbolic discounting in human participants. Chen and Zhao (2021) examined this issue by studying time perception (in a production task) and impulsive choice (in hypothetical tasks) under cognitive load and found that time perception was a causal mediator of impulsive choice. This suggests a possible causal relationship between time perception and impulsive choice. However, these conclusions are based on results from only a small number of studies and have only been applied to hypothetical choice tasks in humans. These results need to be verified by future research in nonhuman animals and using experiential choice tasks in humans. In addition, not all individuals conform to exponential discounting when accounting for their timing perception.

As an extension to the LOG model, Agostino et al. (2021) proposed a modification of the generalized hyperbolic equation suggested earlier by Loewenstein and Prelec (1992) with an added parameter (s) to represent the exponent of the power function relating perceived and actual delays. The modified generalized hyperbolic (MGH; Table 3, Eq. 8) is an EXP discounting model with a free parameter (h) that represents the magnitude of deviation from exponential discounting so that as h approaches 0, the function yields EXP discounting and when h = k, HYP discounting is observed. Agostino et al. (2021) measured participants perceived delays and used the measurements to calculate sensitivity to delay (s). When they subsequently calculated subjective value for participants’ hypothetical choice behavior (using observed s values), they found that exponents were significantly less than 1, indicating time contraction. In addition, most (but not all) participants had h parameters near 0, consistent with EXP discounting. Thus, when accounting for time perception, discounting appears to be more exponential (rational) than hyperbolic, but with some individual differences. A nice feature of this model is that the discounting rate (k), deviation from constant discounting (h), and delay sensitivity (s) are separate parameters, thus presenting opportunities for modeling specific mechanisms. Future applications should confirm whether these parameters are truly independent when modeling individual differences (i.e., absence of multicollinearity).

Figure 2 compares the HYP, ST, CS, LOG, and MGH predictions for a relatively shallow and relatively steep discounting rate. Because the models in this set are independent from the HYP, we calculated the set of parameters for each model that best fit the HYP function using the Solver Tool in Microsoft Excel 2016. The HYP was used as the comparison model here (and in subsequent model assessments) because it is the dominant foundational model of impulsive choice behavior. The resulting parameters are given in Table 4. Although the ST model achieved a general HYP form, this model deviated from the HYP at the shallower discounting rate. This could potentially represent a weakness of the model when fitting discounting functions that have a shallower slope, an issue that should be assessed in future research. At the steeper discounting rate, the ST model closely approximated the HYP. For the ST model, the Weber fraction for amount (a) was significantly lower than the Weber fraction for time (b) in both curve fits. Thus, the ST model predictions for these two curve fits were associated with Weber fractions with delays that were less discriminable than Weber fractions for amounts. This suggests that errors in time perception may serve as a stronger predictor of impulsive choices than errors in amount perception. We are not aware of any formal comparisons assessing this issue, so this could be a fruitful avenue for future research.

Fig. 2
figure 2

Top: Subjective value (VD) as function of delay to reward in seconds for relatively low discounting rate as predicted by the Hyperbolic (HYP), Scalar Timing (ST), Constant Sensitivity (CS), Logarithmic Timing (LOG), and Modified General Hyperbolic (MGH) models. Bottom: VD as a function of delay for a relatively high discounting rate as predicted by the models

Table 4 Parameter values use to calculate the functions shown in Figs. 2, 3 and 4 for the hyperbolic (HYP), scalar timing (ST), constant sensitivity (CS), logarithmic timing (LOG), modified general hyperbolic (MGH), Kim–Zauberman (KZ), and training-integrated maximized estimate of reward rate (TIMERR) models

Applying Solver for the MGH fits for both curves with no constraints converged on an h = 0, which resulted in an error. Because forcing h = k would result in a fit identical to the hyperbolic [h = k = .10 (or .80) and s = 1], we next fit the data setting h to .000001 (an EXP shape) and the model was fit by allowing k and s to vary. When h was set to an EXP shape, the CS and MGH k and s parameters were the same. This is expected given that the MGH is the same as the CS model when setting h to an EXP shape. Thus, the model fits were not especially illuminating in this example, but we include them for illustration purposes. Because the MGH model allows for other forms of discounting functions, thus leading to additional model flexibility, it may be useful when fitting individual participant functions. Whether such flexibility is necessary to account for data needs further testing.

The LOG model provided an excellent fit to the HYP function (Fig. 2) for both steep and shallow curves. The α, β, k values were clustered with lower α but higher k and β values associated with the steeper discounting function. Note that the β values in the LOG model converged on the same value as in the HYP (.10 and .80, respectively); as a reminder, β in the LOG model is equivalent to k in MG hyperboloid. The main advantage that the LOG model may have (in addition to its links with psychophysical properties of time perception) is that the α and k parameters may represent different psychological constructs. However, whether they add unique prediction capability remains to be confirmed.

Kim and Zauberman (2009) proposed an EXP model incorporating a power function relating perceived and actual delays (KZ model; Table 3, Eq. 9) like the CS model. They applied the model to fit hypothetical impulsive choices in humans. One advantage of the KZ model over the LOG model is the inclusion of separate equation for computing perceived delay (d) that has the potential to link psychologically meaningful parameters from the model to different facets of time perception. The α parameter represents timing accuracy with values greater than 1 indicating overestimation, equal to 1 indicating accurate estimation, and less than 1 indicating underestimation. In addition, the model has a sensitivity to delay parameter, s, which reflects the degree of nonlinearity in timing (or degree of time contraction, as in previous models) and must be between 0 and 1 to achieve HYP discounting. Given the previous discussion of linear versus nonlinear timing, building flexibility for different representations of time may be an advantage of this model. Note that when α and s are equal to 1 then the discounted value follows an EXP function. Thus, the discounting function is a simple EXP but with perceived delay in place of actual delay. As noted previously, when timing error is accounted for empirically then HYP discounting is observed within an EXP function (Agostino et al., 2021; Zauberman et al., 2009). This is because participants tend to overestimate short and underestimate long delays. When perceived times are used to calculate value, then the value of short delays decrease and long delays increase, relative to the base EXP function, thus producing a HYP function.

Figure 3 shows the subjective value from the KZ model as a function of perceived time (Vd; top panel) and the perceived time used to calculate subjective value (d; bottom panel) as a function of actual delay for the steep and shallow discounting parameters in comparison to the HYP. The best-fitting parameters to the HYP were determined and are reported in Table 4. This model predicts that perceived delay should directly affect discounting, consistent with previous literature. Note that the fitted parameters to both HYP discounting functions indicated modest overestimation of delays (α > 1) but this was coupled with large nonlinearities in time so that longer delays were substantially underestimated; k-values also were higher for the steeper function. The correlation between β and k values should be assessed when fitting individual discounting functions in future work. The predicted level of time contraction is more extreme than what is likely to be observed in data. For example, the predicted perceived delay for 30 s in both steep and shallow functions, is severely underestimated even in the shallow function. In addition, it seems unlikely that time contraction (underestimation of longer delays) would co-occur with overestimation of delays. These issues may present weaknesses for the model in application to data.

Fig. 3
figure 3

Top: Subjective value as function of actual delay to reward (VD) or perceived delay to reward (Vd) in seconds for relatively low (1) and high (2) discounting rates as predicted by the hyperbolic (HYP) and Kim-Zauberman (KZ) models. Bottom: Perceived delay (d) as a function of actual delay for the KZ models associated with relatively low (1) and high (2) discounting rates

Another weakness of all the preceding timing models is that they do not include any parameters to model specific substituents of the timing system that are common elements in timing models (e.g., clock, attention, memory, decision). Taking a more process-driven approach, the biological clock model (BIO; Ray & Bossaerts, 2011) proposes that hyperbolic discounting emerges from a biological clock that has a variable clock speed with the variation in clock speeds that are positively correlated over time (e.g., the most recent clock speed is similar to the current clock speed). The BIO model proposes EXP discounting coupled with a stochastic biological clock. The BIO model is interesting in that it connects naturally with the noisy nature of neural timing systems that has been proposed to account for timing phenomena in humans and animals (Oprisan & Buhusi, 2013, 2014). In addition, clock speeds can be context and stimulus dependent in humans and animals, thus providing an opportunity to account for state versus trait effects on timing (Wearden, 2007; Wearden et al., 1998, 1999; Wearden & Penton-Voak, 1995) that could be extended to account for effects on discounted value computations. The simulations reported by Ray and Bossaerts (2011) yielded an important insight—as the clock in the BIO model became noisier and clock speeds were more highly correlated over time, discounting functions were increasingly hyperbolic, even though the discounting equation is an EXP function. This suggests that neural noise in the timing system could be a key component that differentiates HYP and EXP discounting. In addition, the correlation of clock speeds over time suggests that one should see individual differences in timing that correlate with individual differences in impulsive choice, which has been reported in both humans and animals (Baumann & Odum, 2012; Brocas et al., 2018; Darcheville et al., 1992; Marshall et al., 2014; McClure et al., 2014; Moreira et al., 2016; Navarick, 1998; Paasche et al., 2019; Smith et al., 2015; Stam et al., 2020; van den Broek et al., 1992; Wittmann & Paulus, 2008). In addition, it has been suggested that individuals who are impulsive may also have a faster clock (Barratt, 1983), although this has not been directly assessed to our knowledge. Finally, although timing and impulsive choice are often correlated, this is not always the case, particularly in studies that have examined delay effects on impulsive choice and timing reported in the previous section. In addition, some reports have observed correlations between timing accuracy and impulsive choice and/or timing precision and impulsive choice. So, the picture in terms of the timing-choice relationship is not entirely clear. Further research is needed to pinpoint specific timing and choice mechanisms that may drive this relationship. As a final note, the BIO model was developed for simulation rather than explicit solution, so we do not include any equations here, but the model can be implemented to converge on a highly similar solution to the MGH model if clock speeds increase systematically with delays farther in the future, thus yielding time compression.

The BIO model suggests that incorporating timing processes into discounted value equations can produce high-quality fits to data that can model specific cognitive mechanisms stemming from the timing system. Taking this a step further, the training-integrated maximized estimate of reward rate model (TIMERR; Namboodiri et al., 2014; Table 3, Eq. 10) incorporates timing processes with reinforcer maximization processes (see next section for further details). TIMERR is based on the fundamental assumption that animals (and humans) estimate the rate of past reinforcers earned (aest) over a finite window of time (Time) based on past experiences with delays to reinforcement (integration window). The TIMERR model does not include a discounting rate parameter; instead, the steepness of the discounting function is governed by the reciprocal of the Time over which reinforcement rates are computed. As with the two previous models, TIMERR assumes that subjective value is a function of the neural representation of delay, d′. This representation is affected by the duration of the integration window, Time, leading to the prediction that individuals who are more impulsive should also show impaired time perception (Baumann & Odum, 2012; Brocas et al., 2018; Darcheville et al., 1992; Marshall et al., 2014; McClure et al., 2014; Moreira et al., 2016; Navarick, 1998; Paasche et al., 2019; Smith et al., 2015; Stam et al., 2020; van den Broek et al., 1992; Wittmann & Paulus, 2008). Specifically, TIMERR predicts that impulsive individuals should overestimate delays and show inconsistent timing of delays (i.e., poor timing precision), consistent with empirical studies. The TIMERR model predicts 19 independent phenomena relating to impulsive choice and time perception observed in humans and animals including foundational phenomena such as the hyperbolic nature of discounting and the scalar property of time perception (Namboodiri et al., 2014). TIMERR also posits that the Weber fractions for delay and amount perception should predict discounted value (Namboodiri et al., 2014), thus bringing this model into line with the ST model. In addition, TIMERR proposes that the Weber fractions should depend on reinforcement history, an interesting prediction that deserves further testing.

Figure 4 (top panel) shows the fit of the TIMERR model to the HYP with an estimated reinforcement rate (aest) set to 0.01 and Time allowed to vary as a free parameter against the HYP with the steep and shallow discounting rates (see Table 4 for parameter values). TIMERR closely approximates the HYP function, although it predicts a steeper decline in value compared with the HYP at the shallower discounting rate. TIMERR achieves steeper discounting by decreasing the length of the integration interval. Because this model integrates reinforcement computations over time, it can incorporate learning of delays and amounts of reinforcers. An interesting prediction of this model is that more impulsive individuals (have a shorter Time window) should learn to adjust their behavior more quickly when the delay or amount change, a proposition that deserves further testing. Impulsive individuals should also show more volatile choice and timing behavior when confronted with an environment that contains variable amounts and/or delays. The bottom panel of Fig. 4 shows the predicted neural representation of delay as a function of actual delay. The d' value would need to be translated to perceived delay for predicting behavioral outputs. The Time parameter produces a large effect on represented delay, thus leading to a predicted correlation between timing and choice behaviors like the KZ model. Where d′ is a neural representation, it is unclear whether the quantitative prediction of perceived delay by this model would be accurate. The predicted d′ for a 30-s delay in Fig. 4 (bottom) is approximately 9 times longer for the shallower function, which may be unrealistic. This deserves further attention in future applications of the model.

Fig. 4
figure 4

Top: Subjective value as function of actual delay to reward (VD) or neural representation of the delay to reward (Vd) in seconds for relatively low (1) and high (2) discounting rates as predicted by the hyperbolic (HYP) and Training-Integrated Maximized Estimate of Reward Rate (TIMERR) models. Bottom: Neural representation of delay (d′) as a function of actual delay for the TIMERR models associated with relatively low (1) and high (2) discounting rates

Overall, the models discussed in this section provide alternative frameworks for conceptualizing impulsive choice and subjective reward valuation. The CS, LOG, MGH, and KZ models assume EXP discounting with nonlinear timing, although the MGH model does allow for the possibility of HYP discounting with nonlinear timing (with the h parameter). These models were developed to fit data from human hypothetical choice tasks and have not been assessed in their application to human experiential or animal choice data. The ST, BIO, and TIMERR models propose alternatives to discounting. In the ST model, discounting emerges from the Weber fractions for time and amount. In the BIO model, discounting emerges from a stochastic clock coupled with an autocorrelation in clock speeds. Finally, the TIMERR model assumes that subjective value is a direct function of the Time window for integrating estimated reinforcement rates, which also governs the neural representation of delays. These models were developed based on similar principles observed in humans and animals. Although the models have different conceptualizations of the underlying mechanisms of impulsive choices, all adequately fit the HYP function in the above simulations. This creates a major challenge for differentiating models and indicates a strong need for more sensitive and comprehensive methodologies to test these models. Rigorous testing of the models in their fit to data from different species and choice tasks, assessment of the ability of parameters to differentiate groups while adequately fitting individuals, evaluation of the unique link to psychologically meaningful variables, and assessment of multicollinearity of parameters is much needed.

Reinforcement maximization and reinforcer valuation models

An alternative set of models has been developed to explain reinforcement maximization and reinforcer valuation processes that have been linked with impulsive choice. The reinforcement maximization models have their origins in studies of foraging behaviors in nonhuman animals. Impulsive choice tasks can be conceptualized as a form of foraging task in which individuals decide between two patches with differing reinforcement rates (Amount/Delay). Optimal foraging theory (Stephens & Krebs, 1986), as applied to impulsive choice tasks, proposes that animals aim to maximize their future reinforcement rates over spans of time. In this formulation, the reinforcement rate is computed over both the trial and ITI. In addition, the reinforcement rate computation has no free parameters—thus, amount and delays are assumed to be accurately perceived (as in EXP and HYP models). On the other hand, ecological rationality theory proposes that individuals seek to maximize reinforcement rates during the trial time only (Bateson & Kacelnik, 1996). This model also assumes no free parameters. Both theories can generate hyperbolic functions, and there are mixed results regarding attention to post-reinforcer delays (or ITI), as discussed above.

More recently, the bounded rationality model (BR; Blanchard et al., 2013; Table 5, Eq. 11) was developed to rectify these two extremes and explain discrepancies in results. The BR model proposes that discounting is determined by the estimated post-reinforcer delay, thus allowing individuals to give varying weight to post-reinforcer delays in determining reinforcement maximization. The estimated post-reinforcer delay (ω) produces a hyperbolic value function without a discounting parameter. A scale parameter (X) translates the model fit to subjective value; in the most straightforward application X can simply be set to equal ω for a direct translation of the estimated delay to subjective value. Figure 5 shows the model fit to the relatively shallow and steep HYP functions, with the associated parameters give in Table 6 (showing that the differences in discounting functions is driven by ω).

Table 5 Reinforcement maximization models and reinforcer valuation models including the bounded rationality (BR) model, additive utility (UTIL) model, multiplicative (MULT) model, and reward sensitivity (REW) model
Fig. 5
figure 5

Subjective value as a function of delay to reward (VD) in seconds for relatively low (1) and high (2) discounting rates as predicted by the Hyperbolic (HYP), Bounded Rationality (BR), and Addictive Utility (UTIL) models. Note that the functions for the BR model are jittered on the x-axis for presentation purposes

Table 6 Parameter values used to calculate the functions shown in Figs. 5, 6 and 7 for the bounded rationality (BR), additive utility (UTIL), multiplicative (MULT), and reward sensitivity (REW) models in comparison to the hyperbolic (HYP)

The TIMERR model is also applicable here. TIMERR bridges between the models derived from the timing system and reinforcement maximization models due to its focus on estimated reinforcer rates over a moving time window. TIMERR further assumes that animals only include amounts and delays associated with active trial time; thus, post-reinforcer delays are not included in the estimated average reinforcement rate computation. This may be a weakness of the model as the emerging picture in the field is that individuals may underestimate ITI durations rather than ignoring them and that the ITI may be attended to under certain circumstances such as when the ITI is cued (Blanchard et al., 2013). These findings favor including the perceived post-reinforcer delay as a free parameter, as in the BR model.

To assess the generality of reward maximization and choice models, Carter and Redish (2016) developed highly comparable tasks for measuring impulsive choice and foraging decisions in rats. They fit optimal foraging theory, ecological rationality theory, HYP, EXP, BR, and TIMERR models to both tasks. They found that all six models fit the data from both tasks to a similar standard; however, they did not control for model complexity in assessing the fits. In addition, none of the models were able to fit the data from the two tasks using similar parameters. This is problematic given that the tasks were developed to be highly comparable and thus the models should fit the tasks in a comparable way. This suggests that the models are not sufficiently general to account for choice behavior in tasks with different structures, which may be a general challenge to all these models.

Although the previous models in this section focus on maximization of reinforcer amount, other models have instead focused on elements of reinforcer quality. The additive-utility (UTIL) model (UTIL; Killeen, 2009; Table 5, Eq. 13) proposes that discounting applies to utility rather than amount of reinforcement. Although the UTIL model was primarily developed to fit data sets from human hypothetical impulsive choice tasks, the model was designed to fit data from nonhuman animals (and presumably human experiential choice tasks) as well. The UTIL model thus has a utility parameter (α) in addition to parameters for discount rate (k) and sensitivity to delay (s). Utility refers to the usefulness of a reinforcer and is not necessarily the same as the amount of a reinforcer. In general, the usefulness of money tends to relate directly to amount, but consumables may have an inverted U-shape function where larger amounts may be more valuable only to a point and then could decrease thereafter. For example, satiety effects could lead to very large amounts of food having lower value than moderate amounts of food. Generally, when α = 1, then discounting is linear and if α = 0 then the utility model is the same as the CS model (Ebert & Prelec, 2007). In addition, if s = 1 then discounting is an EXP function and if s is small then discounting is a HYP function.

As seen in Fig. 5, the UTIL model provides a good approximation to the HYP function. The fitted parameters in Table 6 indicate that the steeper discounting function was associated with a higher k value and a lower sensitivity to delay (s); it is possible that this model could suffer from a similar issue to the hyperboloid models in that k and s may be multicollinear. This is an issue that should be examined when fitting the model to data sets. Note that the utility parameter did not differ appreciably between the relatively shallow and steep functions. This is expected because the reinforcer was the same in those conditions. The main benefit of this model may lie in its potential to explain discounting in impulsive choice tasks with consumable reinforcers that are experienced during the task or in cross-commodity assessments where qualitatively different reinforcers (e.g., cocaine and food) could be directly compared based on their utility.

Another model that is designed to capture reinforcer quality and incentive features in choice data from nonhuman animals is the multiplicative model (MULT; Ho et al., 1999; Table 5, Eq. 13). This model derives its name from the fundamental idea that different task variables can have an associated discounting equation that combine multiplicatively. For example, a task could include variations in delay, amount, and probability, each of which would have a separate discounting equation that would then multiply together to form an overall subjective value computation. The application of MULT to impulsive choice tasks includes a standard HYP equation for the delay component multiplied with a discounting equation for amount, which includes a separate discounting parameter/sensitivity (Q) for reinforcer amount. Although most impulsive choice tasks focus on presenting the same type of reinforcers with different amounts, real-world decisions often involve outcomes that differ in quality. The incentive value parameter can operate like a utility parameter by accounting for the motivational value of the reinforcer.

Figure 6 shows the effect of Q on the HYP function. Here, because the best fit of the model would converge on Q = 0 (identical to the HYP equation), instead of fitting the model we simply demonstrate the impact of reducing reinforcer incentive and quality (see Table 6 for parameter values). In this example, the Q parameter reduces overall value (intercept of the value function) without changing the slope, all other things being equal. Unlike previous models, which assume that the subjective value at a 0-s delay is equal to the amount of the award, this model can modulate value at a 0-s delay. In the example in Fig. 6, the value at 0 s is less than 1, which could be used to explain phenomena that affect incentive value of a reward, independent of delay.

Fig. 6
figure 6

Subjective value as a function of delay to reward (VD) in seconds for relatively low (1) and high (2) discounting rates as predicted by the Hyperbolic (HYP) and Multiplicative (MULT)

An alternative model in this set is the reward sensitivity model (REW; Locey & Dallery, 2009; Table 5, Eq. 14), which was developed to explain amount sensitivity effects in animal impulsive choice data. This model adds an exponent onto the amount in the hyperbolic equation that represents amount sensitivity (z). Because the best fitting model to the HYP would converge on z = 1, instead of fitting the HYP model we show the impact of z on the value function in Fig. 7. The effect of setting z less than 1 is to alter the intercept of the value function without changing the slope, with larger effects associated with shallower rates. The impact of z is similar to the effect of Q in the MULT model. (Note that the z parameter has alternatively been proposed to multiply onto A for an alternative formulation that also alters the intercept of the function; Reynolds et al., 2002). It is possible that these models may be difficult to distinguish in their fit to data because of their similar nature. One noteworthy issue with the REW model is that the z-parameter does not moderate reinforcer sensitivity if the amount is equal to 1, which is a frequently used smaller-sooner amount in animal choice tasks. For this reason, the curves in Fig. 7 were generated for an amount of 2. The alternative formulation with a z multiplying onto A would be another way to solve this problem.

Fig. 7
figure 7

Subjective value as a function of delay to reward (VD) in seconds for relatively low (1) and high (2) discounting rates as predicted by the Hyperbolic (HYP) and Reward Sensitivity (REW) models

Overall, the models in this section complement the previous models by emphasizing reinforcement processes. The reinforcement maximization models (e.g., BR model) supply an alternative discounting-free modeling approach. The UTIL and MULT models are designed to incorporate aspects of reinforcer quality, motivational value, and/or usefulness of the reinforcer. Additionally, the MULT and REW model can account for individual differences in reinforcer sensitivity, aligning with psychophysical principles of amount perception. The MULT and REW models are focused on predicting results from animal choice tasks whereas the UTIL model could potentially fit data from humans and animals. As with many of the previous models, the reinforcement maximization and reinforcer valuation models need to be rigorously tested to assess their comparative fits to data sets.

Present focused models

The final group of models under consideration includes a balance of dual processes relating to immediate versus delayed outcomes. Both theories in this section are related to the CNDS theory described earlier, but with the added opportunity to predict and understand data quantitatively.

The quasi-hyperbolic (QUASI; Laibson, 1997; Table 7, Eq. 15) was developed to explain behavior on delay gratification tasks in humans and includes a bias parameter for immediate or present outcomes (k1) and a discounting parameter for delayed outcomes (k2). Note that when k1 = 1, then discounting follows a standard EXP model, whereas k1 < 1 reflects a present-bias and k1 > 1 reflects a future bias. Present bias has been linked to “hot” cognition, or impulsive processes, whereas the devaluation of delayed outcomes is thought to present “cool” cognition emerging from executive processes related to self-control. Note that this theory has a discontinuity; if the delay is zero, then the value is set equal to the amount. Fundamentally, the QUASI model is an EXP curve with two discounting parameters. As with several previous theories, the QUASI model assumes perfect perception of amounts and delays.

Table 7 Present-focused models that include dual processes for valuation of immediate versus delayed reinforcers including the quasi-hyperbolic (QUASI) and double exponential (DBEXP) model

Figure 8 shows the fit of the QUASI model to the shallow and steep HYP functions. The discontinuity in the function creates a deviation in the fit to the HYP at the shallower discounting rate (top panel). The parameter values for the QUASI fits are given in Table 8. The k2 values were similar for the two discounting curves, but the k1 value was much lower for the steeper discounting curve. Both functions were associated with significant present bias (k1 < 1). This indicates that steeper discounting may emerge predominantly from present bias rather than from discounting of delayed reinforcers in this model, at least for these two examples.

Fig. 8
figure 8

Top: Subjective value (VD) as function of delay to reward in seconds for relatively low discounting rate as predicted by the Hyperbolic (HYP), Quasi-Hyperbolic (QUASI) and Double Exponential (DBEXP) models. Bottom: VD as a function of delay for a relatively high discounting rate as predicted by the models

Table 8 Parameter values used to calculate the functions shown in Fig. 8 for the quasi-hyperbolic (QUASI) and double-exponential (DBEXP) in comparison to the hyperbolic (HYP)

To test whether time perception would add predictive value to the QUASI model, Brocas et al. (2018) measured performance on a time estimation task and a hypothetical impulsive choice task in humans. As with previous literature, they found that overestimation of delays greater than 1 hr was associated with greater impulsive choices. They fit the QUASI function to the impulsive choice tasks using perceived time estimates versus actual time. Both models fit the data well, suggesting that the QUASI model with actual delays may be a reasonable approximation to data when perceived delays are not measured. This does not dismiss the importance of perceived time, but this factor may be less impactful in the QUASI model where two discounting parameters are available to account for more variance in the functions. Note that if the QUASI model were tested with k1 = 1 coupled with perceived delay, then this model would be the same as the KZ model.

A closely related theory, designed to predict hypothetical impulsive choices in humans, is the double-exponential (DBEXP; Van den Bos & McClure, 2013; Table 7, Eq. 16), which assumes separate discounting parameters for the impulsive valuation system (k1) and executive control system (k2). The relative weight of the two systems (ω) is an additional free parameter. This model has the advantage of adding the balance of the two systems, but at the cost of an additional free parameter compared with the QUASI model. One advantage of the DBEXP is that it does not have a discontinuity in the function. As a result, the fits of the DBEXP to the HYP model were excellent (Fig. 8). As seen in Table 8, the DBEXP fit the steeper curve with a lower k2 value and a lower ω but a similar k1. Thus, the steeper hyperbolic curve was modeled by lowering both the weight assigned to the control system and the discount rate for the control system. This is an interesting contrast to the QUASI model where the main parameter change was in the present bias parameter.

The difference in the parameter settings of the two models in the group should be explored further with rigorous comparisons of the models to relevant data sets. Ideally, such modeling should be accompanied by additional cognitive testing of impulsive and executive control systems to confirm which of the two systems is most likely different between steeper and shallower discounters. For example, the impulsive choice task could include tests with both rewards delayed versus testing with a 0-s SS to assess present bias (Mitchell & Wilson, 2012).

Model evaluation

The models across the four sets in this section can be categorized in different ways. First, most models explicitly assume a discounting process including the EXP, HYP, RACH, MG, CS, LOG, MGH, KZ, UTIL, MULT, REW, QUASI, and DBEXP models. These models differ in whether the base model equation is an exponential (EXP, CS, LOG, KZ, UTIL, QUASI, and DBEXP models) or a hyperbolic (HYP, RACH, MG, MULT, and REW). The MGH model can take either an EXP or HYP form. Because the empirical literature is more consistent with a HYP function, the EXP functions produce hyperbolic shapes by assuming nonlinearities in delay sensitivity (RACH, MG, CS, LOG, MGH, KZ, UTIL) and/or amount sensitivity (MG, UTIL, MULT, REW). A final set of discounting-free models propose alternative mechanisms including Weber fractions for delay and amount (ST), a noisy biological clock (BIO), a moving time window for reinforcement rate computation (TIMERR), or variations in estimated post-reinforcement delay (BR). These models can achieve a hyperbolic or hyperbolic-like form. The fact that so many models can account for the same data is a serious challenge for the field that needs to be rectified. One way forward would be to test the model assumptions regarding the underlying mechanisms that drive choice behavior using other cognitive and impulsive choice tasks. So far, only limited tests of this sort have been conducted. One example includes the assessments using time perception measurements embedded within exponential models to predict impulsive choices. However, these tests were only applied to predict behavior on human hypothetical choice tasks and have not yet been assessed in human experiential or animal choice tasks. The models are further discussed in the next section considering the learning, cognitive, and motivational factors from the previous section as a further way to delineate their potential efficacy.

Another key issue is that the models are not often directly compared in their fit to data, and when comparisons have been made, they have only involved a limited set of models (e.g., Carter & Redish, 2016; McKerchar et al., 2009; Mitchell et al., 2015; Peters et al., 2012). Model fit indices often fail to account for differences in model complexity. Using an Akaike information criterion and/or Bayesian information criterion (Burnham & Anderson, 2004) can provide a better indication of model fit, as opposed to variance accounted for indices. This issue is especially pertinent when comparing models with different numbers of free parameters. Models can also be assessed using nonlinear multilevel modeling, a regression technique that fits data to individuals and groups within a single model (Young, 2018), as opposed to the more common approach of fitting individuals and assessing group-level model parameters separately. Finally, multicollinearity of model parameters needs to be examined more thoroughly. This is an issue raised with the RACH and MG hyperboloid models that affected sensitivity of these models in predicting group-level effects. Most models in this section have not been assessed for multicollinearity with multiple data sets, at least not to our knowledge.

Model assessment in relation to empirical findings

In addition to the practical issues with model assessments in the fit to data, models can be assessed in their predictions relating to specific mechanisms of choice behavior. With respect to the learning, motivational, and cognitive factors discussed previously, different models can predict specific outcomes.

The review of learning research on impulsive choice focused on training conditions that affect impulsive decision-making. The models above do not have parameters that account for training histories. The only model that could accommodate training effects is TIMERR, which has a moving integration window that can capture recent reinforcement history. This is an important future direction for researchers to explore as training history clearly does affect behavior on impulsive choice tasks. The understanding of learning processes is additionally valuable in highlighting what mechanisms are important.

For example, the research linking impulsive choice to timing abilities (Baumann & Odum, 2012; Brocas et al., 2018; Darcheville et al., 1992; Marshall et al., 2014; McClure et al., 2014; Moreira et al., 2016; Navarick, 1998; Paasche et al., 2019; Smith et al., 2015; Stam et al., 2020; van den Broek et al., 1992; Wittmann & Paulus, 2008) supports the need for models that recognize that psychological time may not scale the same as physical time. Similarly, research linking impulsive choice to magnitude discrimination abilities (Marshall & Kirkpatrick, 2016) suggests that models should recognize that numerical discrimination is scaled by psychophysical principles. Most of the above models included sensitivity to reinforcer delay and/or amount to account for these relationships.

The observation that impulsive choice appears to be affected by experience with concurrent or successive exposures to contingencies (Grace & Hucks, 2013; Marshall & Kirkpatrick, 2016) might be explained by other processes such as attention (as we have defined it here). Models might account for broader contextual factors by modifying subjective value computations based on the presence of stimulus cues representing delay or magnitude learning history (e.g., conditioned reinforcement effects) and modifiers that account for control of such cues in a context. For example, no model easily addresses token reinforcement effects in increasing self-control (Jackson & Hackenberg, 1996). However, the hyperbolic value added model (Mazur, 2001), which assumes that temporal discounting determines subjective value of delayed outcomes, is a step in this direction. This model was not featured in the modeling section because of its focus on conditioned reinforcement for concurrent chains data as opposed to impulsive choice data. Finally, an animal’s tendency to choose the LL when the SS outcome is delayed by adding a common delay (Calvert et al., 2011; Green et al., 2005; Rachlin & Green, 1972; Siegel & Rachlin, 1995) might simply be accounted for by the fact the subjective value of the SS option is discounted by its own delay to receipt.

Reinforcement maximization has received scant attention in impulsive choice models and only the BR model formally addresses ITI effects on impulsive choice. Bundling effects could be accounted for by models discussed above by allowing the values of all the delayed reinforcers to be summated into one subjective value. For example, Stein and Madden (2021) adapted the HYP equation into an additive hyperbolic model and found that this accounted for bundling effects. In a similar way, the increase in self-control observed by allowing the LL reinforcer to be postponed (rather than waited for) might be accounted for if the discounting of a previous LL outcome is allowed to summate with the subsequent LL outcome that is occurring concurrently (Addessi et al., 2021). The TIMERR model also summates reinforcers over time and might be a good candidate to account for reinforcer influences that go beyond a single trial.

Reinforcer value effects are accounted for in several models. Qualitative differences in reinforcers and influences by motivating operations could be accounted for as an outcome’s utility or quality (UTIL, MULT, respectively). By accounting for quality, these models have the added explanatory power to account for differences in incentive sensitization (i.e., hyper reward valuation observed in substance use disorders; Berridge & Robinson, 2016), inelastic reinforcer effects on quality, motivating operations, and cross-reinforcer effects where the SS and LL are qualitatively different (e.g., food vs. cocaine). Because “quality” could vary in terms of sensory experience or deprivation from food, it might need to be parametrically expanded upon to account for meaningful differences. Quality will need to be separate from quantity, where psychophysical properties in numerical discrimination would be uniquely involved (i.e., models cannot conflate quality and quantity as contributors to subjective value). The REW model treatment of sensitivity to reinforcement (z) may need to be elaborated upon to separate those dimensions of reinforcer value.

The DBEXP model is a quantitative formalization of the qualitative CNDS model that frames impulsive choice emerging from a competition between an impulsive/motivational system and an executive system (Koffarnus et al., 2013). DBEXP has separate discounting parameters for the impulsive system (k1) and the executive system (k2, lower weight for impulsive individuals), and a parameter to account for the influence between the two systems (ω, lower weight for executive system for impulsive individuals). These parameters might be predictive of other psychological and neurobiological processes. For example, the impulsive parameter may be correlated with higher level and elasticity of demand, and striatal dopamine sensitivity. The executive system could be correlated with working memory and long temporal horizons, and prefrontal cortex activity. The weighting parameter might relate to how the insular system balances the two neurobiological systems that are affected by psychological factors like stress, which can lead to increased impulsive choices (e.g., Torregrossa et al., 2012, stress hormones increases SS choices). These areas can be a focus of future research for both human and animal experimentation. Effects of delay intolerance and preference for reinforcer immediacy can be addressed by the QUASI model, which has parameters for immediacy bias and may be well-suited to capture preference for immediacy (Fox, 2021; Fox et al., 2019). To the degree that delay aversion and immediacy preference are reflected in the motivational system, the DBEXP model might account for their influence in the k1 and ω parameters.

Attentional effects could potentially be explained by multiple models. The effects of cues on attention could be accounted for by models that include a parameter for present bias, assuming that bias is a proxy for attention. For example, attentional manipulations could alter the k2 parameter in the QUASI model. The DBEXP may account for cues modulating attention with ω. For example, powerful reinforcer cues may bias attention and trigger cravings (Ashe et al., 2015) and this may lead to a shift in balance favoring the motivational/impulsive system driving discounting.

Generalization of learning from training tasks to the impulsive choice task (e.g., successive/concurrent, short/long SS delays) or within the choice contingencies themselves (e.g., FI or FT delays) may bias attention and modulate impulsive choices. How humans are presented with impulsive options (e.g., explicit-zero, date framing) may, likewise, bias attention in the choice procedure and affect impulsive choices. Reinforcer-focused models (UTIL, MULT, REW) may account for attention effects on amount processing and the TIMERR model may account for attention to delay. If date-framing effects are due to diminishing attention to delay and explicit-zero effects are due to greater attention to amounts, then models with parameters for both could capture those effects. For example, for the UTIL model, attention to magnitudes may affect utility (α) and attention to delays may affect delay sensitivity (s).

The TIMERR model’s Time parameter accounts for the time horizon that is driven by how attention is focused. For example, individuals with FTP may be biased towards the future. EFT procedures may take people with present time perspective (i.e., impulsive) and refocus their attention towards the future, leading to greater self-control. Attention may be shifted to focus on future reinforcers that are worth waiting for, but longer past-oriented time horizons also predict self-control (Radu et al., 2011). The time horizon in TIMERR is backwards focused with a window that captures and integrates past reinforcement history from prior trials. Thus, the temporal window of the horizon may matter more than the time horizon’s direction. Finally, formal models of timing have incorporated attention as a subordinate process (Burle & Casini, 2001). Thus, to the degree that models can account for timing process, they potentially should also account for attention, at least as it relates to delay tracking. For example, in the KZ model, the perceived delay (d) parameter may account for effects of cueing attention toward or away from the delay.

The current models do not readily account for the relationship between working memory and impulsive choice without further assumptions. It is possible that working memory reflects a healthy top-down executive control system, in which case DBEXP parameters (k2, ω) might account for working memory effects. Working memory is intrinsic to timing models (Meck et al., 1984, 2013) and accurate and precise timing necessitates functional working memory (Gu et al., 2015; Lustig et al., 2005). The effects of working memory on impulsivity might operate through its effects on timing (and/or amount discriminations). For example, working memory is necessary for maintaining a current representation of the passage of time during a delay and faulty working memory can result in impairments in timing accuracy and/or precision (e.g., Gibbon et al., 1984). Experimental research needs to flesh out the significance of working memory and its relationship with impulsive choice to guide future model development.

Overall conclusion

The empirical findings and models presented here are intended to provide an organizational structure for understanding potential mechanisms of impulsive choice. Several key cognitive processes were highlighted that are likely to contribute to impulsive choice, at least in some contexts and tasks. Multiple factors include learning processes, reward maximization, incentive motivation, attention, timing, and working memory. Different models have been proposed, with some models directly motivated by empirical findings. Models have been developed to explain the psychophysical properties of delay and amount perception, reinforcement maximization principles, and competition between two systems. Because all models (except for the basic EXP) correctly predict hyperbolic or hyperbolic-like discounting, simply fitting the models to choice data is unlikely to yield any significant new insights. Instead, model fits to impulsive choice data should be accompanied by testing on other cognitive and behavioral tasks. In addition, developing sensitive methods that move beyond correlation across tasks is a key challenge for the field to identify specific mechanisms.

Ultimately, the models cut to core questions that remain unanswered in empirical data. For example, one issue of central importance is whether discounting is rational (exponential) but coupled with a faulty perceptual system or if discounting is irrational (hyperbolic). Only limited studies examining the direct relationship between time perception and discounting (where perceived time is used to predict choice through subjective value computation) have attempted to address this issue and those studies do not definitively differentiate between these two possibilities. Finally, there is the question of whether we even discount at all? There are several viable models that propose alternative non-discounting mechanisms that require more thorough testing.

A problem with the modeling approach is that there has perhaps been too much of a focus on the search for a perfect equation that can account for the hyperbolic function rather than developing process-driven models. As a result, important processes such as learning, attention, timing processes, and working memory are not commonly featured in models, and in some cases (e.g., attention and working memory) can only be accounted for by making indirect assumptions. Ultimately, the field of impulsive choice may require a recalibration to focus more broadly on cognitive mechanisms and the development of models of such mechanisms. Only then may we be able to fully and truly understand impulsive choices.