No one likes to wait, but the willingness to do so certainly depends on the length of the wait and what will be received once the wait is over. In empirical studies of the willingness to wait, participants are given a series of choices involving the trade-off between a smaller immediate reward versus a larger delayed reward (for a collection of articles on the topic, see Madden & Bickel, 2010). As the delay to the larger reward increases, people are more likely to choose the smaller sooner reward over the larger later one. In empirical studies of the effect of reward magnitude, research has revealed that people demonstrate a greater willingness to wait when both the sooner and later outcomes have greater value (Green, Myerson, & McFadden, 1997; Jimura, Myerson, Hilgard, Braver, & Green, 2009). For example, most people would rather have $10 now than $20 in 6 months (the $20 is discounted to less than half its immediate value, due to the delay), but people are much more likely to wait if the trade-off is between $1 million now and $2 million in 6 months. This magnitude effect has been robust in studies of humans but difficult to replicate in nonhuman species (e.g., Calvert, Green, & Myerson, 2010; Richards, Mitchell, de Wit, & Seiden, 1997).

It would not be surprising if human and nonhuman species show different behaviors and tactics in response to a particular environmental challenge like choosing between smaller sooner and larger later outcomes. Their ecological niches are quite different, their neural machinery varies, and their basic biological differences produce dissimilar demands on the importance of immediacy (e.g., metabolism, locomotive speed, and prey vulnerability). Interpreting this species difference, however, has been complicated by the fact that human studies typically involve hypothetical rewards and imagined delays on the order of days to years (e.g., Estle, Green, Myerson, & Holt, 2007; Hinvest & Anderson, 2010; Madden, Begotka, Raiff, & Kastern, 2003; Rachlin, Raineri, & Cross, 1991), whereas nonhuman studies always involve consumable rewards with very short delays on the order of seconds to minutes (e.g., Green, Myerson, Holt, Slevin, & Estle, 2004; Mazur, 1987; Richards et al., 1997). In two studies involving humans and consumable rewards, however, people showed a magnitude effect even when the delays were experienced, not hypothetical, and on the order of seconds (Jimura et al., 2009; Logue & King, 1991). The use of consumables, however, provides a natural limit on the number of choices before participants are satiated. Jimura et al.’s and Logue and King’s designs involved fewer than 20 trials in a session and maximum delays of 60 s.

The need to use consumable rewards in order to observe discounting across short delays seems odd in light of emerging technologies that may condition people to expect immediate outcomes. Small delays in a computer’s response to a command, the sending of a text message, or the loading of a Web page prompt common complaints among technology users. This apparent intolerance for small delays in interactions with technology was leveraged by Young, Webb, and Jacobs (2011), who recently pioneered a novel video game procedure for studying the trade-off between smaller sooner and larger later rewards. In the game, people may fire either sooner and do little damage or later and do more damage. The amount of damage increased as time passed and reached a maximum after 10 s, using a paradigm that they called an escalating interest task. The key independent variable in their method is a parameter that dictates how the weapon damage potential (its charge) approaches this maximum. The left side of Fig. 1 shows the precise variations in recharge acceleration across three values of a parameter, power, that alter the recharge behavior, and the right side of Fig. 1 shows the resulting rate of damage for a consistent waiting time between shots for each of these values. For power values less than 1.0, the maximal damage rate is achieved by waiting the full 10 s; for power values greater than 1.0, the maximal damage rate is achieved by firing as rapidly as possible.

Fig. 1
figure 1

Left: Three examples of the superellipsoid functions generated by Eq. 1 of Young, Webb, and Jacobs (2011), using a power value of 1.5 (top curve), 1.0, and 0.5 (bottom curve). Right: The damage rates produced by these power values for a consistent delay between responses

The advantage to a video game preparation is that it allows the study of delay discounting in humans under short delays without a trial limit created by satiation. This ability to examine behavior across many choices allows better study of how choice evolves with time. The drawback to the preparation is that there is only one published study using it (Young et al., 2011), which raises questions as to whether behavior in the video game responds to manipulations, such as a change in magnitude, in a way that is similar to that used in published studies involving hypothetical choice or consumables.

For the purposes of the present study, we wished to determine whether there would be a magnitude effect similar to that found in other studies involving humans or whether the use of short delays and immediate rewards would result in the absence of a magnitude effect like that observed in nonhuman species. Furthermore, we wished to determine whether the length of the delay to the maximum amount would systematically affect performance. In Young et al.’s (2011) previous study, the maximal reward was achieved after 10 s. Although published studies vary quite dramatically in the length of the delay to the larger later outcome, we are unaware of any research that has systematically varied this variable between subjects to determine whether discount behavior changes as a function of the range of delays used (a form of context effect; cf. Percoco & Nijkamp, 2009).

The examination of both the effects of magnitude and the length of the delay would help determine whether waiting in the escalating interest task shows the same sensitivity to these variables as do hypothetical money choice tasks. Given that recent research has revealed that experience-based decisions can sometimes produce results opposite from those studied using described hypothetical choices (e.g., Hertwig & Erev, 2009; Rakow & Newell, 2010; Shafir, Reich, Tsur, Erev, & Lotem, 2008), it is imperative that magnitude and delay effects be revisited, using this paradigm for studying impulsivity.

In order to assess the magnitude effect, we adapted the Young et al. (2011) video game task to assign participants to either the standard magnitude used in their earlier study or to a condition in which the maximum weapon damage was doubled. If participants show a magnitude effect consistent with the results of prior reports involving hypothetical choice, their behavior should reveal longer interresponse times (IRTs) in the double-magnitude condition regardless of the other variables being manipulated. In order to examine the effect of the length of the delay until maximum value is achieved, participants were assigned a randomly chosen delay between 5 and 20 s in order to map out the functional relationship (see Young, Cole, & Sutherland, 2012). The manipulation of delay could be accomplished by having the same maximum achieved in less time, but doing so produces a situation in which the density of reinforcement necessarily increases. Thus, the maximum damage amount decreased as the delay decreased to ensure that the density of reinforcement would be identical across different delays to maximum reward if the participant waited the full delay between shots. This relationship is depicted in Fig. 2, in which we show the change in damage across four 5-s periods, two 10-s periods, or one 20-s period when 5, 10, and 20 s is the assigned delay to maximum damage. By changing the damage potential as a function of the delay to maximum, we ensure that the same total damage is done for each of these delay conditions over a 20-s period for participants who wait until the maximum is available.

Fig. 2
figure 2

Examples of the growth in damage for short, intermediate, and long delays (5, 10, and 20 s) across 20 s if the participant waited until full charge for each choice. The left graph is for positive acceleration (power = 0.5), the middle graph for linear acceleration (power = 1.0), and right graph for negative acceleration (power = 1.5). These curves are those produced by the standard magnitude condition with damage scaled to 100 % of the standard (43.33 points)

Method

Participants

A total of 80 undergraduate students enrolled in an introduction to psychology course at Southern Illinois University at Carbondale received course credit for their voluntary participation. There were 51 women and 27 men; 2 did not report their sex.

Procedure

The game world included four levels, each containing seven separate regions, with each region populated by two orcs, as described by Young et al. (2011; see http://www.k-state.edu/psych/research/young/suppmaterial.html for a video clip of game play). The player’s task was to destroy all of the orcs within each game level. Thus, a game level was isomorphic to a block of training in a conventional study of choice.

The player’s weapon reached its maximum charge after an assigned delay. Each participant experienced the same assigned delay for the entire game. We used a random sampling design in which the delay to maximum charge for each participant was randomly chosen from the 5- to 20-s range (uniformly). The amount of damage a fully charged weapon could inflict was varied as a function of the programmed delay in order to hold constant the reinforcement density (i.e., average delay) that could be achieved for players waiting for the maximum damage. For the standard magnitude condition (43.33 points per 10 s; Young et al., 2011), equality was achieved by multiplying this density by the assigned delay to maximum charge (5–20 s) divided by 10 s. Thus, for a programmed delay of 5 s, a fully charged shot inflicted 21.66 points on the enemy orc (43.33 * 5/10), and for a programmed delay of 20 s, a fully charged shot inflicted 86.66 damage points on the enemy orc (43.33 * 20/10). In the double-magnitude condition, all damage was doubled. Half of the participants were assigned to the standard-magnitude condition, and the other half were assigned to the double-magnitude condition. Note that it takes 100 points of damage to destroy a target (2.308 fully charged shots in the standard-magnitude condition).

The amount of damage a weapon inflicted during the delay to maximum charge was determined by the power value of that weapon (see Eq. 1 of Young et al., 2011). The power value of the weapon changed each time the participant destroyed two orcs. We used a random sampling design in which the power value was randomly chosen from the 0.5- to 1.5-srange (uniformly). The change in a weapon’s power was accompanied by a 1,250-ms three-tone sequence with a pitch that was correlated with the new power level, with participants told that the pitch of the tone indicated the way that their weapon would recharge (higher pitch for higher power value). Because there were 14 orcs in each level and the power value changed when 2 orcs were destroyed, each player experienced seven power values in each game level.

After completing the game, participants completed a demographics program that asked their sex and self-reported amount of video game play for each of nine types (see Cole, 2011, Appendix D). Amounts of play were solicited on scale of 0 (indicating none) to 6 (indicating daily), with the middle of the scale indicating monthly play.

Results

All but 1 of the participants completed all four levels in the allotted time of 1 h; 1 participant’s session was terminated in the middle of the second level. Of the 80 participants, 78 completed all of the demographics questions.

Most participants produced a bimodal IRT distribution either firing as quickly as possible or waiting until the maximum damage was available (which differed between participants). To avoid statistical issues with the bimodal nature of the data, we dichotomized participant behavior into short responses (IRTs less than half of the programmed delay to maximum damage) and long responses (IRTs greater than or equal to half of the programmed delay to maximum damage) and used a logit link function in a generalized linear mixed effects analysis, thus assuming a logistic relationship between our predictors and the likelihood of a long response. All responses greater than twice the delay to maximum damage, however, were dropped due to their high likelihood of being contaminated by inattention; this criterion resulted in dropping 1.4 % of the responses. Because behavior in the first level was distinctly different from that in the subsequent levels, we treated level 1 as categorically different from levels 2–4 in our analyses.

Figure 3 shows the mean likelihood of waiting more than half the programmed delay for each participant at each experienced power value for the two magnitude conditions; level 1 performance is excluded. The height of the smooth spline fit corresponds to a participant’s overall likelihood of waiting (higher lines indicate more waiting), whereas the slope of the line signifies whether a participant’s waiting is sensitive to changes in the power value (as it should be, with higher power values making short IRTs more efficient, thus producing a negative slope). There were clear individual differences in the overall likelihood of waiting and in sensitivity to the manipulation of power. Furthermore, there is some visual evidence that there was a higher proportion of participants who waited for the maximum reward in the double-magnitude condition than in the standard-magnitude condition.

Fig. 3
figure 3

Mean likelihood of waiting more than half the programmed delay to maximum reward for each participant in the standard and double magnitude conditions. Smoothed spline fits are included to ease readability; higher lines indicate a higher probability of waiting for that participant, and a more negative slope indicates a greater sensitivity to the changing contingencies of waiting

One of the challenges of analyzing data from free operant procedures with continuous predictors is the visual presentation of the results. Given that some situations (e.g., high values of the power parameter) produce many more responses, it is essential that this be considered in describing the results, to ensure that the graphs are not biased by this oversampling of behavior in some conditions. This issue is complicated by the fact that some participants produce much shorter IRTs than do others, which can bias graphs by producing means that overrepresent the behavior of these individuals. As a result, we statistically sampled from our participants’ behavior at each power value and in each level to approximate a balanced design for our figures of raw data (Figs. 3, 4, and 5). Our analytical approach involved linear mixed effects modeling of the complete data set (Pinheiro & Bates, 2004), which appropriately incorporates the sample size imbalances that produce significant problems for traditional general linear modeling approaches like ANOVA and regression.

Fig. 4
figure 4

Probability density curves of sample of scaled interresponse times (IRTs; 1.0 = delay to maximum charge) for the standard magnitude condition and the double magnitude condition in level 1 and across Levels 2–4. Note that these plots are functionally equivalent to smoothed frequency histograms; thus, the plots go below zero due to smoothing

Fig. 5
figure 5

Probability density curves of a sample of scaled interresponse times (IRTs; 1.0 = delay to maximum charge) for the 5- to 8-s delays (short dashes), 9- to 12-s delays (short–long dashes), the 13- to 16-s delays (long dashes), and 17- to 20-s delays (solid line) in level 1 and across levels 2–4. Note that these plots are functionally equivalent to smoothed frequency histograms; thus the plots go below zero due to smoothing

In our examination of the distribution of a sample of scaled IRTs as a function of magnitude, Fig. 4 reveals that participants produced bimodal IRTs for both the standard-magnitude and the double-magnitude conditions, but no significant difference between the two in level 1. In levels 2–4, the lower peak was shifted toward longer IRTs (about 30 % of the delay to maximum charge) for the double-magnitude condition, thus producing longer waiting times.

In the distribution of a sample of scaled IRTs as a function of programmed delay to maximum charge (here, grouped into four categories only for illustration), Fig. 5 shows that the shortest delays (5–8 s) produced a greater likelihood of waiting the full duration in both level 1 and levels 2–4. In contrast, the longest delays (17–20 s) produced a greater likelihood of rapid firing for both level 1 and levels 2–4. In level 1, the intermediate delays (9–12 and 13–16 s) showed IRT distributions very similar to those for the longest delays. In levels 2–4, the distribution for the 9- to 12-s delays was generally shifted toward longer IRTs for the lower mode, and the distribution for the 13- to 16-s delays was similar to that for the longest delays, except with a less pronounced bimodal distribution.

In the linear mixed effects analysis, we specified a binomial distribution (thus producing zs, not ts, in the later analyses) and centered the continuous predictors in the interactions to avoid multicollinearity. The mixed effects analysis allowed intercepts, level slope effects, and power slope effects to vary across participants. Only those interactions that produced a better model (lower AIC) were retained. A logarithmic relation for delay was used because doing so produced a better fit. The results are shown in Table 1.

Table 1 Regression weights for the best-fitting generalized linear mixed effects model of dichotomized interresponse times

First, participants showed a strong sensitivity to the power manipulation, thus replicating Young et al. (2011). Sensitivity was roughly twice as strong in levels 2–4 (b = −2.04) as in level 1 (b = −1.13). In level 1, the modeled likelihood of a “long” response (i.e., waiting more than half the delay to maximum damage before firing) was 55 % for the lowest power value of 0.5 and 31 % for the highest power value of 1.5, z = 5.01, p < .01. This difference was much larger in levels 2–4 (73 % vs. 28 %), z = 10.45, p < .01.

In the standard-magnitude condition (that used by Young et al., 2011), the best-fitting model shown in Table 1 predicted 40 % “long” responses for levels 2–4, whereas in the double-magnitude condition, the model predicted 61 % “long” responses, z = 2.71, p < .01 (see Table 1). The model revealed no statistically significant difference as a function of overall magnitude in level 1 (41 % vs. 44 %), z = −0.06, p > .25. These modeled estimates are consistent with those shown in Fig. 4.

The overall effects of the length of the delay to maximal damage appeared large (Fig. 5) but were undermined by inconsistency across participants and game level. In level 1, the model predicted 52 % “long” responses for the shortest delay and 32 % for the longest delay, z = −1.98, p < .05 (see Table 1). The effect was not sustained in levels 2– 4, where the predicted preferences for “long” responses were 49 % and 53 % for the shortest and longest delays, respectively, z = 0.66, p > .05.

Importantly, there was a statistically significant effect of the length of the delay to maximal damage on sensitivity to the power parameter (i.e., a power × log(delay) interaction), z = 2.02, p < .05 (see Table 1). For example, in levels 2–4, participants were more sensitive to the current power value when the delay to maximum was only 5 s (predicted “long” response Ms = 77 % for a power of 0.5 vs. 21 % for a power of 1.5) than when it was 20 s (predicted “long” response Ms = 67 % for a power of 0.5 vs. 38 % for a power of 1.5).

The individual-difference variables (sex and self-reported video game play) had little effect on performance. Only the reported amount of third-person video game play was retained as a predictor of the likelihood of producing a long IRT (greater than half the delay to maximum), with those who reported the most (a score of 6) having a 22 % chance of producing a “long” response and those who reported the least (a score of 0) having a 54 % chance of producing a “long” response, z = −3.11, p < .01.

Discussion

The observation of a magnitude effect in this very different preparation for the study of impulsivity both broadens the generality of the original reports and further validates the escalating interest task as a measure of impulsive responding. The effect of the length of the delay on sensitivity to the consequences of shorter versus longer IRTs (via the power parameter) was small but somewhat surprising. Our participants’ behavior showed greater differentiation as a function of the power value for shorter delays than for longer delays.

The discounting of reward magnitude is often well-described by Mazur’s (1987) hyperbolic model:

$$ V=\frac{{Damag{e_{IRT }}}}{{1+k\times IRT}} $$
(1)

where V is the discounted value of the programmed amount of damage delivered (Damage IRT ; see Fig. 1) for a given IRT and k is a free parameter that varies directly with sensitivity to delay; k is often estimated independently for each individual. Using Eq. 1, the left side of Fig. 6 depicts hyperbolic discounting across a 20-s span with a k of 0.25. An outcome that reaches its maximum value at 5 s will have been significantly discounted, relative to its value if it had been available immediately. However, an outcome that reaches its maximum value at 20 s would have been discounted even further. The relative amount of discounting for these two delays is more clearly evident on the right side of Fig. 6, where each curve is scaled by dividing by its maximum (5 vs. 20 s). The latter figure suggests that, in general, participants should wait a greater proportion of the delay for short delays than for long delays. As Table 1 and Fig. 5 show, this result was observed only in level 1, suggesting that participants assigned to conditions with longer delays adapted to them. Indeed, Fig. 5 reveals that the principal change in behavior was a shift toward longer IRTs in later game levels for the participants assigned to the longer delay conditions, consistent with their learning to exhibit less impulsive responding in the presence of longer delays.

Fig. 6
figure 6

Plot of hyperbolic discounting (left) and a replot of these discounting effects when the delay to maximum value is plotted on a relative scale (right). See the text for a more complete description

With regard to the weaker sensitivity to the power contingency for longer delays to maximum, initially this behavior may have been driven by the generally low likelihood of waiting in the presence of long delays (see Fig. 5). However, when the delay to maximum was short, the dynamics of the reward magnitude (i.e., how quickly the charge bar changed) may be more evident than when the delay to maximum was long. As Fig. 2 shows, the difference that a few seconds of waiting makes to the outcome magnitude is much greater for short delays than for long delays when the power is less than 1.00. The opposite is true for power greater than 1.00, but this difference is less dramatic due to the asymmetric nature of the superellipsoid function that was governing the recharge rate.

Our findings suggest that both magnitude effects and an increased sensitivity to discounting contingencies for short delays occur within the context of an escalating interest video game task. The first result suggests that the difficulty with observing magnitude effects in nonhuman species is not merely a by-product of the shorter experienced delays in the tasks used in nonhuman studies or the use of hypothetical monetary rewards in human studies. Furthermore, the magnitude effect did not interact with our delay contingencies (via the power parameter) or the delay to maximum damage, thus establishing its independent effect on behavior. Finally, our manipulation of the delay to maximum reward served to generalize the original report of Young et al. (2011) by establishing control by the power parameter at delays to maximum reward that were both shorter and longer than those originally reported.

Although we have placed the present results within the context of research on delay discounting, the escalating interest task more closely resembles a delay of gratification (DG) task (Mischel, 1966; Mischel & Ebbesen, 1970) in which the smaller sooner reward is available throughout the delay, thus allowing for the possibility of defection while waiting. Given that we have not uncovered any research on magnitude effects using a DG task, our results extend demonstrations of participants’ waiting longer for larger magnitudes into related paradigms involving a delay to reward. Furthermore, the DG paradigm usually involves delays of many minutes (Forzano, Michels, Carapella, Conway, & Chelonis, 2011; Mischel & Ebbesen, 1970), although studies using other species show experimental control with much shorter delays (e.g., Reynolds, de Wit, & Richards, 2002).

Differences among species are readily apparent and not surprising, but it is too easy to jump to conclusions that imply a qualitative difference. It is important to choose tasks that conform to the abilities and ecological niches to which a species has adapted. Because most adult humans are not currently living in environments where a few seconds of delay of food, money, or an opportunity to mate has survival significance, it should not be surprising that there is little discounting of these rewards, as compared with that of other species, where competition for basic needs is much more fierce. Thus, using an environment where seconds do matter for humans provides a better test case for identifying whether certain variables are differentially affecting humans and nonhumans.