# Value-based decision-making battery: A Bayesian adaptive approach to assess impulsive and risky behavior

## Abstract

Using simple mathematical models of choice behavior, we present a Bayesian adaptive algorithm to assess measures of impulsive and risky decision making. Practically, these measures are characterized by discounting rates and are used to classify individuals or population groups, to distinguish unhealthy behavior, and to predict developmental courses. However, a constant demand for improved tools to assess these constructs remains unanswered. The algorithm is based on trial-by-trial observations. At each step, a choice is made between immediate (certain) and delayed (risky) options. Then the current parameter estimates are updated by the likelihood of observing the choice, and the next offers are provided from the indifference point, so that they will acquire the most informative data based on the current parameter estimates. The procedure continues for a certain number of trials in order to reach a stable estimation. The algorithm is discussed in detail for the delay discounting case, and results from decision making under risk for gains, losses, and mixed prospects are also provided. Simulated experiments using prescribed parameter values were performed to justify the algorithm in terms of the reproducibility of its parameters for individual assessments, and to test the reliability of the estimation procedure in a group-level analysis. The algorithm was implemented as an experimental battery to measure temporal and probability discounting rates together with loss aversion, and was tested on a healthy participant sample.

## Keywords

Delay discounting Risk seeking Intertemporal choice Loss aversion Bayesian estimationDecision making is inseparable from daily life. Animals, including humans, frequently make choices between alternative options, both consciously and unknowingly. With respect to consequences, these choices are usually considered from normative and descriptive perspectives in many disciplines, including economics, social sciences, and psychology. The rationality of choices is addressed through normative aspects. In contrast, descriptive analysis investigates decisions as preferences, regardless of whether they are logical, beneficial, or harmful (Burns & Bechara, 2007; Kahneman & Tversky, 1984).

In many circumstances, the outcomes of the available options are subject to uncertainties such as delay and/or risk, which decrease the value of the choices in comparison to choices with immediate outcomes. Within a behavioral economic framework, this devaluation of reward is referred to as *discounting* (Ainslie, 1975). Decision-making theories, including expected utility theory (Von Neumann & Morgenstern, 1944), prospect theory (Kahneman & Tversky, 1979; Tversky & Kahneman, 1992), and reinforcement learning theories (Sutton & Barto, 1998), are based on the principal assumption that all dimensions of an option are integrated into a single measure called the *subjective value* of a choice, which is parameterized by the rate of discounting.

*V*=

*Ae*

^{ –k }

_{o}

^{ D }, where the subjective value

*V*of an outcome of amount

*A*, delivered after a delay

*D*, diminishes exponentially according to the discounting rate

*k*

_{o}. Early studies in economics mostly employed this function to evaluate different options (Ainslie, 1975). Nevertheless, humans tend to deviate systematically from this value function (Doyle, 2013; Green, Fry, & Myerson, 1994; Madden & Johnson, 2010; Rachlin, Raineri, & Cross, 1991; Simpson & Vuchinich, 2000). Consequently, the most common value function in behavioral psychology seems to be Mazur’s model (Mazur, 1987),

*k*

_{ o }> 0. Moreover, with a transformation of the probability to the odds against winning,

*θ*= (1 –

*p*)/

*p*, the same hyperbolic discounting function has been used to describe the declining subjective values of probabilistic outcomes (Rachlin et al., 1991):

To describe choice behavior for individuals or groups of individuals, the discount rate *k* _{ o } is usually inferred by assessing several indifference points at separate delays. Fitting Eq. 1 or 2 then gives the best estimate of the discounting rate. Normalized indifference points have also been used to measure discounting rates, employing not a value function but the area under the indifference curve (Myerson, Green, & Warusawitharana, 2001).

Although a range of more complex mathematical models have been tried (Doyle, 2013), the simple hyperbolic function has been widely used to model experimental data for various incentives, real or hypothetical, as well as positive or negative outcomes (Baker, Johnson, & Bickel, 2003; Johnson & Bickel, 2002; Petry, 2003). Typically, economists assess discounting by simply asking participants directly for their indifference value (Loewenstein, 1988), whereas the most common method in neuroscience and psychology is the binary-choice method (Mazur, 1987), employing a titration procedure through which the indifference point is inferred from a series of choices. Such procedures have been implemented with either a set of fixed, predefined choices (Madden, Petry, Badger, & Bickel, 1997) or, more commonly, using adjustments of amount or delay based on the individual’s choices (Loewenstein, 1988; Madden et al., 1997; Rachlin et al., 1991). Adjusting-amount procedures have been used mostly in studies with human subjects, whereas animal research has used delay adjustments in addition. As compared to nonadaptive methods, adjustment procedures have been found to have no effect on the processes that underlie discounting (Green, Myerson, Shah, Estle, & Holt, 2007; Holt, Green, & Myerson, 2012). Because they are based on mapping a set of indifference points, these tasks require a large number of choice trials, which can be time-consuming, and thus limiting in certain applications (Smith & Hantula, 2008). Recently, a hierarchical Bayesian model was developed to assess the temporal discounting rate (Vincent, 2016), and a five-trial adjusting-delay task has been shown to quickly measure discount rates in humans (Koffarnus & Bickel, 2014). Nonetheless, the latter approach does not allow controlling for unsystematic or illogical data.

Research investigating aberrant decision-making patterns within several mental disorders, such as drug abuse, is growing constantly. In particular, delay discounting, as a proposed transdisease process, demands further investigation, including directing interventions toward changing individuals’ discounting rates (Bickel, Jarmolowicz, Mueller, Koffarnus, & Gatchalian, 2012; Koffarnus, Jarmolowicz, Mueller, & Bickel, 2013). Furthermore, due to the partial overlap but also the evident differences between delayed and probabilistic choice behavior, the advantage of similar experimental procedures for different facets of decision making has been emphasized (Green & Myerson, 2004). Increased delay-discounting rates are seen in chronic users of alcohol and other drugs (Bjork, Hommer, Grant, & Danube, 2004; Dom, D’haene, Hulstijn, & Sabbe, 2006; MacKillop et al., 2011; Mitchell, Fields, D’Esposito, & Boettiger, 2005; Petry, 2003), and higher rates of risk taking are seen in pathological gamblers (Madden, Petry, & Johnson, 2009). Impulsive and risky decision making has also been linked to such clinically relevant constructs as treatment outcomes (Blanco et al., 2009; Krishnan-Sarin et al., 2007; MacKillop & Kahler, 2009; Petry, 2012; Stanger et al., 2012). Therefore, it has become increasingly desirable to measure choice behavior in a variety of contexts with improved, precise, and consistent methods, which may further our understanding of the mechanisms underlying value-based decision making, and thus potentially advance clinical care for people suffering mental disorders related to such behaviors.

Individuals may behave differently when they make decisions based on values. More sensitive participants may change their preferences sharply on the basis of small differences, whereas others may be neutral to the same changes. This can also be interpreted as a degree of consistency: A consistent choice policy gives a higher probability of choosing the option with the higher value. It has been shown that different samples behave differently in terms of consistency, which has been estimated by using the softmax function (Eq. 3 below) in maximum likelihood models (Hare, Hakimi, & Rangel, 2014; Yechiam, Busemeyer, Stout, & Bechara, 2005), or by constructing receiver operating characteristic curves for the sets of choices by individual participants (Ripke et al., 2012).

In the present study, we developed a mathematical framework for an isochronous amount and delay/risk adjusting procedure based on a Bayesian estimation approach (Garvert, Moutoussis, Kurth-Nelson, Behrens, & Dolan, 2015). Employing simulations and a sample of healthy participants, we show the robust and efficient estimation of the parameters of interest. The algorithm is presented and discussed in detail for the case of delay discounting. The adaptation of the mathematical framework for assessing probability discounting and loss aversion is straightforward and is omitted for the sake of brevity. Taken together, we here present a novel adaptive approach to measure different facets of value-based decision making, including choice consistency.

## Mathematical framework

In this section, we provide a detailed discussion of the mathematical modeling and parameter estimation algorithm for the delay discounting case. The main idea is to employ a Bayesian approach to improve our initial assumptions about the value of the discounting parameter trial-by-trial by using the choices that an agent—for example, a person—makes between a smaller immediate and a larger delayed monetary reward. We used the hyperbolic discounting function, Eq. 1, for evaluating the subjective value of the delayed offers.

*x*

_{ d }represents the amount of the delayed offer associated with a delay of

*d*units, and the subjective value is shown as

*V*

_{ d }. Similarly,

*x*

_{ i }and

*V*

_{ i }are the immediate offer and its subjective value, respectively. The subjective value of the immediate offer is trivially the value of the offer itself—that is,

*V*

_{ i }=

*x*

_{ i }. The offers vary between

*r*

_{1}and

*r*

_{2}, measured in a currency unit, and the delays are chosen from the set

*D*= {

*d*

_{1},

*d*

_{2},…,

*d*

_{7}}, in days. Now suppose that

*a*

_{ i }and

*a*

_{ d }are the actions of choosing the immediate and delayed offers, respectively, and let

*Q*(

*a*

_{ i }) and

*Q*(

*a*

_{ d }) be the values of taking the corresponding actions (Sutton & Barto, 1998). Furthermore, we assume that the likelihood of choosing between the two offers follows a softmax probability function with an inverse temperature parameter,

*β*

_{ o }> 0, as

*P*(

*a*

_{ d }|

*k*

_{ o },

*β*

_{ o }) = 1 –

*P*(

*a*

_{ i }|

*k*

_{ o },

*β*

_{ o }).

Large values of *β* in Eq. 3 represent consistent choices—that is, a high probability of taking the most valuable action—whereas small values reveal some inconsistency.

Both parameters, *k* _{ o } and *β* _{ o }, are nonnegative. Therefore we expected them to be positively skewed—that is, approximately lognormal (Lovric, 2010). To take advantage of close-to-normal distributions, we transformed the parameters to the natural-logarithmic scale and defined *k* = ln(*k* _{ o }) and *β* = ln(*β* _{ o }). For estimation purposes, we discretized the parameter space over an equally spaced 2-D region, * R*, such that –8 ≤

*k*≤ 2 and –5 ≤

*β*≤ 5. For simplicity, we assumed that the two parameters are independent and imposed liberal univariate priors on the parameters, such that

*k*and

*β*had a Beta and a uniform distribution, respectively. Under the independence assumption, one can build a joint probability distribution

*P*(

*k*,

*β*) serving as a prior for the following Bayesian framework.

*P*(

*k*,

*β*) is the joint distribution over the parameters and

*P*(

*a*|

*k*,

*β*) is the likelihood of observing the action

*a*∊ {

*a*

_{ i },

*a*

_{ d }}, which is computed using Eq. 3. At every trial

*t*, the posterior distribution over the parameters,

*P*(

*k*,

*β*|

*a*), was updated by multiplying the prior by the likelihood of the agent’s action and then served as the prior for the upcoming trial. Note that 1

*/Z*, in Eq. 4, is a simple normalization factor over the discrete domain

*. At the end of each trial, the expected values of*

**R***k*and

*β*were considered the current parameter estimations, \( {\widehat{k}}_t \) and \( {\widehat{\beta}}_t \). Using the current estimates based on the previous choice, we presented the following offers close to the indifference point, where the choices were equally likely, in order to retrieve the most informative data (Lewi, Butera, & Paninski, 2008; Sebastiani & Wynn, 2000).

*δ*is the minimum difference between the two offers. Then a feasible delay was chosen randomly, and the next offers were provided such that they differed at least by

*δ*and had the same subjective values. In extreme cases in which the agent was too patient (or impulsive), even the minimal possible fractional increment was not feasible for the longest (or shortest) delay—that is, \( \left({r}_2-\delta \right)\left(1+{\widehat{k}}_t{d}_m\right)<{r}_2\;\mathrm{or}\kern0.24em {r}_2<{r}_1\left(1+{\widehat{k}}_t{d}_1\right) \). In these cases, amounts very close to the boundary values of the offer range and either the highest or the lowest possible delay were considered as the next options. In all cases, we chose randomly from the set of feasible delays, if any, and adjusted for the immediate and delayed amounts in such a way that the subjective values of both offers were approximately the same according to the current estimate, \( {\widehat{k}}_t \). The procedure continued for a certain number of trials,

*N*, and \( {\widehat{k}}_N \) was considered the estimated parameter. The risk with adaptive designs is that individuals will reverse-engineer the design. Therefore, to reduce the chance of an agent learning the pattern, we added random offers that were distributed throughout all trials.

*λ*. Equation 5 has a simple linear form in which loss aversion is the ratio of the contribution of the loss magnitude,

*L*, to the contribution of the gain magnitude,

*G*, to the participant’s decisions (Frydman, Camerer, Bossaerts, & Rangel, 2011; Tom, Fox, Trepel, & Poldrack, 2007).

## Simulations

*k*= –2 and

*β*= 1.5. We depict five starting, middle, and final trials, which, based on the data from Table 1, show improving estimations across trials.

First and last five trials of a simulation with *k* = –2 and *β* = 1.5

Trial | Accepted Offer | Rejected Offer | \( {\widehat{k}}_t \) | | \( {\widehat{\beta}}_t \) | | ||
---|---|---|---|---|---|---|---|---|

Amount | Delay | Amount | Delay | |||||

1 | 12 | 0 | 14 | 3 | –2.06 | 2.84 | –0.09 | 8.18 |

2 | 18 | 0 | 49 | 14 | –1.20 | 1.76 | 0.43 | 8.01 |

3 | 46 | 31 | 4 | 0 | –1.78 | 1.35 | 0.26 | 8.23 |

4 | 4 | 0 | 49 | 180 | –1.64 | 0.78 | 0.97 | 6.90 |

5 | 41 | 14 | 11 | 0 | –1.98 | 0.69 | –0.70 | 5.73 |

… | … | .… | … | … | … | … | ||

46 | 40 | 14 | 13 | 0 | –1.95 | 0.0009 | 1.11 | 0.22 |

47 | 6 | 0 | 11 | 7 | –1.95 | 0.0007 | 1.17 | 0.22 |

48 | 44 | 31 | 6 | 0 | –1.95 | 0.0006 | 1.18 | 0.22 |

49 | 23 | 14 | 8 | 0 | –1.96 | 0.0009 | 1.03 | 0.22 |

50 | 37 | 31 | 7 | 0 | –1.96 | 0.0011 | 0.98 | 0.21 |

*k*= –5.5, –5,…, –0.5, with a fixed value of

*β*= 1.5. The estimated parameters are shown in Fig. 2a for

*k*= –4, –3, –1 on a trial-by-trial basis. Figure S1 shows boxplots for all values of

*k*. The values of

*k*were reproduced well, and the accuracy of the estimations is supported by the small errors, represented by the low standard deviations. However, the estimations of

*β*resulted in higher standard deviations, which is an indication of a suboptimal estimation (see Fig. 2b). Then, simulations for a fixed discounting rate,

*k*= –3, and different

*β*= –4.5, –3.5,…, 4.5 showed that

*k*was estimated more accurately with higher values of

*β*(Fig. 2c, Fig. S2). We demonstrate the trial-by-trial changes across the estimation procedures to emphasize the convergence of the algorithm.

*k*~

*N*(–3, 1) and

*β*~

*N*(0.5, 2). Data sets of 50 trials were simulated for random pairs from these distributions, and the resulting data are shown in Fig. 3a. The correlations between the initial (true scores) and estimated parameters were .98 for

*k*and .95 for

*β*, which are depicted in Fig. 3b.

*k*

_{ o }≤ 3, given the fact that a

*k*

_{ o }of 1—that is,

*k*= 0—is considered a baseline at which the subjective value corresponds to the expected value, and any value of

*k*

_{ o }different from zero is considered to be an indicator of risk seeking or risk aversive behavior. The loss aversion parameter was set to 0 ≤

*λ*≤ 4 without a transformation to the logarithmic scale. The reason was that any value of

*λ*beyond this range was infeasible with our offer range, and any transformation could result in a change of direction of any potential skewness. The consistency parameter had the same range as before, –5 ≤

*β*≤ 5. Predefined values of the discounting and loss aversion parameters were reliably recovered. As before, the consistency parameter was reproduced with less precision.

## Comparison to standard methods

To compare our approach to standard methods, we chose a value-adjusting method as a representative of a class of titration procedures that adjust the amount by half of the difference between the delayed and immediate offers depending on the previous choice (Ripke et al., 2012). Assuming consistent choices, after a certain number of trials this method will locate the indifference point for a given delay. Finally, the hyperbolic function was fit to indifference points of several delays to estimate the discounting rate.

To make the comparison more reliable in terms of controlling for accuracy, the offer ranges, and similar initial assumptions, we simplified the question of estimating the discounting rate to finding the indifference points for certain discounting rates, immediate amounts, and delays. In other words, given a discounting rate of *k*, an immediate amount of *x* _{ i }, and a delay of *d* days, we aimed to locate the indifference point—that is, to find the delayed amount *x* _{ d } such that *x* _{ d } = *x* _{ i }(1 + *kd*). We further assumed that the choices were made using Eq. 3 with fixed values of *β*.

*x*

_{ i }= 10, single delay

*d*= 30, and consistent choices with

*β*= 1.5, we looked at the number of iterations each algorithm needed to locate the indifference point for

*k*∊ {–6, –5.9, ⋅ ⋅ ⋅, 1} (see Fig. 5a). Although the performance of both methods overlaps for small values of

*k*, for stronger discounting one needs to provide larger delayed values, which makes the amount-adjusting method inefficient. (2) The number of iterations was observed for multiple immediate amounts

*x*

_{ i }∊ {5, 10, 15, 20}, multiple delays

*d*= {10, 30, 60, 120, 180}, and consistent choices with

*β*= 1.5 for

*k*∊ {–6, –5.9, ⋅ ⋅ ⋅, 1}. Our approach exhibited more stable performance in this case, too (Fig. 5b). (3) Starting with an immediate amount of

*x*

_{ i }= 10 for multiple delays

*d*= {10, 30, 60, 120, 180}, and allowing 50 trials in total, we applied both methods to reproduce discounting rates of

*k*∊ {–6, –5.9, ⋅ ⋅ ⋅, 1}. For every

*k*, we conducted 1,000 simulations with

*β*~

*N*(0, 2) (Fig. 5c). The median estimated

*k*using our approach (Fig. 5c, top) closely tracked the true values (black line), and the 5th and 95th percentiles (dotted lines) were evenly distributed for different

*k*values. In contrast, the amount-adjusting procedure performed somewhat less well in terms of the medians and skewed residuals (Fig. 5c, bottom).

For the first two cases, we applied each algorithm 10,000 times for every *k* value and random combinations of *x* _{ i } and *d*. We terminated the algorithm either upon reaching the indifference point, up to an absolute error of .05, or after 200 trials if the results did not converge. The delayed offer at the first trial for both methods was set to twice the amount of the true indifference point, 2*x* _{ i }(1 + *kd*). The algorithms had no restriction on the amounts they could offer. To avoid any bias, we used a flat prior on *k* for our algorithm.

*r*= .66,

*p*< .001, as is shown in Fig. 6a.

Furthermore, we applied our algorithm—that is, the sequential Bayesian update—to the data from the standard method. The data comprised a total of 50 choices between delayed offers and a fixed immediate offer. The iterative procedure ran through all trials and updated the estimations on the basis of the trial-by-trial likelihood of the choices. The resulting discounting rates were correlated to the results from the standard method, *r* = .99, *p* < .001, as is given in Fig. 6b, which shows that our approach gave almost the same results as fitting to the indifference points. This introduces a new and computationally easy way of estimating discounting rates in data from standard methods, and it could also serve as a proof of validity for our approach: Assuming that the classical method measures a construct, our algorithm does so to the same degree.

## The battery

During each task, participants are supposed to choose one of the two offers presented simultaneously for 5 s on a computer screen. The time limit was set in light of the average response times from previous assessments. For each trial, the participant’s choice is highlighted with a frame before presenting the next offer. Presenting the outcomes of gambles during the experiment and the time interval of each trial, as well as the number of trials, are all optional and can be set initially. In general, participants are informed by instructions before each task that at the end of the experiment one trial per task will be selected randomly from among their choices and credited toward their compensation. However, these instructions are integrated into the battery and can be modified on the basis of alternative task designs. The temporal delays in DD are set to 3, 7, 14, 31, 61, 180, and 365 days. For PDG and PDL, gambles are played with five possible probability values: 2/3, 1/2, 1/3, 1/4, and 1/5. The task length for DD, PDL, and PDG is 50 trials, and monetary gains/losses range from €3 to €50. For MG, 50 trials are performed, presenting amounts of €1–€40 for gains and €5–€20 for losses. The number of trials, 50, was chosen according to data acquired by previous implementations of the algorithm, so as to end up with stable estimates. At the beginning of the MG task, participants receive €10 as “house money.” During all tasks, offers are randomly assigned to presentation on the left or the right of the screen.

The experiments, including instructions, binary choices, and outcomes, were initially implemented using MATLAB, Release 2010a (The MathWorks, Inc., Natick, MA) and Psychtoolbox 3.0.10, based on the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997), and are now available under GNU Octave. The initial settings of the tasks, including reward types and ranges, temporal delays, probabilities for gains and losses, and gambling, together with the instructions and payment schemes, are easily accessible through the source code. This enables an end user to modify the layout and initial settings on the basis of different hypotheses and requirements.

## Piloting

*SD*= 7.8). The participants were recruited through flyers distributed around the university campus and neighborhood, and they were paid a fixed amount of money for compensation. Participants completed the battery within 19 min, on average, of which the estimation procedure required 13 min (i.e., 6 min for the instructions). The estimated parameters are shown in Fig. 8a as boxplots, and the summary statistics for the sample are presented in Table 2. To test the convergence of the estimation procedure, we calculated the absolute difference between the estimated value of each parameter on any trial, \( {\widehat{k}}_t\left({\widehat{\beta}}_t\right) \), and the final estimation, \( \widehat{k}\left(\widehat{\beta}\right) \). The median of this error term was then depicted for all participants, together with the 75th percentile and maximum values. Figure 9 shows a decreasing pattern that can roughly be interpreted as the convergence of the procedure. Regarding the distributions of the parameters, the Kolmogorov–Smirnov test rejected the normality null hypothesis for DD, PDG, and MG at

*α*= .05 in our sample. Therefore, we used Spearman’s rank correlation coefficient to look for any associations between the parameters, which is summarized in Table 3.

Participant sample description, with *k* and *β* on a logarithmic scale

Median | Mean | | Min | Max | ||
---|---|---|---|---|---|---|

Gender: ♀ 15, ♂ 11 | ||||||

Age (years) | 25 | 26.23 | 7.8 | 20 | 61 | |

DD | | –5.43 | –5.24 | 1.92 | –7.94 | 0.45 |

| –1.50 | –1.35 | 1.32 | –3.36 | 1.14 | |

PDG | | 0.13 | 0.17 | 0.82 | –2.40 | 2.54 |

| –1.67 | –1.62 | 1.05 | –3.34 | 0.90 | |

PDL | | 0.17 | 0.21 | 0.66 | –1.56 | 1.19 |

| –2.20 | –2.15 | 1.02 | –3.73 | 0.05 | |

MG | | 1.96 | 1.95 | 0.70 | 0.59 | 3.97 |

| –0.93 | –1.06 | 1.36 | –3.05 | 2.61 |

Correlations between parameters

DD | PDG | PDL | |
---|---|---|---|

PDG | –.14 | ||

PDL | .33 | –.0003 | |

MG | –.42 | .47 | –.37 |

Previous studies have shown a negative correlation between the probability discounting rates for gains and losses (Shead & Hodgins, 2009; Takahashi, Takagishi, Nishinaka, Makino, & Fukui, 2014). Nevertheless, we observed no correlation, Spearman’s *ρ* = –.0003, though a linear trend was present, as is shown in Fig. 8b, and a post-hoc analysis revealed a power of .35 for our sample size to detect a correlation of –.25 with *α* = .05. Furthermore, the probability discounting rates for gains and losses in our sample did not show any significant differences in terms of means and medians. For mixed gambles, the resulting distribution was in line with the literature, although we endowed the participants with money in advance and used different framings (including zero values). Such differences in the presentation and/or context of instruction have been shown to affect choice behavior (Ert & Erev, 2008; Kahneman & Tversky, 1979; Silberberg, Murray, Christensen, & Asano, 1988; Thaler & Johnson, 1990). Moreover, we observed significant associations between probability discounting for gains and both loss aversion and temporal discounting, *ρ* = –.41, *p* < .05, and *ρ* = .47, *p* < .05, respectively. We also observed a nonsignificant association between loss aversion and probability discounting for losses, *ρ* = –.37, *p* = .061.

We next examined the reproducibility of *β* in the low value range using simulations. Specifically, we conducted simulations for the delay discounting case using samples of the initial parameters distributed according to Table 1 [1,000 simulations with *k* ~ *N*(–5.24, 2) and *β* ~ *N*(–1.35, 1.32)]. The estimated parameters had significant correlations of .92 and .86 for *k* and *β*, respectively. Hence, in simulations the reproducibility appeared acceptable, even at the lower end of the parameter range.

## Discussion

In this study, we developed and implemented a novel adaptive algorithm to measure different metrics related to impulsive and risky decision making, such as the temporal and probability discounting rates and loss aversion.

The main advantage of this approach is its isochronous adaptive nature, which aims at providing the most informative offers at each trial on the basis of the choices that have been made earlier. Theoretically, this should allow for a very efficient inference of behavioral parameters. Accordingly, we showed through simulations and analysis of participant samples that model parameters converge in a few initial trials, both in simulations and in the course of tasks for real experiments. This provides stable estimates and can differentiate between participants, assuming that the hyperbolic and linear value functions, Eqs. 1, 2, and 5, capture the behavior to an acceptable degree. The temporal discounting rates obtained with computerized adjusting-amount procedures (Reynolds, Richards, Horn, & Karraker, 2004; Richards, Zhang, Mitchell, & Wit, 1999; Ripke et al., 2012) or the Monetary Choice Questionnaire (Kirby, Petry, & Bickel, 1999; Koff & Lucas, 2011) vary in terms of their summary statistics between different studies reported in the literature. This makes the results from different studies incomparable with regard to the values. Nevertheless, we showed a correlation of *r* = .66 between our approach and a standard amount-adjusting procedure for the same participants, which seems to be acceptable in terms of test–retest reliability, considering that the two tasks had different settings of amounts and delays (Beck & Triplett, 2009; Craig, Maxfield, Stein, Renda, & Madden, 2014; Kirby, 2009).

Moreover, we showed through simulations that our approach outperforms a standard amount-adjusting method specifically for the higher values of delay discounting rates. Recent advances (Koffarnus & Bickel, 2014; Yoon & Chapman, 2016) have introduced methods to estimate discounting parameters with very few trials (five and ten, respectively, for Koffarnus & Bickel and Yoon & Chapman). Both methods are theoretically variants of titration procedures. Such extremely brief tasks are advantageous because they save time, but due to theoretical considerations we assume that this might come at the cost of precision. Future research might therefore compare these very short task variants with our algorithm by utilizing simulations and participant samples.

The algorithm was also used to reestimate the discounting rates for data that were acquired by nonadaptive methods. The initial estimates and reestimates were nearly identical (*r* = .99). Furthermore, the medians of the probability discounting rates in our piloting were very close to zero (log-transformed), which is theoretically consistent and corresponds to the evaluation of the risky option by its expected value. For loss aversion, the medians were comparable to what has been reported in the literature (Tom et al., 2007).

The described algorithm was used to implement four independent measures of value-based decision making as an experimental package. This shows that the Bayesian framework can easily be adapted to assess a range of behavioral concepts and suggests that future work can be directed toward the development of additional tasks. For our test battery, we provide a graphical user interface that gives access to the model and runtime variables, as well as task-specific settings such that one can set the values for every single experiment. This and the efficient inference of behavior enhance the flexibility to cover different ranges and commodities of outcomes, even when time is a limiting factor. Most importantly, behavioral estimations are immediately available along with other data such as response times, rendering post-hoc parameter inference unnecessary.

The adaptiveness of the estimation procedure and the instruction disclosing that the outcome of a randomly picked trial will be credited as compensation make our approach vulnerable to reverse-engineering. For example, in DD after observing a delayed choice, the algorithm increases the relative amount of the immediate option for the next offer. Some participants will deliberately pick up the delayed option upon learning this pattern until the immediate offers reach a maximum. To reduce this effect, we distributed random offers in the course of the tasks, to make the patterns less obvious. This, however, decreases the power of the algorithm because of added noise and sampling from less informative data. On the other hand, in exceptional cases of extreme behavior—for example, if a participant tends to take just one type of offer, such as the immediate ones in delay discounting—the procedure runs normally, but these cases could be treated in a different way by allowing for adaptive offer ranges. However, these instances are easily detectable by their value and choice behavior and can be considered for treatment as outliers.

In summary, we developed a new approach for adaptive offer presentation in binary choice settings to estimate discounting parameters. We showed that this approach is quick, reliable, and outperforms the most widely used classical method. Furthermore, it can be easily transferred to other concepts of decision making. Our findings support construct validity under the mathematical framework and we conclude that the proposed Bayesian approach is a functional alternative to those existing in the literature. Thus, our work might advance the evaluation of decision-making processes in single studies or in research consortia that need to collect high numbers of datasets in a flexible and efficient way. Nevertheless, future studies will be required, in order to improve the estimation precision of the consistency parameter, *β*, as well as to compare with more recent methods and other complex decision models with more parameters.

## Notes

### Author note

We thank Zeb Kurth-Nelson for sharing his ideas on the mathematical framework, Nils B. Kroemer for insightful discussions, and Elisabeth Jünger and Christian Sommer for collecting the pilot data. This study was supported by the Deutsche Forschungsgemeinschaft (DFG FOR 1617 Grants RA 1047/2-1, SM 80/7-1, and SM 80/7-2; DFG SPP 1226 Grant SM 80/5-2; and DFG SFB 940/1 and SFB 940/2 grants). Q.J.M.H. and M.S. contributed to the conception and design of the study. A.G. and Q.J.M.H. implemented the mathematical algorithm, which was improved by S.P. for this work. Piloting of participants and data collection were performed by N.B., and N.B. and S.P. drafted the manuscript. All authors provided critical revision of the manuscript for important intellectual content and approved the final version for publication.

## Supplementary material

## References

- Ainslie, G. (1975). Specious reward: Behavioral theory of impulsiveness and impulse control.
*Psychological Bulletin, 82,*463–496. doi: 10.1037/H0076860 CrossRefPubMedGoogle Scholar - Baker, F., Johnson, M. W., & Bickel, W. K. (2003). Delay discounting in current and never-before cigarette smokers: Similarities and differences across commodity, sign, and magnitude.
*Journal of Abnormal Psychology, 112,*382–392.CrossRefPubMedGoogle Scholar - Beck, R. C., & Triplett, M. F. (2009). Test–retest reliability of a group-administered paper-pencil measure of delay discounting.
*Experimental and Clinical Psychopharmacology, 17,*345–355. doi: 10.1037/a0017078 CrossRefPubMedGoogle Scholar - Bickel, W. K., Jarmolowicz, D. P., Mueller, E. T., Koffarnus, M. N., & Gatchalian, K. M. (2012). Excessive discounting of delayed reinforcers as a trans-disease process contributing to addiction and other disease-related vulnerabilities: Emerging evidence.
*Pharmacology & Therapeutics, 134,*287–297. doi: 10.1016/j.pharmthera.2012.02.004 CrossRefGoogle Scholar - Bjork, J. M., Hommer, D. W., Grant, S. J., & Danube, C. (2004). Impulsivity in abstinent alcohol-dependent patients: Relation to control subjects and type 1-/type 2-like traits.
*Alcohol, 34,*133–150. doi: 10.1016/j.alcohol.2004.06.012 CrossRefPubMedGoogle Scholar - Blanco, C., Potenza, M. N., Kim, S. W., Ibáñez, A., Zaninelli, R., Saiz-Ruiz, J., & Grant, J. E. (2009). A pilot study of impulsivity and compulsivity in pathological gambling.
*Psychiatry Research, 167,*161–168. doi: 10.1016/j.psychres.2008.04.023 CrossRefPubMedPubMedCentralGoogle Scholar - Brainard, D. H. (1997). The Psychophysics Toolbox.
*Spatial Vision, 10,*433–436. doi: 10.1163/156856897X00357 CrossRefPubMedGoogle Scholar - Burns, K., & Bechara, A. (2007). Decision making and free will: A neuroscience perspective.
*Behavioral Sciences and the Law, 25,*263–280. doi: 10.1002/bsl.751 CrossRefPubMedGoogle Scholar - Craig, A. R., Maxfield, A. D., Stein, J. S., Renda, C. R., & Madden, G. J. (2014). Do the adjusting-delay and increasing-delay tasks measure the same construct: Delay discounting?
*Behavioural Pharmacology, 25,*306–315. doi: 10.1097/Fbp.0000000000000055 CrossRefPubMedPubMedCentralGoogle Scholar - Dom, G., D’haene, P., Hulstijn, W., & Sabbe, B. (2006). Impulsivity in abstinent early- and late-onset alcoholics: Differences in self-report measures and a discounting task.
*Addiction, 101,*50–59. doi: 10.1111/j.1360-0443.2005.01270.x CrossRefPubMedGoogle Scholar - Doyle, J. R. (2013). Survey of time preference, delay discounting models.
*Judgment and Decision Making, 8,*116–135.Google Scholar - Ert, E., & Erev, I. (2008). The rejection of attractive gambles, loss aversion, and the lemon avoidance heuristic.
*Journal of Economic Psychology, 29,*715–723. doi: 10.1016/j.joep.2007.06.003 CrossRefGoogle Scholar - Frydman, C., Camerer, C., Bossaerts, P., & Rangel, A. (2011). MAOA-L carriers are better at making optimal financial decisions under risk.
*Proceedings of the Royal Society B, 278,*2053–2059. doi: 10.1098/rspb.2010.2304 CrossRefPubMedGoogle Scholar - Garvert, M. M., Moutoussis, M., Kurth-Nelson, Z., Behrens, T. E., & Dolan, R. J. (2015). Learning-induced plasticity in medial prefrontal cortex predicts preference malleability.
*Neuron, 85,*418–428. doi: 10.1016/j.neuron.2014.12.033 CrossRefPubMedPubMedCentralGoogle Scholar - Green, L., & Myerson, J. (2004). A discounting framework for choice with delayed and probabilistic rewards.
*Psychological Bulletin, 130,*769–792. doi: 10.1037/0033-2909.130.5.769 CrossRefPubMedPubMedCentralGoogle Scholar - Green, L., Fry, A. F., & Myerson, J. (1994). Discounting of delayed rewards: A life-span comparison.
*Psychological Science, 5,*33–36. doi: 10.1111/j.1467-9280.1994.tb00610.x CrossRefGoogle Scholar - Green, L., Myerson, J., Shah, A. K., Estle, S. J., & Holt, D. D. (2007). Do adjusting-amount and adjusting-delay procedures produce equivalent estimates of subjective value in pigeons?
*Journal of the Experimental Analysis of Behavior, 87,*337–347. doi: 10.1901/jeab.2007.37-06 CrossRefPubMedPubMedCentralGoogle Scholar - Hare, T., Hakimi, S., & Rangel, A. (2014). Activity in dlPFC and its effective connectivity to vmPFC are associated with temporal discounting.
*Frontiers in Neuroscience, 8,*50. doi: 10.3389/fnins.2014.00050 CrossRefPubMedPubMedCentralGoogle Scholar - Holt, D. D., Green, L., & Myerson, J. (2012). Estimating the subjective value of future rewards: Comparison of adjusting-amount and adjusting-delay procedures.
*Behavioural Processes, 90,*302–310. doi: 10.1016/j.beproc.2012.03.003 CrossRefPubMedPubMedCentralGoogle Scholar - Johnson, M. W., & Bickel, W. K. (2002). Within-subject comparison of real and hypothetical money rewards in delay discounting.
*Journal of the Experimental Analysis of Behavior, 77,*129.CrossRefPubMedPubMedCentralGoogle Scholar - Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
*Econometrica, 47,*263–291. doi: 10.2307/1914185 CrossRefGoogle Scholar - Kahneman, D., & Tversky, A. (1984). Choices, values, and frames.
*American Psychologist, 39,*341–350. doi: 10.1037/0003-066X.39.4.341 CrossRefGoogle Scholar - Kirby, K. N. (2009). One-year temporal stability of delay-discount rates.
*Psychonomic Bulletin & Review, 16,*457–462. doi: 10.3758/PBR.16.3.457 CrossRefGoogle Scholar - Kirby, K. N., Petry, N. M., & Bickel, W. K. (1999). Heroin addicts have higher discount rates for delayed rewards than non-drug-using controls.
*Journal of Experimental Psychology: General, 128,*78–87.CrossRefGoogle Scholar - Koff, E., & Lucas, M. (2011). Mood moderates the relationship between impulsiveness and delay discounting.
*Personality and Individual Differences, 50,*1018–1022. doi: 10.1016/j.paid.2011.01.016 CrossRefGoogle Scholar - Koffarnus, M. N., & Bickel, W. K. (2014). A 5-trial adjusting delay discounting task: Accurate discount rates in less than one minute.
*Experimental and Clinical Psychopharmacology, 22,*222–228. doi: 10.1037/a0035973 CrossRefPubMedPubMedCentralGoogle Scholar - Koffarnus, M. N., Jarmolowicz, D. P., Mueller, E. T., & Bickel, W. K. (2013). Changing delay discounting in the light of the competing neurobehavioral decision systems theory: A review.
*Journal of the Experimental Analysis of Behavior, 99,*32–57. doi: 10.1002/jeab.2 CrossRefPubMedGoogle Scholar - Krishnan-Sarin, S., Reynolds, B., Duhig, A. M., Smith, A., Liss, T., McFetridge, A.,…Potenza, M. N. (2007). Behavioral impulsivity predicts treatment outcome in a smoking cessation program for adolescent smokers.
*Drug and Alcohol Dependence*,*88*, 79–82. doi: 10.1016/j.drugalcdep.2006.09.006 - Lewi, J., Butera, R., & Paninski, L. (2008). Sequential optimal design of neurophysiology experiments.
*Neural Computation, 21,*619–687. doi: 10.1162/neco.2008.08-07-594 CrossRefGoogle Scholar - Loewenstein, G. (1988). Frames of mind in intertemporal choice.
*Management Science, 34,*200–214.CrossRefGoogle Scholar - Lovric, M. (2010).
*International encyclopedia of statistical science*. New York, NY: Springer.Google Scholar - MacKillop, J., & Kahler, C. W. (2009). Delayed reward discounting predicts treatment response for heavy drinkers receiving smoking cessation treatment.
*Drug and Alcohol Dependence, 104,*197–203. doi: 10.1016/j.drugalcdep.2009.04.020 CrossRefPubMedPubMedCentralGoogle Scholar - MacKillop, J., Amlung, M. T., Few, L. R., Ray, L. A., Sweet, L. H., & Munafò, M. R. (2011). Delayed reward discounting and addictive behavior: A meta-analysis.
*Psychopharmacology, 216,*305–321. doi: 10.1007/s00213-011-2229-0 CrossRefPubMedPubMedCentralGoogle Scholar - Madden, G. J., & Johnson, P. S. (2010). A delay-discounting primer. In G. J. M. W. K. Bickel (Ed.),
*Impulsivity: The behavioral and neurological science of discounting*(pp. 11–37). Washington, DC: American Psychological Association.CrossRefGoogle Scholar - Madden, G. J., Petry, N. M., Badger, G. J., & Bickel, W. K. (1997). Impulsive and self-control choices in opioid-dependent patients and non-drug-using control patients: Drug and monetary rewards.
*Experimental and Clinical Psychopharmacology, 5,*256.CrossRefPubMedGoogle Scholar - Madden, G. J., Petry, N. M., & Johnson, P. S. (2009). Pathological gamblers discount probabilistic rewards less steeply than matched controls.
*Experimental and Clinical Psychopharmacology, 17,*283–290. doi: 10.1037/A0016806 CrossRefPubMedPubMedCentralGoogle Scholar - Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.),
*Quantitative analysis of behavior: Vol. 5. The effect of delay and of intervening events on reinforcement value*(pp. 55–73). Hillsdale, NJ: Erlbaum.Google Scholar - Mitchell, J. M., Fields, H. L., D’Esposito, M., & Boettiger, C. A. (2005). Impulsive responding in alcoholics.
*Alcoholism: Clinical and Experimental Research, 29,*2158–2169. doi: 10.1097/01.alc.0000191755.63639.4a CrossRefGoogle Scholar - Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting.
*Journal of the Experimental Analysis of Behavior, 76,*235–243. doi: 10.1901/jeab.2001.76-235 CrossRefPubMedPubMedCentralGoogle Scholar - Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies.
*Spatial Vision, 10,*437–442. doi: 10.1163/156856897X00366 CrossRefPubMedGoogle Scholar - Petry, N. M. (2003). Discounting of money, health, and freedom in substance abusers and controls.
*Drug and Alcohol Dependence, 71,*133–141. doi: 10.1016/S0376-8716(03)00090-5 CrossRefPubMedGoogle Scholar - Petry, N. M. (2012). Discounting of probabilistic rewards is associated with gambling abstinence in treatment-seeking pathological gamblers.
*Journal of Abnormal Psychology, 121,*151–159. doi: 10.1037/A0024782 CrossRefPubMedGoogle Scholar - Rachlin, H., Raineri, A., & Cross, D. (1991). Subjective-probability and delay.
*Journal of the Experimental Analysis of Behavior, 55,*233–244. doi: 10.1901/jeab.1991.55-233 CrossRefPubMedPubMedCentralGoogle Scholar - Reynolds, B., Richards, J. B., Horn, K., & Karraker, K. (2004). Delay discounting and probability discounting as related to cigarette smoking status in adults.
*Behavioural Processes, 65,*35–42.CrossRefPubMedGoogle Scholar - Richards, J. B., Zhang, L., Mitchell, S. H., & Wit, H. (1999). Delay or probability discounting in a model of impulsive behavior: Effect of alcohol.
*Journal of the Experimental Analysis of Behavior, 71,*121–143.CrossRefPubMedPubMedCentralGoogle Scholar - Ripke, S., Hübner, T., Mennigen, E., Müller, K. U., Rodehacke, S., Schmidt, D.,…Smolka, M. N. (2012). Reward processing and intertemporal decision making in adults and adolescents: The role of impulsivity and decision consistency.
*Brain Research*,*1478*, 36–47. doi: 10.1016/j.brainres.2012.08.034 - Sebastiani, P., & Wynn, H. P. (2000). Maximum entropy sampling and optimal Bayesian experimental design.
*Journal of the Royal Statistical Society: Series B, 62,*145–157.CrossRefGoogle Scholar - Shead, N. W., & Hodgins, D. C. (2009). Probability discounting of gains and losses: Implications for risk attitudes and impulsivity.
*Journal of the Experimental Analysis of Behavior, 92,*1–16. doi: 10.1901/Jeab.2009.92-1 CrossRefPubMedPubMedCentralGoogle Scholar - Silberberg, A., Murray, P., Christensen, J., & Asano, T. (1988). Choice in the repeated-gambles experiment.
*Journal of the Experimental Analysis of Behavior, 50,*187–195. doi: 10.1901/jeab.1988.50-187 CrossRefPubMedPubMedCentralGoogle Scholar - Simpson, C. A., & Vuchinich, R. E. (2000). Reliability of a measure of temporal discounting.
*Psychological Record, 50*(1), 3–16.CrossRefGoogle Scholar - Smith, C. L., & Hantula, D. A. (2008). Methodological considerations in the study of delay discounting in intertemporal choice: A comparison of tasks and modes.
*Behavior Research Methods, 40,*940–953. doi: 10.3758/brm.40.4.940 CrossRefPubMedGoogle Scholar - Stanger, C., Ryan, S. R., Fu, H., Landes, R. D., Jones, B. A., Bickel, W. K., & Budney, A. J. (2012). Delay discounting predicts adolescent substance abuse treatment outcome.
*Experimental and Clinical Psychopharmacology, 20,*205–212. doi: 10.1037/a0026543 CrossRefPubMedGoogle Scholar - Sutton, R. S., & Barto, A. G. (1998).
*Reinforcement learning: An introduction*(Vol. 1). Cambridge, MA: MIT Press.Google Scholar - Takahashi, T., Takagishi, H., Nishinaka, H., Makino, T., & Fukui, H. (2014). Neuroeconomics of psychopathy: Risk taking in probability discounting of gain and loss predicts psychopathy.
*Neuroendocrinology Letters, 35,*510–517.PubMedGoogle Scholar - Thaler, R. H., & Johnson, E. J. (1990). Gambling with the house money and trying to break even: The effects of prior outcomes on risky choice.
*Management Science, 36,*643–660.CrossRefGoogle Scholar - Tom, S. M., Fox, C. R., Trepel, C., & Poldrack, R. A. (2007). The neural basis of loss aversion in decision-making under risk.
*Science, 315,*515–518. doi: 10.1126/science.1134239 CrossRefPubMedGoogle Scholar - Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty.
*Journal of Risk and Uncertainty, 5,*297–323.CrossRefGoogle Scholar - Vincent, B. T. (2016). Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks.
*Behavior Research Methods, 48,*1608–1620. doi: 10.3758/s13428-015-0672-2 CrossRefPubMedGoogle Scholar - Von Neumann, J., & Morgenstern, O. (1944).
*Theory of games and economic behavior*. Princeton, NJ: Princeton University Press.Google Scholar - Yechiam, E., Busemeyer, J. R., Stout, J. C., & Bechara, A. (2005). Using cognitive models to map relations between neuropsychological disorders and human decision-making deficits.
*Psychological Science, 16,*973–978. doi: 10.1111/j.1467-9280.2005.01646.x CrossRefPubMedGoogle Scholar - Yoon, H., & Chapman, G. B. (2016). A closer look at the yardstick: A new discount rate measure with precision and range.
*Journal of Behavioral Decision Making, 29,*470–480. doi: 10.1002/bdm.1890 CrossRefGoogle Scholar