# Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks

## Abstract

A state-of-the-art data analysis procedure is presented to conduct hierarchical Bayesian inference and hypothesis testing on delay discounting data. The delay discounting task is a key experimental paradigm used across a wide range of disciplines from economics, cognitive science, and neuroscience, all of which seek to understand how humans or animals trade off the immediacy verses the magnitude of a reward. Bayesian estimation allows rich inferences to be drawn, along with measures of confidence, based upon limited and noisy behavioural data. Hierarchical modelling allows more precise inferences to be made, thus using sometimes expensive or difficult to obtain data in the most efficient way. The proposed probabilistic generative model describes how participants compare the present subjective value of reward choices on a trial-to-trial basis, estimates participant- and group-level parameters. We infer discount rate as a function of reward size, allowing the magnitude effect to be measured. Demonstrations are provided to show how this analysis approach can aid hypothesis testing. The analysis is demonstrated on data from the popular 27-item monetary choice questionnaire (Kirby, Psychonomic Bulletin & Review, **16**(3), 457–462 2009), but will accept data from a range of protocols, including adaptive procedures. The software is made freely available to researchers.

### Keywords

Decision making Delay discounting Inter-temporal choice Magnitude effect Time preference Bayesian estimation MCMC Financial psychophysics## Introduction

Appropriately trading off the immediacy versus the magnitude of a reward is a fundamental aspect of decision making across many domains. Would you like 1 marshmallow now, or 2 in 15 minutes? Should you spend your wages on a holiday now, or contribute to a larger pension in a few decades time? Should society consume fossil fuels now or maintain the biosphere in the long run? Learning how people discount future rewards is crucial across many fields of study, so that we can understand, predict, and nudge people’s decisions.

Psychologists attempt to survey the behavioural phenomena and propose cognitive mechanisms (Mischel et al. 1972; Green et al. 1994; Weatherly and Weatherly 2014). Neuroscientists study the neural mechanisms (Cohen et al. 2004; Kable and Glimcher 2007; Kalenscher and Pennartz 2008; Peters and Büchel 2011). Theorists attempt to explain why discounting behaviour might arise in the first place (Kurth-Nelson et al. 2012; Killeen 2009; Fawcett et al. 2012; Stevens and Stephens 2010; Cui 2011; Sozou 1998). Economists attempt to understand microeconomic decision making with focus upon any violations of rationality (Frederick et al. 2002). And policy theorists study how groups with different time preferences could come to a collective decision over issues such as climate change (Millner and Heal 2014). Therefore, making accurate and rich inferences about how people discount future rewards is important in a wide variety of domains.

*A*now, or £

*B*in

*D*days.” Most people would choose an immediate reward of £100 now over £101 in 1 year, but as the value of the delayed reward increases there will be a point at which the delayed reward becomes preferable. Behaviour in delay discounting tasks will depend upon how a participant discounts future rewards, and this is often measured in the form of a discount function describing how the present subjective value of a reward decreases as its delivery is delayed. A more general form of the delay discounting question has also been explored in the form “Would you prefer £

*A*in

*D*

^{A}days, or £

*B*in

*D*

^{B}days.”

### Challenges

The first challenge is that inferences about a participant’s discounting behaviour are based upon limited number of data. For example, given the data points from a hypothetical participant in Fig. 1, we can see that we will have a degree of uncertainty over the discount function; each of the 3 discount functions shown are about equally consistent with the data. If we had more data points we could better infer the discount function, but this comes at a cost. Data cost is most acute in experiments paying real monetary rewards, but even if the rewards are hypothetical, testing time is still a limiting factor, particularly so when testing special populations.

Second, participants occasionally make response errors so some of the data does not accurately represent subjective preferences. Figure 1 demonstrates two likely response errors; should we let these bias our estimates of a participant’s discount rate, or should we discount them as response errors. How can we incorporate this uncertainty when inferring a participant’s discount rate?

Third, previous research has established that a participants’ intertemporal preferences are influenced by many factors such as age (Green et al. 1994), income (Green et al. 1996), and the magnitude of rewards on offer (Kirby and Maraković 1996; Johnson and Bickel 2002). Can we achieve better parameter estimates and more robust hypothesis tests by incorporating prior knowledge that discount rates may vary as a function of covariates of interest?

### Solutions

^{1}. The software is simple to use (see Section “Using the software”), installation and usage instructions are provided, and can fit within a research workflow described in Fig. 2. By addressing the challenges set out above, this work makes a number of contributions to the analysis of delay discounting data:

- 1.
I formulated a novel Bayesian probabilistic generative model of behaviour in the delay discounting task (Section “The model”). This allows us to: specify prior knowledge about latent variables driving discounting behaviour, update this knowledge rationally in the light of new empirical data, specify our level of confidence (or lack thereof) in our beliefs, and to conduct Bayesian hypothesis testing. The proposed model allows hierarchical Bayesian inference (also known as multi-level modelling) to be conducted on the trial level, the participant level, and at the group level. This means, for example, that relationships between participant covariates (such as age) and discount rates can affect inferences made at the participant level. The model is flexible enough to analyse data obtained from a variety of delay discounting protocols where the question is in the form “Would you prefer £

*A*in*D*^{A}days, or £*B*in*D*^{B}days.” A rich set of inferences result from using this model, demonstrated in Section “An example” with a dataset consisting of 15 participants who completed the widely used 27-item monetary choice questionnaire (Kirby 2009). - 2.
I explicitly account for measurement error, that is erroneous responses given by the participant in the delay discounting task which are not truly reflective of their discounting preferences. This is done with a psychometric function which describes how participants’ responses are probabilistically related to the present subjective value of the sooner and later rewards (Section “Choosing between options”). This approach is common in visual psychophysics, and I propose it is useful to incorporate it into what is essentially financial psychophysics. This allows us to distinguish between a participant’s baseline response error rate, and errors that may arise from their imprecise comparison between the present subjective values of

*A*and*B*. - 3.
I utilise prior knowledge of factors that influence discount rates in two ways. Firstly, the structure of the model incorporates our knowledge that participant’s discount rate decreases as the reward magnitude increases (the magnitude effect, see Section “Calculating present subjective value”). So rather than estimating a discount rate, we estimate how the discount rate varies as a function of reward magnitude. Secondly, Appendix B discusses the approach to extend the model to consider linear relationships between participant covariates (such as age) and group level parameters.

## The model

### The probabilistic generative model

#### Calculating present subjective value

*A*) and the longer larger reward (£

*B*), see next section. However the participant’s do not compare these quantities directly, but the present subjective value of each reward. That is, the present subjective value of a reward

*V*

^{reward}is equal to the actual reward multiplied by a discount factor. We assume the simple 1-parameter hyperbolic discount function

^{2}(Mazur 1987),

*D*

^{A}= 0), then

*V*

^{A}=

*A*. No strong claim is being made that Equation 1 is a true description of how people discount future rewards (e.g. Luhmann, 2013), it was chosen because of its ubiquitous use in the literature, but also see Section “Choice of the 1-parameter discount function”. However, the magnitude effect shows that the discount rate

*k*varies as a function of the magnitude of the delayed reward (see Fig. 4), so the actual function used was

*f*(reward) describes the magnitude effect. Based upon Figure 8 of Johnson and Bickel (2002) we assume log(

*k*) =

*m*log(reward) +

*c*, where

*m*and

*c*describe the slope and intercept of a line describing how the log discount rate decreases linearly as log reward magnitude increases (also see Appendix A). Therefore

#### Choosing between options

*V*

^{B}−

*V*

^{A}>0, or the immediate reward otherwise. Participant’s responses may not be so clear cut however, so we model the participant’s probability of choosing the delayed reward using a psychometric function (see Fig. 5). This relates the difference between the present subjective value of the delayed and immediate rewards (

*V*

^{B}−

*V*

^{A}) to the probability of choosing the delayed reward,

*α*defines the ‘acuity’ of the comparison between options (Fig. 5). The mechanisms responsible for this uncertainty are not explored further here. But if

*α*= 0, then the psychometric function would be a step function and participants would always choose the reward with highest present subjective value (Fig. 5a). As

*α*increases however, it means there is more error in this comparison between

*V*

^{A}and

*V*

^{B}. Comparison acuity

*α*takes on positive values, and we define a simple Normal prior distribution (truncated at zero) with uninformative uniform priors over the mean

*μ*

^{α}and standard deviation

*σ*

^{α}. The psychometric function also incorporates that participants make response errors at some unknown rate

*𝜖*(Fig. 5c, d) regardless of the proximity of the present subjective values of the rewards.

Responses are modelled as Bernoulli trials (a biased coin flip) where *P*(choose delayed) is the bias, and a value of *R* = 1 means the delayed choice (*B*) was preferred.

### Hierarchical modelling

Figure 3 shows that we model at the trial, participant, and group levels. Interested readers are referred to Lee (2011) for an introduction to hierarchical Bayesian modelling. The model presented here is useful when analysing data from participants assumed to be drawn from a single group level population.

Each participant has 4 parameters governing their magnitude effect (slope *m*_{p}, and intercept *c*_{p}), comparison acuity *α*_{p}, and error rate *ε*_{p}. Participant level magnitude effect parameters were assumed to be normally distributed. Comparison acuity parameters were also assumed to be normally distributed but were constrained to take on positive values. The error rate parameter was assumed to be Beta distributed, but values above 0.5 were not allowed as this flips the psychometric function, implying that participants prefer the smaller of the present subjective values.

Rather than making inferences about participants in isolation, it is appealing to be able to infer something about the population that these participants are drawn from. The hierarchical modelling approach allows us to do this by modelling participants (and their 4 parameters per person) as being drawn from a group level distribution (see outer nodes in Fig. 3). The actual priors therefore are placed upon these group-level parameters. The group level magnitude effect slope and intercept priors were based upon estimates from previous literature (see Fig. 3 and Appendix A). Uninformative priors were chosen for the group level comparison acuity parameters. The group level error rate priors represented a mild belief that the error rate was 1 % (for *ω*) and an uninformative prior was placed over the concentration parameter *K*.

Our knowledge of the population from which the participants were drawn from (the group level) can be summarised simply by generating posterior predictive distributions for our 4 parameters (m, c, *α*, *ε*), which we label as *G*^{m},*G*^{c},*G*^{α},*G*^{ε}, respectively. These can also be thought of as representing our prior over the parameters of an untested participant randomly selected from the population, given the currently observed participant dataset.

### Using the software

- 1.
Collect experimental data and save it in a tab-delimited text file with columns representing

*A*,*D*^{A},*B*,*D*^{B},*R*, and each row representing a trial. Example datasets are provided. - 2.
Start Matlab, navigate to the folder containing the analysis software downloaded from the webpage above.

- 3.
An analysis session can be run with a few simple commands, outlined in full in the software documentation. Example analysis scripts are provided. Functions are provided to export publication-quality figures such as those shown in Section “An example”, and best-guess (point estimate) parameters for later use in hypothesis testing.

^{3}where the software is freely available to download.

### Inference using MCMC

The analysis code is written with Matlab. The JAGS package was used to conduct Markov chain Monte Carlo (MCMC) sampling based inference (Plummer 2003). By default, the posterior distributions are estimated with a total of 100,000 MCMC samples over 2 chains. This excludes the first 1,000 samples of each chain which were discarded (the burn-in interval). Convergence upon the true posterior distributions was checked by visual inspection of the MCMC chains, and by the \(\hat {R}\) statistic being closer to 1 than 1.001 for all estimated parameters.

## An example

### Delay discounting dataset

The analysis software was applied to delay discounting data from 15 participants who completed a 27-item monetary choice questionnaire (Kirby, 2009), denominated in pounds sterling rather than U.S. dollars. This questionnaire does not include a front-end delay, so the smaller sooner reward was to be delivered immediately (*D*^{A} = 0). No particular research question was posed, the example is provided to demonstrate the nature of the inferences drawn, and how to use the software.

### Summarising our inferences about delay discounting behaviour

This particular dataset shows, for all parameters, little between-participant variance. The parameter distributions for each participant are broadly similar to the group level parameter distributions. This could be because participants display similar discounting behaviour, or it could be because the the data is insufficient to draw more precise conclusions about each participant. This should not be surprising as we only have 27 questions per participant, and the fact that the rewards cover a limited range of magnitudes (£11 – £85) means that *a priori* we should not expect the data to maximally constrain our estimates of the magnitude effect.

### Rich forms of analysis

*m*,

*c*) are also shown as a bivariate density plot (column 3). This magnitude effect function is visualised in column 4, again with 95 % credible regions. The MAP estimate

*m*and

*c*parameters were used to visualise the discount surface (column 4). Participant level plots also show raw response data. This is not present at the group level because the hierarchical inference does not operate by simply pooling all the participant-level data; there is no group level data, just latent parameters describing the group’s properties.

Visualising the psychometric function (column 2) is informative. A perfect financial decision maker would have a psychometric curve that was a step function, which means they would always choose the higher of the immediate reward *V*^{A} = *A* and the present subjective value of the delayed reward *V*^{B}. Inspecting the response data and discount surfaces (column 5) we can see that with rare exceptions, the discount surface successfully separates immediate and delayed choices.

The bivariate density plots of the magnitude effect parameters (column 3) demonstrates our uncertainty about the parameters is anti-correlated, which is expected for slope and intercept parameters. This knowledge was not available from the univariate summary in Fig. 6.

When the posterior of these parameters is used to generate a magnitude effect plot (column 4) we can see that our certainty is highest around the reward magnitudes in the monetary choice questionnaire, between £11 – £85. We rightly have less confidence in the group and participant discount rates at much lower or higher reward magnitudes.

### Hypothesis tests

In order to demonstrate how the parameter estimation can contribute to research conclusions, we test whether there is evidence that participants exhibit a magnitude effect, more specifically that the slope of the magnitude effect at the group level is less than zero.

#### Traditional workflow

Using the traditional workflow (see Fig. 2), point estimates (posterior mode) of *m*_{p} were analysed using JASP statistical software (Love et al. 2015). A Bayesian one-sample t-test (Rouder et al. 2009) was conducted to evaluate if the population mean *μ* was less than 0: \(\mathcal {H}_{0}: \mu = 0, \mathcal {H}_{1}: \mu < 0\). The resulting log Bayes Factor was log(*B**F*_{10})=43.2 meaning that, under the scale of Jeffreys (1961), we have decisive evidence for \(\mathcal {H}_{1}\), that there is a population level magnitude effect.

#### Fully Bayesian methods

However, we can achieve more robust research conclusions by using the more advanced workflow (see Fig. 2). By exporting point estimates of the slope of the magnitude effect for each participant *m*_{p}, we lost all knowledge of how certain or uncertain we were in those estimates. A fully Bayesian approach can be taken by drawing research conclusions from posterior distributions directly, in this case the group level magnitude effect slope *G*^{m}.

There are two approaches that could be taken. The first is Bayesian hypothesis testing and results in a Bayes Factor summarising the evidence for or against competing hypotheses. The second approach is parameter estimation, where for example the focus is on estimating an effect size of value of a parameter. Rather than one being more correct than another, they achieve different goals (Kruschke 2011). There is a lively debate over which of these methods will be most scientifically useful (e.g. Morey et al., 2014; Kruschke and Liddell, 2015; Wagenmakers et al., 2015) and so I demonstrate both methods applied to the question, is the magnitude effect slope less than zero at the group level?

*B*

*F*

_{01}≈545 (see Fig. 8a), meaning that again we have decisive evidence that the slope of the magnitude effect at the group level is less than 0 (Jeffreys 1961).

Under the parameter estimation approach however, we simply examine our posterior over the relevant latent variable *G*^{m} (see Fig. 8b). We see that the 95 % credible region is far from the value of interest, *G*^{m} = 0 and so it is reasonable to believe the slope of the magnitude effect at the group level is less than one.

#### Extending the model

The default analysis strategy is to estimate best guess parameters and conduct hypothesis testing alongside other participant data in an alternative software package (see Fig. 2 and Section “Traditional workflow”). In some research contexts however the ‘fully Bayesian’ approach can be taken, as in the above section, but in many research contexts this will require extending the model. Readers are referred to Kruschke (2015) for a thorough overview of Bayesian approaches to general linear modelling, which can be added on to account for relationships between latent parameters and observed participant covariates. Appendix B outlines two examples of how to do this: when participants are members of 1 of *G* groups (a discrete, within participant variable), and when a group-level parameter varies as a function of a covariate.

## Discussion

### Choice of the 1-parameter discount function

The aim of this paper is to introduce Bayesian estimation and hypothesis testing for delay discounting tasks. It is open for debate whether discounted value or discounted utility is the best way to frame discounting behaviour and we do not know the ‘correct’ discount function. The 1-parameter hyperbolic discount function (Equation 2) was used for a number of reasons. First, even though it is unlikely to be a complete description of how people discount future rewards (e.g. Luhmann, 2013), the 1-parameter hyperbolic model is simple and provides a good account of discounting in human (McKerchar et al. 2009) and non-human animals (Freeman et al. 2009). Second, a comparison of 4 prominent models shows clear superiority for the 1-parameter hyperbolic over a 1-parameter exponential model, but there is no clear rationale for placing attention on one of the more complex discount functions (McKerchar et al. 2009; Doyle 2013).

### Comparison to existing analysis approaches

A common way to analyse data from delay discounting tasks is to fit a curve to indifference points as a function of delay (Robles and Vargas 2008; Reynolds and Schiffbauer 2004; McKerchar et al. 2009; Whelan and McHugh 2009). The benefit of this approach is its simplicity. However, it does not easily allow for the number of trials to influence the certainty we have about the estimates made. Bootstrap procedures may be used, but they do not have the same intuitive meaning as Bayesian posterior distributions or credible intervals. A straight forward extension to estimating the magnitude effect (by independently estimating discount rates for different reward magnitudes) would also be limited without some form of hierarchical treatment They also place emphasis upon the parametric form of the discount function rather than the processes underlying behaviour. The approach presented does indeed rely upon a parametric form of the discount rate, but the explicit modelling of choices made on a trial to trial basis means that this model can be extended in future work to explain causal processes responsible for discounting behaviour (see Vincent, 2015, for a perceptual decision making example).

An advantage of the present model is that it utilises all trial data from each participant. This, in combination with Bayesian inference methods, allows the confidence in parameters to be influenced by the number of trials and participants. Wileyto et al. (2004) present an analysis approach which also utilises data from every trial, which was also applied to the 27 item questionnaire. While this approach is better than fitting parameters to indifference points, that approach did not model the magnitude effect, did not include hierarchical estimation, and did not use a Bayesian inference approach.

### Advantages of this approach

While the modelling details presented in this paper are more complex than existing approaches, it is likely that a net benefit is conferred to researchers. Firstly, the complexities of the model are not as relevant as the ease of using the software (see Section “Using the software” and online usage instructions). Secondly, the Bayesian approach taken here provides a range of non-trivial advantages.

First, the full joint posterior distribution over parameters is calculated which represents our degree of belief in these parameters having observed the data. Section “Rich forms of analysis” showed how this can provide greater understanding of our data and of our uncertainty in latent variables putatively underlying the participant’s behaviour.

Second, hierarchical Bayesian estimation simultaneously estimates trial-level responses, and participant- and group-level parameters. This has numerous benefits, most notably when it comes to providing additional prior knowledge (see Appendix B) and incorporating participant level uncertainty into hypothesis testing.

Third, the widely used and easily interpreted 1-parameter hyperbolic discount function is used to not only estimate a participant’s discount rate, but how that varies as a function of reward magnitude (the magnitude effect). The model is extendable to investigate alternative, or even multiple, discount functions.

Fourth, the psychometric function incorporates measurement errors (participant response errors). Effects of baseline error rates and comparison acuity can be separated which could be informative in comparing theoretical explanations of discounting behaviour. It could also provide a quantitative justification to exclude participants from a larger dataset on the basis of a high error rate. However this is optional, group-level inferences are less likely to be affected by such individuals as the error rate parameter can account for inconsistent responding rather than affecting more critical parameters related to discounting behaviour.

Fifth, the estimation procedure is flexible enough to utilise data obtained from a variety of different delay discounting protocols such as: studies with or without a front-end delay before the sooner reward, the Kirby 27-item test, fixed immediate reward protocol, fixed delayed reward protocol, and adaptive protocols.

### Conclusion

The probabilistic model (Fig. 3) underlying the data analysis approach described here, and the use of Bayesian inference, offers a number of advantages over traditional approaches to both estimating discount rates and subsequent hypothesis tests.

The proposed analysis can fit into traditional research workflows (see Fig. 2) by exporting best guess parameter estimates which have been derived with all the advantages of the Bayesian approach and the particular model proposed. This approach is most flexible in allowing for hypothesis tests in a wide range of research contexts. The approach also allows for a more fully Bayesian approach to hypothesis testing however, in that uncertainty at the participant level propagates to the group level where hypothesis tests are typically focussed. Examples have also been provided to show how this latter approach can be extended to more complex research questions (see Appendix B).

In conclusion, using this hierarchical Bayesian data analysis approach with the freely available software allows researchers from multiple disciplines to draw more robust research conclusions by making the best use of their prior knowledge and new delay discounting data.

## Acknowledgments

The author thanks Rebecca Morris and Vaidas Staugaitis for help with collecting data.