Thinking Inside the Bounds: Improved Error Distributions for Indifference Point Data Analysis and Simulation Via Beta Regression using Common Discounting Functions

Standard nonlinear regression is commonly used when modeling indifference points due to its ability to closely follow observed data, resulting in a good model fit. However, standard nonlinear regression currently lacks a reasonable distribution-based framework for indifference points, which limits its ability to adequately describe the inherent variability in the data. Software commonly assumes data follow a normal distribution with constant variance. However, typical indifference points do not follow a normal distribution or exhibit constant variance. To address these limitations, this paper introduces a class of nonlinear beta regression models that offers excellent fit to discounting data and enhances simulation-based approaches. This beta regression model can accommodate popular discounting functions. This work proposes three specific advances. First, our model automatically captures non-constant variance as a function of delay. Second, our model improves simulation-based approaches since it obeys the natural boundaries of observable data, unlike the ordinary assumption of normal residuals and constant variance. Finally, we introduce a scale-location-truncation trick that allows beta regression to accommodate observed values of 0 and 1. A comparison between beta regression and standard nonlinear regression reveals close agreement in the estimated discounting rate k obtained from both methods.


Introduction
Delay discounting refers to the phenomenon where the subjective value of a reward diminishes as the delay to its receipt increases.Individuals with high impulsivity tend to exhibit a stronger inclination towards immediate gratification, demonstrating higher rates of delay discounting.In contrast, individuals with low impulsivity demonstrate a greater ability to exercise self-control and opt for delayed, larger rewards.Therefore, delay discounting serves as an essential behavior analytic marker, offering insights into a range of problematic behaviors including drug dependence, gambling dependence, and obesity.More details about delay discounting related with problematic behaviors can be found in [Bickel et al., 2019].The purpose of this paper is to suggest a modeling approach that more carefully characterizes variability and obeys natural boundaries of delay discounting indifference points.
An indifference point at a given delay describes the proportion of the larger later reward that an individual is willing to accept to receive reward sooner.For example, if someone would be just as happy with $87 today as $100 in a week, the indifference point at a delay of one week is 0.87.
We focus on discounting tasks where the upper bound A, (i.e. the larger later amount, $100 in this example) for indifference points is pre-set, so observed indifference points can be rescaled to be bounded bounded between 0 and 1 simply by dividing by A. We suggest a nonlinear beta regression approach that can be applied to common discounting functions.Further, we extend beta regression model by developing a Scale Location Truncated (SLT) beta regression model to accommodate 0 and 1 values, which cannot easily be incorporated with typical beta regression approaches.
The hyperbolic discounting function [Mazur, 1987] is a widely used model for characterizing choice behavior in delay discounting studies.It can be expressed as where D is delay, k is the unknown discounting rate parameter to be estimated using data, y is indifference point, and E(y) indicates the expected value of Y , i.e, the value of the regression line for k and D. The left panel of Figure 1 shows an example of a series of indifference points with the hyperbolic discounting function plotted.The nonlinear least squares (NLS) method [Bates and Watts, 1988] is commonly used to fit the hyperbolic discounting function.NLS estimates value of k as the value that is the closest to the observed data by minimizing the sum of squared residuals.However, NLS does not assume a specific probability distribution for observed data, and thus alone NLS does not prescribe a specific way to characterize variability in the data.In statistics, capturing variability inherent in the observed data is important because it enables accurate estimation and prediction.In addition, while a common assumption in regression modeling is that residual error follows a normal distribution, this assumption does not hold for delay discounting.Data simulation based on real world delay discounting data [Jarvis et al., 2019] shows this clearly.When simulating data with normal residuals, 19.28% of the indifference points were invalid, and in 76.3% of cases, at least one subject had an invalid point when conducting 1000 Monte Carlo simulations.By contrast, simulating data from the beta distribution did not result in any invalid points.More detail of this Simulation can be found in Section 2.4 and Figure 7 in Section 3.
Figure 1 shows the hyperbolic discounting function fit for subjects 4 and 56 from [Jarvis et al., 2019].The left panel of Figure 1  By contrast, the beta regression model [Ferrari and Cribari-Neto, 2004] is based on the beta distribution, which describes data on a range between 0 and 1.This coincides with the range of the indifference points.The beta distribution has two shape parameters which make the distribution flexible.Like the normal curve, the beta distribution has a curve that expresses probabilities in terms of area under the curve.In general, the normal curve and the beta curve (Figure 2) are examples of probability density functions (PDF).Thus, the PDF of the beta distribution describes the behavior of the data, allowing users to compute the probabilities of various outcomes.In this case, indifference points are the outcomes.This means that beta regression can be widely used in various fields where response variable is continuous and in bounded between 0 and 1.
There are some existing implementations of beta regression.The R package betareg [Cribari-Neto and Zeileis, 2010] fits the linear beta regression model.The limitations of this method are that (i) the response variable can not include 0 or 1 while those are possible values in some data sets (including in the left panel of Figure 1).In addition, (ii) this package only fits linear relationships and thus it is not applicable to common nonlinear discounting functions.Another approach for beta regression to model nonlinear functions is in [de Brito Trindade et al., 2021].
This approach also does not accommodate 0 or 1 for the response variable.
Some transformation methods have been suggested for cases where beta regression is desired but the observed data contains values of 0 or 1.One such method is to transform data using  , 2006].However, this is not a model-based approach, and the choice of transformation for zeroes in particular can be influential since the log function ascends so rapidly close to 0. Liu and Kong [2015] suggested zero/one inflated beta regression (ZOIB).The ZOIB consists of three segmented beta regression models that address response values at 0, 1, and values between 0 and 1.This mixture model-based approach may not be effective when there are a limited number of point mass values at 0 and 1 as we might expect in a typical discounting analysis.This is because mixture model requires relatively large number of observations.If sample size is not big enough for both the endpoints and the interior of the space, it is hard to reliably estimate the parameters.Berk et al. [2021] recognized the same problem with non-normal indifference points but they use a different approach from ours.While we use hyperbolic discounting function which is nonlinear form, they transform the hyperbolic discounting function to a linear form.Thus, their response variable encompasses values ranging from zero upwards, with no upper bound which is different from our response range.
In this paper, we propose the beta distribution as a probability model for indifference points, which correctly adheres to the [0,1] range for delay discounting data.The de-identified data and code used for the analyses in this paper can be found as online supplemental materials accompanying this paper and at the following link: https://bitbucket.org/paper-code-kmg/slt_beta/src/main/.This section covers the beta distribution, beta nonlinear regression, and SLT beta regression.
First we explore the beta density function, reparameterization, and log-likelihood function, and the reason that standard beta regression cannot accommodate 0 and 1 values.We will also develop a better understanding about building nonlinear beta regression models.We develop SLT beta regression and show how it can accomodate indifference points with values of 0 and 1.
While other common discounting functions can be used, we focus on beta nonlinear regression using the hyperbolic discounting function.Finally, we describe simulation methods based on real world data [Jarvis et al., 2019].
The data we analyze is from Jarvis et al. [2019].The data include 146 participants with 34 individuals that have at least one 0 or 1 indifference point.Participants are a subset of participants analyzed in the original study.The collection of discounting data in that study was an add-on procedure that started after the main study was already ongoing, so not all participants completed the task, and those who did complete the task did not necessarily do so at all time points.Among the 146 participants, we used 126 subjects who passed Johnson-Bickel Criteria [Johnson and Bickel, 2008] to identify systematic delay discounters.

Beta distribution
The beta curve, i.e, PDF is typically expressed as where shape parameters α, β > 0 and Γ(•) is the gamma function.the beta density equals 0 when the range is less than or equal to 0 or greater than or equal to 1. On the other hand, PDF of normal distribution is bell-shaped, can vary in both mean and dispersion, and has positive values even beyond 0 and 1.
Figure 2: PDF plot of normal distribution (left) and PDF of beta distribution (right).In the left panel of the plot, the white region represents the range of indifference points, spanning values from 0 to 1, while the grey area represents the values outside of [0, 1].The right panel shows that the beta distribution's PDF exhibits a variety of shapes contingent upon the values of its shape parameters, α and β, and is constrained to be between zero and one.
Ferrari and Cribari-Neto [2004] proposed a different parameterization for the beta density that better accomodates regression problems.This reparameterization rewrites the beta density in terms of mean parameter µ and scale parameter ϕ.The mean µ is set equal to the value of the hyperbolic discounting function (or other discounting function preferForestGreen by the user. The scale parameter ϕ accommodates non-constant variance in indifference points as a function of delay.For the ordinary normal case, variance of y equals σ 2 regardless of delay and beta regression.However, in the beta distribution, variance of y equals to µ(1 − µ)/(1 + ϕ), which varies depending on delay.

Scale Location Truncated beta regression (SLT beta regression)
By default, the beta distribution cannot accommodate values at 0 or 1.However, values of 0 and 1 may appear among indifference points.While many common adjusting procedures do not produce 0 or 1 values, there are several cases in which these extreme values can show up in the data, and in some cases the experimental question may require us to establish whether indifference points are at 0 or 1, not just "close" to 0 or 1.In order to give the beta density mass

Nonlinear Beta Regression
We next pair the hyperbolic discounting function [Mazur, 1987] with our SLT beta regression approach.Recall that Equation (1) is the hyperbolic discounting function.To complete the specification of our SLT Beta regression model, consider the response variable y ij as the indifference point for ith subject and jth delay.Let y ij be a random sample which follows The mean of the distribution µ ij depends on both the subject and the delay, and follows the hyperbolic discounting function.Thus, where the D j variable is jth delay, and we estimate discounting rate k i for participant i using observed indifference point data.Instead of k i , we typically directly estimate As previously mentioned, the PDF and hence log-likelihood of the beta distribution cannot accommodate 0 and 1.Thus, we use our SLT beta regression approach for estimating ψ and ϕ.This is the log-likelihood for the SLT Beta regression model: and details of its derivation can be found in the Appendix.
In Equation ( 9), we introduce a reparameterization of the response variable, denoted as g.
The variable y ij represents the original form of the response variable, while g serves as a reparameterized version used for mathematical clarity.This reparameterization helps differentiate the response variable from other variables, such as z, introduced in Equation ( 6).Despite being represented differently, g and y ij fundamentally refer to the same response variable.
By maximizing L(y ij , ψ i , ϕ i ), we can estimate ψ i and ϕ i for each participant.In other words, using maximum likelihood estimation, we select values of ψ i and ϕ i that make observed data most likely to have occurForestGreen for each individual series of data.An area of future research would be to extend the SLT beta regression approach for hierarchical modeling.

Reanalysis of human discounting data
We use human discounting data [Jarvis et al., 2019] to assess the utility of the proposed SLT Beta regression approach with the following strategy.First, we conduct similar analyses as shown in Figure 1, showing the SLT beta regression model fit and density curves within the space of indifference points to establish adequare model fit and distribution that obeys the [0,1] bounds.
We then compare estimates of variability between NLS and SLT beta regression approaches to demonstrate the ability of SLT beta regression to model non-constant variance.We show that the empirical pattern of variability in indifference points as a function of delay matches the pattern shown using SLT beta regression.We show that estimates of ln(k) are highly correlated between NLS and SLT beta regression, indicating high agreement the estimation of discounting rate between the methods.We plot model fits alongside data for several participants to show similarity in the regression lines produced by NLS and SLT Beta regression.Finally, we store estimates of ln(k) and variance for use in our simulation study.

Simulation
We conducted a simulation based on real world data [Jarvis et al., 2019].Following the conventions of the Statistics literature, we denote the estimator of k with k.The simulation's algorithm is structuForestGreen as follows.

Simulation from Normal distribution
1. Fit each subjects data to the hyperbolic discounting function using NLS, the current standard approach.Store estimated values of ln( k) and variance σ2 .
2. Based on the estimated k from NLS, we simulate the jth subject's indifference points from a normal distribution with parameters µ ij and σ 2 j that are set equal to empirical estimates from the [Jarvis et al., 2019] data.In other words, simulated data reflects observed regression lines and variance in data, but residuals follow a normal distribution with constant variance, i.e., the typical assumptions for NLS modeling.Thus: where D j stands for the jth delay, d is the number of delays,.and knls,i is the estimated k from NLS of the ith subject.

Simulation from Beta distribution
1. Fit each participants' data to the hyperbolic discounting function using SLT beta regression.
2. Based on the estimated ki and φi from beta regression, we simulate the ith subject's indifference points from the beta distribution with parameters α ij and β ij as shown in Equation ( 2).Thus, where D j stands for jth delay point and d is the number of delay points.kbeta,i is estimate k from NLS of ith subject and ϕ i is the estimated ϕ of ith subject.

Monte Carlo simulation
Monte Carlo simulation employs repeated sampling to assess the properties of specific phenomena.In this paper, we use Monte Carlo simulation to learn the percentage of invalid indifferent points outside of the [0,1] bounds generated by the normal distribution and beta distribution strategies described above.We use 1000 as a replication number.While it may be obvious to some readers that the SLT beta regression approach cannot generate data outside [0, 1], we include a Monte Carlo study to empirically validate that our SLT beta regression model correctly obeys the [0, 1] bounds.We demonstrate that the SLT beta regression approach does not produce invalid out-of-bounds data, and that simulating normal residuals does produce invalid data.Crucially, this indicates that SLT beta regression model more closely reflects empirical data and thus is the more suitable method when using Monte Carlo simulation-based approaches for the study of delay discounting.
The process of Monte Carlo simulation is as follows.
1. Simulate the data as described above from normal and SLT beta distributions.Replicate the data generation 1000 times using the inputs from the Jarvis et al. [2019] data.
2. Examine the simulated indifference points for each subject to determine whether the values exceed 1 or fall below 0. Calculate the percentage of simulated data points exceeding 1 and those falling below 0 for each subject and at various delay points.Summarize overall percentage of invalid points and percentage of invalid points by delay.Compute the percentage of participants with at least one invalid indifference point.

Results
The left panel of Figure 3 shows the model fit of participant 4 from Jarvis et al. [2019].The right panel of Figure 3 shows how the SLT beta density is properly constrained in 0 and 1 when modeling indifference points with the hyperbolic discounting function.By contrast, the normal PDF in Figure 1 gives positive probability for values over 1 and below 0. The left panel of Figure 4 shows variance among subjects at each delay.This plot shows that at low and high delays, variance is lower, and in the middle of the delay range, variance is higher.This is because low delays tend to elicit little discounting and there is an upper bound on indifference points at 1. Similarly, for very large delays, individuals tend to have indifference points near the boundary of 0 where there is little "wiggle room" and thus lower variance.By contrast, indifference points in the middle of the range are free to vary without bumping into any boundaries which is why empirically we tend to see larger variances here.The right panel of Figure 4 shows the estimated variance at each delay according to the SLT beta regression fit.On the right plot, while model estimated variance of SLT beta at each delay shows similar pattern with variance of subjects at fixed delay, model estimated variance from NLS has common variance at all delays.Importantly, the NLS with constant variance approach tends to produce an estimate of variance that is larger than most of the delay-specific variances that can be accommodated by the SLT beta regression approach.NLS misses the pattern of variability inherent in the data.SLT beta regression at each delay and variance from NLS (right panel).Delays are graphed as equally spaced for ease of visualization.
In Figure 5, we can see that points are closely located around the the line of equality (y=x).
This demonstrates that the estimated values of ln( k) obtained from NLS and SLT beta regression are remarkably similar.Table 1 demonstrates the similarity between the distributions of ln(k) obtained from NLS and SLT beta regression.method Min Q1 Median Q3 Max Mean SD NLS ln( k) -9.Table 1: Summary statistics of ln(k) from NLS and SLT beta regression for subjects which satisfy the Johnson-Bickel criteria [Johnson and Bickel, 2008].Q1 indicates 25 percentile and Q3 indicate 75 percentile.
Figure 6 shows model fits from NLS and SLT beta regression for indicated subjects.The four plots show that their fits are very close various in different settings.We also simulated data from the normal distribution and beta distribution based on data from Jarvis et al. [2019].Figure 7 shows an example data set that shows that simulation from the normal distribution gives invalid values, while the beta distribution gives valid data.The right panel of Figure 8 shows the Monte Carlo simulation result.This shows the proportion of invalid previous discounting data when estimating k is the only goal.The agreement between estimates of ln(k) between NLS and SLT beta regression indicate that our methodology does not call previous analyses of discounting rate into question; rather we hope we have added the ability to more carefully characterize variability and simulate more realistic data in future studies.
We note that the previous analyses does not consider standard beta regression, which has its limitation that it cannot estimate k values when there exist a subject which has 0 or 1 indifference point.For those who are interested, we include a comparison of "vanilla" beta regression and SLT beta regression in the Appendix.Of course, this analysis only includes the subset of particiopants who do not exhibit 0s or 1s in their data.As shown in the Figure 8 in Appendix, the PDF of beta distribution and SLT beta distribution looks identical to the naked eye.Therefore, SLT beta regression has advantages of beta regression and does not have limitation that beta regression has; it can estimate k values when the subjects has 0 or 1 values for indifference points.
We also illustrated simulated data in Figure 7.This an example underscores the superiority of using beta model over standard normal approach.Since beta distribution is bounded between 0 and 1 (including 0 and 1), simulation from beta model always generate more human-like data.
When we refer to human-like data, we imply that it falls within the range from 0 to 1 [0, 1].This is because the questionnaire was intentionally crafted to prevent values exceeding 1 or going below 0. This method shows similar performance with NLS regression and has advantage in terms of simulations.
The proposed SLT beta regression approach does have some limitations and opportunities for future research.The beta regression models described in this paper target the analysis of delay discounting data that can be sensibly normalized to the [0,1] interval.Tasks that have a fixed upper amount meet this criterion.However, tasks that have indifference points on the [0, ∞) range, such as those arising from certain cross-commodity discounting procedures, would not seem to fit sensibly into the methods we propose.Another opportunity for future research is to use the ability of SLT beta regression to quantify non-constant variance to further study patterns of variability in delay discounting data.Finally, an additional future research goal is to fully deploy SLT beta regression in a hierarchical model that simultaneously estimates fixed and random effects.While PDF of beta is 0 in when x-axis is 0 and 1, SLT beta distribution has bigger value of PDF than 0.

Standard beta regression
For the reader who is curious about the standard (i.e., non-SLT) beta regression, Figures 9 and 10 show results similar to those presented in Section 3. Since standard beta regression cannot accommodate 0 and 1 indifference point values, we selected participants who pass Johnson-Bickel Criteria and whose indifference points do not include 0 or 1.
In Figure 9, the model estimated variance from beta and SLT beta regression are very similar.
is the hyperbolic discounting function fit for subject 4 .We can see that the indifference points range from [0, 1].In this paper,[a, b]  indicates a bounded range between a and b, including a and b.The right panel of Figure1shows the normal distribution curve for each delay.The normal distribution curves are centered at the regression line, and variance is estimated from the model fit, i.e., how close the observed indifference points are to the regression line.From the plot, we can see that the normal curves are not bounded and they stretch past the [0,1] boundaries.This indicates that the actual distribution of residual error for delay discounting analysis is not well characterized by the normal distribution.

Figure 1 :
Figure 1: Indifference points for participant 4 [Jarvis et al., 2019] and NLS hyperbolic discounting function fit to the data (left panel), and normal distribution curve at each delay for participant 56 (right panel).Variance in normal curves is estimated on the basis of this model fit.The right panel of the plot shows the the problem with using the normal distribution to describe variance for indifference point data.The normal distribution includes probability mass for data outside of 0 and 1, which is invalid.The dotted lines in the right panel indicate invalid values.

Figure 2
Figure 2 illustrates examples of normal and beta PDFs.The normal PDF exhibits different patterns based on its mean parameter µ and standard deviation σ.The beta PDF exhibits different patterns based on the values of the shape parameters α and β.However, in all scenarios, on 0 and 1 endpoint values, we propose a scale-location truncation (SLT) strategy.Truncating a density is typically used for restricting values to a specific range.An interesting property of the truncated beta density is that it takes positive values on their endpoints unlike the usual beta density.By using truncation on the beta density, we have developed the SLT beta density function which has mass on 0 and 1, unlike the usual beta density function.This is shown in the right panel of Figure8in the Appendix.Thus, our SLT beta density function can be used to form a likelihood function, which can in turn be maximized to furnish estimates of discounting rate and variance, even when there are 0 and/or 1 values among the indifference points.The technical details of the SLT beta density function can be found in the Appendix.

Figure 3 :
Figure3: Indifference points and SLT beta regression fit to the data from participant 4 (left panel), and the PDF of the SLT beta distribution at each delay for subject 56 (right panel).In the right plot, unlike normal distribution case (Figure1) , the SLT beta distribution has positive probability between 0 and 1.

Figure 4 :
Figure4: Variance among subjects at each delay (left panel).Box plot of model estimated variance from SLT beta regression at each delay and variance from NLS (right panel).Delays are graphed as equally spaced for ease of visualization.

Figure 5 :
Figure 5: Scatter plot of ln( k).Horizontal axis is ln( k) from NLS and vertical axis is ln( k) from SLT beta regression.Red line indicates y=x.

Figure 8 :
Figure8: Left panel is the probability density function (PDF) of SLT beta and beta distribution.We can two density functions looks very close to each other.The right panel shows the region near 0 and 1.While PDF of beta is 0 in when x-axis is 0 and 1, SLT beta distribution has bigger value of PDF than 0.

Figure 9 Figure 9 :Figure 10 :
Figure 9 is similarto Figure 4 but the estimated variance from beta regression is added.In addition, to make the estimated k value comparison clearer, we have added the Figure 10.In the left panel of the plot, it similar to Figure 5 but instead of comparing the estimated ln( k) of SLT beta and NLS, we compared that of beta and NLS.In the right panel of Figure 10 , We compared the estimated ln( k) of SLT beta and beta regression, and they are remarkably close.