According to the RR hypothesis, decision-makers should choose a decision criterion that maximizes their expected RR. The specific shape of the RR optimal decision criterion depends on the structure of the task environment and the decision-maker’s knowledge of this structure. Here we will consider three factors that influence the shape of the RR optimal decision criterion. We already mentioned the role the dynamics of the decision environment play in determining the shape of the optimal decision criterion. In a task environment with constant total rewards and constant difficulties across trials, a decision criterion that remains constant throughout the decision process (i.e., a static decision criterion) will yield the maximal RR (Wald, 1945). On the other hand, if the decision environment is dynamic, either due to a variable task difficulty or due to a time-dependent total reward, a criterion that changes over the course of the decision process (i.e., a dynamic decision criterion) is optimal (Frazier & Yu, 2008; Rapoport & Burkheimer, 1971). Here we will focus on the effect variable sampling costs have on the shape of the optimal decision criterion as this allows for a relatively straightforward mathematical analysis. A discussion of the effect of variability in task difficulty can be found in Malhotra et al., (2018).
The second factor we will consider is the overall difficulty of the experimental task. Although we assume that task difficulty is constant, a higher overall task difficulty means that correct decisions require more observations. This introduces a trade-off between the time spent on a decision and the probability of earning a reward, which should be reflected in the shape of the RR optimal decision criterion.
The third factor we will consider is uncertainty about the structure of the decision environment. Uncertainty may concern several aspects of the experimental task, such as the rate parameters of the target and distractor stimulus, response deadlines, or the sampling costs the decision-maker has accrued at a given point in time. However, many sources of uncertainty can be controlled experimentally. Uncertainty about response deadlines and sampling costs, for instance, can be eliminated by explicitly displaying the remaining time and the accrued sampling costs. We will therefore focus on the effect uncertainty about the rate parameters of the target and distractor stimulus has on the shape of the RR optimal decision criterion.
Formal definition of reward rate
Reward rate can be generally defined as (Drugowitsch et al., 2012):
$$ RR = \frac{\langle R \rangle - \langle C(T_{d}) \rangle}{\langle T_{t} \rangle + \langle t_{i} \rangle + \langle t_{p} \rangle}, $$
(3)
where 〈⋅〉 indicates the average over choices, decision times, and values of ti and tp. 〈R〉 is the average reward for the final decision. 〈C(td)〉 denotes the average total sampling costs at decision time Td. These are the costs a decision-maker incurs by postponing the final decision by at least one time step to observe an additional sensory event. The sampling costs at each point during the decision process are given by the cost function c(t) and a decision-maker who gives a final decision after Td time steps will have to pay total sampling costs \(C(T_{d}) = {\sum }_{t=1}^{T_{d}} c(t)\).
The quantities in the denominator in Eq. 3 represent the effect of temporal discounting; rewards and sampling costs affect RR less strongly as they are accumulated over a longer period of time. 〈Tt〉 is the expected total duration of each trial, 〈ti〉 is the average inter-trial interval and 〈tp〉 is the average punishment delay imposed for incorrect responses. Note that this formulation of RR differentiates between the decision time Td and the total trial duration Tt; although the decision-maker’s accumulated sampling costs depend on Td, the trial might continue without further sampling costs for an additional period of time Tt − Td after the decision-maker has indicated a final decision.
In this general form, RR depends on a number of factors that complicate the derivation of the optimal decision criterion and are not an essential part of expanded judgment tasks. We will therefore introduce some simplifying assumptions that make the formulation more amenable to our theoretical analysis. First, we will assume that all trials have the same length Tt, independent of the decision-maker’s decision time Td, and that the inter-trial interval ti is fixed. Second, we will assume that there is no punishment delay tp associated with incorrect responses. With these simplifications in place, the denominator in Eq. 3 becomes a constant and decision-makers can maximize RR by maximizing the expected net rewards in the numerator.
Given the sequential sampling model and the structure of the experimental task with parameters 𝜃T and 𝜃D and a cost function c(t), the optimal decision criterion can now be derived using dynamic programming techniques (Bellman, 2003; Rapoport and Burkheimer, 1971). In what follows, we will first show how the RR optimal decision criterion is affected by sampling costs and task difficulty under the sequential sampling model outlined above, where it is assumed that the decision-maker has complete knowledge of the stochastic structure of the decision environment. We will subsequently modify our sequential sampling model to include uncertainty about the stochastic structure and show how this uncertainty affects the RR optimal decision criterion. Moreover, we will compare the RR optimal dynamic decision criterion to the best static decision criterion, which yields the highest RR among all possible static decision criteria.
Influence of sampling costs
We consider two different reward schemes and the optimal decision criteria they imply. Both reward schemes have in common that the decision-maker receives a constant reward of 1000 points for correct decisions and a constant penalty of -500 points for incorrect decisions. In addition, the decision-maker incurs sampling costs every time the final decision is postponed by one time step. Under the first reward scheme, additional observations become more expensive as time passes, that is, sampling costs increase. Under the second reward scheme, additional observations become cheaper as time passes, that is, sampling costs decrease. We implement these two reward schemes using a logistic cost function that we parameterize so that, over the course of 30 observations, the total sampling costs accrue to 500 points. Together with the fixed rewards and penalties for correct and incorrect decisions, this choice of the cost function implies that, after 30 observations, the expected reward for guessing is 0 points. We will furthermore assume that the decision-maker has to commit to a final decision after 30 time steps and not deciding will result in a penalty of -1000 points (i.e., the total sampling costs for 30 time steps plus the penalty for an incorrect response). For the increasing costs case the cost function is:
$$ c(t)=\frac{74.92217}{1+e^{3-(t/10)}} $$
(4)
and the function for the decreasing costs case is obtained by replacing t by 31 − t, which means that the function is traversed in the opposite direction. Our choice of the logistic cost function was based on a specific experimental setup in which the accumulated sampling costs were displayed in real time. Sampling costs had to grow non-linearly for decision-makers to be able to clearly identify changes in the speed at which sampling costs grew at the start of a trial compared to the end of a trial. Nevertheless, as the argument below shows, a large class of monotonically increasing or decreasing cost functions will lead to qualitatively similar results.
We will focus on an intuitive account of the different effects of the two cost functions on the optimal dynamic decision criterion here. A formal description of the dynamic programming techniques used to derive the optimal decision criterion is presented elsewhere (e.g., DeGroot (1969) and Rapoport and Burkheimer (1971)). Figure 1 shows the cost function (top panels) and optimal dynamic decision criteria (solid lines, bottom panels) for the increasing cost case (left) and the decreasing cost case (right) with 𝜃T = 0.38 and 𝜃D = 0.24. The jagged appearance of the decision criteria is due to the discrete time steps and evidence units in the experimental task considered here. Decision-makers update their posterior beliefs after each new observation, which are presented at fixed time intervals. Moreover, because the number of possible observations at any time is finite, the posterior belief is updated in discrete steps.
As can be seen in the bottom left panel of Fig. 1, increasing sampling costs lead to a dynamic decision criterion that collapses quickly toward 0 as time passes. This result can be intuitively understood in terms of a trade-off between the chances of making a correct decision and the mounting costs of waiting. Assuming that the left stimulus is indeed the target (i.e., \(\mathcal {H}_{l}\) is true), as the decision-maker waits longer to make a final decision, the posterior probability for \(\mathcal {H}_{l}\) will slowly increase. Therefore, the expected reward, which is 1000 ⋅ π(t) − 500 ⋅ (1 − π(t)), will also slowly increase. However, at the same time the total sampling costs increase at an ever higher rate, thus increasingly offsetting the small gains in expected reward as time passes. Consequently, the decision-maker stands to gain less and less from a correct decision but risks losing more and more for an incorrect decision, and should therefore become increasingly willing to risk an incorrect decision while it is still relatively cheap.
Decreasing sampling costs, on the other hand, lead to a non-monotonic dynamic decision criterion that increases as time passes but eventually collapses toward 0 at the decision deadline (see bottom right panel of Fig. 1). This result can again be understood in terms of a trade-off between the chances of making a correct decision and the costs of waiting. As the decision-maker gathers more observations from the stimuli, the posterior probability for \(\mathcal {H}_{l}\) increases, and so does the expected reward. Although the total sampling costs also increase, they do so at a decreasing rate. Consequently, the increase in expected reward increasingly dominates the trade-off and the decision-maker should become increasingly willing to risk a tiny additional loss for an incorrect decision while losing relatively little of the increase in expected reward by waiting for an additional time step.
The solid gray lines in the bottom panels of Fig. 1 show the RR optimal static decision criteria for the two reward schemes. As can be seen, the best static decision criterion in the increasing costs case (left panel) intersects the optimal dynamic decision criterion after about half of the available time for the decision process and subsequently stays above the optimal criterion. This might suggest that a static decision criterion leads the decision-maker to wait too long before committing to a final decision, thus losing expected rewards due to staggering sampling costs. In the decreasing costs case (right panel), the best static decision criterion intersects the optimal criterion repeatedly and the differences between the two criteria appear to be relatively small except at the time of the decision deadline when the optimal criterion collapses to 0.5. In this case, the decision-maker will tend to commit to a final decision at similar times and evidence values under the static decision criterion as under the optimal criterion. The decision performance before the deadline might, therefore, be expected to be near-optimal even under a static decision criterion. However, for both reward schemes the best static decision criterion remains at a high value at the time of the decision deadline and will therefore incur certain loss if the posterior probability has not reached the decision criterion. The optimal dynamic decision criteria, on the other hand, collapse towards 0.5 before the decision deadline, which avoids certain loss due to the penalty for a late response. These qualitative considerations suggest that differences between static and dynamic decision criteria might only result in markedly different RRs in the case of increasing sampling costs or if task difficulty is very high.
Influence of task difficulty
As described above, our decision environment is characterized by the two rate parameters 𝜃T and 𝜃D that determine the likelihood functions λi(x) under the two competing hypotheses. The decision-maker uses the observed stimulus events to update the belief π(t) about which stimulus is the target. However, not all stimulus events provide discriminating information; if either both stimuli flash or neither flashes, the posterior probability remains unchanged. Discriminating information in favor of the correct hypothesis is observed if only the target stimulus flashes but the distractor stimulus does not flash. This occurs with probability p = 𝜃T(1 − 𝜃D). Discriminating information against the correct hypothsesis is observed if only the distractor stimulus flashes but not the target stimulus, which occurs with probability q = 𝜃D(1 − 𝜃T). Hence, task difficulty can be conceptualized as the difference between the probability p of observing veridical information and the probability q of observing misguiding information. We will use this conceptualization in terms of the probabilities p and q to investigate the influence of task difficulty on the shape of the RR optimal decision criterion.Footnote 1
Panel a of Fig. 2 shows the parameter space for our decision environment. The gray shaded area represents the set of possible values of p and q. To investigate the influence of task difficulty on the shape of the RR optimal decision criterion, we sampled 201 pairs of values (p,q) and computed the optimal decision criterion for the reward schemes with increasing and decreasing sampling costs. As the qualitative patterns only depend on the difference between p and q but not on the specific values of the two parameters, we only discuss the results for a fixed value of q and a representative set of four values of p, which are indicated by red circles in panel a. We will return to full set of 201 (p,q) pairs when we compare the expected RR under the optimal decision criterion to the expected RR under a suboptimal static decision criterion below.
Panel b of Fig. 2 shows the RR optimal decision criterion for different task difficulties. The dotted lines show the optimal criterion if the decision-maker knows the rate parameters 𝜃T and 𝜃D exactly. The top row shows the results for the case of increasing sampling costs, with task difficulty decreasing from left to right. As can be seen, in line with the results established in the previous section, increasing sampling costs induce a monotonically decreasing optimal decision criterion, irrespective of the task difficulty. However, the overall height of the decision criterion is lower for higher task difficulties. This result can be understood intuitively as follows. Because discriminating information (i.e., only one of the stimuli flashes) is observed less frequently in a high difficulty task, decision-makers need to acquire more observations to reach a given value of π(t). At the same time, expected RR decreases as average decision times become longer. Hence, to maintain an acceptable balance between average decision times and expected rewards, decision-makers need to sacrifice some of the expected rewards by accepting a lower value of π(t) at decision commitment. In the most extreme case, this might result in a premature collapse of the decision criterion to 0.5, as shown in the leftmost panel. In this case of an extremely high task difficulty, at some point during the decision process the decision-maker stands to gain fewer rewards from an additional observation than is incurred in sampling costs by postponing the final decision by one step, so the optimal strategy is to guess.
The bottom row of panel B shows the results for the case of decreasing sampling costs. In line with the results established in the previous section, decreasing sampling costs induce a non-monotonic decision criterion that increases at first but eventually collapses to 0.5 at the time of the decision deadline, irrespective of the task difficulty. Similar to the case of increasing sampling costs, a higher task difficulty results in a lower overall setting of the RR optimal decision criterion. In the case of an extremely high task difficulty, shown in the leftmost panel, the decision criterion might have an initial value of 0.5 so that the optimal strategy is to guess immediately. Due to the high task difficulty, even modest gains in expected rewards require several observations. At the same time, sampling costs are high initially and might therefore outweigh these modest gains in expected rewards.
Influence of uncertainty about rate parameters
We account for a decision-maker’s uncertainty about the rate parameters by replacing the exact rates 𝜃T and 𝜃D in our sequential sampling model by probability distributions. In particular, we describe the uncertainty about the target and distractor rate by beta distributions with parameters αT and βT, and αD and βD, respectively:
$$ \theta_{T} \sim B(\alpha_{T}, \beta_{T}), \quad \theta_{D} \sim B(\alpha_{D}, \beta_{D}). $$
(5)
The beta distribution is the natural choice for expressing uncertainty about a binomial rate. If αi = βi = 1, i ∈{T,D}, the beta distribution is uniform over [0,1], which means that all values for the rates are equally likely. If the distribution parameters are set to positive integer values, the resulting distribution is the posterior distribution a decision-maker obtains from a uniform prior distribution after a total of αi + βi observations, of which αi were positive events (i.e., stimulus i flashed αi times) and βi were negative events (i.e., stimulus i did not flash during the remaining βi observations). We will model different levels of uncertainty about the rate parameters of the experimental task by setting αi = 𝜃iK and βi = (1 − 𝜃i)K, where K is a positive integer.Footnote 2 The resulting distribution has its mode at the true rate and has its mass more concentrated around the mode for larger values of K; this distribution can be interpreted as the posterior distribution obtained from K observations of stimulus i. We will symbolically write K = ∞ for the case where the rate parameters are known exactly.
Due to uncertainty about the rate parameters, the likelihood of any particular type of observation for the target and the distractor stimulus depends on the plausibility of different values of 𝜃T and 𝜃D. The decision-maker can account for this uncertainty by marginalizing over all possible values for the rate parameters. The updating rule for the decision-maker’s belief about \(\mathcal {H}_{l}\), the probability of the left stimulus being the target, now is:
$$ \pi(t) = \frac{\pi(t - 1){{\int}_{0}^{1}}{{\int}_{0}^{1}} f_{\alpha_{T},\beta_{T}}(u)f_{\alpha_{D},\beta_{D}}(v)\lambda_{l}(x_{t},u,v)\mathrm{d}u\mathrm{d}v} {\left( \begin{array}{l} \!\pi(t - 1){{\int}_{0}^{1}}{{\int}_{0}^{1}} f_{\alpha_{T},\beta_{T}}(u)f_{\alpha_{D},\beta_{D}}(v)\lambda_{l}(x_{t},u,v)\mathrm{d}u\mathrm{d}v\\ ~~ +(1 - \pi(t - 1)){{\int}_{0}^{1}}{{\int}_{0}^{1}} f_{\alpha_{T},\beta_{T}}(u)f_{\alpha_{D},\beta_{D}}(v)\lambda_{r}(x_{t}, u, v)\mathrm{d}u\mathrm{d}v \end{array} \right)}. $$
(6)
Here \(f_{\alpha _{j},\beta _{j}}\), j ∈{T,D}, denotes the probability density function of the beta distribution, and λi(xt,u,v) is the likelihood of xt under \(\mathcal {H}_{i}\) given that 𝜃T = u and 𝜃D = v. A consequence of the marginalization over 𝜃T and 𝜃D is that observations that do not directly discriminate between the target and distractor stimulus nevertheless change the decision-maker’s posterior belief π(t). If a positive event is observed for both stimuli (i.e., both stimuli flash), for instance, the decision-maker updates the beliefs about the two rate parameters, shifting the mass of the two beta distributions to higher values. This, in turn, may give higher or lower a posteriori plausibility to \(\mathcal {H}_{l}\), depending on the decision-maker’s prior beliefs about 𝜃T and 𝜃D.
We investigated the influence of uncertainty about the rate parameters on the RR optimal decision criterion for the 201 (p,q) pairs shown in panel a of Fig. 2. Panel b shows the comparison of the RR optimal decision criterion for different levels of uncertainty for a fixed values of q and four representative values of p. Dotted lines show the optimal decision criterion when K = ∞, that is, when the rate parameters are known exactly, and solid lines show the optimal criterion for different levels of uncertainty, orange for K = 10000, light blue for K = 1000, and dark blue for K = 100. As can be seen, for lower levels of uncertainty the optimal criterion quickly approaches the criterion when rates are known exactly, and for K = 1000 and K = 10000 the optimal criterion is visually indistinguishable from the optimal criterion when rates are known exactly. Moreover, the qualitative patterns, even under high uncertainty (i.e., K = 100), match the qualitative patterns described in the preceding section for the case where rates are known exactly. We will illustrate the effect uncertainty has on the shape of the RR optimal decision criterion in more detail for the case 𝜃T = 0.38 and 𝜃D = 0.24, and discuss how the RR optimal dynamic decision criterion compares to the best static decision criterion.
Figure 3 shows how the RR optimal decision criterion changes as uncertainty about the rate parameters increases. The top row of plots shows the prior distributions on 𝜃T and 𝜃D for different values of K. For K = ∞ the prior distributions are point masses at the true values of the rate parameters. As K decreases from left to right, the overlap between the prior distributions for the two rate parameters increases, which means that the two hypotheses, \(\mathcal {H}_{l}\) and \(\mathcal {H}_{r}\), assign similar likelihood to different types of observations, and are thus harder to discriminate.
The middle row of plots shows the optimal dynamic and static decision criteria for the case of increasing sampling costs. The RR optimal dynamic decision criterion, shown as solid black lines, has the same shape for different levels of uncertainty but collapses more quickly as uncertainty increases from left to right. This quicker collapse is due to the increasing overlap between the prior distributions for 𝜃T and 𝜃D with increasing uncertainty, which causes discriminating information between the two hypotheses to accumulate more slowly. To compensate for this increase in expected decision time and accompanying higher cumulative sampling costs, the decision-maker needs to accept a higher probability of an incorrect decision. The best static decision criterion, shown as solid gray lines, is also set to lower values as uncertainty increases from left to right. Compared to the optimal dynamic decision criterion, the best static decision criterion is set to considerably lower values for the largest part of the decision process but has the same height as the optimal dynamic criterion close to the decision deadline. This might suggest that the best static criterion should result in a larger number of incorrect decisions than the optimal dynamic criterion, and might therefore yield a lower expected reward rate. We will revisit this prediction in the next section.
Finally, the bottom row of plots in Fig. 3 shows the case of decreasing sampling costs. The RR optimal dynamic decision criterion, shown as solid black lines, has a qualitatively similar shape across levels of uncertainty, although it is somewhat less smooth if uncertainty is very high (i.e., K = 100). Similar to the results for the case of increasing sampling costs, the optimal dynamic decision criterion is set to overall lower values if uncertainty is high, which is again due to the slower accumulation of discriminating information. The best static decision criterion, shown as solid gray lines, is also set to lower values as uncertainty increases from left to right. Similar to the case of increasing sampling costs, the best static decision criterion is set to overall lower values than the optimal dynamic decision criterion. In the next section we will investigate how the expected RR compares under the optimal dynamic and under the best static decision criterion, and how this relationship depends on sampling costs, task difficulty, and uncertainty about the rate parameters. As the qualitative results in this section were similar for low levels of uncertainty, we will only consider the cases K = 1000 and K = 100.
To summarize, in the preceding sections we investigated the influence of sampling costs, task difficulty and uncertainty on the shape of the RR optimal decision criterion and on the setting of the best static decision criterion. The results of this analysis show that, first, the main determinant of the shape of the RR optimal decision criterion (i.e., collapsing or expanding) are sampling costs, and, second, task difficulty and uncertainty determine the overall height of the decision criterion but have only a negligible effect on the shape of the decision criterion. Moreover, the best static decision criterion is generally set to a lower value than the optimal dynamic decision criterion, and higher task difficulty and uncertainty result in a lower value of the best static decision criterion.