Paradoxical evidence weighting in confidence judgments for detection and discrimination

Mazor, Matan; Maimon-Mor, Roni O.; Charles, Lucie; Fleming, Stephen M.

doi:10.3758/s13414-023-02710-8

Paradoxical evidence weighting in confidence judgments for detection and discrimination

Open access
Published: 20 June 2023

Volume 85, pages 2356–2385, (2023)
Cite this article

Download PDF

You have full access to this open access article

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Paradoxical evidence weighting in confidence judgments for detection and discrimination

Download PDF

Matan Mazor^1,2,
Roni O. Maimon-Mor^3,4,
Lucie Charles^5,6 &
…
Stephen M. Fleming^2,3,7

1539 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

When making discrimination decisions between two stimulus categories, subjective confidence judgments are more positively affected by evidence in support of a decision than negatively affected by evidence against it. Recent theoretical proposals suggest that this “positive evidence bias” may be due to observers adopting a detection-like strategy when rating their confidence—one that has functional benefits for metacognition in real-world settings where detectability and discriminability often go hand in hand. However, it is unknown whether, or how, this evidence-weighting asymmetry affects detection decisions about the presence or absence of a stimulus. In four experiments, we first successfully replicate a positive evidence bias in discrimination confidence. We then show that detection decisions and confidence ratings paradoxically suffer from an opposite “negative evidence bias” to negatively weigh evidence even when it is optimal to assign it a positive weight. We show that the two effects are uncorrelated and discuss our findings in relation to models that account for a positive evidence bias as emerging from a confidence-specific heuristic, and alternative models where decision and confidence are generated by the same, Bayes-rational process.

The effects of non-diagnostic information on confidence and decision making

Article Open access 15 March 2024

Prior information differentially affects discrimination decisions and subjective confidence reports

Article Open access 06 September 2023

Confidence judgments interfere with perceptual decision making

Article Open access 19 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

When considering two alternative hypotheses, the probability of the chosen hypothesis being correct is a function of the availability of evidence supporting not only the chosen hypothesis but also the unchosen one. For example, when deciding that there are more ants in the kitchen than in the living room, confidence should not only positively weigh the number of ants found in the kitchen (positive evidence) but also negatively weigh the number of ants found in the living room (negative evidence). Specifically, a decision should be based on the difference in the number of ants between the kitchen and the living room, but not on the total number of ants found in both rooms together (we refer to these quantities as relative evidence and sum evidence, respectively).

While sum evidence is irrelevant to discrimination decisions between two symmetrical hypotheses (e.g., kitchen or living room), it is highly informative with respect to detection decisions about the presence or absence of a signal. For example, when deciding that an ant colony is nesting in the house, we should also care about the total number of ants, irrespective whether they are found in the kitchen or living room (see Fig. 1).

A surprising finding is that, despite the irrelevance of sum evidence to the accuracy of discrimination decisions, people are systematically more confident in their perceptual discrimination decisions when sum evidence is high. For example, Zylberberg et al. (2012) had subjects judge which of two flickering stimuli was brighter on average. Subjects were more confident in their decisions when both stimuli were brighter, indicating an effect of sum evidence (here, overall luminance) on decision confidence. A positive effect of sum evidence on decision confidence is mathematically equivalent to a disproportional weighting of positive evidence over negative evidence, also known as a positive evidence bias (Koizumi et al., 2015; Peters et al., 2017; Rollwage et al., 2020; Samaha & Denison, 2020; Sepulveda et al., 2020; Zylberberg et al., 2012). The two are equivalent because positively weighing the sum of positive and negative evidence effectively weakens the negative contribution of negative evidence to decision confidence, while strengthening the contribution of positive evidence. Notably, this finding stands in contrast to what is expected from the exponential scaling of sensory noise relative to stimulus energy (Weber’s law). Instead, an effect of sum evidence on discrimination confidence may indicate a profound link between how confidence is formed in general, and the processes underpinning perceptual detection (Rausch et al., 2018; Samaha et al., 2020).

Different models identify the origin of this evidence-weighting asymmetry at different levels of the cognitive hierarchy, ranging from positing a metacognitive bias that ignores conflicting information (Maniscalco et al., 2016; metacognitive level, Peters et al., 2017), to asymmetries in the active sampling of evidence (attention allocation level; Sepulveda et al., 2020), and down to perceptual asymmetries between the representations of signal and noise (perception level; Miyoshi & Lau, 2020; Webb et al., 2021). These models vary in whether they postulate separate evidence accumulation processes for decisions and confidence judgments, and in whether they model confidence formation as following a suboptimal heuristic, or alternatively as being optimal with respect to available information (information which may be limited or corrupted by noise).

Here we focus on a subset of models which assume that subjects are rational decision makers equipped with veridical beliefs about the world, but who only have limited access to noisy evidence. Our models further assume that subjects’ confidence ratings are Bayesian estimates of the probability of being correct, given the exact same evidence that was used to make the decision. The models do not postulate any metacognitive biases, heuristics, or suboptimalities. We show that two of these models reproduce a positive evidence bias (that is, a positive effect of sum evidence) in discrimination confidence. The same models also make predictions for evidence weighting in detection judgments and confidence ratings. In four experiments, reverse correlation analysis revealed evidence weighting patterns that only partly agree with the predictions of our models. Most notably, our four models fail to account for a negative evidence bias we observed in detection decisions and confidence: a tendency to irrationally place a negative weighting on evidence, such as being more confident in the presence of a bright stimulus when one of the presented stimuli was unusually dark. In what follows we first describe the four models and the predictions they make, before turning to empirical findings from our four experiments.

Computational models

We model a setting in which agents are presented with a sequence of samples from two noisy sensory channels: E₁ and E₂. The agents’ task is to decide which of the two channels was the signal channel (discrimination), or whether any of the channels had signal in it at all (detection). When a signal is present in a channel, evidence E is sampled from a normal distribution $\mathcal{N}\left(0.5,1\right)$, and when a signal is absent evidence is sampled from $\mathcal{N}\left(0,1\right)$ (see Fig. 2, upper panel). In all four models agents only have access to a noisy version of these samples E^′, corrupted by additional internal sensory noise. After each time step, they update their belief about the relative likelihood of the observed samples under the two possible world states (signal in Channel 1 versus 2, or signal presence versus absence), and given full knowledge of the true sample-generating process, including the properties of sensory noise. Each trial comprises 12 time steps. At the end of a trial, agents report the world state that maximizes the likelihood of the observed evidence, and rate their confidence as the objective probability that their decision was correct given the accumulated likelihood estimates. The four models vary in the properties of sensory noise, and in the selection of some channels for inspection by selection mechanisms.

Vanilla model

In the basic, vanilla model, sensory noise is sampled from a normal distribution $\mathcal{N}\left(0,2\right)$. This model corresponds to a standard equal-variance signal detection model, as illustrated in Fig. 1.

Firing rate model

The firing rate model is similar to the vanilla model, with the exception that perceived values are sampled from a Poisson, rather than a normal distribution. An important property of the Poisson distribution family, commonly used to model firing rates in neuronal populations, is that their mean and variance are lawfully coupled: the stronger the activation, the more variable it is. When applied to sensory neurons, this results in strong stimuli being subjectively perceived as noisier, consistent with the Weber–Fechner law (Fechner & Adler, 1860). In identifying the origin of the positive evidence bias at the perceptual level, this model shares a family resemblance with the unequal-variance model by Miyoshi and Lau (2020). An important feature of this model is that perceptual noise is conditioned not on stimulus class, but on the perceptual sample. This seems plausible, as the perceptual system has no access to stimulus class beyond the information that is available in perceptual samples.

Random attention model

Like the vanilla model, sensory noise is again sampled from $\mathcal{N}\left(0,2\right)$. Unlike the vanilla model, however, here agents have access to one channel per time point only (they ‘attend’ to one channel at a time). At the start of each trial, agents randomly choose a preferred channel. Then, on each time point, they attend to the preferred channel with probability 0.95, and the nonpreferred channel with probability 0.05, and update their beliefs accordingly. We include this model because it is inherently asymmetric: on each trial, evidence from the preferred channel contributes more to both decision and confidence, simply because it is more visible to the agent.

Goal-directed attention model

This model is similar to the random attention model, except that here attention is biased towards channels that are more likely to include signal. Specifically, agents track the log likelihood ratio LLR_r between signal presence in the left or in the right channels, with the probability of attending the right channel being dynamically set at each time point to S(LLR_r) where S is a sigmoid function with a steep slope of 5 and LLR_r is based on all previous sensory samples in the trial. A conceptually similar drift diffusion model was previously shown to produce a positive evidence bias in confidence ratings (Sepulveda et al., 2020).

Simulations

We simulated 20,000 discrimination and 20,000 detection trials per model (100 trials × 200 simulated agents per model). On each discrimination trial, the signal channel was designated as right or left with equal probability. On half of the detection trials both channels were noise channels. We then sampled, for each trial, 12 values from each channel. These 24 values were then passed on to the simulated agent, who returned a decision and a confidence rating. We then subjected the agents’ decisions and confidence ratings to a reverse correlation analysis. We now turn to describe this analysis, which will also be used to analyze the behaviour of human participants in Exps. 1–4.

Reverse correlation analysis

Following Zylberberg et al. (2012), we took a reverse correlation approach and asked which sources of evidence (positive, negative, relative, and sum evidence) contribute to agents’ decisions and confidence ratings. This analysis focuses on random fluctuations in signal intensity, and asks how they affect behaviour (here, decisions and confidence in these decisions). Accordingly, in analyzing data from our simulated agents, we contrasted external stimulus energy (E) and not internal stimulus energy (E^’) leaving internal noise hidden.

Methodological note: Positive evidence bias in perceptual decisions

The positive evidence bias in decision confidence is often seen as particularly striking, given that positive and negative evidence are equally weighted in forming a decision (Peters et al., 2017; Zylberberg et al., 2012). For example, using reverse correlation, Zylberberg et al. (2012) showed that momentary fluctuations in the availability of perceptual evidence for and against a decision were equally predictive of the decision itself. Similarly, Peters et al. (2017) showed that in classifying rapidly presented images as ‘face’ or ‘house’, decisions are not solely guided by positive evidence (e.g., face-related brain activity when deciding ‘face’), but also by negative evidence (e.g., house-related brain activity when deciding ‘face’).

In both cases, it is useful to ask what it would look like for an agent to only consider positive evidence in making a decision. This soon becomes circular, because positive and negative evidence are defined with respect to the decision itself. For example, when analyzing the decisions of an agent that consistently ignores evidence for one alternative (similar to the random attention model above), both positive and negative evidence should still be predictive of decisions. The effect of positive evidence is then driven by those trials in which the agent selected the attended alternative, and the effect of negative evidence by those trials in which the agent selected the ignored alternative (because the evidence for the attended alternative was insufficient). Put differently, asymmetries of positive and negative evidence cannot affect the decision itself, because at the time of making the decision there is no positive and negative evidence to speak of—instead, there are two sources of evidence that may become positive or negative, depending on the decision that is selected. For this reason, in measuring evidence weighting in decision formation, we defined relative and sum evidence relative to the ground truth rather than the agents’ decision.

Discrimination decisions

From each trial (tr) we extracted random fluctuations in perceptual evidence in the signal ${E}_s^{tr}(t)$ and nonsignal ${E}_n^{tr}(t)$ sensory channels. To make sure we are measuring true random fluctuations and not systematic differences between noise and signal channels, we mean centered the signal channels across trials to 0, such that the average time course across all agents and trials was constant at 0. For simplicity, in extracting qualitative predictions from model simulations we averaged all time points in a trial to obtain trial-level estimates ${E}_s^{tr}$ and ${E}_n^{tr}$. Human data were analyzed in a similar fashion, but separately for each time point. Time-resolved decision and confidence kernels derived from model simulations are available in the Appendix.

‘Relative evidence’ was defined as the difference in noise terms between the signal and nonsignal channels (${E}_{relative}^{tr}={E}_s^{tr}-{E}_n^{tr}$). To obtain a decision kernel, we took the difference between the average relative evidence in trials where agents chose the signal and nonsignal channels ${E}_{relative}={\left\langle {E}_{relative}^{tr}\right\rangle}_{CORRECT}-{\left\langle {E}_{relative}^{tr}\right\rangle}_{INCORRECT}$. This was done separately for each simulated agent, and the resulting values were tested against zero in a t test. In all four models, relative evidence was higher on trials in which the agent correctly identified the signal channel (Fig. 3A, orange markers).

‘Sum evidence’ was defined as the total sum of noise terms across both channels (${E}_{sum}^{tr}={E}_s^{tr}+{E}_n^{tr}$). Similarly, we used the difference between sum evidence in correct and incorrect trials ${E}_{sum}={\left\langle {E}_{sum}^{tr}\right\rangle}_{CORRECT}-{\left\langle {E}_{sum}^{tr}\right\rangle}_{INCORRECT}$ to probe effects of sum evidence on decision. Sum evidence had no effect on decision in any of the four models (Fig. 3A, black markers).

Discrimination confidence

In all four models, confidence was defined as the Bayesian probability of being correct, given an equal prior over the two world states (see Appendix). The median confidence rating was used to split evidence channels into four sets, according to decision (chosen or unchosen, depending on the agent’s decision) and confidence level (high or low). Confidence kernels for the chosen and unchosen channels were then extracted by subtracting the mean low-confidence from the mean high-confidence values for each channel:

$${E}_{conf- chosen}={\left\langle {E}_{chosen}^{tr}\right\rangle}_{HIGH}-{\left\langle {E}_{chosen}^{tr}\right\rangle}_{LOW}$$

$${E}_{conf- unchosen}={\left\langle {E}_{unchosen}^{tr}\right\rangle}_{HIGH}-{\left\langle {E}_{unchosen}^{tr}\right\rangle}_{LOW}$$

Confidence kernels were also extracted for relative and sum evidence:

$${E}_{conf- relative}=\left({\left\langle {E}_{chosen}^{tr}\right\rangle}_{HIGH}-{\left\langle {E}_{unchosen}^{tr}\right\rangle}_{HIGH}\right)-\left({\left\langle {E}_{chosen}^{tr}\right\rangle}_{LOW}-{\left\langle {E}_{unchosen}^{tr}\right\rangle}_{LOW}\right)$$

$${E}_{conf- sum}=\left({\left\langle {E}_{chosen}^{tr}\right\rangle}_{HIGH}+{\left\langle {E}_{unchosen}^{tr}\right\rangle}_{HIGH}\right)-\left({\left\langle {E}_{chosen}^{tr}\right\rangle}_{LOW}+{\left\langle {E}_{unchosen}^{tr}\right\rangle}_{LOW}\right)$$

In all four models, high confidence ratings were associated with stronger evidence in the chosen channel (Fig. 3B, green markers) and weaker evidence in the unchosen channel (Fig. 3B, purple markers). As expected, this translated to an effect of relative evidence on decision confidence: agents were more confident when the evidence difference between the chosen and unchosen channels (E_{conf − relative}) was high (Fig. 3C, orange markers).

Critically, only the firing rate and goal-directed attention models produced an effect of sum evidence (E_{conf − sum}) on decision confidence, such that agents were more confident when overall evidence was high (Fig. 3C, black markers). As reviewed above, this effect is consistent with a positive evidence bias in discrimination confidence.

Detection decisions

For the reverse correlation analysis of detection decisions, we focused on trials in which a signal was present. This allowed us to disentangle the effects of evidence in the signal and nonsignal channels on detection decisions and confidence. We subtracted evidence in trials that resulted in a ‘no’ (target absent) decision from evidence in trials that resulted in a ‘yes’ (target present) decision, separately for the signal and nonsignal channels:

$${E}_{detection-s}={\left\langle {E}_s^{tr}\right\rangle}_{YES}-{\left\langle {E}_s^{tr}\right\rangle}_{NO}$$

$${E}_{detection-n}={\left\langle {E}_n^{tr}\right\rangle}_{YES}-{\left\langle {E}_n^{tr}\right\rangle}_{NO}$$

We similarly obtained detection kernels as a function of relative and sum evidence:

$${E}_{detection- relative}=\left({\left\langle {E}_s^{tr}\right\rangle}_{YES}-{\left\langle {E}_n^{tr}\right\rangle}_{YES}\right)-\left({\left\langle {E}_s^{tr}\right\rangle}_{NO}-{\left\langle {E}_n^{tr}\right\rangle}_{NO}\right)$$

$${E}_{detection- sum}=\left({\left\langle {E}_s^{tr}\right\rangle}_{YES}+{\left\langle {E}_n^{tr}\right\rangle}_{YES}\right)-\left({\left\langle {E}_s^{tr}\right\rangle}_{NO}+{\left\langle {E}_n^{tr}\right\rangle}_{NO}\right)$$

In all four models, ‘yes’ responses were associated with stronger evidence in the signal channel (Fig. 3D, blue markers). Importantly, the same was true for evidence in the nonsignal channel: Agents were more likely to respond ‘yes’ when evidence was stronger in this channel too (Fig. 3D, red markers). This is a key prediction of our Bayes-rational models: In detection, evidence in both channels should be weighted positively, as the agent’s goal is to detect any signal relative to noise. Together, these two positive effects translated to a strong effect of sum evidence on detection decisions: Agents were more likely to respond ‘yes’ when the total sum of evidence was high (Fig. 3E, black markers). A weaker effect of relative evidence on detection decisions was observed in all models except for the random attention model (Fig. 3E, orange markers).

Detection confidence

Similar to the discrimination task, the median confidence rating was used to split evidence channels into four sets, according to signal (signal channel or nonsignal channel) and confidence level (high or low). This was done separately for ‘yes’ and ‘no’ responses. Confidence kernels for the signal and nonsignal channels were then extracted by subtracting the mean low-confidence from the mean high-confidence evidence values for each channel and decision. For example, for ‘yes’ responses this meant computing:

$${E}_{conf- yes-s}={\left\langle {E}_s^{tr}\right\rangle}_{YES, HIGH}-{\left\langle {E}_s^{tr}\right\rangle}_{YES, LOW}$$

$${E}_{conf- yes-n}={\left\langle {E}_n^{tr}\right\rangle}_{YES, HIGH}-{\left\langle {E}_n^{tr}\right\rangle}_{YES, LOW}$$

$${E}_{conf- yes- relative}=\left({\left\langle {E}_s^{tr}\right\rangle}_{YES, HIGH}-{\left\langle {E}_n^{tr}\right\rangle}_{YES, HIGH}\right)-\left({\left\langle {E}_s^{tr}\right\rangle}_{YES, LOW}-{\left\langle {E}_n^{tr}\right\rangle}_{YES, LOW}\right)$$

$${E}_{conf- yes- sum}=\left({\left\langle {E}_s^{tr}\right\rangle}_{YES, HIGH}+{\left\langle {E}_n^{tr}\right\rangle}_{YES, HIGH}\right)-\left({\left\langle {E}_s^{tr}\right\rangle}_{YES, LOW}+{\left\langle {E}_n^{tr}\right\rangle}_{YES, LOW}\right)$$

In all four models, agents were more confident in their decisions about signal presence when evidence in the signal channel was stronger (Fig. 3F, blue markers). Mirroring the detection decision kernel means, confidence in signal presence was also positively affected by evidence for signal in the nonsignal channel (Fig. 3F, red markers). Together, these two positive effects produced an overall positive effect of sum evidence on confidence in signal presence (Fig. 3G, black markers). All four models predicted a weaker effect of relative evidence (Fig. 3G, orange markers).

Finally, we asked how random variability in sensory noise contributed to confidence in detection “no” responses. Here, a low number of misses made it difficult to reliably estimate confidence kernels for the firing rate model. In the remaining three models, agents were more confident in decisions about signal absence when evidence in both signal and nonsignal channels was weaker (Fig. 3H, blue and red markers, respectively). Together, these negative effects translated to a total negative effect of sum evidence on confidence in absence (Fig. 3I, black markers). None of the four models predicted a negative effect of relative evidence on confidence in absence, but the random attention model predicted a subtle positive effect (Fig. 3I, orange markers).

Equipped with qualitative predictions from four Bayes-rational models, we now turn to describing our empirical results. As we report below, these models failed to account for a key signature of human decision making: in both decisions and confidence ratings, subjects negatively weigh evidence in the nonsignal channel when inferring signal presence, as if they are making a discrimination judgment about the origin of the signal, rather inferring signal presence.

Experiment 1

Methods

Participants

The research complied with all relevant ethical regulations and was approved by the Research Ethics Committee of University College London (UCL; study ID number 1260/003). Ten participants were recruited via the UCL’s psychology subject pool, and gave their informed consent prior to their participation. Each participant performed four sessions of 600 trials each, in blocks of 100 trials. Sessions took place on different days and consisted of three discrimination blocks interleaved with three detection blocks.

Experimental procedure

The experimental procedure for Exp. 1 largely followed the procedure described in Zylberberg et al. (2012), Exp. 1. Participants observed a random-dot kinematogram for a fixed duration of 700 ms. In discrimination trials, the direction of motion was one of two opposite directions with equal probability, and participants reported the observed direction by pressing one of two arrow keys on a standard keyboard. In detection blocks, participants reported whether there was any coherent motion by pressing one of two arrow keys on a standard keyboard. In half of the detection trials, dots moved coherently to one of two opposite directions, and in the other half all dots moved randomly.

In both detection and discrimination blocks, participants indicated their confidence following each decision. Confidence was reported on a continuous scale ranging from chance to complete certainty. To avoid systematic response biases affecting confidence reports, the orientation (vertical or horizontal) and polarity (e.g., right or left) of the scale was set to agree with the Type 1 response. For example, following an up-arrow press, a vertical confidence bar was presented where ‘guess’ is at the center of the screen and ‘certain’ appeared at the upper end of the scale (see Fig. 4).

To control for response requirements, for five subjects, the dots moved to the right or to the left, and for the other five subjects, they moved upward or downward. The first group made discrimination judgments with the right and left keys and detection judgments with the up and down keys, and this mapping was reversed for the second group. The number of coherently moving dots (‘motion coherence’) was adjusted to maintain performance at around 70% accuracy for detection and discrimination tasks independently. This was achieved by measuring mean accuracy after every 20 trials, and adjusting coherence by a step of 3% if accuracy fell below 60% or went above 80%. We opted for a block-wise staircasing procedure in order to keep motion energy relatively stable across trials, allowing participants to optimally place their detection criterion. The staircasing procedure for both tasks started at a coherence value of 1.0.

Stimuli for discrimination blocks were generated using the exact same procedure reported in Zylberberg et al. (2012).^{Footnote 1} Trials started with a presentation of a fixation cross for 1 second, immediately followed by stimulus presentation. The stimulus consisted of 152 white dots (diameter = 0.14^∘), presented within a 6.5^∘ circular aperture centered on the fixation point for 700 ms (42 frames, frame rate = 60 Hz). Dots were grouped in two sets of 76 dots each. Every other frame, the dots of one set were replaced with a new set of randomly positioned dots. For each coherence value of c^′, a proportion of c^′ of the dots from the second set moved coherently in one direction by a fixed distance of 0.33^∘, while the remaining dots in the set moved in random directions by a fixed distance of 0.33^∘. On the next update, the sets were switched, to prevent participants from tracing the position of specific dots. Frame-specific coherence values were sampled for each screen update from a normal distribution centred around the coherence value c with a standard deviation of 0.07, with the constraint that c^′ must be a number between 0 and 1.

Stimuli for detection blocks were generated using a similar procedure, where on a random half of the trials coherence was set to 0%, without random sampling of coherence values for different frames.

To probe global metacognitive estimates of task performance, at the end of each experimental block (100 trials) participants estimated the number of correct responses they have made. Analysis of these global metacognitive estimates is provided in the Appendix.

Analysis

Experiment 1 was preregistered (preregistration document is available here: https://osf.io/z2s93/). Our full preregistered analysis is available in the Appendix.