Global visual confidence

Visual confidence is the observers’ estimate of their precision in one single perceptual decision. Ultimately, however, observers often need to judge their confidence over a task in general rather than merely on one single decision. Here, we measured the global confidence acquired across multiple perceptual decisions. Participants performed a dual task on two series of oriented stimuli. The perceptual task was an orientation-discrimination judgment. The metacognitive task was a global confidence judgment: observers chose the series for which they felt they had performed better in the perceptual task. We found that choice accuracy in global confidence judgments improved as the number of items in the series increased, regardless of whether the global confidence judgment was made before (prospective) or after (retrospective) the perceptual decisions. This result is evidence that global confidence judgment was based on an integration of confidence information across multiple perceptual decisions rather than on a single one. Furthermore, we found a tendency for global confidence choices to be influenced by response times, and more so for recent perceptual decisions than earlier ones in the series of stimuli. Using model comparison, we found that global confidence is well described as a combination of noisy estimates of sensory evidence and position-weighted response-time evidence. In summary, humans can integrate information across multiple decisions to estimate global confidence, but this integration is not optimal, in particular because of biases in the use of response-time information. Supplementary Information The online version contains supplementary material available at 10.3758/s13423-020-01869-7.

This logistic regression does not include any additional degree of freedom as the sensitivity to the difference in uncertainty between the two sets can directly be represented in the weights. β i

Statistical tests for recency effects on logistic regression weights For each
participant and for each set size of 2 and larger, we fitted a linear regression model to the logistic regression weights over serial positions . Table S1 shows the results for the β i i one-sample t test against zero across subjects for both accuracy and reciprocal of response time. Table S1 . Statistical results of one-sample t tests against zero across subjects on the linear-regression slopes (position-specific weights over serial position) for each set size larger than two. The "avg." indicates the within-observer averages of the linear slopes 4 . across set sizes, as an estimate for the per-observer effect regardless of set sizes. Bayes Factors below 1 favor the null hypothesis (that the mean equals zero). Bayes Factors above 1 favor the alternative hypothesis (that the mean is different from zero).

Generic response pattern as an alternative explanation
We need to rule out the possibility that the RT recency effect could be due to a generic response pattern that is irrelevant to confidence choices (e.g., response times increased as serial position advanced within a set regardless of confidence choice). We thus computed average perceptual accuracy and average response time across all sets and trials set for each serial position, and found that average accuracies and response times (except for the first decision within a set) remained constant over serial positions within a set for all set sizes.
The absence of "recency" pattern in these averages suggests that the recency effect in response times was not generic, but was specifically related to global confidence choices. Figure S2 shows the average position-specific accuracies and response times. Average accuracies and response times (except for the first decision within a set) remained constant over serial positions within a set for all set sizes. The slow average response times in the first decision within each set might be due to attention lapse right after the prompt before each set began. The absence of "recency" pattern in these averages suggests that the recency effect in response times was not generic, but was specifically related to global confidence choices.

Figure S2
. Perceptual accuracies (orange) and response times (blue) averaged at each serial position. Panels represent different set sizes (from left to right: set size = 1, 2, 4, and 8). Error bars represent +/-one standard error of the mean.

Model formulation
We considered two cues from which an observer would gather evidence in making the global-confidence choice.
The first cue was based on the sensory stimuli within each set. In each confidence-comparison trial in Experiment 2, observers viewed two sets of Gabors, one after another. With respect to the orientation-discrimination task, the sensory information is the difference in orientation between the Gabor orientation and the reference orientation.
We denote by this orientation difference for a stimulus presented at serial position u i,j,t i This represents the standardized "distance" between the sensory representation of Z i,j,t the stimulus from the decision criterion. We took this as a local confidence estimate based on sensory information for each perceptual decision. We denote this cue as .

IST D
The second cue was based on response time for each perceptual decision ( ) within RT i, j, t a set. Based on similar notations as above, we computed as follows .
RUNNING HEAD: GLOBAL VISUAL CONFIDENCE This reciprocal of the response time was taken as the local confidence estimate based on response time for each perceptual decision. We denote this cue as .

T R
In the first analysis about summary statistics, we considered three ways to obtain the global confidence estimate for each set across perceptual decisions. The "average" strategy was to compute the arithmetic mean across the values within each set as the evidence estimate. For example, the evidence for set in trial , the evidence IST D j t was computed as follows (the evidence based on response time was The "maximum" and "minimum" strategies were to take, respectively, the maximum and minimum values across items within each set as the evidence estimate for the set. In the second analysis about the position-specific weights, each local evidence estimate was weighted based on its position before being summed over. For example, for the IST D cue (likewise for the cue) T R For uniform weights over positions, we set so that the computation of evidence / n w i = 1 is identical to arithmetic mean.
For exponential weights, we computed based on an exponential weight function . .
The parameter controls the variation in position weighting. When , later items in the r r > 0 set are given heavier weights, with the last item having the heaviest (i.e., simulating a "recency" effect). When , earlier items in the set are given heavier weights, with the r < 0 first item having the heaviest (i.e., simulating a "primacy" effect). When , the r = 0 exponential weight function is identical to uniform weights. The local estimates based on and cues could be combined using separate values of , so that we made IST D T R r and two different parameters. r DIST r RT Then, the evidence for set B was subtracted from that for set A to obtain the combined evidence for choosing set A to be the more-confident set for trial . For example, for the t evidence,

IST D
The evidence based on response time was computed likewise. E RT , j, t Then, the overall evidence for choosing set A was the linear combination of the evidence from the two cues, plus a constant term to capture bias in confidence choices: The probability for choosing the Set A was modeled based on the logistic function as described in equation (3).
In terms of this model formulation, the six models differ in terms of which parameters were fitted to the data based on maximum-likelihood estimation (all models included the intercept term ). Below are the models we considered, with the parameters being fixed β 0 described: Model 1: the uniform-DIST-only model (fitted and only) .

Model selection
We took the parameter set that maximized as the estimated parameter set, and θ ˆm We selected the best model by comparing the model evidence across models. We , .

ADDITIONAL STATISTICAL ANALYSES ON THE SET-SIZE EFFECT
Background Suppose the evidence for global confidence was computed by averaging multiple noisy local confidence estimates arising from the perceptual decisions within a set. Then, as set size increases, based on the central limit theorem, there are two consequences on the distribution of the internal representation of global evidence. First, such the global-evidence distribution becomes more similar to a normal distribution, increasing the fidelity of the normality assumption in using signal detection theory (SDT) to analyze the data. Second, the variability of the global-evidence distribution decreases for both the hard and easy sets and, with the mean being constant as set size varies, the signal-to-noise ratio for the global-confidence choice task increases. set-size effect on confidence-choice d' (i.e., the increase in global metacognitive sensitivity as set size increased) was mainly due to the second consequence (i.e., increase internal signal-to-noise ratio) instead of the first one (i.e., increase in the fidelity of the normality assumption for SDT).

Simulation preparation
We obtained samples of noisy local estimates by generating 100 simulated runs of Experiment 2 for each observer. In each simulation run, the noise for each local estimate was independently sampled from a Gaussian distribution, with mu and sigma, respectively, being the PSE and sensitivity of the observer's own perceptual psychometric curve. Then, for each trial, we computed the global evidence for the easy and hard set by averaging the noisy local confidence estimates within each set. Figure S4 shows the distributions of local (top row) and global evidence (bottom row) of the two sets as set size increases. To quantify the effects of set size on the above variables, we computed the linear slopes of regressing the respective variables by the natural log of set sizes (i.e., the same method as we computed the set-size effect for the confidence-choice d' in the main text). As a result, for each observer in each simulation run, we obtained the "set-size effect" on each of the above variables. We then averaged these set-size effects across the 100 simulation runs to obtain an estimate of each type of set-size effect for each observer.
Correlating set-size effects If any of the variables are related to the set-size effect on metacognitive sensitivity, the corresponding set-size effects should be correlated with the set-size effect on confidence-choice d'. We computed such correlation for each of the set-size effects with the set-size effect on confidence-choice d'. Figure S5 shows the scatter plots with correlations. If the increase in fidelity of the normality assumption could explain the set-size effect on confidence-choice d', we should observe a correlation between the respective set-size effects on these two variables.
However, we did not find such a correlation (left panel below; r = .2543, p = .2793).
Instead, we found a positive correlation between the respective set-size effects on confidence-choice d' and on internal SNR (middle panel below; r = .6236, p = .0033). We found similar results between the respective set-size effects on confidence-choice d' and on the AUROC (right panel below; r = .5762, p = .0078).

Mediation analysis
To further examine the possible influence from the fidelity of normality assumption, we constructed the following mediation model. The purpose of this analysis is to evaluate the mediating effects of the following two variables.
We included four variables for each observer and each set size. For the simulated variables, we used the average across the 100 simulation runs. As a result, each variable contained 80 observations (20 observers x 4 set sizes). Figure S6 illustrates the mediation model with the fitted coefficients and statistics.  Table S2. We found a significant total effect of set size on confidence-choice d' (β= .541, Z = 5.714, p < .001), which is consistent with what is reported in the main text. However, the direct effect was insignificant (β= -.158, Z = -1.193, p = .233).
Critically, we did not find any indirect effect from set size via "normality of evidence" to confidence-choice d' was insignificant (β= .132, Z = 0.562, p = .574), which may be due to the weak path from "normality of evidence" to confidence-choice d' (β= .161, Z = 0.570, p = . 569). Instead, we found a significant indirect effect from set size via "internal SNR" to confidence-choice d' (β= .590, Z = 3.734, p < .001). All other component paths were significant (all ps < .001). Taken together, these results suggest that the set-size effect on confidence-choice d' reported in the main text was not mediated by the increase in the fidelity of the normality assumption on the internal evidence distributions. Instead, a more likely mediating factor was the enhanced signal-to-noise ratio in the internal representations of global evidence.