Appendix A: Proof of p(F) ≤ 1/2 in the RC2a rule
In pages 1014-1015, Petrov (2009) said:
“...if the first stimulus is classified as “B” when x1 > c2, the second stimulus is classified as “A” when x2 < −c2. If the first stimulus is classified as “A” when x1 < c1, then second stimulus is classified as “B” when x2 > −c1. With symmetric criteria (c1 = −c2), the resulting rule coincides with the CC2s rule”.Footnote 6
It should be apparent from the above that c1 < 0 because c1 ≤−c1 in order to avoid double labeling (because otherwise, if − c1 < c1, then when x1 < c1, x2 can be labeled both as “A” and “B” if − c1 < x2 < c1). It is also apparent that when the first stimulus is classified as B correctly or incorrectly, criterion c2 will be used for both the first and second stimuli. Therefore, when the response is correct, p(“first response is B”\(|B) =b_{B1} =1 - {\Phi } \left (c_{2} - \frac {d'}{2} \right )\). When the response is incorrect, p(“first response is B”\(|A) = b_{A1}=1 - {\Phi } \left (c_{2} + \frac {d'}{2} \right )\).
Likewise, if the first stimulus is classified as A (centered at -\(\frac {d'}{2}\)) correctly or incorrectly, criterion c1 is used for both the first and second stimuli. Therefore, when the response is correct, p(“first response is A\(|A ) = a_{A1}={\Phi } \left (c_{1} + \frac {d'}{2} \right )\). When the response is incorrect, p(“first response is A”\(|B) = a_{B1} = {\Phi } \left (c_{1} - \frac {d'}{2} \right )\).
We now consider the false alarm rate p(F) by definition,Footnote 7
$$ \begin{array}{@{}rcl@{}} p(F) &= &p(\text {``responding different''}|AA \cup BB)\\ &= &p(\text {``responding different''}|AA)p(AA|AA \cup BB)\\ && + p(\text {``responding different''}|BB)p(BB|AA \cup BB)\\ &= & \frac{1}{2} \left( p \left( \text {``responding different''}|AA \right)\right.\\ &&\left.+ p \left( \text {``responding different''}|BB \right) \right) \\ &=& \frac{1}{2} \left( a_{A1} b_{A2}+b_{A1} a_{A2} + a_{B1} b_{B2} + b_{B1} a_{B2} \right). \end{array} $$
(A.1)
Let us look at the first two terms in the parenthesis, aA1bA2 + bA1aA2. Recall aA1 means that “responding A” when the first stimulus is A, that c1 is used when the first stimulus is responded as “A,” and that c2 is used when the first stimulus is responded as “B”.
$$ \begin{array}{@{}rcl@{}} a_{A1} b_{A2} &=& {\Phi} \left( c_{1}+\frac{d'}{2} \right) \left( 1 - {\Phi} \left( -c_{1} - \left( -\frac{d'}{2} \right) \right) \right),\\ b_{A1} a_{A2} & =& \left( 1 - {\Phi} \left( c_{2} + \frac{d'}{2} \right) \right) {\Phi} \left (-c_{2} - \left( -\frac{d'}{2} \right) \right).\\ \text{Note that } a_{A2} &=& {\Phi} \left (-c_{2} - \left( -\frac{d'}{2} \right) \right) = 1 - {\Phi} \left (c_{2} - \frac{d'}{2} \right), \\ \text{ and } b_{A2} &= &1 - {\Phi} \left( -c_{1} - \left( -\frac{d'}{2} \right) \right) = {\Phi} \left( c_{1} - \frac{d'}{2} \right). \end{array} $$
(A.2)
Similar derivation, or better yet, the symmetry consideration, gives us (as it should, given the necessary condition for RC2a)
$$ p(\text {responding ``different''}|BB) = p(\text {responding ``different''}|AA). $$
(A.3)
Hence,
$$ \begin{array}{@{}rcl@{}} p(F)& =& {\Phi} \left( c_{1} + \frac{d'}{2} \right) {\Phi} \left( c_{1} - \frac{d'}{2} \right) \\&&+ \left( 1 - {\Phi} \left( c_{2} + \frac{d'}{2} \right) \right) \left( 1 - {\Phi} \left( c_{2} - \frac{d'}{2} \right) \right). \end{array} $$
(A.4)
Note in the derivations above there might not appear a need for c1 ≤ c2. However, since aA1 = bB2 and bA1 = aB2, it is necessary that c1 ≤ 0 ≤ c2 to avoid overlap labeling. We next prove that p(F) ≤ 1/2.
$$ \begin{aligned} \because & c_{1} \leq 0,\\ \therefore & {\Phi} \left( c_{1} + \frac{d'}{2} \right) \leq 1 - {\Phi} \left( c_{1} - \frac{d'}{2} \right). \end{aligned} $$
(A.5)
This means that, starting at c1 ≤ 0, the right tail of the normal distribution \(N \left (-\frac {d'}{2}, 1 \right )\) has a greater area than the left tail of the normal distribution \(N \left (\frac {d'}{2}, 1 \right )\).
$$ \begin{aligned} \therefore & {\Phi} \left( c_{1} + \frac{d'}{2} \right) {\Phi} \left( c_{1} - \frac{d'}{2} \right) \leq \left( 1 - {\Phi} \left( c_{1} - \frac{d'}{2} \right) \right) {\Phi} \left( c_{1} - \frac{d'}{2} \right) \leq \frac{1}{4}. \end{aligned} $$
(A.6)
Using left-right symmetry, it can be similarly proved that
$$ \left( 1 - {\Phi} \left( c_{2} + \frac{d'}{2} \right) \right) \left( 1 - {\Phi} \left( c_{2} - \frac{d'}{2} \right) \right) \leq \frac{1}{4}. $$
(A.7)
Therefore,
$$ \begin{array}{@{}rcl@{}} p(F) &= &{\Phi} \left( c_{1} + \frac{d'}{2} \right) {\Phi} \left( c_{1} - \frac{d'}{2} \right) \\&&+ \left( 1 - {\Phi} \left( c_{2} + \frac{d'}{2} \right) \right) \left( 1 - {\Phi} \left( c_{2} - \frac{d'}{2} \right) \right)\\ & \leq &\frac{1}{2}. \end{array} $$
(A.8)
Appendix B: Proof that p(F) ≤ 1/2 in CC2a, CC2s, and CC1
To ensure absolute clarify regarding how the CC2a (covert classification with two asymmetric criteria) works according to Petrov (2009), and to ensure that we have not added any extra assumptions or conditions, we first quote the relevant definition from Petrov (2009), page 1024:
The CC2a decision rule is a generalization of CC1 that uses three covert categories: “A,” “B,” and “ambiguous”. This requires two criteria c1 ≤ c2...
The observer responds “different” iff one stimulus is unambiguously classified “A” and the other “B”.
It should be emphasized that the condition c1≤c2 for the two criteria was specified in Petrov (2009), and not enforced by us. It should also be emphasized that the “different” response is definitive and unambiguous, because this response is made when and only when both stimuli are classified unambiguously and differently, i.e., either “AB” or “BA”.
When a “different” response is correct, it is defined as a hit. When a “different” response is incorrect, it is defined as a false alarm. Since a “different” response is defined unambiguously, a hit or false alarm is defined unambiguously also. As a result, when c1 and c2 are defined with c1 ≤ c2, the CC2a rule is completely defined.
To recap, in the CC2a rule and in each interval, a stimulus x can be categorized with three possibilities: “a” when x ≤ c1, “b” when x > c2, and “ambiguous” when c1 < x ≤ c2. Obviously this decision is independent of the stimulus sequence, which is why CC2a belongs to the case when pAB = pBA. Given that the probability of both stimuli being AA but classified as “different” is: pAA = aAbA + bAaA = 2aAbA, and pBB = 2aBbB, from Eq. A.3, we know that \(p(F) = \frac {1}{2} (p_{AA} + p_{BB}) = a_{A} b_{A} + a_{B} b_{B}\).
$$ \begin{array}{@{}rcl@{}} a_{A} b_{A}& =& {\Phi} \left (c_{1} + \frac{d'}{2} \right) \left( 1 - {\Phi} \left( c_{2} + \frac{d'}{2} \right) \right),\\ a_{B} b_{B} &=& {\Phi} \left (c_{1} - \frac{d'}{2} \right) \left( 1 - {\Phi} \left( c_{2} - \frac{d'}{2} \right) \right),\\ \therefore p(F) &= &{\Phi} \left (c_{1} + \frac{d'}{2} \right) \left( 1 - {\Phi} \left( c_{2} + \frac{d'}{2} \right)\right) \\&&+ {\Phi} \left (c_{1} - \frac{d'}{2} \right) \left( 1 - {\Phi} \left( c_{2} - \frac{d'}{2} \right)\right). \end{array} $$
(B.1)
$$ \begin{array}{@{}rcl@{}} &\because& c_{1} \leq c_{2},\\ &\therefore & {\Phi} \left (c_{1} + \frac{d'}{2} \right) \leq {\Phi} \left( c_{2} + \frac{d'}{2} \right),\\ &\therefore & {\Phi} \left (c_{1} + \frac{d'}{2} \right) \left( 1 - {\Phi} \left( c_{2} + \frac{d'}{2} \right)\right) \\&&\leq {\Phi} \left (c_{1} + \frac{d'}{2} \right) \left( 1 - {\Phi} \left( c_{1} + \frac{d'}{2} \right)\right) \\ && \leq \left( \frac{1}{2} \right)^{2}.\\ &\therefore & a_{B} b_{B} \leq \frac{1}{4}.\\ &\therefore & p(F) = a_{A} b_{A} + a_{B} b_{B} \leq \frac{1}{2}. \end{array} $$
(B.2)
Given that CC2s and CC1 are both special cases of CC2a, it follows that this proof applies to CC2s (when − c1 = c2) and CC1 (when c1 = c2) also. In particular, when c1 = c2 = 0, it is the optimal independence rule, which is now proved to have a restricted p(F) ≤ 1/2.
This CC2a rule can also be extended mathematically to overcome the restriction that p(F) ≤ 1/2. Recall that CC2a was defined in Petrov (2009) as: If a stimulus x ≤ c1, then x is labeled as “A”. Otherwise if x > c2, x is labeled as “B” (c1 ≤ c2). A “different” response will be made iff one stimulus is unambiguously labeled as “A” and the other unambiguously as “B”. The proposed extension is two-fold: (1) When c1 ≤ x < c2, respond “different” regardless of how the other stimulus x2 is labeled (opposite to the original definition). (2) When c2 ≤ c1 such that x is labeled both as “A” and “B” when c2 ≤ x ≤ c1, then respond “same” regardless of how x2 is labeled. Under these extensions, one can see that, at one extreme, when \(c_{1} \rightarrow -\infty \), and \(c_{2} \rightarrow +\infty \), then the response should always be “different” such that \(p(F) \rightarrow 1\) and \(p(H) \rightarrow 1\). At the other extreme, when \(c_{1} \rightarrow +\infty \), and \(c_{2} \rightarrow -\infty \), all stimuli will be double labeled as both “A” and “B” such that \(p(F) \rightarrow 0\) and \(p(H) \rightarrow 0\). As a result, the ROC covers the entire range and p(F) > 1/2 will be possible.
Such an extension may be mathematically “natural,” but not necessarily so psychologically. For example, Petrov (2009) considered it psychologically natural for two uncertain stimuli to be perceived as “same”. As far as we know, such an extension has not been proposed as a psychological decision rule, possibly because of the “same” response bias when stimuli were ambiguous. For example, Bamber (1969) postulated that the “same” decision was processed in parallel, whereas the “different” decision was processed in serial. This is termed the “fast-same” effect and suggests that “same” and “different” may not be symmetric psychologically, even if they are symmetric mathematically (see also Egeth (1966)). Testing this extension is hence beyond the scope of the current study, but we will address this in the future study when participants are trained in multiple sessions.
Appendix C: Verifying CC2a and CC1 model fitting
Recall that the independence rule is a special case of the covert classification model with one parameter. When this parameter, which is the decision criterion for each of the two stimuli, is unbiased, the model is called the independence rule. When this parameter is systematically varied in \((-\infty , +\infty )\), a linear ROC is obtained in the Z-space (or Z-ROC) whose slope is one (MacMillan and Creelman, 2005). We have empirically verified (Figs. 2 and 3) that the slopes of the human Z-ROCs were smaller than one for both the 8∘ and (after chance participants were excluded) 4∘ participants. We can now, based on this empirical result, independently verify whether our model fitting will give rise to the same conclusion, with the covert classification model with two parameters (CC2a) and with one parameter (CC1).
Since we have proven that, for CC2a and CC1, p(F) ≤ 1/2, we will only use the human data with p(F) ≤ 1/2 to fit the models. It also turned out that, by adding an additional Euclidean distance calculation between human and model (pAA + pBB, pAB + pBA) that is equivalent to insisting that (p(F),p(H)) fit well, the best and second best-fitting d′ values could better separate apart. As a result, all subsequent model fittings incorporated this additional constraint.
Since human data will be used only if p(F) ≤ 1/2, where p(F) = 1/2(pAA + pBB), each of the 10 experimental sections had two to four 4-D data points that satisfied this constraint. As a result, each participant contributed on average 30 4-D data points for the model fitting. During the model fitting, an exhaustive search was conducted within the range of d′∈ [0,2.5], with a step size of Δd′ = 0.05.
The consequence of the model fitting was that residuals were obtained as a function of the four pXY dimensions, and of the rating criteria (two to four levels, since p(F) needed to be ≤ 1/2). There are two ways to analyze the residual data. The first is to check the magnitudes of the residuals. Obviously, the larger the residuals are, the poorer the fitting is. The second way is to check whether the residuals are evenly distributed across different pXY dimensions and rating criteria. One can argue that even if residual magnitudes are large, but if the residuals are reasonably evenly distributed across various dimensions, then the model fitting is unbiased and has captured the mean values of human performance.
Since the participants in the current study were non-experts, we expected that the residuals could be large. Consequently, we focused our analysis on the second aspect, namely whether the residuals were evenly distributed across pXY dimensions and rating criteria. To accomplish this, we restricted the rating data to the two smallest p(F) values so that there would be nearly no empty entries and an ANOVA was possible. Because there were only two levels of rating data used, our emphasis would be on the four pXY dimensions to see whether residuals evenly distributed across these four dimensions. Figure 8 shows the means of residuals across the four dimensions of pXY, X, Y ∈{A, B} and the two lowest rating criteria for the two models, CC2a and CC1.
Since this is verification of the results in “Human behavioral results”, where data from 26 8∘ participants, and 18 4∘ participants (whose accuracies were > 0.52) were used, the same participants’ data were used here. We first analyzed the residuals from the CC2a model fitting to a 4 × 2 ANOVA with pXY and rating criterion as the main factors. The main effect of pXY was significant, F(3,129) = 10.64,p = 3 × 10− 6. The main effect of rating criterion did not reach significance, F(1,43) = 2.02,p = 0.16. The interaction was significant, F(3,129) = 5.67,p = 0.0011.
A similar analysis using the residuals from the CC1 model fitting gave rise to all significant effects: pXY, F(3,129) = 33.85,p = 3.33 × 10− 16; rating criterion, F(1,43) = 9.52,p = 0.0035; interaction, F(3,129) = 2.95,p = 0.035. Taken together, the residuals in both CC2a and CC1 model fittings were unevenly distributed across the four pXY dimensions. The results suggest that these two models could not well explain the human data. Such conclusion is consistent with that obtained in “Human behavioral results” using completely different analysis methods.
Appendix D: Comparing between CC2s and DF1 model fittings
In Petrov (2009), the best fitting d′ using the CC2s model was mathematically predicted to be smaller than that using the DF1 model (dCC2s′<dDF1′), since CC2s closely approximates the optimal model (and is therefore more efficient using the signal). Here, we verified this mathematical prediction, independently from the mathematical constraint that p(F) ≤ 1/2 for the CC2s model, which is a special case of CC2a.
Figure 9 shows, the best fitting human ROC in p-coordinates (since all data are within the 1 × 1 square), the best fitting CC2s ROC, human data, and the resultant CC2s d′. There were 11 panels because 11 participants’ data were potentially explainable by CC2s and DF1. A similar fitting procedure was applied using the DF1 model. Importantly, the corresponding best fitting DF1 d′ was also obtained. According to the mathematical prediction, dCC2s′ < dDF1′. Among the 11 participants, eight confirmed this inequality. The remaining three showed zero difference, the d′ fittings for two of the three were both d′ = 0. We conclude, separate from the p(F) ≤ 1/2 constraint, that the mathematical prediction that dCC2s′ < dDF1′ was consistent with the human data.
Appendix E: Comparing RC2a and LR2 rules
We demonstrate here that the conjectured approximate equality between the RC2a and LR2 models is very limited, both mathematically and experimentally, not to say p(F) ∈ [0,1/2] for RC2a whereas p(F) ∈ [0,1] for LR2.
Petrov (2009) stated that the LR2 rule could be approximated by the RC2a rule. However, this should only be the case when the two rules give rise to similar hit and false alarm rates. As proved earlier, p(F) ≤ 1/2 for RC2a. Yet, LR2’s p(F) covers the full range of [0,1]. This full range can be seen by the following example:
\(p(F) = 1 - {\Sigma }_{i=1}^{2} \left (1 - {\Phi } \left (\frac {ln \left (\beta \right )}{d'} + (-1)^{i} \frac {d'}{2} \right ) \right )^{2}\), where β > 1. Assuming that d′ > 0, when \(\beta \rightarrow +\infty , \frac {\ln \left (\beta \right )}{d'} \pm \frac {d'}{2} {\rightarrow +}\infty \), then \({\Phi } \left (\frac { \ln \left (\beta \right )}{d'} \pm \frac {d'}{2} \right ) {\rightarrow } 1\). Therefore, \(p(F) \rightarrow 1\).
Figure 10 shows direct comparison between ROC’s of RC2a and LR2 when they share the same d′.
Appendix F: Comparing CC1 and CC2a model fittings with human data
Despite the fact that p(F) ≤ 1/2 for both the CC1 and CC2a models, we nevertheless fitted the four candidate participants’ data with the models, as shown in Fig. 11. The purpose was to verify that the best fitting d′s by the two models were similar to each other for any given participant’s data, since CC1 is a special case of CC2a (when c1 = c2). These four pairs of d′ values were indeed similar to each other, attesting to the reasonable model fittings even though p(F) ≤ 1/2 for both models — (CC1, CC2a): (2.125, 2.175), (2.150, 2.225), (0.600, 0.537), and (0.475, 0.450) (\(\chi ^{2} = 1.00 < \chi ^{2}_{critical_{v}alue}=9.49\)). Note also that when c1 = c2 = 0, both models become the optimal independence rule.
In addition, for these two models, Petrov (2009) stated that (pAB − 0.5)2 = (pAA − 0.5)(pBB − 0.5). Given that pAB = pBA, the equation becomes (pAB − 0.5)(pBA − 0.5) = (pAA − 0.5)(pBB − 0.5). We also checked this equality among the four participants’ data, as follows. We checked the χ2 by assuming that either all five criteria were used or only the middle criterion was used, per experimental block. Two of the four participants data, HJL and LJJ, rejected the equality in both χ2 tests.
Appendix G: Model comparison between CC2s, CC2a, and CC1
Via model simulations, we observed that, as a model with two parameters (c1, c2), CC2a takes up a cloud of model datum points. One of its special cases, CC2s—defined as the two criteria c1 + c2 = 0, has its ROC on the top boundary of the cloud. The other of CC2a’s special case, CC1—defined as c1 = c2, has its ROC at the bottom boundary of the same cloud. Figure 12 shows the three models’ ROCs with d′ = 1, 1.5, and 2.
Appendix H: Model comparison between RC2s and RC2a
Similar to the last section, we also compared model performance between RC2s and RC2a, where RC2s is a special case of RC2a, which has two symmetric criteria about the midpoint between the two distributions. Figure 13 shows these models’ performance with d′ = 1, 1.5, and 2. Note that in each panel, the RC2s ROC is on top of the RC2a data clouds (RC2s = CC2s).