Theories of cognitive architectures and processes can perhaps best be evaluated by deriving quantitative predictions that can be tested against data from behavioral experiments. Determining which mathematical models best describe human behavior is an interesting way to evaluate theories and a crucial step towards finding methods to answer our research questions. To model effectively, the types of variables involved (nominal, ordinal, interval, or ratio) and the roles they play in specific experimental paradigms (as dependent or independent variables, etc.) must be defined, and all variables that carry relevant information should be included in the model. In the present study, we will illustrate this point by focusing on recognition memory models.

In standard recognition-memory experiments, participants first study a list of items (usually words) and then try to identify items presented in a test phase as to whether they had been previously studied or not. Those words that had been previously studied are to be classified as “old” and nonstudied words are to be called “new.” By knowing the type of word presented for a test and the response of the participant, we can distinguish four categories of responses (see Table 1): previously studied (target) words result in a Hit (Hit: “old” response to an old item) or a Miss (Miss: “new” response to an old item), whereas new (distractor) words can result in a false alarm (FA: “old” response to a new item) or a correct rejection (CR: “new” response to a new item). In order to explain the experimental results obtained with this paradigm, two classes of mathematical models have been fitted: signal detection theory models (SDT; Ashby, 2014; Atkinson & Juola, 1973, 1974; Juola et al., 1971; Luce, 1959; Thurstone, 1927) and multinomial processing trees, MPT, which include the two-high-threshold (2HT) model (Batchelder & Riefer, 1990; Bröder & Schütz, 2009; Erdfelder et al., 2009; for a recent MPT tutorial, see Schmidt et al., 2023).

Table 1 Recognition paradigm categories

Although these models have proven useful because they were originally designed to include categorical variables, recognition experiments often also measure continuous variables such as response times (RTs), besides ordinal variables such as confidence levels (CLs). Traditional SDT and 2HT models make predictions about these variables, and also about speed–accuracy relationships (Atkinson & Juola, 1973; Hockley, 1982; Murdock & Dufty, 1972). However, when both categorical and quantitative variables are measured (e.g., by recording RT in a recognition task), they are usually analyzed separately, or one variable is just ignored. Such practices have several limitations.

First, there is a risk of interpreting only the significant effects of one variable (e.g., RT) without considering the other one (e.g., accuracy), which would be prone to the accumulation of Type I errors (Voss et al., 2013). Second, because of the well-known speed–accuracy trade-off effect, we cannot predict whether a participant will prioritize performing the task well or on performing it fast. The response strategy may vary among individuals, items, experimental designs, instructions, and conditions, and this will affect both RT and accuracy Tourangeau et al. (2000). Consequently, we cannot have an overall interpretation when ignoring one variable, and analyzing both measures separately can lead to incongruent results (Liesefeld, & Janczyk, 2019). Furthermore, when considering only one variable we lose information about the other and this can affect the accuracy of parameter estimates. Also, if individual differences in performance are distributed between the two metrics (e.g., RTs and accuracy), we might not detect significant effects in either of them. Thus, we could see a reduction in statistical power (Voss et al., 2013). Psychometry has addressed this issue by using hierarchical models based on item response theory (Molenaar et al., 2015; van der Linden, 2007), and it has demonstrated that if two dimensions are related, the estimates of one are improved when the other’s data are analyzed jointly. These psychometric models are applicable to many research areas because they are general measurement models and are not limited by any specific process models (De Boeck, & Jeon, 2019; Wagenmakers, 2009).

Thanks to recent developments in MPT models, many previous obstacles can be overcome, and quantitative and categorical variables can be modeled jointly (Heck & Erdfelder, 2016; Heck et al., 2018b; Klauer & Kellen; 2018; Schweickert & Zheng, 2019). MPT models for discrete and continuous variables (MPT-DC) are process models with a theoretical rationale that assumes that processes have intrinsic attributes (e.g., processing time), and modeling these variables can help to identify and measure these attributes and their theoretical sources. Therefore, MPT-DC have both methodological and theoretical benefits. These models could yield different conclusions than models that do not jointly analyze quantitative and categorical variables. Furthermore, they allow us to make new inferences about the structure and functioning of cognitive models, study the relationships between dependent variables including trade-offs and lead to a finer understanding of how individuals process information and respond accordingly.

Quantitative variables, such as RT (Heck & Erdfelder, 2016, 2020; Klauer & Kellen, 2018) and both RT and CLs jointly (Starns, 2021) have been included in 2HT models through MPT-DC models. Here we propose to include both RT and CLs jointly in SDT models by a solution based on reparametrizing SDT as multinomial models (estimating SDT models as multinomial models has previously been proposed by Singmann & Kellen, 2013). To the best of our knowledge, the extension of both MPT-DC and SDT models to RT and CL data has not yet been developed.

In short, MPT-DC allows us to (1) solve methodological problems resulting from studying categorical and continuous variables separately; (2) study discrete latent cognitive states (e.g., as in 2HT models) and the latent continuous variables associated with these states; and (3) estimate parameters of SDT models that include ordinal and continuous variables.

The main purpose of the present study is to highlight the advantages of incorporating quantitative variables in both 2HT and SDT models. More specifically, we will try to widen the range of research questions related to these models and to make more accurate conclusions about them. To achieve these goals, we use the data from Juola et al. (2019), in which continuous and categorical variables (RTs and confidence levels, respectively) were measured in a recognition experiment using words lists, and the proportions of targets were manipulated across trial blocks.

Theoretical assumptions and modeling approach

The psychological assumptions about confidence ratings and RTs are different for the SDT and 2HT models. In the next section we will explain the theoretical assumptions of each model and, following the experimental design of Juola et al. (2019), we will describe their reparameterization as MPT-DC models.

Classic two-high-threshold model

Although the underlying variable of memory strength might be continuous, the 2HT model assumes that two thresholds exist that define three states, within each of which the cognitive experience is the same. One can experience relatively certain states of detect new, detect old, and an intermediate state of uncertainty that results in old or new guesses (see Fig. 1). These cognitive states are determined by a series of discrete processes whose probabilities can be estimated using MPT models (Bröder & Schütz, 2009).

Fig. 1
figure 1

Classic 2HT model. Note. The left column refers to the presented stimulus whereas the 2HT model column includes the process probabilities. The responses produced by each 2HT branch are displayed in the category column (Hit/Miss/FA/CR), while distinguishable branches, which form cognitive states, are shown in the right column. Parameters do, dn, and g represent the probabilities of the detect old, detect new, and guess old processes, respectively. Subindex j of parameter gj indicates the condition corresponding to relative target frequencies

As can be seen in Fig. 1, the detect old and the detect new states are determined by the probabilities of detecting old, do, and new words, dn, respectively. The high thresholds do not allow the detect old state to be entered on new word trials, nor are target trials allowed to result in the detect new state. The guessing states include both a nondetection probability and a guessing probability. Depending on whether old or new words are presented, the probability of nondetection will be the complementary of the detection probability for each tree (i.e., 1 − do for old words and 1 − dn for new words). The guess states result in a bias for saying “old” or “new” when the participant is not in a detect old or detect new state (Delay & Wixted, 2021; Malmberg, 2008). While the guess old state includes a guessing old probability, g, the guess new state includes a guessing new probability, 1 − g. Notice that in the current model the guessing old (g) and guessing new (1 − g) probabilities for old and new stimuli are the same. This assumption is based on previous studies (Juola et al., 2019), although with different theoretical assumptions, other restrictions could be established (Heck & Erdfelder, 2016; Heck et al., 2018b; Kellen & Klauer, 2014; Kellen et al., 2015). Also, because manipulating the proportion of targets has the assumed effect of changing the guessing bias, but not the detection probability, the probability g varies among the J experimental conditions defined by the proportions of targets, thus having gj parameters. The expected effect is that gj increases as the proportion of targets increases in any trial block.

Confidence levels for the two-high-threshold model

CLs in 2HT models are given by a translation of memory states into confidence scales. However, the way this translation is done is not entirely straightforward. One possible assumption, called the certainty assumption, is that a detection state, for both old and new stimuli, can produce only high-confidence responses, whereas states of uncertainty (guess old or guess new states) can generate responses of all CLs. The certainty assumption has been widely criticized as poorly fitting the data and being unrealistic (e.g., Luce, 1963). In response to this issue, more flexible models have been generated, but these have been criticized for being too unconstrained and neither generalizable nor useful. Authors such as Bröder et al. (2013) suggest that model flexibility might be essential to grasp the actual complexity of some experimental paradigms, because the individual distribution of confidence ratings in a 2HT model may be influenced by many variables unrelated to the recognition system. Such nuisance variables, reflect the participants’ response styles (Henninger & Plieninger, 2021; Naemi et al., 2009) more than their detection abilities or guessing strategies.

In other words, the use of a 2HT model without too many constraints is motivated by the vast number of ways of responding and the various factors that influence the distribution of CLs. That said, here we will assume that any cognitive state can be the source of “high,” “medium,” and “low” confidence responses, but the distribution of CLs is conditioned by the cognitive states included in the 2HT model (e.g., confidence distributions in detect states are usually characterized by a higher probability of high- than of low-confidence levels).

To simplify the description of the 2HT model in which CLs are incorporated (2HT-CL), we exemplify the extension of the detect old state of the MPT-DC model, with probability do (see Fig. 2.). As in Juola et al. (2019) the CL is an ordinal variable with three levels, and the confidence bins (high, medium, and low confidence) have been included for each branch of the 2HT model. This implies that the total number of trials must be distributed among more subcategories. The new nodes will be associated with the L parameters,Footnote 1 that model the probability of each confidence category and branch. Hence, for the detect old state the probability of a high-confidence response will be, Ldo1, for medium-confidence, (1 − Ldo1) · Ldo2, and for low-confidence,(1 − Ldo1) · (1 − Ldo2).

Fig. 2
figure 2

Confidence level and two-high-threshold model extension example. Note: The do parameter in the 2HT column represents the detect old state probability of the classic 2HT model. The L parameters characterize the probability of each confidence bin (CL-BIN) included in each state. The subscript “do” of the Ldo parameter denotes that it is the L parameter associated with the detect old state. The probability for a high-confidence response will be, Ldo1, for medium-confidence, (1 − Ldo1) · Ldo2, and for low-confidence,(1 − Ldo1) · (1 − Ldo2)

Reaction time in the two-high-threshold model

Regarding RTs in 2HT models, it has been advocated that once the detection threshold is exceeded, faster responses occur compared with those found in nondetection states, which is compatible with the results of several studies (Heck & Erdfelder, 2016; Klauer & Kellen, 2018; Starn, 2021). To study the relationships between cognitive states and their RTs, a model that allows modeling latent RTs is needed.

Two procedures have been proposed to evaluate an MPT model in which a quantitative variable, such as RT, has been incorporated: parametric (Heck et al., 2018b; Klauer & Kellen, 2018) and nonparametric (Heck & Erdfelder, 2016). The parametric model is based on assumptions about the shape of the distribution of the quantitative variables and the parameters that characterize them (Heck et al., 2018b). Although it leverages and provides more information from the data than the nonparametric version, it may not be a suitable procedure for situations with important discrepancies between what is assumed and how the data are distributed (Klauer & Kellen, 2018). With respect to the nonparametric procedure, instead of analyzing the whole set of observed RTs, it assumes that any continuous variable can be discretized into bins and represented by a histogram (Van Zandt, 2000). Although it provides limited information about quantitative data, it has the advantage of requiring fewer assumptions about their distributions.

As far as usability is concerned, the nonparametric procedure may be easier to apply with a user-friendly library originally made for classical MPTs, called MPTinR (Singmann et al., 2022). This procedure enables fitting SDT models with continuous variables, something that, to the best of our knowledge, is still not allowed by libraries specifically built for fitting parametric MPT-DCs (Heck et al., 2018a; Klauer & Kellen, 2018).

Considering all the above, we will now explain the modeling procedure of RTs according to the nonparametric approach of Heck and Erdfelder (2016). To do this, the continuous variable is discretized into RT-bins and then the model is reparametrized, forming an extended-MPT model. In other words, the branches of the original MPT are subdivided into as many subbranches as RT-bins, which implies that the total number of observations must be distributed among the bins. Then a new probability parameter is assigned to each bin. In our work, the continuous variable RT has been divided into fast and slow bins. We exemplify the extension of the detect old state in Fig. 3.

Fig. 3
figure 3

Reaction time two-high-threshold model extension example. Note. The do parameter on the 2HT column represents the detect old state probability of the classic 2HT model. The H parameters characterize the probability of each RT-bin included in each state. Hdo1 is the probability that a response triggered by the detect old state (subscript “do”) is fast, and a probability of 1 − Hdo1 that it is slow

Confidence level and reaction time two-high-threshold model

To study whether, as theory predicts (Juola et al., 2019; Murdock, 1985; Ratcliff & Murdock, 1976), higher CLs result in faster responses than lower CLs, we need to study the interactions between RTs and CLs, and thereby model these variables jointly. For this purpose, we can use the 2HT-CL model and, from each of the extended CL branches, add two new branches that are the result of discretizing the RT data into two bins, the fast and slow ones. These bins will have an associated probability H which is the probability of each RT bin given an extended CL branch. For example, by extending the old high-confidence detect state (see Fig. 4), there is the parameter Hdo1, which represents the probability that the high-confidence detect response is fast, and 1 − Hdo1 that it is slow.

Fig. 4
figure 4

Confidence levels and reaction times for the two-high-threshold model extension example. Note. The 2HT column represents the detect old state probability of the classic 2HT model. The L parameters characterize the probabilities of entering each confidence bin (CL-bin) and the H parameters the probabilities of each reaction time bin (RT-bin) included in the detect old state

Classic signal theory detection model

We will now explain a continuous memory recognition model (i.e., the SDT model). According to this model, the test stimulus is translated by the observer into a continuous random variable of memory strength or familiarity. As can be seen in Fig. 5, the probability density functions of this random variable for old and new stimuli are assumed to follow two normal (i.e., Gaussian) distributions, respectively, that overlap to some extent (Macmillan & Creelman, 2004). Here the recognition model is described by the sensitivity parameter of the SDT, d, and by the amount of variability in the familiarity (standard deviation), that could be σ1 and σ2, for new and old items, respectively, or just σ if homoscedasticity is assumed. The subject establishes a decision criterion, c, that allows him or her to divide the familiarity continuum into two regions, an “old” response region above c, and a “new” response region below c. Depending on experimental conditions c can vary. For example, it is expected that when the proportions of targets in a test block is increased, c decreases, while sensitivity should not change (Juola et al., 2019). Then, as there are J = 3 experimental conditions differing in target proportions (65%, 50%, and 35%) there will be three parameters cj. Note that to reduce the number of parameters the mean of the new stimulus distribution, μ1, is set to 0, and its standard deviation, σ1, to 1, then: d, the difference between the means of the two distributions is equal to μ2. Thus only d, σ and cj need to be estimated.

Fig. 5
figure 5

Signal detection theory model. Note. Signal detection model of recognition. The areas to the right of cj in the target (right) distribution and the distractor (left) distribution represent Hit and FA probabilities, respectively. The areas to the left of cj in the target and distractor distributions represent the respective probabilities of a Miss and a CR

Confidence in the signal detection theory model

The establishment of new criteria allows for capturing the distribution of CLs (see Fig. 6). The continuous process model assumes that confidence judgments are based on the establishment of a more specific set of decision boundaries that allows participants to give combinations of the type of stimulus and the level of confidence in their responses. The CLs given by each participant are simply graded “old”/“new” judgments made according to where the familiarity of each test stimulus is positioned (Bröder et al., 2013; e.g., in Fig. 6 a familiarity value between c3j and c4j would result in an “old-low” response). Thus, for modeling the Juola et al. (2019) experiment, where there are six possible answers (“old-high,” “old-medium,” “old-low,” “new-low,” “new-medium,” and “new-high”) since K = 5, criteria ckj are included. Note that the j subscript of ckj implies that the manipulation of the target proportions (j) affects all criteria. Nevertheless, if, as expected, the response criteria change monotonically with the relative target frequency blocks (Juola et al., 2019), then, the j conditions affect the ck parameter by the same amount. That is, although the manipulation of target proportion alters the position of all response criteria, the distance between different ck is assumed to be the same under the j conditions. Therefore, only the K − 1 neighborhood distances between criteria (∆c = ck − ck − 1) will have to be estimated, such that,

$${c}_{2j}={c}_{1j}+\Delta {c}_2,$$
(1)
$${c}_{3j}={c}_{1j}+\Delta {c}_2+\Delta {c}_{3,}$$
(2)
$${c}_{4j}={c}_{1j}+\Delta {c}_2+\Delta {c}_3+\Delta {c}_{4,}$$
(3)
$${c}_{5j}={c}_{1j}+\Delta {c}_2+\Delta {c}_3+\Delta {c}_4+\Delta {c}_5.$$
(4)
Fig. 6
figure 6

Signal detection theory confidence model. Note. Signal detection model of recognition with varying confidence levels. Each decision criterion ckj along the familiarity continuum delimits the area or probability of CLs for new and old responses (from “new-high” to “old-high”) in both distractor (left) and target (right) distributions

Reaction time in the signal detection theory model

Due to the direct relationship with SDT models, confidence has been extensively studied; however, RTs tend to be ignored. The strength theory is precisely an attempt to theoretically relate SDT models and RTs. It suggests that the closer the evidence of familiarity of the stimulus is to the criterion, the longer the RT (Emmerich et al., 1972; Malmberg, 2008; Murdock, 1985; Norman & Wickelgren, 1969). To test these hypotheses, we propose to employ MPT-DC models.

For building an SDT model in which RTs are incorporated, the SDT model is first reparametrized as a multinomial model (see Singmann & Kellen, 2013). Figure 7 shows the reparametrized SDT model, in which the branches of the multinomial model represent different response categories and the probability of each category is a function with typical SDT parameters. The parameters included in the table are d, which represents the sensitivity, and σ the variability of the signal distribution ( μ1 is set to 0 and its standard deviation, σ1, to 1; see Footnote 3). The cumulative normal distribution is represented by Φ.

Fig. 7
figure 7

Signal detection theory model. Note. The left column refers to the presented stimulus whereas the SDT model column includes the probability for each category. The responses produced by each SDT model branch are displayed in the category column (Hit, Miss, FA, CR)

Once the SDT is reparametrized, we can follow the previously used logic for the inclusion of the H parameters associated to the probability of each RT-bin (see Fig. 8 for an example of RT inclusion in the SDT model for the first branch). Again, the Hso1 is the probability for fast responses and 1 − Hso1 for slow ones. The subscript “so” has been added to the H parameter in Fig. 8 to refer to the first branch of the SDT model where a signal is presented, and the response is “old.”

Fig. 8
figure 8

Reaction time signal detection theory model extension example. Note. The SDT column represents the Hit category probability of the classic 2HT model. The H parameters characterize the probability of each reaction time bin (RT-BIN) included in the example

Confidence and reaction time in the signal detection model

As noted above, strength theory (Emmerich et al., 1972; Murdock, 1985; Norman & Wickelgren, 1969) suggests that more extreme familiarity (high-confidence responses) results in shorter RTs. Consequently, there is an inverse relationship between confidence and RTs. Indeed, results from numerous studies have shown an inverted U-shaped relationship between familiarity and latencies, and a U-shaped relationship between familiarity and confidence judgments, which is compatible with strength theory (Baranski & Petrusic, 1994, 1998; Murdock & Dufty, 1972; Ratcliff & Starn, 2009; Starn, 2021; Weidemann & Kahana, 2016). However, it does not capture whether the relationship differs between correct (CR, Hit) and incorrect (Miss, FA) responses, and would imply an inverse relationship between RTs and CLs for all categories, which may not be a feasible prediction.

In short, to disentangle this issue, it is necessary to build models that allow the relationships among response categories, CLs, and RTs to appear.

In order to develop a more general model of CLs and RTs, we propose to use a reparametrized SDT confidence model, SDT-CL, and then include RTs by means of an MPT-DC. The parameters included in the SDT-CL are those already referred to for a transformed SDT model parameterized as a multinomial model (see Fig. 7) and, additionally, Δc1, Δc2, Δc3, Δc4and Δc5, the K − 1 increments of criterion (∆c) (see Equations 14). Having done the above, as shown in the example in Fig. 9, it would suffice to include the H parameters that characterize the RT-bins of the SDT-CL model.

Fig. 9
figure 9

Confidence levels and reaction times in the signal detection theory extension model example. Note. The SDT-CL column represents the hit category probability of the SDT model, for high (top row) medium (middle row) and low (bottom row) CLs. The H parameters characterize que probabilities of each reaction time bin (RT-BIN) included for the hit category

In summary, with the described MPT-DC procedure we can (1) fit classic 2HT and SDT models of recognition memory (without CLs and/or RTs), (2) either include the CLs (by aggregating the ∆c parameters or L parameter, for SDT and 2HT classical models, respectively), (3) or RTs (by aggregating the H parameters to the classical models), (4) or include all variables simultaneously. We will work with all these versions of such models with the aim of studying the methodological and theoretical advantages of including none, one, or both aforementioned variables.

Method

Data availability

The data have been collected from https://osf.io/y78mk/. Detailed information on the participants, materials and experimental procedure can be found in Juola et al. (2019). The following is a summary of these sections:

  • PARTICIPANTS: The data were obtained from 47 students and members of the academic community from the Universidad Autónoma de Madrid.

  • MATERIALS: The study used 500 common Spanish nouns. These words were displayed one at a time on a computer monitor during the study and test phases. The experiment was conducted using E-Prime 3, a software tool specifically designed for psychological research.

  • EXPERIMENTAL PROCEDURE: The participants were presented with a study list consisting of 250 words, displayed one at a time, and they were instructed to pronounce each word as it appeared. Following the completion of the study phase, participants engaged in a short intervention task followed by five blocks of test trials. Each block consisted of 100 words, and the target proportions in each block were varied between .15 and .85. Participants were informed in advance of these proportions and were told to respond as rapidly as possible with an old/new response while being careful to avoid errors. After the response, they were asked to indicate their confidence on a 3-point scale indicating whether they were certain, relatively certain, or uncertain of the accuracy of their response. The entire session lasted less than 1 hour. In analyzing the data, we became aware that the most extreme manipulation conditions, when the proportions of targets were .15 or .85, seriously affected the goodness of fit of the models. This was likely due to these conditions yielding parameter estimates that were close to the boundaries of the parameter space (e.g., guessing probabilities very close to either 0 or 1; Silvapulle & Sen, 2005). Therefore, we decided to restrict our analysis to the blocks with target proportions of .35, .50, and .65.

Model details

Four separate analyses were conducted to study: (1) classic SDT and 2HT recognition models with only the accuracy measures; (2) 2HT and SDT models with CL bins (2HT-CL and SDT-CL, respectively); (3) 2HT and SDT with RT bins (2HT-RT and SDT-RT); and (4) 2HT and SDT with both CL and RT bins (2HT-CL-RT and SDT-CL-RT, respectively).

Models were built using a nonparametric procedure (Heck & Erdfelder, 2016) with the discretization of the continuous RTs into bins set individually for each subject using the geometric mean of their RTs. If the RTs exceed this criterion, they were assigned to the slow bin, while those below the criterion were assigned to the fast bin. The criterion was set using data-dependent RT boundaries, namely the log-normal approximation, which is the default recommendation extrapolated from previous simulation studies. The procedure consists of log-transforming the RTs, calculating the mean and variance, obtaining the quantiles, and then converting them back to the standard RT scale to obtain the necessary cut-offs for categorization (see log-normal approximation procedure and simulations in Heck & Erdfelder, 2016).

Both 2HT and SDT classic versions include five parameters. The 2HT model includes two detection parameters, do and dn, and J = 3 guessing parameters, gj, one for each target proportion manipulation condition j. The SDT model includes one sensitivity parameter, d, one variability parameter, σ, and J = 3 criteria, cj.

The 2HT-CL model contains 13 parameters, which are five from the classical version, plus the L parameters that define the three CL-bins for detect old, detect new, guess old, and guess new states. The SDT-CL model includes nine parameters, the five previously indicated in the SDT model plus the K − 1 = 4 criterion increments (∆c) that allow estimating the distances between the different criteria for response confidence ratings.

The 2HT-RT model contains nine parameters, the five classic parameters plus the H parameter that defines the two RT bins of the detect old, detect new, guess old, and guess new states. The SDT-RT model contains nine parameters, the five classic parameters plus the H parameter that defines the two RT-bins for Hits, Misses, FAs, and CRs.

In the 2HT-CL-RT model 25 parameters were included. The 13 indicated in 2HR-CL plus the H parameters that define the probabilities for the two RT bins per each extended branch of the 2HT-CL model. The SDT-CL-RT model has 21 parameters, the nine from SDT-CL plus the H parameters that define the probabilities of the two RT bins per branch.

Fits

All models were fitted using MPTinR (Singmann et al., 2022) for R Core Team (2022) and their formulation is available in Appendix A Tables 6, 7, 8, 9, 10, 11, 12 and 13. The SE (standard errors) of the estimated parameters is obtained directly from MPTinR using the Hessian matrix. For assessing goodness of fit, we initially employed the Pearson chi-squared (χ2) goodness of fit test. However, due to the presence of low expected frequencies in some cells of the fitted models, the use of chi-squared tests is not recommended, since the asymptotic properties are not preserved (Langeheine et al., 1996; Lin et al., 2015). As an alternative, we adopted a parametric bootstrap goodness-of-fit test. To do so, we generated 1,000 parametric bootstrap samples for each to-be-tested model and subject, fitting the corresponding model to each of these samples and calculating χ2. From each bootstrap-generated χ2 distribution, we obtained the critical value. To evaluate the statistical significance, we compared the estimated chi-squared value derived from fitting the observed data with the critical value. If the estimated χ2 value was less than or equal to the critical value, we retained the null hypothesis, indicating that the differences between observed and expected frequencies were not statistically significant, thus concluding that the model provided a suitable fit to the data.

In addition, since the ratio between the data (n) and the number of parameters (p) is relatively low, a corrected AIC has been calculated (Burnham & Anderson, 2002) which follows the expression below:

$$AI{C}_C= AIC+\left[{~}^{\left(2\cdot p\cdot \left(p+1\right)\right)}\!\left/ \!{~}_{\left(n\hbox{--} p\hbox{--} 1\right)}\right.\right].$$

Analysis

To determine whether parameter estimations could be improved by adding more dependent variables, we found the standard errors (SE) of the estimated parameters of the SDT and 2HT models without and then with the inclusion of CL (2HT-CL and SDT-CL), RT (2HT-RT and SDT-RT) and with the simultaneous inclusion of both CL and RT data (2HT-CL-RT and SDT-CL-RT). The results of this analysis can be found in the first results section.

In the second results section we assess the scope of research questions that can be covered using MPT-DC models, fitting 2HT and SDT models that jointly model CLs and RTs. Descriptive analyses were performed on raw aggregated data (see Appendix B Tables 14, 15 and Fig. 12), while statistical comparisons for each model were made on individual estimated parameters. As for the analyses of the parameter predictions, we will only mention those related to the interactions between latent branches (cognitive states or categories), CLs and RTs. Details concerning all analyses performed and their predictions can be found in Appendix C Figs. 13, 14, 15, 16 and 17.

In the third results section we address the continuous-process versus discrete-process questions to demonstrate that the CL and RT evidence might alter conclusions about models’ performance. Individual comparisons between SDT and 2HT models were made using the Akaike information criterion (AIC) obtained by fitting the models to experimental data from Juola et al. (2019). Cross tabulations were constructed to analyze changes in the selection of the best individual model (in the comparison between the SDT vs. 2HT models) based on the variables included in these models. The χ2 test of independences was applied to each crosstab, where Phi (φ) was used as a measure of effect size. Significance level alpha was set to .05.

Results

Accuracy analysis

Figure 10 shows box and whisker plots for individual SEs for the estimated parameters of the 2HT model. Despite a slight decrease in SEs for the \(\hat{dn}\), \(\hat{do}\), and \(\hat{g_j}\) parameters with the inclusion of CL parameters, we observed a clear decrease of SEs when CL and RT were simultaneously analyzed. Regarding the SDT model (see Fig. 11), we observed a slight reduction of \(\hat{\sigma}\) and the \(\hat{c_k}\) SEs when including both CL and RT.

Fig. 10
figure 10

2HT individual standard errors for detect new, detect old, and guess old probability parameter estimates. Note. Box and whisker plots for the SEs of the estimated parameters in the 2HT models. The bottom and top sides of the boxes are the first and third quartiles while the horizontal lines inside the boxes are the median SE values. The lower and upper limits of the whiskers represent ±3 standard deviations, respectively

Fig. 11
figure 11

SDT individual standard errors for the sensitivity, variability, and decision criteria parameters. Note. Box and whisker plots for the SEs of the estimated parameters in the SDT models. The bottom and top sides of the boxes are the first and third quartiles while horizontal lines inside the boxes are the median SE values. The lower and upper limits of the whiskers represent ±3 standard deviations, respectively

In summary, by jointly modeling accuracy, confidence and speed, the standard errors of the individual parameter estimates have been reduced, especially in 2HT, but also in SDT.

Discrete state versus continuous process

Table 2 displays the percentage of subjects for whom the null hypothesis of the χ2 bootstrapped test was rejected (see Appendix D Tables 16, 17, 18 and 19 for individual χ2 estimations and critical values obtained through bootstrapping). Based on bootstrapped chi-squared tests, we conclude that there is apparently no difference in the percentage of subjects with acceptable fits between the SDT and 2HT models. In particular, of the 47 subjects, only two (4.3%) showed a lack of fit to the classical models. This percentage increases when additional DVs are added, reaching almost 60% inadequate fits when both RTs and CLs are included. In principle, this could lead us to think that the choice of the classical versions of these models would be the most appropriate. As we will see in the results derived from the simulations, goodness-of-fit tests are not a good indicator of the relevance of the DVs when what really interests the researcher is the comparison between models.

Table 2 Frequency of hypothesis rejection of the bootstrap goodness-of-fit test

Regarding whether recognition memory is best explained by a continuous process or by discrete states, we find that, for Juola et al.’s (2019) experimental data (see Table 3), when not considering CLs nor RTs, there are more individuals that are better fit by the continuous SDT model (72.3%) than for the discrete 2HT model (27.7%). The same is true when considering RTs alone, but here the discrepancy of selection frequency in favor of SDT models is slightly smaller (70.2%). The same holds true when CLs are considered. In these situations, there is even a higher selection of the SDT model as the better one (80.9% for CL and 87.2% for CL+RT). The AICC values for each subject and model can be found in Appendix D Tables 16, 17, 18 and 19.

Table 3 Best model selection

Considering the cross tables of model selection shown in Table 4, we observe that model selection when accounting for CLs or both CLs and RTs (CL+RT) tends to coincide. Specifically, we can observe in Table 4 (A) that in matching selections made between models with CL and CL+RT, 38 subjects’ data are best classified by an SDT model and six by a 2HT model, with only three subjects changing their best model classification. In fact, we reject the null hypothesis of independence between selections made for models including CLs and those including CL+RT, with a considerably larger effect size compared with the other tests of independence (see Table 5 for tests of independence). We also observe in Table 4 (E) and (F) that the hypothesis of independence between selections made with RTs and those made with CLs or CL+RT, respectively, is rejected, although the size effect is not as strong as in Table 4 (A). In other words, model selections made with CLs or CL+RTs are related to those made when including RTs, but there are still quite a few inconsistencies between these classifications. Specifically, the number of subjects with incongruent classifications increases to 11 when comparing the selections made with models that include RTs and those that include CLs+RTs, and to 12 when considering models that include RTs and models that include CLs. Finally, when we analyze the classifications made with classical versions of the models, Table 4 (B), (C), and (D), we maintain the null hypothesis that selections made between classical models and any other version are not related to each other.

Table 4 Crosstabs for selection of the best model according to the variables included
Table 5 Test of independence for selection crosstabs

Discussion

Accuracy improvements

In this section we will discuss the benefits of jointly analyzing different dependent variables in 2HT and SDT models. When we only use the frequency of each category of responses (Hit/Miss/FA/CR) we may lose information regarding other variables intrinsically linked to our experimental design.

Our results indicate that, indeed, the modeling of category frequencies with their confidence ratings and RTs together reduces SEs and thus improve the precision of the classical parameter estimates of the 2HT and SDT models. However, the informativeness is fully dependent on the variables to be included in each model. In our case, including RTs alone does not improve the precision of the estimates. This does not mean that the RTs do not reduce SEs, but only that they do so in conjunction with CLs. On the other hand, including CLs leads to a slight reduction of the SEs, which means that the precision of the joint estimation of CLs and RTs is greater than that of the separate estimations of the two variables.

Recognition memory predictions

The SDT and 2HT models in their classical versions do not allow the simultaneous study of RTs, CLs, and accuracy.

To work with these variables authors such as Juola et al. (2019) performed three separate analyses to study the effects of (1) experimental manipulation of target rates; (2) clustering of responses into confidence bins; and (3) clustering of responses into RT bins. This has a certain limitation. First, CLs and RTs are not studied as dependent variables, so we cannot study the interaction between dependent variables, nor can we estimate latent CLs and RTs (e.g., separate estimates for RTs of detect old and guess old states). Second, trials from different blocks of relative target frequency are clustered in the same CL and RT bins and may interact in a masked way with the estimates from each model. Third, doing multiple separate analyses increases the risk of Type I errors and may encourage reporting only partial information when one variable has significant effects, but the other does not. The use of MPT-DCs models, however, allows us to overcome the above methodological difficulties while studying the compatibility of the theoretical hypotheses reported in the literature.

As expected, we found that the probability of guessing old, for the 2HT model, and the decision criterion between “new”/“old” responses, for the SDT model, changed monotonically with relative target frequency; the former being a direct relationship and the latter an inverse one. This pattern agrees with the analyses of the effect of aggregating data according to the relative frequency of targets in the Juola et al. (2019) study, in which the value of g decreased and the criterion values ( c1 to c3) shifted from left to the right as the target frequencies decreased (see Appendix C Figs. 13, 14, 15, 16 and 17).

The 2HT-CL-RT model was consistent with evidence relating detection responses to high confidence. High-confidence responses were faster than medium or low-confidence responses. However, when we separated this confidence probability distribution by states, we found some peculiarities. For example, the inverse relationship between RTs and CLs holds for all states but guess old states, where we actually find shorter RTs in low-confidence responses than in medium confidence responses. Note that these results could not be detected without a model capable of studying the interactions between the RTs of CLs and latent cognitive states (detect old/detect new/guess old/guess new).

As for the SDT model, some of the expected interaction trends were also not fulfilled. For instance, there is a clear pattern of faster responses at high CLs than at medium and low CLs in all categories except FA. That is, we again observe a different pattern of responses between “new”/“old” decisions.

Thus, the distribution of response times depends on the confidence level and the cognitive state (in 2HT) or response type (in SDT), and this result contradicts the idea that RT and CLs are measuring the same thing (Thomas & Myers, 1972; Weidemann & Kahana, 2016). These results also demonstrate the need for us to estimate the variables jointly to unravel the complexity of the relationships between them.

Discrete-state versus continuous-process dilemma

The debate on whether recognition processes are discrete or continuous has been carried out with the classical versions of the SDT and 2HT models. Now that we have the data with MPT-DC models, with increased validity and the integration of continuous and discrete variables, we may draw a different and more complete set of conclusions.

The result of comparing discrete and continuous models is different if we treat quantitative variables as dependent variables, as proposed here, or as factors, as in Juola et al. (2019). What we do find in common is that it does not appear that all subjects have the same model of best fit, and that the proportions of subjects who follow a continuous or discrete model vary when different variables are considered.

These variations can have implications for comparisons between non-nested models, such as SDT and 2HT. Let’s consider a scenario in which an experimenter compares the SDT and 2HT models when RTs alone are included. In this case, if only the AICs of both models are compared, it could be concluded that there is an important number of subjects that fit the 2HT model better than the SDT model (27.7%). However, as discussed later, this last conclusion would be misleading because models that solely incorporate RTs do not appear to be based on the best variable to include in this particular comparison, and if we were to take into account other variables we would see that this percentage is considerably lower. Hence, not all the variables that can be included in the models hold the same level of validity or relevance, indicating the importance of judicious selection among the variables to be considered and manipulated.

In contrast, our results when classifying participants using CL and RT or only CL, as dependent variables, are much more consistent with each other than those using only RT and allow us to make a more appropriate model selection. In the following section, we will justify this statement by means of a simulation study.

Model simulations

In order to study possible explanations and thus to verify in which situations the comparison between models is more accurate, we have used a simulation approach. Specifically, we generated data following each 2HT and SDT model version (Classic, CL, RT, and CL+RT versions), with parameter values based on each model’s estimations when fitted to data from Juola et al. (2019). We then fitted the various versions of the SDT and 2HT models to the simulated data and performed a model comparison based on AIC. Simulation details and results are described in Appendix E Tables 20, 21, 22 and 23.

Based on the simulations performed, the models that include CLs or CLs and RTs together resulted in selection of the true model for a large majority of subjects. However, when neither of these variables was considered, nearly one-third of the subjects resulted in incorrect model comparisons. Considering the results of the simulations, we can expect that the individual model selections for the experimental data are more reliable when based on CL or CL+RT and, therefore, the individual selections when CL and CL+RT are taken into account should be consistent with each other.

With the simulation results in mind, it is not surprising that of the 47 participants in the Juola et al. (2019) experiment, there are 38 subjects whose data are best fit by a continuous SDT model and 9 by a discrete 2HT model when CLs are analyzed. On the other hand, 41 subjects’ data were best fit by the SDT model and 6 by the 2HT model, leaving only three subjects with inconsistent best-model selections (see Table 4). Furthermore, the results of the simulations indicated that when CLs and RTs were not modeled, the frequency of selection of the true model was lower, so one would expect a decrease in the frequency of selection of the true model in the experimental data leading to the erroneous conclusion that there will be an increased number of subjects who fit the 2HT better than the SDT model when comparing classical models or models that only include RTs. This is precisely what we found when fitting experimental data from Juola et al. (2019).

Thus, based on the simulation results, the usefulness of MPT-DC can be improved with the appropriate selection and inclusion of multiple dependent variables. For example, the conclusions as to whether the recognition model of each participant is better described by a discrete or a continuous model depends on the selection of such variables. Yet why is it that only when CL and RT, or only CL, are included, that the true models are selected as the best ones? One possible reason is that the MPT-DC model takes advantage of information that allows it to differentiate model branches (e.g., CLs are higher in detecting old than in guessing old) thus improving estimates and power. This issue has been analyzed by simulations in the study by Heck et al. (2018b), where it was found that, when the distributions of RTs are equal for the different branches, the model does not benefit from the inclusion of this variable and therefore the estimates will be equivalent to those of a classical MPT model. Another possible explanation is that comparisons between models with RT with two bins are worse than those with CLs with three bins, not due to the amount of information each variable yields but to the number of bins each variable has. Increasing the number of bins can allow us to be more precise about the shape of the variable’s distribution. However, this also means that we must discretize the same set of observed data into more subcategories and, therefore, there is a greater risk of having bins with very few or even no observations. This might affect the estimation capability of the model, and thereby prevent the calculation of standard errors of estimation and the use of chi-squared fit indices. In fact, we have not been able to fit 2HT models or estimate standard errors with three RT bins, since the information matrix was not invertible, something that can occur with insufficient numbers of observations per bin. Unfortunately, despite having a relatively high number of trials per subject (100 trials per tree and subject in our case), we cannot ensure adequate observations solely based on this number. This is because the combination of branches and bins of RTs and CLs can lead to sparse data in certain cells, particularly when dealing with extreme probabilities. Nonetheless, the choice of the number of RT bins will depend, in practice, on the substantive question. If the question of interest lies in a measure of the relative speed of cognitive processes, two RT categories may be sufficient. For example, stating responses of detection states as faster than those of guessing implies that the probability of being fast is greater than that of guessing, which does not necessarily require a three-bin categorization of slow, medium, and fast responses. A simple division into slow and fast may suffice.

Conclusion

The purpose of the present study is to analyze the advantages of jointly modeling the continuous and discrete variables typically studied in recognition memory paradigms using the 2HT and SDT models. To answer this question, we have explored whether the inclusion of CLs and RTs in MPT-DC models (1) improves the estimation accuracy of classical recognition models, (2) allows us to address new research questions, and (3) encourages us to rethink old ones (i.e., the discrete vs. continuous dilemma).

Concerning the first of these goals, we can conclude that, indeed, the inclusion of variables by means of MPT-DC allows the reduction of standard errors for parameter estimates. However, what may be true for one model may not be true for another. Specifically, we find a reduction in the uncertainty of the estimates when adding CL in the 2HT models, but only when CL and RT are added simultaneously is there an obvious improvement in accuracy in both SDT and 2HT models. It should be mentioned that this result is not necessarily generalizable to other quantitative variables of interest (e.g., eye tracking, mouse trajectory) or to other models (e.g., the two-low-threshold and one-high-threshold model). Therefore, the experimenter wishing to use MPT-DC must rely on data (e.g., comparing models) and theory to choose which variables to include in the MPT-DC model.

As to whether MPT-DC can expand the scope of testable research questions, we have concluded that by jointly modeling CL and RT we can study effects that cannot be studied when analyzing these variables separately. Juola et al. (2019) studied the effects of grouping the data into relative target frequency blocks; confidence level blocks; or RT blocks, but not all three variables at once. The data indicated that, unlike some previous research indicating similar effects of these variables, all three methods of grouping the data yielded differences in the receiver-operating characteristic (ROC) curves. However, it was not possible to estimate the distribution of CLs or RTs associated with each response category or cognitive state, nor to study the interactions between the two variables or how they might interact with relative target frequency. On the contrary, with the 2HT-CL-RT and SDT-CL-RT models, we were able to detect, among others effects, patterns such as (a) “old” responses tend to be of high confidence, and confidence is generally higher for responses emanating from the detect old state than from the guess states; (b) low-confidence responses are more likely than medium-confidence responses only for the guess old state; and (c) the inverse relationship between CLs and RTs is not fulfilled in the guess old state in 2HT models nor in the FA categories in SDT models. That said, studying the continuous variables simultaneously as dependent variables allows us to make inferences about the relationships between processes and their outcomes in an integrated manner, avoiding methodological issues derived from including quantitative variables as factors. In addition, although it was not the aim of our study, MPT-DC models can be used to disentangle response strategies or styles (Heck & Erdfelder, 2020). For example, whether extreme styles of confidence ratings tend to result in “high-fast” and “low-fast” responses rather than more typical response styles that result in “high-fast” and “low-slow” responses.

To address the third research question, on whether our modeling proposal allows us to reach new conclusions on old research questions, we faced the continuous-process versus discrete-state recognition model dilemma. Our results allowed us to reconsider several aspects of this matter. First, our results suggest that the question of whether recognition memory is continuous or discrete may need to be rephrased. Do subjects always fit a discrete or a continuous model better? The answer to that question seems to be negative. Our results indicate that only 23 out of 47 subjects always fit the same model, regardless of the variable included. This leads us to the second question. Does the validity of the model comparison depend on measured variables? On one hand, we find that adding more DVs, regardless of the model fitted or the type of DV added, apparently can have a negative impact on the goodness-of-fit of the model. The more DVs we added, the number of subjects’ data for which the goodness-of-fit test was maintained decreased. This seems to indicate that these tests do not tell us much about the fit of each model, but rather that the fit is influenced by the number of categories we are modeling. As mentioned above, by adding more categories, we subdivide the number of trials into more cells, which decreases the number of observations per category and seems to have an effect on the estimate of χ2 (Davis-Stober, 2009). On the contrary, it is worth noting that our simulation results indicate that only when comparing 2HT and SDT models that include CL and RT simultaneously or CLs alone can we actually identify the model that generates data correctly. The experimental results have also been consistent with the simulations, since if the selections made with CL and CL+RT are correct, as indicated by the simulations, then model selections made with CL or CL+RT should match, which is what we found when using the actual data. In short, according to our results, the comparison between discrete and continuous models depends on the included variables, and we should give more consideration to the model selection made when both CL and RTs or only CLs are taken into account. Third, individual differences impede us from answering the dilemma in a deterministic way. Bearing in mind the relevant variables analyzed in the particular sample used by Juola et al. (2019), there are between 81% and 87% of subjects’ data that fit better to a continuous model, and between 19% and 13% to a discrete one. This could indicate that participants have different response strategies Tourangeau et al. (2000), or that other factors come into play that have not been studied, which make the data fit better to one model or the other, or even that there is a more general or hybrid model, encompassing both the SDT and 2HT models.

The use of MPT-DC models has demonstrated their methodological and theoretical usefulness for modeling the effects of discrete and quantitative variables simultaneously in a recognition memory paradigm. However, as we have discussed above, the experimenter must choose which variables are most relevant to study. The motivation for choosing one or another variable may be substantial to the results of both theoretical and methodological considerations. In this study we have seen that the accuracy of the estimated parameters and model comparisons improve when modeling CLs and CLs with RTs. However, with another sample or other models this conclusion might be different. To construct a protocol on how to select the variables of interest, new methodological studies would have to be carried out (e.g., those that manipulate the number of bins and observations, the distribution of the variable to be included, the cognitive process probabilities). Furthermore, due to the relative recency of the MPT-DC models, several lines of research on the use of these models remain open. Among them we highlight the interest in studying which MPT-DC procedure is optimal for each experimental situation. If the current procedures differ in how they include the data, it seems logical to think that depending on the distribution of the data and the knowledge we have about it, different procedures should apply to different situations. For example, it has been impossible for us to establish three RT bins due to insufficient observations per bin. This problem might be solved by using a parameterized MPT-DC procedure since, unlike the nonparametric one, it uses the data set instead of discretizing it into bins. On the other hand, as we have suggested, due to the interest that MPT-DC has as a method to include variables in SDT models, we would have to modify or create libraries that allow the use of MPT-DC parametric procedures adapted to continuous models, something that to our knowledge is not currently possible. Hence, despite the encouraging results of our study in favor of the use of MPT-DC models, there are still several practical aspects to be developed and investigated regarding their use.

Alternative modeling approaches

In addition to Heck and Erdfelder’s (2016) proposal to extend the MPT to account for continuous variables using nonparametric methods, there are parametric versions such as those by Klauer and Kellen (2018) and Heck et al. (2018a) that fit fully continuous RT distributions. Klauer and Kellen’s (2018) version is specialized for fitting RTs assumed to follow ex-Gaussian distributions and assumes sequential processes. One of its major advantages is the ability to study the duration of each cognitive process in the sequence. This procedure can be fitted using the “RT-MPT” library (Klauer & Kellen, 2018), although it is designed specifically for RTs. However, it would not be difficult to extend the library to other distributions and thus make it applicable to other continuous variables. Additionally, Klauer and Kellen’s (2018) proposal requires serial processes and does not allow for parallel processes, unlike the proposals of Heck and Erdfelder (2016) and Heck et al. (2018b). In summary, the latter two proposals have the advantage of being applicable to a multitude of continuous variables (RT, eye tracking, cursor movement) and are flexible enough to study a wide variety of models in social psychology, basic psychology, neuroscience, and more, at the cost of being less explicit about the true nature of the processes involved. For example, they estimate the distributions of branches but do not specify the distributional components of the processes involved in them. Furthermore, Starns (2018) proposed a model called the race model, which has a similar structure to an MPT-DC model but integrates RTs by studying the time between the start of a trial and the occurrence of a detection or the participant’s decision to guess. This model does not allow for the study of process durations but rather estimates how long participants take before making a guess; a crucial point for understanding the trade-off between speed and accuracy. An alternative to the race model in the nonparametric MPT-DC framework has been proposed by Heck and Erdfelder (2020).

When it comes to fitting models to a dataset with multiple subjects, we usually must choose between fitting a model to aggregated data from multiple subjects or fitting a model for each individual subject. In this choice, it is often preferred to fit MPT models individually as aggregated data can alter the shape of the distributions we want to study. However, individual fits can be quite poor when there are few observations per subject (Chechile, 2009). As a solution to this dilemma, hierarchical MPT-DC models allow us to estimate distribution shapes quite reliably, even in situations with few observations per subject. However, hierarchical MPT-DC fits have two issues when used to derive individual fits. The first is that they assume that all subjects follow the same model. It is worth noting that our data, supported by simulations, indicates that some subjects fit better to SDT models while others fit better to 2HT models (it is possible that there is a general model that encompasses both SDT and 2HT, which would allow for the application of a single hierarchical model). The second issue is that current hierarchical libraries, for parametric MPT-DC models (“RT-MPT” from Klauer & Kellen, 2018) and nonparametric models (e.g., “TreeBUGS” from Heck, Arnold, et al., 2018a; see also Nestler & Erdfelder, 2023, for a random effects MPT modelling approach.) do not allow fitting of SDT models reparametrized as multinomial models. Therefore, it was not feasible to use hierarchical MPT-DC procedure in the present study.

On the other hand, instead of using MPT-DC models, one can turn to other types of cognitive models that allow for the joint analysis of continuous and discrete variables. Among these, diffusion models stand out as a specific version of evidence-accumulation models that can be used to analyze recognition memory. For example, the circular diffusion model of continuous-outcome source (Zhou et al., 2021) is a circular model, adapted from the diffusion model, applied to a continuous-source memory retrieval task. Unlike existing source retrieval models that attribute all response variability to memory, the circular diffusion model decomposes the noise into variability arising from both memory and decision-making processes. These findings suggest that in the task of continuous-source memory recall, there exists a memory strength threshold that must be reached to retrieve information about the stimulus source. Below this threshold, no information is retrieved, leading to guessing responses. Furthermore, the study suggests that participants’ confidence is controlled in an old/new recognition task, ruling out the possibility that participants’ guesses are simply due to a lack of recognition of the items. Additionally, we have the models proposed by Donkin et al. (2013), that combine a discrete state representation with a particular sequential sampling model called the linear ballistic accumulator (LBA; Brown & Heathcote, 2008). This model has an accumulator for each available response, and the response is triggered when the amount of evidence in one of the accumulators reaches a specific criterion. The LBA model uses evidence accumulators for each of the possible responses in the task (e.g., “change” or “no change”). The model assumes that some trials are “guessing” trials, where the average accumulation rate is the same for both accumulators, and the remaining trials are “memory” trials, where accumulation rates are determined by the presented stimulus. In other words, the distribution of accumulation rates across trials is a discrete mixture of various types of trials.

Similarly, attempts have been made to model continuous variables such as RTs using SDT models. According to strength theory, both RTs and CLs can be defined as a function of distance to a criterion. For example, an exponential decay function was used to fit the distribution of RTs as a function of distance to criteria (Atkinson & Juola, 1973). However, this approach did not produce a significant improvement in the model fit to justify the additional parameters. It is crucial to note that these functions usually assume an inverse relationship between CL and RT. However, if this assumption is not entirely true, it could lead to incorrect predictions. To address this problem, MPT-DC models estimate RTs distributions within each branch, which allows us to examine whether the inverse relationship between RTs and CLs holds across branches. In our research, we have found that this is not the case. This knowledge of the relationship between RT and CL is valuable, as it helps to avoid potentially erroneous predictions based on nondirectly observable assumptions that do not hold when fitted to the real data.

In conclusion, depending on our theoretical models (e.g., models of continuous processes versus models of discrete processes), the type of theoretical hypotheses we want to test as researchers (e.g., hypotheses regarding fast/slow responses versus expectations of changes in the parameters of continuous RT distributions), and the distribution of our data, it may be better to employ one approach over the other. However, it would be interesting to leverage the strengths of each modeling approach to address the same scientific objective, thereby gaining robustness in our conclusions through converging evidence. By combining these approaches, we can enhance our understanding and provide more comprehensive insights into the cognitive processes under investigation.