# Item response and response time model for personality assessment via linear ballistic accumulation

## Abstract

On the basis of a combination of linear ballistic accumulation (LBA) and item response theory (IRT), this paper proposes a new class of item response models, namely LBA IRT, which incorporates the observed response time (RT) by means of LBA. Our main objective is to develop a simple yet effective alternative to the diffusion IRT model, which is one of best-known RT-incorporating IRT models that explicitly models the underlying psychological process of the elicited item response. Through a simulation study, we show that the proposed model enables us to obtain the corresponding parameter estimates compared with the diffusion IRT model while achieving a much faster convergence speed. Furthermore, the application of the proposed model to real personality measurement data indicates that it fits the data better than the diffusion IRT model in terms of its predictive performance. Thus, the proposed model exhibits good performance and promising modeling capabilities in terms of capturing the cognitive and psychometric processes underlying the observed data.

## Keywords

Item response theory Response time Diffusion model Linear ballistic accumulation## 1 Introduction

In recent years, personality assessments have been widely administered via computers and tablets. Through such computerized tests, researchers can obtain not only the item responses but also the response times (RTs) of respondents. Considering the fact that every testing situation typically involves at least some time pressure, the time required to respond to items can reveal useful information about respondents’ traits (Tuerlinckx et al. 2016).

Tuerlinckx et al. (2016) conceptually classified previously proposed models for dealing with both the item responses and RTs into three types. First, there exist models that regard the RT as collateral information. These models are primarily concerned with the item response model and use the RT to improve the efficiency of parameter estimation or to provide additional information such as the detection of cheating and guessing. For example, Roscam (1987, 1997) added the logarithm of the RT as an independent variable to the logistic item response model. This class of models essentially represents the probability of a correct item response and uses the RT merely as a source of additional information. The second type of model includes normative models, which typically employ some scoring rules. For example, van der Maas and Wagenmakers (2005) introduced the correct item summed residual time (CISRT) scoring rule, which assigns the sum of all residual times for correct responses as the score of the respondent for the measurement of chess expertise. Although scoring rules are popular in games and sports, they are not commonly adopted in psychometrics because it is difficult to establish reasonable and widely acceptable scoring rules.

The third type of model includes process models, which make specific assumptions about the underlying psychological process that result in the item responses and RTs. Such models enable us to apply accumulated scientific knowledge in psychology for the direct modeling of the relationships between the observed test performance and the inner psychological process that drives the item-answering behavior. In contrast, the first and second types of models mentioned above have no obvious connection to the psychological models for representing the underlying mechanisms of such information processing; hence, it is difficult for them to consider the underlying cognitive process of the item response and RT. Nevertheless, most previously proposed RT-incorporated item response models are of the first or second type (van der Linden 2016).

By explicitly taking into account the cognitive process underlying the observed responses, we can obtain psychologically meaningful parameter estimates such as the speed of information uptake and the amount of information used to make a decision. This helps us to quantify and empirically test psychological theories in an applied context. In fact, process models have been successfully applied to understand and explain the processes underlying a variety of topics in human sciences, such as memory, familiarity effects, perceptual judgments, and decision making (see van Ravenzwaaij and Oberauer 2009). For example, Ratcliff et al. (2007) applied a process model to examine why the elderly take more time than younger ones in lexical decision tasks. Their finding based on the obtained parameter estimates is that this is not due to the decreased rate of information processing but because older respondents set their response boundaries more conservatively. Palada et al. (2016) also applied process models to response data with complex stimuli. In their study, respondents tended to perform a task faster when the stimulus is more complex. On the basis of the parameter estimates of process models, they found that this is not because the respondents’ capacity increases with the complexity of stimuli (which is a phenomenon called “super-capacity”) but because respondents set a lower threshold for a response when the task is complex. As Donkin et al. (2011, p. 61) state, “this kind of conclusion would have been very difficult to draw without a cognitive model of choice RT.”

On the basis of these considerations, we focus on process models in this study. First, we review the relevant literature.

In the field of cognitive and mathematical psychology, which has a rich tradition of RT modeling (Luce 1986; Voss et al. 2013), psychological models have been developed to explain the course of information processing that leads to the observed human response and the associated RT. Two of the best-known models are the diffusion model and the linear ballistic accumulation (LBA) model. By introducing several cognitive parameters, as elaborated in the next section, the diffusion and LBA models address the tradeoff between the response accuracy and the speed, thereby facilitating the quantification of individual performance. The diffusion model, originally formalized by Ratcliff (1978) on the basis of preceding studies such as Stone (1960) and Laming (1968), is a representative model that considers the underlying response generation process. In contrast, the LBA model was proposed by Brown and Heathcote (2008), who aimed to propose a simple yet effective alternative to the diffusion model. They demonstrated that the LBA model accounts for many important empirical phenomena from choice tasks, such as the speed difference between correct and incorrect responses and the shape of the speed–accuracy tradeoff function.

The original diffusion and LBA models do not distinguish respondent and item parameters, which are fundamental characteristics of item response theory (IRT) models. However, considering the fact that RT-incorporating IRT models are meant to be applied to item response data obtained as a result of internal human information processing, it would be reasonable to believe that the combination of traditional IRT models for the item response and psychological models for the RTs would lead to the emergence of a novel, important, and practical class of models. On the basis of this idea, the D-diffusion IRT model (Tuerlinckx and De Boeck 2005; van der Maas et al. 2011) has been proposed as a novel category of RT-incorporating IRT models and is currently the most representative process model for item-answering behavior for personality assessments. It is an elegant combination of the diffusion and IRT models with both respondent and item parameters. Its respondent parameters provide psychological insights into the underlying cognitive properties of human information processing.

Nevertheless, the diffusion model is a complex nonlinear model. It is expressed as a sum of infinite series, and the parameters vary across trials. Wagenmakers et al. (2007) suggested that mathematical psychologists use such a complicated model because of the substantial payoff involved; the estimated parameter values of the diffusion model can provide psychological insights that cannot be provided by standard superficial methods of analysis. However, when combined with IRT in the form of the diffusion IRT model, the diffusion model becomes even more complex. The increased complexity of the model might prevent its application to real datasets. This is a critical issue in practice, especially when the data have a large sample size or when the hierarchical structure of the data is to be taken into account.

In addition, in the diffusion IRT model, the discriminability and expected RT have a linear relationship, as we elaborate and illustrate in Sect. 3.2. The discriminability is equivalent to the slope of the logistic curve, i.e., how well an item can distinguish high-scoring people from low-scoring ones. However, this linear relationship may be a strong and unrealistic assumption. In the case of a reaction time task in cognitive psychology, which is the original context modeled with the diffusion model, the expected RT is typically shorter than a few seconds; then, the linearity assumption might work out as a primary assumption. However, when the expected RT is longer than a few seconds, which is actually the case in typical personality assessments, the estimates of the discriminability could be considerably inflated owing to this linear relationship. One possible way to deal with this problem is to add another new parameter to moderate the relationship. However, as noted above, the diffusion IRT model is already a complex model. It would be better to not increase the number of parameters further in consideration with the estimation efficiency. Therefore, this study instead focuses on a simpler alternative of the diffusion IRT model.

Donkin et al. (2011) investigated whether and to what extent conclusions regarding psychological processes depend on the choice between the diffusion and LBA models. They found a largely straightforward correspondence between the parameters of the two models. In fact, for the diffusion and LBA models, they concluded that “inferences about psychological processes made from real data are unlikely to depend on the model that is used” (p. 61).

On the basis of the abovementioned observations, the central idea of this study is that the combination of the LBA and IRT models would be beneficial as a novel model for modern test data. In particular, it would provide us with insights into the underlying psychological process, distinguish the item and respondent characteristics, and facilitate faster and more stable estimation than the diffusion IRT models, even when new parameters are included. Thus, the objectives of the present study are to propose a new LBA IRT model and to comparatively evaluate it against existing diffusion IRT models using simulated and real data.

The remainder of the paper is organized as follows. Section 2 reviews the existing diffusion and diffusion IRT models. Section 3 introduces the proposed LBA IRT model. Section 4 describes a simulation study conducted to compare the performance of the proposed LBA IRT model with that of existing diffusion IRT models. Section 5 provides an empirical illustration of the proposed and existing methods using a real personality dataset. Finally, Sect. 6 concludes the paper with a brief discussion of the directions for future research.

## 2 Existing models

### 2.1 Diffusion model

*z*). Once it reaches the upper (\(\alpha\)) or lower (0) boundary, the respondent answers the item. The upper boundary corresponds to the correct response, and the lower boundary corresponds to the incorrect response. The increase in the amount of information in unit time follows a normal distribution with a mean

*v*and within-trial variance \(s^{2}\).

The diffusion model has four main parameters. \(\alpha\) represents the distance between the upper and lower boundaries. A larger value of \(\alpha\) means that a longer time is required to answer the item, which suggests that the respondent’s choice is more deliberate. *z* denotes the starting point. If the subject has a positive response bias to the item from the beginning, *z* becomes larger; thus, the starting point approaches the upper bound (\(\alpha\)), which leads to a higher probability of giving the correct answer. When there is no response bias, i.e., when the respondent does not favor or avoid one of the choice alternatives at the beginning (Leite and Ratcliff 2011), *z* equals \(\alpha /2\). *v* represents the average slope of the information accumulation process. The process approaches the upper (resp. lower) bound when *v* is positive (resp. negative). \(\tau\) represents the nonresponse time, i.e., the duration of nondecisional processes, which may comprise basic encoding processes.

*X*be a binary observed variable that takes the value 1 when the correct answer is observed (or the accumulation reaches the upper bound) and 0 when the wrong answer is observed (or the accumulation reaches the lower bound). To derive the formulation of the diffusion model, consider a discrete random walk process at first. In the process, the accumulator goes up one unit with a probability

*p*and goes down one unit with a probability \(q=1-p\). Let us consider the probability that the accumulator reaches the lower boundary when it starts from

*z*, which we denote by \(P(X=0|z)\). From the above assumption, \(P(X=0|z)\) must satisfy the recursive equation \(P(X=0|z)=pP(X=0|z+1)+qP(X=0|z-1).\) By solving this equation, \(P(X=0|z)\) becomes

*n*-th step when it starts from

*z*. \(p(X=0,n|z)\) must satisfy the recursive equation \(p(X=0,n+1|z)=pp(X=0,n|z+1)+qp(X=0,n|z-1).\) As a result, \(p(X=0,n|z)\) becomes

*X*) and RT (

*T*) in the diffusion model is represented as

### 2.2 Diffusion IRT model

*z*is set to \(\alpha /2\) (no response bias), the probability that respondent

*i*answers the

*j*-th item correctly is given as

*i*and

*j*for the parameters of the IRT-based models but suppress them for the original diffusion and LBA model parameters. This is because, whereas the separation between item and respondent parameters is the key characteristic of IRT models, the original diffusion and LBA models essentially do not distinguish them. In the diffusion IRT model, this separation can be represented as

IRT models are typically used for two different types of psychological measurement: personality measurement such as the Big Five scale and ability measurement such as college entrance examinations. Corresponding to these two scenarios, two types of diffusion IRT models with different functional forms have been developed.

*v*is expressed as the

*difference*between the item threshold and the respondent’s latent trait. In this model, the specific form of Eq. (6) is given as

*v*as the

*quotient*of the item difficulty and respondent’s ability. In this model, the specific form of Eq. (6) is given as

*b*are restricted to be nonnegative and positive, respectively. This corresponds to the assumption that a more competent respondent tends to answer faster than a less competent one. In addition, when a respondent has the lowest competence and answers a two-alternative forced choice item at random, the probability of obtaining a correct response should equal 50%. In Eq. (8), the lower bound of \(v_{ij}\) is 0 (when \(\theta _i =0\)), and the probability of reaching the upper bound (choose correct response) is exactly 50% in this case. In this manner, the Q-diffusion model takes into account the effect of guessing. The reader may refer to van der Maas et al. (2011) and Tuerlinckx et al. (2016) for further details regarding the differences in the D- and Q-diffusion models.

### 2.3 Linear ballistic accumulation model

Brown and Heathcote (2008) proposed a simple cognitive model called the LBA model, which is schematically illustrated in Fig. 2. In the LBA model, information regarding a certain choice of an item is linearly accumulated with time, whereas the amount of accumulation is normally distributed in the diffusion model. Once the item is presented to the respondent, the evidence toward each choice of the item accumulates independently (the accumulation of one choice is irrelevant to that of any other choice) and linearly (the amount of accumulation does not change with time). This means that the model assumes no within-trial variance (\(s^2=0\)). Instead, the LBA model introduces the between-trial variance \(\eta ^2\), which indicates that the amount of information accumulation over time varies between each trial, even if a respondent answers the same item repeatedly. When the information for any one choice reaches the boundary (\(\beta\)), a corresponding response is provided by the respondent. The starting point of evidence accumulation for each choice is a random realization between 0 and *A*, and the amount of evidence accumulated in unit time is a realization from a normal distribution with a mean *v* and between-trial variance \(\eta ^2\). All choices have *A* and \(\beta\) in common, whereas *v* differs among the choices.

*v*, \(\tau\), and \(\beta - \frac{A}{2}\) in the LBA model can be interpreted in a similar manner as

*v*, \(\tau\), and \(\alpha\) in the diffusion model, respectively. However, the LBA model differs from the diffusion model in two major aspects. First, each choice has its own

*v*, while the standard deviation of the drift rate, denoted by

*s*, is common to all choices. On the contrary, in the diffusion model, the information for each choice accumulates dependently; for example, approaching one choice (e.g., the upper bound) indicates drifting away from the other choice (i.e., the lower bound), which is shown in Fig. 1. Hence, the diffusion model has only one drift rate parameter (

*v*). In addition, because of the problem of scaling, the standard deviation of the drift rate is typically fixed. Second,

*A*indicates the upper bound of the starting point, whereas

*z*in the diffusion model corresponds to the starting point of information processing.

Brown and Heathcote (2008) have pointed out several differences between the LBA model and other decision-making models. First, the EZ-diffusion model (Wagenmakers et al. 2007) is simpler than the LBA model, but Wagenmakers et al. note that the EZ-diffusion model was developed for expressing data in a simple form rather than fully modeling the cognitive process. Therefore, it may not reflect all of the important features of the RT. For example, the EZ-diffusion model assumes that the RT distributions of the correct and incorrect responses are identical, which can be an unrealistically strong assumption in practice. On the other hand, the LBA model is considered as the simplest yet complete decision-making model of the RT because the model successfully accounts for important empirical phenomena of the choice RT, such as the speed–accuracy tradeoff and the relative speed of correct vs. incorrect responses (Brown and Heathcote 2008). Similarly, other choice RT models such as the latency model (Grice 1968) and the LATER model (Reddi and Carpenter 2000) can be seen as simplifications of the diffusion model that assume no trial-to-trial variability among evidence accumulation processes. In these models, however, the model-predicted RT distribution of the incorrect choice is negatively skewed, and this would never occur in the observed data. This inadequate negative skew is caused by projecting the tail of a normal distribution that generates negative slopes (Brown and Heathcote 2008). On the other hand, the LBA model represents accumulation processes for every choice; hence, the RT distributions of all choices can be fitted well. Finally, the LBA model has simple analytic solutions for choices among any number of different alternatives.

### 2.4 Relation between the diffusion and LBA models

Donkin et al. (2011) conducted a parameter recovery simulation study to examine the relation between the diffusion and LBA models. They considered the relationship between the two models with the following settings. (1) In the LBA model, the sum of *v* should be one. When the LBA model is compared with the diffusion model, \(v^{(2)}\) becomes \(1-v^{(1)}\). Here, \(v^{(1)}\) is the mean of the slope for the first response category (\(k=1\); e.g., “I agree”), and \(v^{(2)}\) is that of the second response category (\(k=2\); e.g., “I disagree”). (2) In the diffusion model, the distance from the starting point to the boundary is \(\frac{\alpha }{2}\). In the LBA model, the starting point is uniformly distributed from 0 to *A*. Therefore, the expected distance from the starting point to the boundary becomes \(\beta -\frac{A}{2}\). Accordingly, they compared \(\frac{\alpha }{2}\) with \(\beta -\frac{A}{2}\) rather than comparing \(\alpha\) with \(\beta\) directly. In their simulation study, simulated data were generated and estimated with both models. Their results indicated the existence of a nearly one-to-one relationship with regard to the drift rate or nondecision time parameters, while the boundary parameters did not exhibit simple mapping (even though they have a fairly high correlation).

On the other hand, there are also studies that discuss the differences between the diffusion and LBA models. For example, Heathcote and Hayes (2012) pointed out that the parameters of the two models would result in equivalent inferences under some conditions and different inferences under other conditions. This is not surprising because the precise functional forms of the diffusion and LBA models are different. Thus, caution may be needed when qualitatively translating a parameter estimate of one model to the other. In general, however, most core parameters of the diffusion and LBA models are comparable and have similar empirical meanings (Heathcote et al. 2015).

## 3 Proposed model

We apply the LBA framework of modeling the RT data, which has been proved to be useful in the field of cognitive and mathematical psychology, to IRT models that are popular in psychometrics. For this purpose, we first present the original formulation of the LBA model (Brown and Heathcote 2008) in this section and then reparameterize it to yield the proposed LBA IRT model. This reparameterization allows us to combine the strengths of the LBA and IRT models, which are both popular in different fields. Note that in this study, we only focus on two-alternative (\(k=1, 2\)) forced choice tasks, although the original LBA model applies to *K*-alternative (\(k = 1,\ldots , K\)) tasks.

*c*be the random value derived from \(U[\beta -A,\beta ]\). This

*c*represents the distance from the start point, which is randomly derived from

*U*[0,

*A*], to the threshold \(\beta\). Moreover, let

*d*be the random value derived from \(N(v^{(k)},\eta ^2)\). Then, the cumulative distribution function of RT (

*T*) for the

*k*-th choice is given as

*T*) for the

*k*-th response category (\(k=1, 2\)) is re-expressed as (for the detailed derivation, see Appendix A in Brown and Heathcote 2008)

*t*:

*k*reaches the boundary, regardless of any other response. However, in typical applications of the LBA model, we can only observe the time for a single (chosen) choice to reach the threshold; the times for the other choices are not known. Therefore, we need to derive the

*defective*(meaning not summing to 1) distribution, which is the distribution of a certain choice reaching the threshold before any other choice at time

*t*. From Eqs. (10) and (11), we obtain the defective density function of choice

*k*with RT

*t*as

*k*can be derived by the integration of Eq. (12) over all \(t (0\le t\le \infty )\) or by evaluating the cumulative distribution function given by Eq. (12) at \(t \rightarrow \infty\). Apparently, it is impossible to derive a logistic (or normal ogive) function in the form of Eq. (5). Therefore, it is impossible to obtain a simple relationship between the functional forms of the LBA and IRT equations from Eq. (12). Nevertheless, we can adopt a numerical approach—namely, the Markov chain Monte Carlo (MCMC) estimation method—for the proposed model.

### 3.1 Parameter settings

As stated in Sect. 2.2, there exist two classes of diffusion IRT models. In this study, we chose to extend the D-diffusion model for the following reasons. First, \(\theta\) and *b* of the D-diffusion model can be regarded as nearly the same as those of traditional IRT, whereas those of Q-diffusion are restricted in that they cannot take negative values. Second, *v* in D-diffusion is simply the *difference* between \(\theta\) and *b*; therefore, it is easier to estimate than in the case of Q-diffusion, where *v* is a *quotient*. Third, the responses of personality measurement tend to be faster than those of ability measurement. The LBA and diffusion models were typically applied to cognitive tasks, the RT of which is typically less than a few seconds (although there are recent exceptions; see Palada et al. 2016). In general, ability measurements require a much longer RT than personality measurements, and the model properties under such conditions are less well-known. Therefore, we adopt D-diffusion. Accordingly, the proposed model is called the *D-LBA IRT* model (we may simply call it the D-LBA model hereafter).

*Boundary*Donkin et al. (2011) showed that \(\alpha\) in the diffusion model can be interpreted as nearly the same as \(\beta -\frac{A}{2}\) of the LBA model. To retain the same number of parameters in the D-LBA model as that in the D-diffusion model, we need to introduce a parameter constraint. For this purpose, we consider

*A*as fixed in this study; specifically, we set

*Drift rate*We can treat the drift rate in the same manner as in the case of the D-diffusion model, except that it needs to satisfy an identification constraint, which is required for latent variable models. In the original LBA model, a common way of incorporating this constraint is to set the sum of the drift rates among the alternative choices to be one. We follow this approach by letting

*v*to be the difference between \(\theta\) and

*b*scaled by a logistic function:

*Nondecision time* The nondecision time is nearly the same as that in the case of D-diffusion. We set the nondecision time as the item parameter \(\tau _{j}\).

*Between-trial variability of the slope*As with the LBA model, the diffusion model involves the identification issue. To deal with this in the diffusion model, the within-trial variance of the slope,

*s*, has to be fixed at a certain value such as 0.1 or 1. This is due to the indeterminacy of the scale among \(s, \alpha\), and

*v*in the diffusion model. On the other hand, the LBA model has no with-trial variance

*s*; instead, it incorporates the between-trial variance \(\eta\). By decomposing this variance into the item and respondent parameters, the proposed model can solve the problem of the linear relationship between them, which was, as we noted before, one of the major problems of the diffusion IRT model. Specifically, the proposed model decomposes the between-trial variance \(\eta\) into person and item factors:

### 3.2 Meanings of the parameters in the D-diffusion and D-LBA models

To facilitate the understanding of parameters, in this section, we briefly show the relationships between the parameter values and the observed quantities, which are the RTs and item responses.

*Expected RT*Figure 3 shows the relationships between the expected RTs and \((\theta _i-b_j)\). This \((\theta _i-b_j)\) is equal to drift rate in the D-diffusion model (Eq. 7) and is a one-to-one with the drift rate in the D-LBA model (Eq. 15); thus, we comprehensively call it the drift rate component here. Each of the three lines correspond to different boundary parameter values. We can observe two major characteristics common to both models. First, the expected RT is longer when the absolute difference \(|\theta _i-b_j|\) is close to zero. Second, the expected RT is longer when the boundary (\(\gamma _{i}/a_{j}\)) is larger. In the D-diffusion model, the expected RT is given by (Tuerlinckx et al. 2016)

*Item response*Figure 5 shows the relationship between the probability of choosing the first response category and the drift rate component \((\theta _i-b_j)\). Each of the three lines corresponds to different boundary parameter values. In the D-diffusion model, the discriminability differs in conjunction with the boundary parameters. In the D-LBA model, on the other hand, the discriminability does not differ even when the boundary parameters differ. This suggests that, in the D-LBA model, the boundary parameters only affect the expected RT. Here, Fig. 6 indicates how the between-trial variability in the slope, \(\eta _{ij}\), affects the discriminability and expected RT. The results suggest that \(\eta _{ij}\) affects both the discriminability and expected RT.

The relationship shown in Fig. 6 shows the advantage of the proposed D-LBA model. The D-LBA model is free from the strong and unrealistic assumption of linearity between the discriminability and the expected RT by incorporating two different parameters \(\alpha _{ij}\) and \(\eta _{ij}\). In contrast, the D-diffusion model has this linearity. This may limit the applicability of the D-diffusion model to empirical data that has a longer RT than the typical context of the diffusion model.

### 3.3 Prior distribution

In the original LBA model, some parameters (viz., \(\beta\), *A*, and \(\eta\)) are permitted to differ between different response categories. As pointed out by Heathcote and Love (2012), allowing this flexibility can make the model fit better. However, in this study, we set these parameters to be equal between response categories for the following reasons. First, the proposed D-LBA model is meant to be a simpler alternative to the D-diffusion model; thus, we would not want to increase the number of parameters unless they are of substantial importance. Second, this constraint facilitates faster and stable estimation.

## 4 Simulation study

In this simulation study, we investigated several issues. First, we assessed parameter recovery. If the proposed model cannot properly recover parameters with the simulation data, it is pointless to examine other issues. Second, we investigated the similarity between the D-diffusion and D-LBA models. Because the parameters of these two models do not analytically map to each other, the empirical correspondence between the parameters needs to be investigated. Although the D-LBA and D-diffusion models may have similar parameter interpretations, they have different parameter scales. Hence, we evaluated the similarity by correlations rather than absolute differences. Third, we compared both models in terms of the number of iterations to convergence. Previous studies have shown that the LBA model is simpler than the diffusion model. If so, it may be natural to expect the D-LBA model to converge faster than D-diffusion, even if new parameters are added to the D-LBA model. Finally, we examined the model performance according to the information criteria.

*Conditions* In this simulation, we generated simulation data from both the D-diffusion and D-LBA models, and we estimated the parameters using both models. In addition, we set \(3\times 3=9\) conditions by combining the following factors: (1) the number of respondents \(I=(100,300,500)\) and (2) the number of items \(J=(10,20,30)\). Overall, we simulated \(3\times 3\times 2\times 2=36\) conditions with 20 replications for each condition.

*Data generation*To generate simulation data from the D-LBA IRT model, the following distributions were used:

*s*to 1 in the diffusion model.

Using the true parameter values generated from Eq. (21), we generated simulation data with the rdiffusion function in the rtdists R package. In this process, we found that a very small proportion of the generated RTs were greater than 120 s. However, such data make estimation (calculation of the log-probability) difficult and unstable for technical reasons, especially in the case of diffusion IRT. Moreover, in practice, the observation of such large RT data is unlikely; if they exist, they are usually excluded from the analysis as “lazy” responses. For these reasons, we used only data for which the RT was less than 120 s.

Descriptive statistics of the RTs generated by the simulation

> 120 s | Mean | 2.5% | 50% | 97.5% | |
---|---|---|---|---|---|

D-diffusion | 0.054% | 2.111 s | 0.254 s | 1.066 s | 10.587 s |

D-LBA | 0.007% | 1.539 s | 0.282 s | 0.907 s | 6.635 s |

*Prior distributions* For estimation by the proposed D-LBA model, our priors are those in Eq. (20). We use the same prior for the D-diffusion model, except \(\psi _j\) and \(\sigma _i\).

For all of the estimation results presented below, we used R (3.4.0) and rstan 2.15.0 on a Windows 10 PC. Three MCMC chains were run for each dataset. The number of MCMC iterations per chain was 10,000, 9000 of which were discarded as warmup. The Stan code for LBA was obtained from the work of Annis et al. (2016) and extended to the D-LBA model. The Stan code used in this study is provided Open Science Framework (https://osf.io/ck7fr/). We used the posterior means of the parameters as their point estimates.

### 4.1 Results

*Parameter recovery*(

*RMSE and bias*) Table 2 lists the mean root-mean-square error (RMSE) values for each condition when both the generation and estimation models are the same. The RMSE for the parameter \(\theta _i\), for example, is calculated by

*I*increases, the RMSEs for all item parameters decrease in both the D-diffusion and D-LBA models. On the other hand, the RMSEs for the respondent parameters did not decrease as a function of

*I*. This can be attributed to the well-known “Neyman–Scott paradox” (Neyman and Scott 1948). Specifically, in some IRT models, the estimates of the respondent parameters do not converge to the true value even in the large-sample limit because the number of parameters also increases as the number of respondents increases.

Mean RMSE for each model

| | \(a_j\) | \(\log (\gamma _i)\) | \(b_j\) | \(\theta _i\) | \(\tau _j\) | \(\psi _j\) | \(\log (\sigma _i)\) | |
---|---|---|---|---|---|---|---|---|---|

D-Diffusion | 100 | 10 | 0.062 (0.047) | 0.175 (0.075) | 0.144 (0.050) | 0.465 (0.046) | 0.003 (0.002) | – | – |

20 | 0.119 (0.054) | 0.224 (0.079) | 0.152 (0.050) | 0.391 (0.051) | 0.003 (0.001) | – | – | ||

30 | 0.213 (0.080) | 0.348 (0.103) | 0.144 (0.040) | 0.364 (0.052) | 0.003 (0.001) | – | – | ||

300 | 10 | 0.032 (0.022) | 0.150 (0.024) | 0.093 (0.031) | 0.482 (0.032) | 0.001 (0.000) | – | – | |

20 | 0.039 (0.026) | 0.119 (0.034) | 0.094 (0.027) | 0.390 (0.028) | 0.001 (0.000) | – | – | ||

30 | 0.055 (0.033) | 0.128 (0.048) | 0.085 (0.028) | 0.338 (0.028) | 0.001 (0.000) | – | – | ||

500 | 10 | 0.029 (0.016) | 0.145 (0.017) | 0.070 (0.026) | 0.471 (0.023) | 0.001 (0.000) | – | – | |

20 | 0.035 (0.018) | 0.115 (0.021) | 0.072 (0.021) | 0.389 (0.027) | 0.001 (0.000) | – | – | ||

30 | 0.037 (0.016) | 0.108 (0.024) | 0.065 (0.012) | 0.340 (0.018) | 0.001 (0.000) | – | – | ||

D-LBA | 100 | 10 | 0.230 (0.128) | 0.302 (0.056) | 0.435 (0.158) | 0.670 (0.071) | 0.010 (0.004) | 0.525 (0.208) | 0.509 (0.046) |

20 | 0.323 (0.167) | 0.276 (0.054) | 0.429 (0.138) | 0.548 (0.058) | 0.010 (0.004) | 0.616 (0.219) | 0.401 (0.047) | ||

30 | 0.456 (0.171) | 0.288 (0.051) | 0.381 (0.096) | 0.523 (0.058) | 0.010 (0.003) | 0.674 (0.228) | 0.348 (0.045) | ||

300 | 10 | 0.120 (0.057) | 0.281 (0.034) | 0.285 (0.133) | 0.684 (0.055) | 0.005 (0.003) | 0.330 (0.180) | 0.485 (0.020) | |

20 | 0.176 (0.099) | 0.252 (0.028) | 0.233 (0.089) | 0.564 (0.032) | 0.005 (0.001) | 0.317 (0.105) | 0.373 (0.026) | ||

30 | 0.199 (0.097) | 0.224 (0.035) | 0.228 (0.067) | 0.517 (0.037) | 0.004 (0.001) | 0.308 (0.103) | 0.321 (0.024) | ||

500 | 10 | 0.094 (0.056) | 0.284 (0.020) | 0.175 (0.106) | 0.650 (0.053) | 0.004 (0.002) | 0.224 (0.121) | 0.478 (0.018) | |

20 | 0.106 (0.060) | 0.233 (0.018) | 0.176 (0.099) | 0.553 (0.031) | 0.003 (0.001) | 0.212 (0.083) | 0.360 (0.017) | ||

30 | 0.105 (0.062) | 0.209 (0.017) | 0.167 (0.072) | 0.495 (0.027) | 0.003 (0.001) | 0.209 (0.069) | 0.311 (0.020) |

Mean bias for each model

| | \(a_j\) | \(\log (\gamma _{i})\) | \(b_{j}\) | \(\theta _{i}\) | \(\tau _{j}\) | \(\psi _{j}\) | \(\log (\sigma _i)\) | |
---|---|---|---|---|---|---|---|---|---|

D-Diffusion | 100 | 10 | − 0.044 | − 0.084 | 0.020 | 0.021 | 0.000 | – | – |

20 | − 0.109 | − 0.194 | − 0.030 | − 0.032 | 0.001 | – | – | ||

30 | − 0.205 | − 0.338 | 0.022 | 0.008 | 0.000 | – | – | ||

300 | 10 | − 0.023 | − 0.056 | − 0.012 | − 0.020 | 0.000 | – | – | |

20 | − 0.030 | − 0.061 | 0.001 | − 0.001 | 0.001 | – | – | ||

30 | − 0.049 | − 0.095 | 0.009 | 0.007 | 0.000 | – | – | ||

500 | 10 | − 0.013 | − 0.035 | − 0.019 | − 0.010 | 0.000 | – | – | |

20 | − 0.028 | − 0.059 | − 0.013 | − 0.014 | 0.000 | – | – | ||

30 | − 0.033 | − 0.070 | − 0.008 | − 0.002 | 0.001 | – | – | ||

D-LBA | 100 | 10 | − 0.129 | − 0.090 | 0.008 | − 0.028 | 0.000 | − 0.244 | − 0.145 |

20 | − 0.269 | − 0.163 | 0.066 | 0.006 | 0.001 | − 0.428 | − 0.189 | ||

30 | − 0.405 | − 0.215 | 0.007 | 0.001 | 0.000 | − 0.490 | − 0.172 | ||

300 | 10 | − 0.054 | − 0.067 | 0.023 | − 0.005 | 0.001 | − 0.134 | − 0.137 | |

20 | − 0.142 | − 0.100 | 0.005 | − 0.003 | 0.000 | − 0.134 | − 0.098 | ||

30 | − 0.163 | − 0.101 | 0.006 | 0.010 | 0.000 | − 0.186 | − 0.099 | ||

500 | 10 | − 0.050 | − 0.064 | 0.001 | 0.005 | 0.001 | − 0.050 | − 0.110 | |

20 | − 0.057 | − 0.052 | 0.006 | 0.000 | 0.000 | − 0.064 | − 0.074 | ||

30 | − 0.076 | − 0.063 | − 0.011 | − 0.006 | 0.000 | − 0.106 | − 0.077 |

*Correspondence (correlations)*Tables 4, 5, 6, 7 and 8 summarize the correlation of each parameter for each condition. With regard to the correlations for the conditions in which data were generated with the D-diffusion model and estimated with the D-LBA model, all of the parameters except \(a_j\) took sufficient values (greater than 0.9). Even though \(a_j\) had lower correlations than the others, this is consistent with the results of Donkin et al. (2011). In addition, the proposed model expresses the relationship between the discriminability and the RT with two parameters, \(\alpha _{ij}\) and \(\sigma _{ij}\), while the D-diffusion model uses only \(\alpha _{ij}\) to express this relationship. This would explain the result that the correlation of \(a_j\) is lower, particularly when the data were generated from the D-diffusion model and estimated with the D-LBA model. These results, especially those in Table 7, show an interesting point. When the data generation and estimation models were the same, the D-diffusion model exhibited a higher correlation than the D-LBA model. On the other hand, when data generation and estimation models were different, the D-LBA model exhibited a higher correlation than the D-diffusion model. This suggests that when the data generation model is known to be the D-diffusion model, the true D-diffusion model seems to indicate higher performance than the D-LBA model. However, when the data generation model is unknown, which is natural in a practical situation, the D-LBA model indicates more stable performance regardless of the true data generation model.

Correlation between the true parameter value of \(a_{j}\) in the data generating model and its estimate in the estimation model

Data generating model | D-diffusion | D-LBA | |||
---|---|---|---|---|---|

Estimation model | D-diffusion | D-LBA | D-diffusion | D-LBA | |

| | ||||

100 | 10 | 0.973 | 0.407 | 0.920 | 0.990 |

20 | 0.982 | 0.598 | 0.905 | 0.991 | |

30 | 0.981 | 0.665 | 0.890 | 0.989 | |

300 | 10 | 0.994 | 0.519 | 0.942 | 0.996 |

20 | 0.993 | 0.547 | 0.939 | 0.996 | |

30 | 0.993 | 0.553 | 0.936 | 0.996 | |

500 | 10 | 0.996 | 0.530 | 0.937 | 0.998 |

20 | 0.996 | 0.582 | 0.932 | 0.998 | |

30 | 0.996 | 0.603 | 0.948 | 0.998 |

Correlation between the true parameter value of \(\log (\gamma _{i})\) in the data generating model and its estimate in the estimation model

Data generating model | D-diffusion | D-LBA | |||
---|---|---|---|---|---|

Estimation model | D-diffusion | D-LBA | D-diffusion | D-LBA | |

| | ||||

100 | 10 | 0.992 | 0.947 | 0.917 | 0.960 |

20 | 0.996 | 0.968 | 0.939 | 0.976 | |

30 | 0.997 | 0.977 | 0.942 | 0.982 | |

300 | 10 | 0.992 | 0.936 | 0.921 | 0.964 |

20 | 0.996 | 0.964 | 0.932 | 0.973 | |

30 | 0.997 | 0.974 | 0.940 | 0.980 | |

500 | 10 | 0.991 | 0.938 | 0.918 | 0.962 |

20 | 0.996 | 0.966 | 0.931 | 0.974 | |

30 | 0.997 | 0.974 | 0.941 | 0.980 |

Correlation between the true parameter value of \(b_{j}\) in the data generating model and its estimate in the estimation model

Data generating model | D-diffusion | D-LBA | |||
---|---|---|---|---|---|

Estimation model | D-diffusion | D-LBA | D-diffusion | D-LBA | |

| | ||||

100 | 10 | 0.998 | 0.958 | 0.944 | 0.981 |

20 | 0.998 | 0.971 | 0.947 | 0.984 | |

30 | 0.998 | 0.976 | 0.934 | 0.982 | |

300 | 10 | 0.999 | 0.964 | 0.946 | 0.990 |

20 | 0.999 | 0.978 | 0.963 | 0.993 | |

30 | 0.999 | 0.980 | 0.952 | 0.993 | |

500 | 10 | > 0.999 | 0.964 | 0.970 | 0.996 |

20 | > 0.999 | 0.980 | 0.961 | 0.996 | |

30 | > 0.999 | 0.980 | 0.955 | 0.996 |

Correlation between the true parameter value of \(\theta _{i}\) in the data generating model and its estimate in the estimation model

Data generation model | D-diffusion | D-LBA | |||
---|---|---|---|---|---|

Estimation model | D-diffusion | D-LBA | D-diffusion | D-LBA | |

| | ||||

100 | 10 | 0.882 | 0.739 | 0.612 | 0.736 |

20 | 0.921 | 0.853 | 0.695 | 0.836 | |

30 | 0.935 | 0.894 | 0.738 | 0.856 | |

300 | 10 | 0.879 | 0.721 | 0.596 | 0.727 |

20 | 0.924 | 0.830 | 0.708 | 0.826 | |

30 | 0.941 | 0.892 | 0.741 | 0.861 | |

500 | 10 | 0.880 | 0.716 | 0.652 | 0.754 |

20 | 0.922 | 0.838 | 0.719 | 0.831 | |

30 | 0.941 | 0.887 | 0.745 | 0.870 |

Correlation between the true parameter value of \(\tau _{j}\) in the data generating model and its estimate in the estimation model

Data generation model | D-diffusion | D-LBA | |||
---|---|---|---|---|---|

Estimation model | D-diffusion | D-LBA | D-diffusion | D-LBA | |

| | ||||

100 | 10 | > 0.999 | 0.999 | 0.986 | 0.995 |

20 | > 0.999 | 0.999 | 0.988 | 0.996 | |

30 | > 0.999 | 0.999 | 0.985 | 0.997 | |

300 | 10 | > 0.999 | > 0.999 | 0.997 | 0.999 |

20 | > 0.999 | > 0.999 | 0.995 | 0.999 | |

30 | > 0.999 | > 0.999 | 0.995 | 0.999 | |

500 | 10 | > 0.999 | > 0.999 | 0.997 | 0.999 |

20 | > 0.999 | > 0.999 | 0.998 | > 0.999 | |

30 | > 0.999 | > 0.999 | 0.997 | > 0.999 |

*Estimation efficiency* (\({\hat{R}}\)*and effective sample size*) We computed the Gelman–Rubin diagnostic statistic (\({\hat{R}}\); Gelman et al. 2014) as a convergence diagnostic measure. We found that when the number of respondents is small, both models converge quite fast (fewer than 1000 iterations when \(I=100\)). Therefore, in this paper, we show the results when the number of respondents is the largest under the conditions that we considered.

*y*axis represents the proportion for which \({\hat{R}}\) is below the threshold. Gelman et al. (2014) suggested a threshold of 1.1 for \({\hat{R}}\); therefore, we set the same threshold. These proportions were computed on the basis of the MCMC samples from zero to the

*x*-axis value iterations in increments of 500. The solid and dashed lines represent the results estimated by the D-LBA and D-diffusion models, respectively.

Average effective sample sizes for each parameter under the conditions \(I=500\) and \(J=30\)

Generation model | D-Diffusion | D-LBA | ||
---|---|---|---|---|

Estimation model | D-Diffusion | D-LBA | D-Diffusion | D-LBA |

\(a_j\) | 48.25 | 146.09 | 49.03 | 63.41 |

\(\gamma _{i}\) | 333.72 | 938.00 | 320.33 | 781.89 |

\(b_{j}\) | 188.29 | 2101.40 | 202.00 | 1164.00 |

\(\theta _{i}\) | 969.67 | 3615.56 | 2411.72 | 4153.28 |

\(\tau _{j}\) | 1253.42 | 2291.05 | 2739.38 | 3153.62 |

\(\psi _{j}\) | – | 508.43 | – | 430.78 |

\(\sigma _{i}\) | – | 1516.56 | – | 2171.01 |

*Information criteria*In addition to the results presented above, we assessed the fitness of the models in terms of information criteria (WAIC: widely applicable information criterion and WBIC: widely applicable Bayesian information criterion; Watanabe 2010, 2013; Vehtari et al. 2017). Figure 9 shows the results under the largest data conditions (\(I=500, J=30\)). In all of these graphs, the solid lines represent the results estimated by the D-LBA model, and the dot–dash lines represent the results estimated by the D-diffusion model. The upper half represents the WAIC, and the lower half represents the WBIC. The graphs on the left are for data generated by D-diffusion, whereas those on the right are for data generated by the D-LBA model. For all datasets generated by the D-LBA model, both indices become lower when estimated by the D-LBA model. On the other hand, for all datasets generated by the D-diffusion model, the D-LBA model shows worse values than D-diffusion. These results are expected, and we confirmed the same results under all conditions.

## 5 Real data application: extraversion data

In this section, we consider a more realistic situation using real data to examine the applicability of the proposed D-LBA model.

*Data* We used the extraversion data in the diffIRT R package. These data, obtained by Molenaar et al. (2015), comprise 146 respondents for 10 items. Each item is a particular word or phrase related to extraversion behavior (e.g., “active” or “noisy”). Respondents were asked whether each item is appropriate to their personalities. For all respondents and all items, the actual response (yes/no) and RT were recorded, some of which are missing.

### 5.1 Results

Item parameters obtained by D-LBA and D-diffusion

Item | Prop. | MRT | \(a_j\) | \(b_j\) | \(\tau _j\) | \(\psi _j\) | ||||
---|---|---|---|---|---|---|---|---|---|---|

D-LBA | D-diff | D-LBA | D-diff | D-LBA | D-diff | D-LBA | ||||

1 | Active | 0.741 | 1.486 | 0.966 | 0.520 | \(-\) 2.019 | \(-\) 0.704 | 0.451 | 0.575 | 1.503 |

2 | Noisy | 0.538 | 1.357 | 1.091 | 0.540 | \(-\) 0.087 | \(-\) 0.117 | 0.327 | 0.475 | 2.513 |

3 | Energetic | 0.846 | 1.120 | 1.573 | 0.597 | \(-\) 2.032 | \(-\) 1.322 | 0.394 | 0.502 | 2.813 |

4 | Enthusiastic | 0.916 | 1.000 | 2.090 | 0.579 | \(-\) 2.440 | \(-\) 1.854 | 0.398 | 0.458 | 2.957 |

5 | Impulsive | 0.539 | 1.298 | 1.240 | 0.551 | \(-\) 0.240 | \(-\) 0.213 | 0.339 | 0.464 | 2.751 |

6 | Jovial | 0.902 | 1.262 | 1.187 | 0.507 | \(-\) 4.338 | \(-\) 1.380 | 0.412 | 0.501 | 1.937 |

7 | Viable | 0.937 | 1.142 | 1.306 | 0.490 | \(-\) 2.694 | \(-\) 1.788 | 0.372 | 0.511 | 3.393 |

8 | Eupeptic | 0.958 | 1.090 | 1.419 | 0.454 | \(-\) 3.448 | \(-\) 2.065 | 0.352 | 0.434 | 2.955 |

9 | Communicative | 0.824 | 1.728 | 0.786 | 0.408 | \(-\) 2.056 | \(-\) 0.904 | 0.472 | 0.609 | 2.088 |

10 | Spontaneous | 0.860 | 0.986 | 1.848 | 0.632 | \(-\) 2.488 | \(-\) 1.527 | 0.370 | 0.462 | 2.750 |

In addition, Fig. 10 shows the posterior density for each item parameter. In the figure, the left-hand side shows the plot for \(a_j\), and the right-hand side shows the plot for \(b_j\). From this figure, we can see that each parameter estimate was properly obtained because each density has only one peak. Moreover, the parameter estimates for each model were highly correlated with the summary statistics. For items having a high proportion of “yes” responses (e.g., “viable” and “eupeptic”), \(b_j\) become lower. As for \(a_j\), it corresponds with the mean response time (MRT). This can be considered evidence for the validity of the estimates.

Table 11 summarizes the mean effective sample sizes for both models. All but \(\tau _j\) indicated higher values in the D-LBA model than in the D-diffusion model. This result suggests that the D-LBA model can estimate parameters more efficiently than the D-diffusion model, even for real data.

Mean effective sample size for each parameter in the D-diffusion and D-LBA models

Estimation model | D-Diffusion | D-LBA |
---|---|---|

\(a_j\) | 101.84 | 260.60 |

\(\gamma _{i}\) | 370.02 | 843.82 |

\(b_{j}\) | 684.45 | 1255.58 |

\(\theta _{i}\) | 2465.66 | 3201.51 |

\(\tau _{j}\) | 2035.36 | 1706.55 |

\(\psi _{j}\) | – | 720.89 |

\(\sigma _{i}\) | – | 1414.07 |

One of the major advantages of using the model-based parameters instead of much simpler descriptive statistics such as the MRT or the proportion of choosing the first category is that, while the theory of psychological measurement suggests that the observed data contain random fluctuations or errors, the substantially informed model parameters directly reflect the underlying psychological process that elicited the observed responses (Molenaar et al. 2015). The model also decomposes the observed information into several different meaningful sources of variability. For instance, the MRT is used as a property of an item, although it may be influenced by some respondents’ traits, which were represented by the parameter \(\gamma _i\). If more “deliberate” respondents were unintentionally collected, the observed MRT may be longer even if they answered the same items. However, we cannot distinguish the respondents’ traits from the item traits as long as the simple MRT is used. By estimating both \(\gamma _i\) and \(a_j\) at the same time, \(a_j\) can be seen as that not influenced by respondents’ traits. This also makes it reasonable to examine the item parameter estimates from a qualitative perspective without considering respondents’ traits. For example, it may be difficult to provide an answer quickly and intuitively for an item having a higher \(a_j\) (e.g., “communicative”). In other words, such items seem to indicate more complicated meanings than those having a lower \(a_j\). In addition, more respondents seem to answer “yes” for items having a lower \(b_j\). From a qualitative viewpoint, items with a higher \(b_j\) tend to have negative meanings. For example, in the Oxford Advanced Learner’s Dictionary (Deuter et al. 2015), the word “noisy” is defined as “making a lot of noise,” and the word “impulsive” is defined as “acting suddenly without thinking carefully about what might happen because of what you are doing.” Obviously, these two words have more negative connotations than positive ones. Therefore, respondents may be reluctant to answer “yes” to these items.

*full*D-LBA models exhibit the best values in terms of both the WAIC and WBIC. Particularly, the obtained WAIC values of the

*full*D-diffusion and

*full*D-LBA models are 0.829 and 0.757, whereas the WBIC values are 1024.98 and 898.71, respectively. Therefore, the results indicate that the proposed D-LBA model is better fitted than the D-diffusion model in terms of these information criteria with this dataset. In other words, the assumption that the item discriminability and expected RTs are completely correlated is unlikely in real data. In addition, the results indicate that both \(\alpha _{ij}\) and \(\eta _{ij}\) should be decomposed into item factors and person factors to show better fit to real data.

Information criteria values of the full model as well as more parsimonious submodels for the D-LBA and D-Diffusion models

Model | Boundary | Between-trial variance | Number of parameters | WAIC | WBIC |
---|---|---|---|---|---|

D-LBA | \(\beta _{ij} = \gamma _{i}/a_{j}\) | \(\eta _{ij}=\sigma _{i}/\psi _{j}\) | \(4J+3I\) | 0.757 | 898.71 |

\(\eta _{ij}=\sigma _{i}\) | \(3J+3I\) | 0.777 | 968.52 | ||

\(\eta _{ij}=\psi _{j}\) | \(4J+2I\) | 0.828 | 1023.02 | ||

\(\eta _{ij}=\eta\) | \(3J+2I+1\) | 0.784 | 982.39 | ||

\(\beta _{ij} = a_{j}\) | \(\eta _{ij}=\sigma _{i}/\psi _{j}\) | \(4J+2I\) | 0.868 | 1123.83 | |

\(\eta _{ij}=\sigma _{i}\) | \(3J+2I\) | 0.909 | 1234.57 | ||

\(\eta _{ij}=\psi _{j}\) | \(4J+I\) | 0.939 | 1234.60 | ||

\(\eta _{ij}=\eta\) | \(3J+I+1\) | 0.915 | 1241.96 | ||

\(\beta _{ij} = \gamma _{i}\) | \(\eta _{ij}=\sigma _{i}/\psi _{j}\) | \(3J+3I\) | 0.816 | 989.34 | |

\(\eta _{ij}=\sigma _{i}\) | \(2J+3I\) | 0.818 | 1030.26 | ||

\(\eta _{ij}=\psi _{j}\) | \(3J+2I\) | 0.818 | 1030.26 | ||

\(\eta _{ij}=\eta\) | \(2J+2I+1\) | 0.820 | 1040.59 | ||

\(\beta _{ij} = \beta\) | \(\eta _{ij}=\sigma _{i}/\psi _{j}\) | \(3J+2I+1\) | 0.880 | 1144.93 | |

\(\eta _{ij}=\sigma _{i}\) | \(2J+2I+1\) | 0.931 | 1269.21 | ||

\(\eta _{ij}=\psi _{j}\) | \(3J+I+1\) | 0.962 | 1273.85 | ||

\(\eta _{ij}=\eta\) | \(2J+I+2\) | 0.944 | 1286.09 | ||

D-Diffusion | \(\alpha _{ij}=\gamma _{i}/a_{j}\) | - | \(3J+2I\) | 0.829 | 1024.98 |

\(\alpha _{ij}=a_{j}\) | - | \(3J+I\) | 0.902 | 1184.12 | |

\(\alpha _{ij}=\gamma _{i}\) | - | \(2J+2I\) | 0.849 | 1052.88 | |

\(\alpha _{ij}=\alpha\) | - | \(2J+I+1\) | 0.914 | 1204.62 |

## 6 Discussion

In this study, we proposed a new cognitively-based IRT model that can explain the flexible relationship between the item discriminability and the expected RTs for personality assessment using RT information. The likelihood function of the proposed D-LBA IRT model can be essentially seen as a reparameterization of the LBA model. Our argument is that this reparameterization is the point: the LBA framework for modeling the RT data, which has been proved to be useful in the field of cognitive and mathematical psychology, has not been applied to IRT models in psychometrics. The aims of this study are to clearly reveal the relationship between these two models, which are both popular in different fields, and to combine the strengths of both models to propose the D-LBA IRT model.

From the simulation results, we identified four advantageous properties of the proposed D-LBA model. First, the proposed model can recover parameters as sufficiently as the D-diffusion model. Second, each parameter in the proposed and D-diffusion models can be interpreted in nearly the same way. Third, the correlations between the true values and the estimates obtained from the proposed model are higher than those from the D-diffusion model when the true data generation model is different. Fourth, the proposed model converges much faster and estimates more effectively than the D-diffusion model. These findings suggest that the proposed D-LBA model is a more realistic, efficient, and practical yet simpler alternative to the D-diffusion model of the item response and RT.

In addition, we applied the D-LBA and D-diffusion models to a real personality measurement dataset. Consequently, from the viewpoint of the information criterion, the D-LBA model was found to fit this dataset better than the D-diffusion model.

By introducing a new parameter and extending the simple LBA model, the proposed model can mitigate the problems that originate from the diffusion IRT model. Nevertheless, in empirical applications of the proposed D-LBA model, three potentially significant issues might persist.

First, the time required for the MCMC estimation algorithm of the proposed D-LBA model might be substantial. In our simulation study, the proposed model took around 7000 iterations to achieve full convergence, which was judged on the basis of \({\hat{R}}\) when all parameters were less than 1.1. This corresponds to a few hours (with \(I=500\) and \(J=30\)) in our computational environment (CPU: Intel Core i7-7700K; Memory: 64 GB; Operating system: Windows 10). Note that a greater computational time was needed for the existing D-diffusion model; at times, even 40, 000 iterations were not sufficient for convergence. Nevertheless, for researchers who want to analyze real data, the estimation time of the proposed model might not be sufficiently fast. In this case, one may be able to use variational Bayes (VB) inference instead of MCMC estimation for the proposed D-LBA model. Using rstan, it is easy to apply automatic differentiation variational inference (ADVI) without the need to specify the approximating variational distribution. With this approach, researchers need to satisfy only one condition: each parameter should be approximately transformed to a normal distribution.

The second potential issue is related to the test properties. In this study, we applied the D-LBA model to a single-dimensional personality scale to compare the results with those of the existing D-diffusion model. However, many existing personality scales have more than two dimensions. Therefore, an interesting direction for future research would be to extend the proposed D-LBA model to the multidimensional case. One simple approach to deal with multidimensionality is to adopt the Thurstonian IRT model (Brown and Maydeu-Olivares 2011). In the Thurstonian IRT model, each choice corresponds to a different dimension, and the respondent is asked to choose the option that best describes himself/herself. A typical forced-choice questionnaire consists of three or four choices; hence, LBA-based models would be appropriate for this case as compared to the diffusion-based model, which can only handle two-choice items. In this study, we focused on cases in which an item has only two choices. However, the LBA-based approach provides the possibility of extension to items that have more than two alternative choices. We have already started examining the possibility of this extension, and our preliminary findings are that we might need additional parameter constraints to achieve identification in this case. This may be because the LBA model considers only the first choice that reached the boundary, even when the item has more than two choices. Therefore, extension of the proposed model to more than two alternative choices is a vital area of future study.

Third, our proposed approach is model-based; therefore, strictly speaking, the advantages of the model only apply when the underlying model is correct (Tuerlinckx et al. 2016). However, as is often said, “all models are wrong, but some are useful.” We believe, in line with Box (1979), that the relevant question is not whether the model assumptions are met exactly but rather whether the model is illuminating and sufficiently useful as an approximation of reality. On the basis of the model comparison and posterior predictive check that we present in this paper, we are particularly positive about the empirical applicability of the model. That being said, a more thorough investigation regarding the fit and prediction of the model, such as the evaluation of person fit (e.g., Ferrando 2007), would be desirable in future studies.

It is noted that the D-LBA model cannot be used for ability measurements because the drift rate is expressed as the difference between the respondent and item parameters. As mentioned earlier, this model assumption corresponds to the nearness hypothesis (Kuncel 1973) and the distance–difficulty hypothesis (Ferrando and Lorenzo-Seva 2007) of personality measurement. On the other hand, under these hypotheses, the respondent tends to answer more quickly when he or she has an extremely low ability; this is obviously unlikely in ability measurement unless the respondents are simply guessing. Therefore, another interesting direction for future research would be to develop a Q-version of the LBA IRT model for ability measurements.

Finally, one of the major advantages of the IRT framework is that thanks to the decomposition of the observed data variability into item and respondent parameters, items can be scaled independently of respondents; likewise, respondents can be scaled independently of items. As one of the anonymous reviewers pointed out, this characteristic would be particularly advantageous in situations when the observed data are accumulated from different sets of samples and items over time, as in the typical application of IRT for educational measurement. On the other hand, in typical personality assessment, items do not change across respondents. However, recent technological developments allow us to administer large-scale personality assessments in which different items are presented to different respondents and to model such data (e.g., Condon and Revelle 2015; Okada et al. 2018). Therefore, the proposed D-LBA IRT model may be applicable in such modern personality measurement studies in which the RT is also corrected. This can be a fruitful direction of future research.

## Notes

### Funding

Funding was provided by Japan Society for the Promotion of Science (Grant nos. JP17J07674, JP17H04787) and Okawa Foundation Research Grant.

## References

- Annis, J., Miller, B. J., & Palmeri, T. J. (2016). Bayesian inference with Stan: A tutorial on adding custom distributions.
*Behavior Research Methods*,*48*, 1–24.CrossRefGoogle Scholar - Box, G. E. P. (1979). Robustness in the strategy of scientific model building. In R. L. Launer & G. B. Wilkinson (Eds.),
*Robustness in statistics*(pp. 201–236). New York: Academic Press.CrossRefGoogle Scholar - Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling of forced choice questionnaires.
*Educational and Psychological Measurement*,*71*, 460–502.CrossRefGoogle Scholar - Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation.
*Cognitive Psychology*,*57*, 153–178.CrossRefGoogle Scholar - Condon, D. M., & Revelle, W. (2015). Selected personality data from the SAPA-Project: On the structure of phrased self-report items.
*Journal of Open Psychology Data*,*3*, e6.CrossRefGoogle Scholar - Deuter, M., Bradbery, J., & Turnbull, J. (2015).
*Oxford advanced learner’s dictionary*(9th ed.). London: Oxford University Press.Google Scholar - Donkin, C., Brown, S., Heathcore, A., & Wagenmakers, E. J. (2011). Diffusion versus linear ballistic accumulation: Different models but the same conclusions about psychological processes?
*Psychonomic Bulletin & Review*,*18*, 61–69.CrossRefGoogle Scholar - Feller, W. (1968). Random walk and ruin problems. In W. Feller (Ed.),
*An introduction to probability theory and its applications*(3rd ed., Vol. 1, pp. 342–371). New York: Wiley.Google Scholar - Ferrando, P. J. (2007). A Pearson-type-VII item response model for assessing person fluctuation.
*Psychometrika*,*72*, 25–41.MathSciNetCrossRefzbMATHGoogle Scholar - Ferrando, P. J., & Lorenzo-Seva, U. (2007). An item response theory model for incorporating response time data in binary personality items.
*Applied Psychological Measurement*,*31*, 525–543.MathSciNetCrossRefGoogle Scholar - Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014).
*Bayesian data analysis*(3rd ed.). New York: CRC Press.zbMATHGoogle Scholar - Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models.
*Annals of Applied Statistics*,*2*, 1360–1383.MathSciNetCrossRefzbMATHGoogle Scholar - Grice, G. R. (1968). Stimulus intensity and response evocation.
*Psychological Review*,*75*, 359–373.CrossRefGoogle Scholar - Heathcote, A., Brown, S. D., & Wagenmakers, E.-J. (2015). An introduction to good practices in cognitive modeling. In B. U. Forstmann & E.-J. Wagenmakers (Eds.),
*An introduction to model-based cognitive neuroscience*. Berlin: Springer Science & Business Media.Google Scholar - Heathcote, A., & Hayes, B. (2012). Diffusion versus linear ballistic accumulation: Different models for response time with different conclusions about psychological mechanisms?
*Canadian Journal of Experimental Psychology*,*66*, 125–136.CrossRefGoogle Scholar - Heathcote, A., & Love, J. (2012). Linear deterministic accumulator models of simple choice.
*Frontiers in Psychology*,*3*, 292.CrossRefGoogle Scholar - Kuncel, R. B. (1973). Response process and relative location of subject and item.
*Educational and Psychological Measurement*,*33*, 545–563.CrossRefGoogle Scholar - Laming, D. R. J. (1968).
*Information theory of choice reaction time*. New York: Wiley.Google Scholar - Leite, F. P., & Ratcliff, R. (2011). What cognitive process drive response biases? A diffusion model analysis.
*Judgment and Decision Making*,*6*, 651–687.Google Scholar - Luce, R. D. (1986).
*Response times*. New York: Oxford University Press.Google Scholar - Molenaar, D., Tuerlinckx, F., & van der Maas, H. L. J. (2015). Fitting diffusion item response theory models for responses and response times using the R package diffIRT.
*Journal of Statistical Software*,*66*, 1–34.CrossRefGoogle Scholar - Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations.
*Econometrica*,*16*, 1–32.MathSciNetCrossRefzbMATHGoogle Scholar - Okada, K., Vandekerckhove, J., & Lee, M. D. (2018). Modeling when people quit: Bayesian censored geometric models with hierarchical and latent-mixture extensions.
*Behavior Research Methods*,*50*, 406–415.CrossRefGoogle Scholar - Palada, H., Neal, A., Vuckovic, A., Martin, R., Samuels, K., & Heathcote, A. (2016). Evidence accumulation in a complex task: Making choices about concurrent multiattribute stimuli under time pressure.
*Journal of Experimental Psychology: Applied*,*22*, 1–23.Google Scholar - Ratcliff, R. (1978). A theory of memory retrieval.
*Psychological Review*,*85*, 59–108.CrossRefGoogle Scholar - Ratcliff, R., Thapar, A., & Mckoon, G. (2007). Application of the diffusion model to two-choice tasks for adults 75–90 years old.
*Psychology and Aging*,*22*, 56–66.CrossRefGoogle Scholar - Reddi, B. A. J., & Carpenter, R. H. (2000). The influence of decision time on performance.
*Nature Neuroscience*,*3*, 827–830.CrossRefGoogle Scholar - Roscam, E. E. (1987). Toward a psychometric theory of intelligence. In E. E. Roscam & R. Suck (Eds.),
*Progress in mathematical psychology*(pp. 151–174). Amsterdam: North-Holland.Google Scholar - Roscam, E. E. (1997). Models for speed and time-limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.),
*Handbook of modern item response theory*(pp. 187–208). Berlin: Springer Science & Business Media.CrossRefGoogle Scholar - Stone, M. (1960). Models for choice-reaction time.
*Psychometrika*,*25*, 251–260.CrossRefzbMATHGoogle Scholar - Thissen, D. (1983). Timed testing: An approach using item response theory. In D. J. Weiss (Ed.),
*New horizons in testing: Latent trait test theory and computerized adapting testing*(pp. 179–203). New York: Academic Press.Google Scholar - Tuerlinckx, F., & De Boeck, P. (2005). Two interpretations of the discrimination parameter.
*Psychometrika*,*70*, 629–650.MathSciNetCrossRefzbMATHGoogle Scholar - Tuerlinckx, F., Molenaar, D., & van der Maas, H. L. J. (2016). Diffusion-based response-time models. In W. J. van der Linden & R. K. Hambleton (Eds.),
*Handbook of item response theory, volume one: Models*(pp. 283–300). Boca Raton: Chapman & Hall/CRC Press.Google Scholar - van der Linden, W. J. (2016). Lognormal response-time model. In W. J. van der Linden (Ed.),
*Handbook of item response theory, volume one: Models*(pp. 261–282). Boca Raton: Chapman & Hall/CRC Press.Google Scholar - van der Maas, H. L. J., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences.
*Psychological Review*,*118*, 339–356.CrossRefGoogle Scholar - van der Maas, H. L. J., & Wagenmakers, E.-J. (2005). A psychometric analysis of chess expertise.
*The American Journal of Psychology*,*118*, 29–60.Google Scholar - van Ravenzwaaij, D., & Oberauer, K. (2009). How to use the diffusion model: Parameter recovery of three methods: EZ, fast-dm, and DMAT.
*Journal of Mathematical Psychology*,*53*, 463–473.MathSciNetCrossRefGoogle Scholar - Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.
*Statistics and Computing*,*27*, 1413–1432.MathSciNetCrossRefzbMATHGoogle Scholar - Voss, A., Nagler, M., & Lerche, V. (2013). Diffusion models in experimental psychology: A practical introduction.
*Experimental Psychology*,*60*, 385–402.CrossRefGoogle Scholar - Wagenmakers, E.-J., van der Maas, H. L. J., & Grasman, R. P. P. P. (2007). An EZ-diffusion model for response time and accuracy.
*Psychonomic Bulletin & Review*,*14*, 3–22.CrossRefGoogle Scholar - Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.
*Journal of Machine Learning Research*,*11*, 3571–3594.MathSciNetzbMATHGoogle Scholar - Watanabe, S. (2013). A widely applicable Bayesian information criterion.
*Journal of Machine Learning Research*,*14*, 867–897.MathSciNetzbMATHGoogle Scholar