Advertisement

Computational Brain & Behavior

, Volume 1, Issue 3–4, pp 228–236 | Cite as

A Comparison of Approximations for Base-Level Activation in ACT-R

  • Christopher R. FisherEmail author
  • Joseph Houpt
  • Glenn Gunzelmann
Article
  • 283 Downloads

Abstract

Cognitive models provide a principled alternative for hypothesis testing and measuring individual differences. However, many cognitive models are computationally intensive to simulate, making their use difficult. Using approximations can make the application of cognitive models more tractable. We compare the standard and hybrid approximations of the base-level activation equation for the ACT-R cognitive architecture with respect to four criteria: (1) preservation of core properties, (2) computational efficiency, (3) robustness to violations of assumptions, and (4) mathematical tractability. Contrary to a core property of the theory, activation for the standard approximation was non-monotonic with respect to the decay parameter, rendering it unidentifiable. Consequentially, the standard approximation is not valid for investigating individual differences in decay or experiments designed to manipulate decay. However, monotonicity is largely preserved with the hybrid approximation. Additionally, we show that both approximations are equally more computationally efficient than the exact equation. Furthermore, the hybrid approximation can achieve the same level of computational efficiency as the standard approximation while achieving greater accuracy. Our robustness analysis reveals that the hybrid approximation is more robust to violations of the assumption of equally spaced retrievals compared to the standard approximation. However, the hybrid approximation sacrifices some mathematical tractability in order to achieve improvements along the other criteria. Based on these findings, we encourage the use of the hybrid approximation for parameter estimation.

Keywords

ACT-R Memory Approximations 

Introduction

Formal cognitive models (henceforth cognitive models) are invaluable research tools in cognitive science. In comparison to verbal models, cognitive models offer several benefits, such as explicating theoretical assumptions, providing a coherent framework for exploring theoretical concepts, and making precise, quantitative predictions (McClelland 2009). Cognitive models also provide an alternative method for hypothesis testing and measuring individual differences. With this approach, it is possible to interpret experimental effects and differences among individuals in terms of cognitive processes. Some examples of these applications of cognitive models include comparing clinical populations in terms of memory processes (Riefer et al. 2002) and decision-making processes (Yechiam et al. 2005) and examining the effect of cognitive moderators on cognition, such as sleep deprivation (Walsh et al. 2017).

The cognitive architecture Adaptive Control of Thought-Rational (ACT-R; Anderson 2007; Anderson et al. 2004) has been developed to provide a comprehensive framework for studying cognition across disparate domains, spanning perception, motor processes, memory, and reasoning. ACT-R is well suited for hypothesis testing and measuring individual differences because it provides a unified framework within which competing hypotheses can be evaluated. However, using ACT-R poses several practical challenges. One challenge in particular is that models developed within ACT-R can be complex and computationally intensive to simulate, sometimes requiring the use of high-performance computing (Harris 2008). This problem is not unique to ACT-R, but follows a more general trade-off in theory development: As a theory advances, it will make increasingly specific predictions about an expanding range of phenomena. In the process, computational implementations of the theory may become less tractable, making it practically more difficult to examine and apply as a formal model.

One approach for mitigating these problems is to use approximations that abstract away less important details while maintaining core theoretical properties of the model. The quality of an approximation can be judged according to four related criteria: (1) preservation of core properties, (2) computational efficiency, (3) robustness to violations of assumptions, and (4) mathematical tractability. A successful approximation will improve mathematical tractability and computational efficiency while maintaining the core properties of the model. If core properties of the model are not preserved, the approximation might not be valid under some or perhaps any circumstances. A successful approximation is also robust to violations of its simplifying assumptions. If an approximation is highly sensitive to violations of those assumptions, it will have limited applicability and unknown validity when assumptions cannot be verified.

In the present paper, we compare two approximations for ACT-R’s memory activation equation (Anderson 2007) according to the four criteria outlined above. Our comparison demonstrates that the commonly used standard approximation (Anderson 1982)—sometimes referred to as optimized learning—fails to preserve a core property of the model—namely, that memory activation should decrease monotonically with increases in the decay parameter. By contrast, the more recently proposed hybrid approximation (Petrov 2006) preserves this core property except in some edge cases. Our analysis further shows that the hybrid approximation can achieve the same high level of computational efficiency as the standard approximation while producing much less approximation error. In contrast to the standard approximation, the hybrid approximation is robust to violations of the assumption of equally spaced retrievals for a wide range of decay values. However, we find that the hybrid approximation sacrifices some mathematical tractability in order to achieve these improvements. As a general principal, we recommend the use of the hybrid approximation, particularly when investigating individual differences and experimental manipulations of decay, or transient memory dynamics, such as sequential effects (Petrov and Anderson 2005).

The remainder of the paper is organized as follows. We begin with a brief overview of the ACT-R cognitive architecture before introducing base-level memory activation, which forms the basis of ACT-R’s declarative memory system. Next, we discuss several applications in which allowing parameters to vary is necessary to address substantive research questions. On this basis, we argue that an approximation should yield valid output not only for default parameter values but for the entirety of its parameter space. Finally, we compare the standard and hybrid approximations according to the four criteria outlined above. The code used in this paper can be found at https://osf.io/nwy8z/.

ACT-R

Adaptive Control of Thought-Rational, or ACT-R (Anderson 2007; Anderson et al. 2004), is a computational cognitive architecture designed for the development and simulation of unified theories of cognition. As envisioned by Allen Newell (1990), a unified theory of cognition “provides the total picture and explains the role of the parts and why they exist” (p. 18). Such theories should encompass all aspects of cognition (e.g., reasoning, memory, perception) and scale up to relatively complex tasks. Following this idea, the ACT-R architecture consists of multiple independent information processing modules, which include procedural memory, declarative memory, goal management, vision, audition, and motor modules, among others (Anderson 2007). Behavior of the architecture is coordinated through the procedural memory module, which operates as a production system. Modules interface with the procedural memory module through capacity-limited buffers. Requests to a given module are sent via the module’s buffer(s) where the results of the request are ultimately stored and made available to the procedural module, which represents central cognition. Information processing bottlenecks can emerge as a consequence of the buffers’ capacity limitations: only one request can be processed at a time, and only one unit of information can be stored as a result of a single request.

Base-Level Activation

One of the most extensively developed components of ACT-R is the declarative memory module (Anderson 2007; Anderson et al. 1998). The declarative memory module serves as a long-term memory store for factual information. The basic unit of declarative memory in ACT-R is a feature-value vector called a chunk, which is similar to a memory trace in other memory models (e.g., Hintzman 1984; Nosofsky 1986; Shiffrin and Steyvers 1997). According to ACT-R, the probability and speed with which a chunk is retrieved from declarative memory are increasing functions of activation. Activation in ACT-R is influenced by several components, including base-level activation, spreading activation, mismatch penalty, and activation noise. For the present purposes, we focus on the core component called base-level activation. A key characteristic of base-level activation is that it is context independent. It varies simply as a function of the frequency and recency with which a chunk has been retrieved. Base-level activation is defined as the sum over the inverse of the time elapsed since each retrieval of that chunk:
$$ a = \log\left( \sum\limits_{j = 1}^{N} t_{j}^{-d}\right) $$
(1)
where N is the total number of times the chunk has been used or retrieved, tj is the elapsed time in seconds since the jth retrieval, and d = [0,1) is a decay parameter.1 Because the exponent of tj is negative, higher values of the decay parameter are associated with greater decay, except when tj ≤ 1.2,3

Parameter Variability

A common practice in applying ACT-R to data involves fixing several parameters to default values that have worked well in previous applications. The purpose of using default parameter values is to limit the flexibility of the model, and to make a priori predictions. Given benefits of using default parameters and the fact that the standard approximation works well with the default decay value, one might question whether the standard approximation is problematic. Below, we present several situations in which the use of default parameters might prevent one from answering substantive questions.

One important use of parameters is to test hypotheses regarding cognitive mechanisms. The logic behind this idea is that a change in a parameter value reflects a change in the operation of a cognitive mechanism. One particularly relevant example involving the decay parameter was reported by Halverson et al. (2010), who varied decay as a function of circadian rhythm to capture performance decrements due to sleep deprivation in a spatial orientation task. The results from this model suggest that degraded performance was due to greater difficulty learning and retaining information. Some related examples of this approach include modeling the effect of sleep deprivation on alertness (Walsh et al. 2017), the effect of toluene on cognitive performance (Fisher et al. 2017), the effect of stress on cognition (Dancy et al. 2015), and integrating neuroimaging with cognitive models (Turner et al. 2015).

In addition, variability in parameter values can be used to investigate individual differences in cognition. It is well known that individuals vary in terms of cognition and strategy use. In addition to providing a richer and more accurate characterization of the data, it is possible to assess the relationship between model parameters and other psychological constructs, such as personality, using individualized models (Vandekerckhove 2014). Even when individual differences in cognition are not the primary focus of investigation, accounting for individual differences can prevent potential averaging artifacts (e.g., Brown and Heathcote 2003; Estes 1956; Siegler 1987). Taken together, this suggests that approximations that are accurate across the full range of parameters are needed.

Base-Level Activation Approximations

The computation of base-level activation presents two practical challenges. One challenge is computational: As the number of chunks and retrieval history increases, the computational demand required to evaluate the model also increases. The other challenge arises when the modeler wishes to incorporate prior knowledge into the model or to predict future behavior. In both cases, an approximation would be convenient as the distribution of memory usage is not precisely known. Challenges such as these have prompted the development of several approximations.

Standard Approximation

The derivation of the standard approximation is based on the simplifying assumption that the retrieval history of a chunk is equally spaced across time (Anderson 1982). Under the equal-space assumption, only two quantities are needed: (1) the time elapsed since the creation of the chunk, L, and (2) the number of uses, N. This eliminates the need to track each time lag, tj, and perform costly computations over the entire history of retrievals. Assuming equally spaced retrievals, Eq. 1 can be rewritten as:
$$ a = \log\left( \sum\limits_{j = 1}^{N} \left( j\cdot\frac{L}{N}\right)^{-d} \right). $$
(2)
Using the approximation, \({\sum }_{j = 1}^{N} j^{-d} \approx N^{1-d}/(1-d) \), Anderson (1982) showed that:
$$\begin{array}{@{}rcl@{}} \sum\limits_{j = 1}^{N} \left( j\cdot\frac{L}{N}\right)^{-d} &=& N^{d} L^{-d}\sum\limits_{j = 1}^{N} j^{-d} \!\approx\! N^{d} N^{1-d}\frac{L^{-d}}{1-d}\\ &=& N\frac{L^{-d}}{1-d}. \end{array} $$
(3)
By taking the log of Eq. 3 and rearranging, we arrive at the standard approximation:
$$ a \approx \log\left( \frac{N}{1-d}\right) -d \log\left( L\right). $$
(4)

Alternatively, it is possible to derive the standard approximation by assuming time lags are uniformly distributed across the lifetime of a chunk. Under this assumption, there is a bias towards overestimating activation.

Theorem 1

If the times of events are independently and identically uniformly distributed, \(T_{j} \overset {iid}{\sim } \text {Uniform}(0,L)\) , then the expected value of the standard approximation of activation is greater than or equal to the expected value of activation under the exact equation.

Proof

We first demonstrate that the standard approximation is equal to the logarithm of the expected value of a uniformly distributed random variable.
$$\begin{array}{@{}rcl@{}} \log\left( \frac{N}{1-d}\right) &-&d \log(L) = \log\left( \frac{N}{1-d}\right)\\ &+& \log(L^{-d}) = \log\left( \frac{N L^{-d}}{1-d}\right) \end{array} $$
$$\frac{L^{-d}}{1-d}= \frac{1}{L}{{\int}_{0}^{L}} s^{-d}\ ds =\mathrm{E}[T_{j}^{-d}] $$
Therefore,
$$\log\left( \frac{N}{1-d}\right) -d \log(L) = \log\left( N \mathrm{E}[T_{j}^{-d}]\right). $$
However, the expected value of the exact equation is
$$\mathrm{E}\left[\log\left( N T_{j}^{-d}\right)\right].$$
The logarithm is a concave function, so according to Jensen’s inequality,
$$\log\left( N \mathrm{E}[T_{j}^{-d}]\right) \geq \mathrm{E}\left[\log\left( N T_{j}^{-d}\right)\right].$$
Therefore,
$$\log\left( \frac{N}{1-d}\right) -d \log(L) \ge \mathrm{E}\left[\log\left( N T_{j}^{-d}\right)\right].$$

In the following sections, we detail two problems with the standard approximation. In particular, we find that memory activation does not decrease monotonically with respect to the decay parameter, and approximation error grows unacceptably large for high values of decay.

Non-monotonicity

In ACT-R, the decay parameter d governs the rate with which memory activation decreases during the passage of time. Higher values of d are intended to correspond to faster decreases in activation. While this is true for the exact base-level activation in Eq. 1 so long as tj > 1, it is not the case for the standard approximation in Eq. 4.

Theorem 2

The standard approximation is non-monotonic in d for chunks with a lifetime larger than Euler’s constant, L > e.

Proof

To demonstrate that the standard approximation is non-monotonic in d, we take the derivative of Eq. 4 with respect to d and show that it changes sign for some d ∈ (0,1).
$$ \frac{d}{d d} \left[\log\left( \frac{N}{1-d}\right) -d \log\left( L\right) \right] = \frac{1}{1-d} - \log(L) $$
(5)
By solving Eq. 5 for zero, we obtain the minimum:
$$ d_{*} = 1 - \frac{1}{\log(L)}. $$
(6)
This implies that when d < d, the derivative of the standard approximation is negative, and whenever d > d, the activation increases as d increases. As long as d∈ (0,1), there is a minimum activation as a function of d within the range of possible parameter values. This will occur as long as L > e. □
Figure 1 illustrates both of these properties. Each plot compares the standard approximation to the exact equation in terms of the relationship between activation and d (assuming equal time intervals between retrievals for both equations). As shown in the left panel, activation decreases with increasing values of d until it reaches d, beyond which point activation increases with increasing values of d. Non-monotonicity is especially problematic for parameter estimation because no unique mapping exists between d and activation, except at the minimum. As a consequence, the decay parameter is unidentifiable. Although non-identifiability is typically not problematic when d is fixed at the default value, it does become a significant issue when d is treated as a free parameter.
Fig. 1

Left panel: a demonstration of the non-monotonic relationship between decay and activation in the standard approximation. The red vertical line marks the minimum point of d. L = 200 and N = 10. Right panel: a demonstration of the monotonically increasing relationship between decay and activation in the standard approximation with L = 2 and N = 1

Perhaps even more problematic, the right panel of Fig. 1 demonstrates that when L is sufficiently small, activation increases monotonically with d, in direct contrast with the intended behavior of the parameter. For chunks with longer life times, d approaches 1 (the maximum value of d): \(\lim _{L\to \infty } 1 - \frac {1}{\log (L)} = 1\). However, Fig. 2 demonstrates that the limit is reached logarithmically, requiring more time than is generally found in experimental contexts.
Fig. 2

The minimum of d as a function L in log scaling

Error Profile

In the previous sections, we detailed several undesirable properties of the standard approximation. In this section, we examine the magnitude of the approximation error—defined as the difference between the standard approximation and the exact equation—and how it varies as a function of decay rate (d), chunk lifetime (L), and total number of retrievals (N). Across a wide range of L and N, higher values of d resulted in higher error, as evident in Fig. 3. The left panel shows the error does not change as a function of L, regardless of d. The right panel indicates error decreases as N increases. Additionally evident in the right panel, error decreases faster and asymptotes closer to zero for smaller values of d.
Fig. 3

Left panel: approximation error as a function of d and L. N is fixed at 20. Right panel: approximation error as a function of d and N. L is fixed at 10

Hybrid Approximation

The standard approximation is the most commonly used approximation. However, Petrov (2006) proposed a hybrid approximation to overcome some limitations of the standard approximation. In particular, the standard approximation fails to adequately account for transient spikes in activation immediately following the use of a chunk. As its name implies, the hybrid approximation combines the exact equation and the standard approximation into a more general expression. The hybrid approximation divides the chunk history into two sets: a recent set that is computed according to the exact equation and a distant set that is computed according to the standard approximation. The hybrid approximation is defined as
$$ a \approx \log\left( \sum\limits_{j = 1}^{k} t_{j}^{-d} + \frac{(N-k)(L^{1-d} - t_{k}^{1-d})} {(1-d)(L - t_{k})}\right). $$
(7)
The parameter k denotes the number of records in the recent set with an exact time stamp, and tk is an adjustment term corresponding to the time elapsed since the kth use of the chunk. When k = 0, the hybrid approximation reduces to the standard approximation, and, at the other extreme, when k = N, it reduces to the exact equation.

Petrov (2006) showed that tracking the exact history of the most recent retrieval (k = 1) is sufficient to accurately capture the spikes in activation. Tracking the most recent use is effective because retrievals occurring early in the chunk history tend to have large lags, and, as a consequence, contribute little to the resulting activation computed from Eq. 1. In his analysis, however, Petrov (2006) did not investigate the monotonicity of activation as a function of d. In what follows, we show that the hybrid approximation mitigates the limitations of standard approximation outlined above as long as minimally restrictive conditions are satisfied.

Monotonicity

In this section, we will identify the conditions under which activation decreases monotonically in d for the hybrid approximation. Our basic approach involves demonstrating when both terms in the derivative presented below will be negative. We restrict our analysis to the special case in which k = 1 as this is computationally efficient and provides an accurate approximation.

With t indicating the most recent retrieval, Eq. 7 simplifies to,
$$a \approx \log\left( t^{-d} + \frac{(N-1)(L^{1-d} - t^{1-d})}{(1-d)(L - t)}\right). $$
In the case that there has only been one retrieval, there is no difference between the approximation and the exact equation. When N = 1, the derivative of Eq. 1 is − log (t), which is negative as long as t > 1. Thus, for the exact equation and the hybrid approximation, as long as t > 1, activation is monotonically decreasing in d. For the approximation, we focus only on the cases when t > 1.

Theorem 3

The hybrid approximation for base-level activation with k = 1 is monotonically decreasing in d when either,
  1. 1.

    t > exp [1/(1 − d)] or

     
  2. 2.

    1 < t < exp [1/(1 − d)] and L > exp [1/(1 − d)].

     

Proof

To characterize the approximation as a function of d for the general case, when N > 1, it is sufficient to examine the derivative of the sum inside the logarithm because the logarithm is a monotonic transform,
$$\begin{array}{@{}rcl@{}} &&\frac{\partial}{\partial d}\left[t^{-d} + \frac{(N-1)(L^{1-d} - t^{1-d})}{(1-d)(L-t)}\right] =\\ &&-t^{-d} \log \left( t \right) + \frac{(N-1)L^{-d}t^{-d}}{(L-t)(1-d)^{2}} \left[ Lt^{d}(1 - (1-d)\log \left( L \right))\right.\\ && \left. -L^{d} t(1 - (1-d) \log \left( t \right))\right]. \end{array} $$

The first term in the sum is negative as long as t > 1 as that implies log(t) > 0. When N > 1, there has been more than one retrieval, so N − 1 > 0, and the most recent retrieval is not as old as the lifetime of the chunk, so Lt > 0. Furthermore, Ld,td, and (1 − d)2 are positive regardless of the value of d, so the multiplier \(\frac {(N-1)L^{-d}t^{-d}}{(L-t)(1-d)^{2}}\) is positive for all d. Hence, as long as the difference inside the bracket is negative, the derivative of the activation with respect to d is negative. To assess the sign of this difference, we examine two cases, first when t > exp [1/(1 − d)] and second when 1 < t < exp [1/(1 − d)] and L > exp [1/(1 − d)]. In both cases, we will use the fact that the natural logarithm is a monotonic function, so L > t implies log(L) > log(t). Further, when L,t > 1 and d ∈ (0,1), then Ltd > Ldt.

Because the natural logarithm is a monotonic transform, the first case assumption that t > exp [1/(1 − d)] implies that log(t) > 1/(1 − d), thus (1 − d)log(t) > 1. Similarly, because L > t,(1 − d)log(L) > 1.

We need to show that,
$$Lt^{d}(1-(1-d)\log(L)) - L^{d}t(1-(1-d)\log(t)) < 0 $$
or, equivalently,
$$Lt^{d}(1-(1-d)\log(L)) < L^{d}t(1-(1-d)\log (t)). $$
Both L and t are positive, so dividing both sides by Ld and td does not change the sign, and hence it is sufficient to show,
$$L^{1-d}(1-(1-d)\log (L)) < t^{1-d}(1-(1-d)\log(t)).$$
Both (1 − d)log(L) > 1 and (1 − d)log(t) > 1, so both sides of the inequality are negative. Thus, the ordering holds because both L1−d > t1−d and, log(L) > log(t) implies − (1 − d)log(L) < −(1 − d)log(t) which, in turn, implies 1 − (1 − d)log(L) < 1 − (1 − d)log(t). Therefore, in this case, the approximation is monotonically decreasing in d.
Second, we assume, that 1 < t < exp [1/(1 − d)], which implies that (1 − d)log(t) < 1, and we assume L > exp [1/(1 − d)] which implies (1 − d)log(L) > 1. In this case,
$$Lt^{d}(1-(1-d)\log(L)) \!< 0\textrm{ and} -L^{d}t(1-(1-d)\log(t)) \!<0. $$
Thus,
$$Lt^{d}(1-(1-d)\log(L)) - L^{d}t(1-(1-d)\log(t)) < 0. $$

When L < exp [1/(1 − d)], the situation is more complicated; hence, we turn to simulation to explore both the range of parameters under which monotonicity will hold and the magnitude of the approximation error.

Parameter Sweep

Our analysis in the previous section showed that the behavior of the derivative is not straightforward for portions of the parameter space falling outside the rules we defined. We performed a parameter sweep to better understand the extent to which the derivative decreases monotonically in these conditions. We varied d from 0.01 to 0.99 in steps of 0.01, and N from 2 to 100 in steps of 1. Based on our analysis above, we used exponential spacing for tk and the difference between L and tk to ensure that the conditions for producing non-negative values were present in our parameter sweep. Exponential spacing was achieved by varying the exponent of e from −20 to 6 in steps of 0.5 (min ≈ 2.06e− 9, max ≈ 403). Overall, the derivative is non-negative in 72% of cases. Whether or not it was non-negative depended critically upon the value of tk: the derivative is non-negative in 95% of cases in which tk < 1, but only 0.6% of cases in which tk ≥ 1. Among the cases in which tk ≥ 1 and the derivative was non-negative, the maximum difference between L and tk was extremely small: 8.32e− 7.

Taken together, these findings suggest that the derivative is likely to be non-negative when tk ≤ 1, mirroring a similar problem with the exact equation in which the derivative is non-negative at small time scales. However, the hybrid derivative is likely to be negative so long as tk ≥ 1 and the difference between L and tk is not extremely small. The conditions for producing a non-negative derivative are rare in practice and over-represented in our parameter sweep by design. Such conditions would require creating a chunk and subsequently retrieving it multiple times within the span of one second, or retrieving the chunk multiple times in rapid succession.

Computational Efficiency

Big O notation is commonly used in computer science as a metric of algorithmic efficiency (Stephens 2013). Rather than measuring the runtime of an algorithm directly, Big O notation measures how runtime increases as the input size becomes arbitrarily large. For simplicity, we will evaluate the equations with respect to a single chunk as a function of N. In Big O notation, the standard approximation is O(1), meaning that the computing time is constant. This follows from the fact that the standard approximation only requires a single computation regardless of the chunk history. Similarly, computational efficiency for the hybrid approximation is O(1) because k is intended to a small integer constant (e.g., 1), and thus has no bearing on the algorithm’s asymptotic behavior. By contrast, computational efficiency for the exact equation is O(N), indicating that runtime scales linearly.

As mentioned by a reviewer, Big O notation has several limitations, which we believe are worth discussing. These limitations stem from the focus on the change in runtime relative to an arbitrarily large input size. In practice, because the size of the input is not arbitrarily large, constants that are otherwise negligible in the limit can contribute significantly to the total runtime of an algorithm. Moreover, the computations that scale with input size are also important. For example, transcendental functions (e.g., non-integer powers and logarithms) require more computational resources than addition and multiplication. One could argue that in general, comparing algorithms in terms of the number of transcendental operations is a better metric of computational efficiency. Although that might be true, in this particular case, however, both methods lead to similar conclusions: The standard and hybrid approximations are equivalent in terms of computational efficiency, and both are much more efficient compared to the exact equation.

Mathematical Tractability

Mathematical tractability, unlike computational efficiency, is more difficult to evaluate because it depends on the mathematical analyses one wishes to perform. We are not aware of a general metric for measuring mathematical tractability. However, as a heuristic approach, we consider one aspect of mathematical tractability: the ease with which behavior of the derivative with respect to d can be analyzed. The derivative for the standard approximation was easy to obtain, and its behavior reduced to a few simple rules that were easy to understand. By contrast, the derivative of the hybrid approximation was more complex and more difficult to understand. Although we were able to obtain a small set of rules that guarantee a negative derivative, the behavior outside of those rules was not straightforward. As a consequence, we needed to perform a simulation to understand the behavior of the derivative outside of the rules. Our simulation revealed that the rules excluded a large set of cases in which the derivative was likely to be negative.

Robustness

We performed a simulation to compare the standard and hybrid approximations in terms of robustness to violations of the assumption of equally spaced retrievals. Petrov (2006) provided a single example illustrating that the hybrid approximation is more robust compared to the standard approximation when d = 0.5. We extend this analysis to examine robustness as a function of d and the chunk history distribution. In our simulation, the model created a chunk and retrieved it 10 times within a period of Lmax = 100. The chunk history distributions were either \(T_{j} \sim \text {exponential}(\frac {1}{2}L_{\max })\) truncated at Lmax or Tj ∼uniform(0,Lmax). We used two values of k: k = 0, which reduces to the standard approximation, and k = 1. Decay was varied across values of d = {0.1,0.3,0.5,0.7,0.9,0.99}. Activation was computed iteratively through each run of the simulation, starting at L = 0.5 and incrementing by 0.5 until reaching Lmax. RMSE between the exact and approximate equations was computed for each run and averaged across 1000 replications of each condition.

The results in Fig. 4 reveal three important patterns. First, uniform and exponential chunk histories yield highly similar error patterns. Second, RMSE is much lower for k = 1 than the standard approximation in which k = 0. Third, RMSE increases as decay increases, but this trend is much less pronounced for k = 1. Taken together, this pattern of results suggests that the hybrid approximation is robust across a wide range of d.
Fig. 4

Robustness analysis of the hybrid approximation for uniformly and exponentially distributed chunk history. The hybrid approximation reduces to the standard approximation at k = 0

Discussion

The use of approximations can greatly facilitate the application of cognitive models by increasing their computational efficiency and transparency, making them accessible to a wider audience of researchers. As with any methodology in a researcher’s repertoire, it is important to fully understand the limitations of an approximation. Our comparison of the standard and hybrid approximations of the base-level activation equation for the ACT-R cognitive architecture will help researchers make informed decisions when selecting an approximation for a given application.

Petrov (2006) demonstrated that the standard approximation fails to adequately produce transient spikes in activation, which are important for capturing sequential effects and other dynamic effects (e.g., Petrov and Anderson 2005). Our comparison showed that the limitations of the standard approximation are, in fact, much broader in scope. Perhaps our most important finding was the failure of the standard approximation to preserve the decreasing monotonic relationship between activation and the decay parameter, which forms a core property of the underlying theory. By contrast, the hybrid approximation preserved this important property except in edge cases. Both approximations achieved equally high computational efficiency compared to the exact equation. Our robustness analysis revealed that the hybrid approximation was more robust to violations of the equal space assumption for memory retrievals than the standard approximation, even though computational efficiency was equal. The primary downside of the hybrid approximation was some loss of mathematical tractability. Taken together, we find a narrow set of use cases in which both approximations produce similar results, but no cases in which the standard approximation is superior to the hybrid approximation. For this reason, we recommend that modelers use the hybrid approximation.

The ability to perform parameter estimation with the hybrid approximation is important because it allows researchers to test hypotheses regarding memory, such as how exposure to a particular chemical may affect decay, or how decay increases with sleep deprivation (Halverson et al. 2010). This also provides an opportunity for researchers who use ACT-R to take advantage improved methodologies that involve individual or hierarchical analyses, as opposed to data aggregation.

Footnotes

  1. 1.

    We explicitly exclude the possibility that d = 1 because neither of the approximations discussed below are defined at that value.

  2. 2.

    This reflects a more fundamental issue with the formulation of base-level activation that we do not address here, as our focus is on comparing the approximations.

  3. 3.

    As mentioned by a reviewer, taking the log or non-integer power of a dimension, such as time, is problematic because the dimension will become scale dependent, causing the values of various parameters to change.

Notes

Acknowledgements

The views expressed in this paper are those of the authors and do not reflect the official policy or position of the Department of Defense or the US Government. Approved for public release; distribution unlimited. Cleared 06/06/2018;88ABW-2018-3530. CF’s contributions to this work were supported by a postdoctoral research associateship, administered by the Oak Ridge Institute for Science and Education (ORISE).

Funding Information

This work was supported by the Air Force Research Laboratory’s Warfighter Readiness Research Division and the Air Force Office of Scientific Research (grant 18RHCOR068).

References

  1. Anderson, J.R. (1982). Acquisition of cognitive skill. Psychological review, 89(4), 369.CrossRefGoogle Scholar
  2. Anderson, J.R. (2007). How can the human mind occur in the physical universe? Oxford: Oxford University Press.CrossRefGoogle Scholar
  3. Anderson, J.R., Bothell, D., Lebiere, C., Matessa, M. (1998). An integrated theory of list memory. Journal of Memory and Language, 38(4), 341–380.CrossRefGoogle Scholar
  4. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y. (2004). An integrated theory of the mind. Psychological Review, 111(4), 1036.PubMedCrossRefGoogle Scholar
  5. Brown, S., & Heathcote, A. (2003). Averaging learning curves across and within participants. Behavior Research Methods Instruments, & Computers, 35(1), 11–21.CrossRefGoogle Scholar
  6. Dancy, C.L., Ritter, F.E., Berry, K.A., Klein, L.C. (2015). Using a cognitive architecture with a physiological substrate to represent effects of a psychological stressor on cognition. Computational and Mathematical Organization Theory, 21(1), 90–114.CrossRefGoogle Scholar
  7. Estes, W.K. (1956). The problem of inference from curves based on group data. Psychological bulletin, 53(2), 134.PubMedCrossRefGoogle Scholar
  8. Fisher, C.R., Myers, C., Reem, H.M., Stevens, C., Hack J., Charles, G., Gunzelmann, G.G. (2017). A cognitive-pharmacokinetic computational model of the effect of toluene on performance. In Gunzelmann, GG, Howes, A, Tenbrink, T, Davelaar, E (Eds.) Proceedings of the 39th Annual Conference of the Cognitive Science Society. Austin: Cognitive Science Society.Google Scholar
  9. Halverson, T., Gunzelmann, G., Moore Jr, L.R., Van Dongen, H.P. (2010). Modeling the effects of work shift on learning in a mental orientation and rotation task. In: Proceedings of the 10th international conference on cognitive modeling, pp 79–84.Google Scholar
  10. Harris, J. (2008). Mindmodeling@ home: a large-scale computational cognitive modeling infrastructure. In: Proceedings of the 6th annual conference on systems engineering research.Google Scholar
  11. Hintzman, D.L. (1984). Minerva 2: a simulation model of human memory. Behavior Research Methods, 16(2), 96–101.Google Scholar
  12. McClelland, J.L. (2009). The place of modeling in cognitive science. Topics in Cognitive Science, 1(1), 11–38.PubMedCrossRefGoogle Scholar
  13. Newell, A. (1990). Unified theories of cognition. Cambridge: Harvard University Press.Google Scholar
  14. Nosofsky, R.M. (1986). Attention, similarity, and the identification–categorization relationship. Journal of experimental psychology: General, 115(1), 39.CrossRefGoogle Scholar
  15. Petrov, A.A. (2006). Computationally efficient approximation of the base-level learning equation in ACT-R. In: Proceedings of the seventh international conference on cognitive modeling, pp 391–392.Google Scholar
  16. Petrov, A.A, & Anderson, J.R. (2005). The dynamics of scaling: a memory-based anchor model of category rating and absolute identification. Psychological review, 112(2), 383.PubMedCrossRefGoogle Scholar
  17. Riefer, D.M., Knapp, B.R., Batchelder, W.H., Bamber, D., Manifold, V. (2002). Cognitive psychometrics: assessing storage and retrieval deficits in special populations with multinomial processing tree models. Psychological Assessment, 14(2), 184.PubMedCrossRefGoogle Scholar
  18. Shiffrin, R.M., & Steyvers, M. (1997). A model for recognition memory: REM—retrieving effectively from memory. Psychonomic bulletin & review, 4(2), 145–166.CrossRefGoogle Scholar
  19. Siegler, R.S. (1987). The perils of averaging data over strategies: an example from children’s addition. Journal of Experimental Psychology: General, 116(3), 250.CrossRefGoogle Scholar
  20. Stephens, R. (2013). Essential algorithms: a practical approach to computer algorithms. New York: Wiley.Google Scholar
  21. Turner, B.M, Van Maanen, L., Forstmann, B.U. (2015). Informing cognitive abstractions through neuroimaging: the neural drift diffusion model. Psychological review, 122(2), 312.PubMedCrossRefGoogle Scholar
  22. Vandekerckhove, J. (2014). A cognitive latent variable model for the simultaneous analysis of behavioral and personality data. Journal of Mathematical Psychology, 60, 58–71.CrossRefGoogle Scholar
  23. Walsh, M.M., Gunzelmann, G., Van Dongen, H.P. (2017). Computational cognitive modeling of the temporal dynamics of fatigue from sleep loss. Psychonomic Bulletin & Review, 24(6), 1785–1807.CrossRefGoogle Scholar
  24. Yechiam, E., Busemeyer, J.R., Stout, J.C., Bechara, A. (2005). Using cognitive models to map relations between neuropsychological disorders and human decision-making deficits. Psychological Science, 16(12), 973–978.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Air Force Research LaboratoryWright Patterson AFBUSA
  2. 2.Wright State UniversityDaytonUSA

Personalised recommendations