Similarity-Based Interference in Sentence Comprehension in Aphasia: a Computational Evaluation of Two Models of Cue-Based Retrieval

Sentence comprehension requires the listener to link incoming words with short-term memory representations in order to build linguistic dependencies. The cue-based retrieval theory of sentence processing predicts that the retrieval of these memory representations is affected by similarity-based interference. We present the first large-scale computational evaluation of interference effects in two models of sentence processing — the activation-based model and a modification of the direct-access model — in individuals with aphasia (IWA) and control participants in German. The parameters of the models are linked to prominent theories of processing deficits in aphasia, and the models are tested against two linguistic constructions in German: pronoun resolution and relative clauses. The data come from a visual-world eye-tracking experiment combined with a sentence-picture matching task. The results show that both control participants and IWA are susceptible to retrieval interference, and that a combination of theoretical explanations (intermittent deficiencies, slow syntax, and resource reduction) can explain IWA’s deficits in sentence processing. Model comparisons reveal that both models have a similar predictive performance in pronoun resolution, but the activation-based model outperforms the direct-access model in relative clauses.


Introduction
When hearing a sentence, the listener has to link incoming words together and build up syntactic and semantic relations in real time. For instance, verbs have to be linked with their dependent arguments, which is commonly assumed to require retrieval from memory (Lewis, 1999;Van Dyke & Lewis, 2003;McElree, 2006). In a sentence like (1), focusing on the relative clause (RC) The boy [who greeted the girl], when the verb greeted inside the relative clause is reached, the comprehender must retrieve the subject boy Paula Lissón and Dario Paape contributed equally to this work.
Dario Paape paape@uni-potsdam.de 1 Department of Linguistics, University of Potsdam, Potsdam, Germany from memory in order to understand who greeted whom. 1 (e.g., Chomsky, 1977). However, for the sake of simplicity, and assuming that both the noun phrase and the relative pronoun inherit the relevant retrieval features from the head noun boy, we will refer directly to the retrieval of boy. In linguistics, resolving who did what to whom is known as thematic role assignment. The doer of the action expressed by the relative clause verb greeted is the agent (boy) and the recipient of the action is the theme (girl).
(1) The boy who greeted the girl plays with the dog.
Cue-based retrieval theory (McElree, 2000;McElree, Foraker, & Dyer, 2003;Lewis & Vasishth, 2005;Lewis, Vasishth, & Van Dyke, 2006;Vasishth, Nicenboim, Engelmann, & Burchert, 2019) posits that items are retrieved from memory based on their syntactic and semantic features, 2 we refer to all the different types of cue-based retrieval accounts as "the" cue-based retrieval theory, even though there are important differences in the underlying latent process assumed and the quantitative predictions of the different variants (e.g., Lissón et al., 2021a). According to cue-based retrieval as implemented computationally in Lewis and Vasishth (2005), items are stored in memory as a bundle of feature-value pairs. In a subject relative clause such as in (1), the embedded verb greeted triggers the retrieval of an item in memory whose features match the retrieval cues [+animate] and [+RC subject], which identify the agent of the relative clause. When the relative clause verb greeted is read, the only item in memory that matches these retrieval cues is boy. Consider now sentence (2), which is an object relative clause.
(2) The girl who the boy greeted plays with the dog.
In this case, when the comprehender reaches the verb greeted, there is one item in memory, boy, that matches all the retrieval cues set by the verb, but there is also another item, girl, that matches some of the retrieval cues ([+animate, −RC subject]). Following Jäger, Engelmann, & Vasishth (2017), we will refer to the fully matching item (boy) as the retrieval target, and to items with partial feature match (girl) as distractors.
A core assumption of the cue-based retrieval theory is that memory retrieval is subject to interference: When a retrieval is triggered, processing difficulty is predicted if multiple items in memory match the same retrieval cues. Therefore, the verb greeted should be more difficult to process in (2) than in (1). This effect is known as similarity-based interference and is indexed by a slowdown at the retrieval site and/or by occasional misretrievals of a distractor item, which results in misinterpretation, such as the girl being interpreted as the agent in (2). Similaritybased interference has been attested in multiple linguistic constructions across different languages (e.g., Van Dyke and Lewis, 2003;Van Dyke & McElree, 2006Van Dyke, 2007;Vasishth, Brüssow, Lewis, & Drenhaus, 2008;Van Dyke & McElree, 2011;Martin, Nieuwland, & Carreiras, 2012;Jäger et al., 2017;Engelmann, Jäger, & Vasishth, 2019;Jäger, Mertzen, Van Dyke, & Vasishth, 2020;Vasishth & Engelmann, 2022).
Within the cue-based retrieval framework, two different models of retrieval processes have been proposed: the activation-based model (Lewis & Vasishth, 2005) and the direct-access model (McElree, 2000). The two models share the assumption that retrieval cues mediate access to items in memory. However, they make different assumptions 2 Following  regarding the underlying latent processes that unfold in memory retrieval.
The activation-based model was originally implemented in the cognitive architecture ACT-R (Anderson et al., 2004). Because the full ACT-R based model is implemented in the programming language Lisp, it is not easily accessible to the wider community in psycholinguistics. Partly in response to this problem, Engelmann et al. (2019) developed an implementation in R (R Core Team, 2020) that represents a simplified version of the full model. 3 The match between simulated data from the Lewis and Vasishth (2005) model and human experimental data has been studied in subject-verb and reflexive-antecedent dependencies, negative polarity items, and other linguistic constructions Wagers, Lau, & Phillips, 2009;Jäger et al., 2020;Dillon, Mishler, Sloggett, & Phillips, 2013;Jäger, Engelmann, & Vasishth, 2015;Nicenboim & Vasishth, 2016;Patil, Vasishth, & Lewis, 2016;Parker & Phillips, 2017;Vasishth et al., 2019).
In the activation-based model implemented in Lewis and Vasishth (2005), each memory item has a fluctuating activation value that determines both the probability and the latency of retrieval. When a retrieval is triggered, the retrieval cues spread activation to all matching items available in memory. Items with more matches accrue more activation, making them more likely to be retrieved, and decreasing retrieval latency. However, if the cued feature is present on multiple items in memory, the cue's activation is shared across all items, so that each item receives comparatively less activation. The reduced activation of the target item in memory and the increased activation of competing items are the source of similarity-based interference.
The direct-access model, developed by McElree and colleagues (McElree, 2000(McElree, , 2006McElree et al., 2003;Martin & McElree, 2011), also predicts similarity-based interference. It assumes that the availability of items in memory -that is, the probability of successful retrieval -decreases as a function of interference, but that retrieval times remain unaffected. However, low availability can lead to misretrievals and/or parsing failure: if a retrieval fails completely, that is, if no appropriate chunk can be retrieved to perform a syntactic attachment, words will be left "stranded," that is, fail to be integrated into the syntactic structure (Lewis & Vasishth, 2005;Bartek, Lewis, Vasishth, & Smith, 2011), and the parse will crash. If retrieval fails or if an incorrect chunk is retrieved, in a certain proportion of trials, a backtracking process is initiated that requires some extra processing time (Martin & McElree, 2008). Backtracking, also known as reanalysis, is implicitly assumed to lead to the retrieval of the target (McElree, 1993).  compared the activationbased model and the direct-access model using self-paced reading data from unimpaired adult readers (Nicenboim, Vasishth, Engelmann, & Suckow, 2018).  showed that the predictive performance of the default activation-based model implementation was worse compared to that of the direct-access model. However, the models had similar quantitative performance when the activation-based model was implemented with different variances for target and distractors. Lissón et al. (2021a) tested the two competing models against selfpaced listening data from individuals with aphasia and control participants (Caplan, Michaud, & Hufford, 2015). The authors modeled the comprehension of English subject and object relative clauses as in (1) and (2), in selfpaced listening and a sentence-picture matching task. Model comparisons showed similar quantitative performance, but major qualitative differences. Lissón et al. concluded that in order to disentangle the differences between the models, more studies using different linguistic constructions and different experimental paradigms were needed. This is the empirical gap that the current study aims to fill.
In the context of aphasic sentence processing, one assumption of the direct-access model is potentially overly constraining, namely the assumption that backtracking, if initiated, always leads to correct retrieval of the target. Because of this assumption, due to the added backtracking time in some of the correct trials, the direct-access model assumes that correct retrievals are, on average, slower than misretrievals. However, it is known from different cognitive tasks that "slow errors" can occur in addition to "fast errors" (e.g., Van Maanen, Katsimpokis, & Van Campen, 2019). The direct-access model's assumption that correct retrievals are on average slower than incorrect retrievals leads to incorrect predictions when modeling data from individuals with aphasia with the direct-access model (Lissón et al., 2021a): Individuals with aphasia (IWA) often have slower latencies in incorrect trials relative to correct trials (see Hanne, Burchert, de Bleser, & Vasishth, 2015;Adelt, Stadie, Lassotta, Adani, & Burchert, 2017, Pregla, Lissón, Vasishth, Burchert, & Stadie, 2021. Based on the high prevalence of "slow errors" in IWA, Lissón et al. (2021b) implemented a modified version of the direct-access model. In this model, there is not only a distinction between trials in which backtracking is initiated and trials in which it is not, but also a distinction between trials in which backtracking is successful and trials in which it fails. Trials with failed backtracking are slower than trials without backtracking, but the parser is stuck with the result of the original misretrieval. Lissón et al. (2021b) tested the modified direct-access model and the original directaccess model against self-paced listening data from IWA and control participants in German (Pregla et al., 2021). The models were compared using Bayes factors, and the result was inconclusive. In the present study, we compare the modified direct-access model against the activation-based model using a larger dataset.
We model interference effects in IWA and control participants using a subset of the data from Pregla, Vasishth, Lissón, Stadie, & Burchert (2022). We focus on two linguistic constructions in German: pronoun resolution and relative clauses. These constructions are well-suited for our modeling goals because IWA have difficulty processing them (Burchert, de Bleser, & Sonntag, 2003;Choy & Thompson, 2010;Caplan et al., 2015;Adelt et al., 2017;Pregla et al., 2021). Furthermore, given that cue-based retrieval is intended as a general model of sentence processing, it is necessary to investigate different constructions and test whether the proposed implementations are able to account for the entire range of data.
We also aim to establish links between model parameters and prominent theories of processing deficits in aphasia. Linking these verbally stated theories to model parameters is crucially important, because it enables us to derive constrained, testable predictions for each theory and to evaluate them against the data. Finally, our study is, to our knowledge, the first to compare two different models of cue-based retrieval using online eye-tracking data from the visual-world paradigm in combination with an offline sentence-picture matching task. 4 We seek to answer the following questions: 1. Which model of cue-based retrieval offers a better account of interference effects in IWA and control participants across pronoun resolution and relative clauses? 2. How do the parameters of each model map onto theories of processing deficits in IWA?
The paper is structured as follows: We begin by summarizing the theories of processing deficits in aphasia that we will evaluate, as well as their proposed connection to the parameters of the activation-based and modified direct-access models. We then introduce the two linguistic constructions of interest, namely pronoun resolution and relative clauses. Next, on the basis of these constructions, we discuss the implementation of the competing models in greater detail, and link the assumed parameters to influential theories from the aphasia literature. Finally, we fit the models to the data and evaluate whether the theoretical predictions are borne out. We also assess each model's predictive fit by repeatedly fitting the models to subsets of the data, simulating new data and comparing the simulated data to a different subset of the original data. To anticipate our results, across IWA and control participants, and across both linguistic constructions, neither of the two implementations of cue-based retrieval performs decisively better than the other in terms of predictive fit. However, the parameter estimates from each of the two models are informative with regard to the underlying deficits in IWA: There is support in the data for slow syntax, intermittent deficiencies, and resource reduction (see discussion below), but less support for delayed lexical access.

Theories of Processing Deficits in Aphasia
Individuals with aphasia have difficulties processing noncanonical sentences (e.g., Caramazza & Zurif, 1976;McAllister et al., 2009;Schumacher et al., 2015) such as object relative clauses and passives, especially when the thematic roles are semantically reversible. That is, IWA experience difficulties identifying the agent and theme of the verb (who did what to whom) based on morpho-syntactic cues alone. Similarly, IWA also experience difficulties comprehending binding relations, that is, pronouns and reflexives (e.g., Justin told [Thomas i to shave himself i ]; Edwards & Varlokosta, 2007;Choy & Thompson, 2010). Caplan et al. (2015) discuss the different theories that aim to explain why these constructions are challenging for IWA 5 . For instance, Burkhardt, Piñango, & Wong (2003) and Burkhardt, Avrutin, Piango, & Ruigendijk (2008) argue that IWA compute syntactic dependencies at a slower-thannormal pace, which can lead to comprehension failure. According to this theory, known as slow syntax, the processing deficit in IWA is specific to syntactic structure building. By contrast, Ferrill, Love, Walenski, & Shapiro (2012), and Love, Swinney, Walenski, & Zurif (2008) posit that delayed lexical access causes the slowdown in the formation of syntactic dependencies. Love, Swinney, Walenski, & Zurif (2008) claim that when the sentence requires the reactivation of a lexical item in order to complete a dependency, the lexical reactivation is too slow, and an extragrammatical heuristic may be used instead, which can lead to comprehension errors. These two theories -slow syntax and delayed 5 Although the slow syntax and the delayed lexical access theories were originally proposed for Broca's aphasia, the studies by Caplan and colleagues (Caplan, Michaud, & Hufford, 2013 show that these deficits could also be playing an important role in impaired sentence comprehension in patients with other types of aphasia. lexical access -have been tested using the cross-modal lexical decision paradigm. However, studies using the visual-world paradigm in aphasia do not support a delay in lexical access or in syntactic structure building as the main source of comprehension deficits in IWA (e.g., Dickey, Choy, & Thompson, 2007). The data modeled in the present work is well-suited to test the predictions of these two theories, because we consider visual-world data in combination with reaction times and accuracies from a picture-selection task, which require similar motor responses as in the crossmodal lexical decision task.
Another theory, proposed by Caplan and colleagues, argues that IWA may have an impairment in the resources needed for parsing, such as memory capacity (Caplan, Waters, DeDe, Michaud, & Reddy, 2007;Caplan, 2012). Complex sentences create greater processing demands, and therefore IWA have more difficulties in complex sentences. This account is known as resource reduction. In addition, Caplan et al. (2013) claim that IWA may also exhibit intermittent deficiencies in the parsing system that block access to parsing operations such as relating the surface and base positions of words in the structure. The intermittent nature of these breakdowns would explain why IWA are able to understand complex sentences on some but not all trials.
All of these theoretical proposals can be incorporated into computational models of retrieval. For instance, Patil, Hanne, Burchert, de Bleser, & Vasishth (2016) modeled the comprehension of active vs. passive sentences in a visual-world eye-tracking experiment in German (Hanne, Sekerina, Vasishth, Burchert, & de Bleser, 2011), using different implementations of the Lewis and Vasishth (2005) model. The best-fitting model for IWA was one that assumed generally slowed processing (understood as a combination of delayed lexical access and slow syntax), as well as intermittent deficiencies. In another modeling study, Mätzig, Vasishth Engelmann, Caplan, and Burchert mapped parameters of the Lewis and Vasishth (2005) model to slowed processing, intermittent deficiencies, and resource reduction. The authors modeled accuracies in English subject and object relative clauses using the data from Caplan et al. (2015). They concluded that IWA's performance can be explained by a combination of these three deficits, and that slowed processing, intermittent deficiencies, and resource reduction may affect each individual with aphasia to a different degree. Similarly, in English relative clauses, using self-paced listening data from Caplan et al. (2015), Lissón et al. (2021a) also found that intermittent deficiencies, delayed lexical access, and slow syntax can explain the main processing deficits in IWA.
In the current work, we focus on the role of the proposed processing deficits in the context of the activationbased and the direct-access models of cue-based retrieval.
In our modeling, we follow Lissón et al. (2021a) and Lissón et al. (2021b) and implement intermittent deficiencies as increased stochastic noise in memory activations/availabilities. A higher noise value in IWA would mean more mistretrievals and presumably more parsing failures due to failed retrievals compared to unimpaired individuals. Delayed lexical access or slow syntax is assumed to delay the retrieval of the target item from memory, leading to a slowdown at the retrieval site in the activation-based model, and/or to misretrieval in both the activation-based and the direct-access model. In the DA model, which assumes backtracking as a key resource in parsing, resource reductions could disrupt this mechanism and lead to comprehension deficits.
We fit the models to data from a picture-selection task and a visual-world experiment that tested the comprehension of pronouns and relative clauses in German (Pregla et al., 2022). We now introduce each linguistic construction in turn.

Experiment 1: Pronoun Resolution
Consider sentence (3a). When the pronoun er ("he") is encountered, cue-based retrieval predicts that a search for its antecedent is triggered, using the cues [+animate, +masculine, +singular]. 6 The experiment makes use of the fact that for some verbs, the implicit subject of a sentential complement is coreferential with the main clause subject (subject control) while for others it is coreferential with the main clause object (object control, e.g., Chomsky, 1981;Comrie, 1985). The verb versprechen ("promise") is lexically specified as a subject control verb (Müller, 2002). Even though we do not investigate control structures, we assume that the additional retrieval cue [+subj] is set at the pronoun, and that Peter is the retrieval target: Because versprechen ("promise") enforces subject control, replacing er ("he") with sie ("she") to refer to Lisa in (3a) results in unacceptability, and there is no alternative antecedent in the discourse to which the pronoun could refer. 6 The parser presumably also honors structural constraints during retrieval, such as adherence to Binding Principle B (Chomsky, 1981), as opposed to first retrieving and then eliminating structurally illicit antecedents (e.g., Chow, Lewis, & Phillips, 2014). In the current context, however, both possible antecedents are structurally available, so that we abstract away from structural retrieval cues. 'Peter now promises Thomas that he will pet and ruffle the little lamb.' Across the two sentences, the main clause object nouns, Lisa in (3a) and Thomas in (3b), partially match the retrieval cues from the pronoun. Both mismatch the [+subj] cue that the pronoun inherits from the verb, but Thomas matches the gender cue from the pronoun, which should lead to increased similarity-based interference. We will refer to sentences like (3a) as mismatch conditions, because the target noun (Peter) and the distractor noun (Lisa) do not share the same gender, and sentences like (3b), as match conditions.
In unimpaired populations, a processing advantage in gender mismatch conditions has been observed in English by Badecker and Straub (2002) and Runner and Head (2014), but not by Chow et al. (2014). Laurinavichyute, Jäger, Akinina, Roß, & Dragoy (2017) reported mixed results for German. In the aphasia literature, Choy and Thompson (2010) and Engel, Shapiro, & Love (2018) found that IWA had difficulties in pronoun resolution, but these studies did not target the gender mismatch configurations that Pregla et al. (2022) tested, and that we model in the present work.
In Experiment 1, we model interference as a function of the gender cue at the pronoun in the Pregla et al. (2022) data. Based on cue-based retrieval theory, we predict a processing advantage in gender mismatch conditions relative to the gender match conditions. We aim to (a) compare how the activation-based and the modified direct-access model fit these data, and (b) evaluate the theoretical accounts of processing deficits in aphasia by mapping them onto model parameters.
Our study mainly focuses on the effect of number interference within subject and object relative clauses rather than on the well-studied SR/OR asymmetry. This is because similarity-based interference alone cannot account for the asymmetry in German relative clauses, as the retrieval point for both SRs and ORs is the clause-final verb (see discussion below). However, it is still informative to check if the asymmetry can be captured by assuming changes in the parameters of the activation-based model and the modified direct-access model.
Consider the sentences in (4). When the verb badet/baden (bathes/bathe) is encountered at the end of the sentence, two retrievals are triggered, because the agent and the theme of the action expressed by the verb need to be identified. Our modeling focuses on the retrieval of the agent because that is the theoretically interesting event (this is explained below). In (4a) and (4b), both noun phrases in the sentence are singular (der Esel, der/den Tiger). This is expected to cause similarity-based interference during the retrieval of the subject. By contrast, in (4c) and (4d), the second noun phrase is plural (die Tiger), which should result in easier identification of the subject, based on the retrieval cue from the verb ([singular] or [plural]). We will refer to sentences like (4c) and (4d) as mismatch conditions, because target and distractor do not share the same number, and sentences like (4a) and (4b), as match conditions. Both types of relative clauses should be easier to process in mismatch configurations compared to match conditions. Thus, (4c) and (4d) should be easier to process than (4a) and (4b), respectively.
In German, when the head noun is masculine (such as in our items), the morphological form of the relativizer (der for nominative, den for accusative) provides disambiguating information. Therefore, in our items, by the time the comprehender reaches the relativizer, they should be able to identify the agent or the theme of the relative clause due to the overt case marking. Retrieval should occur at the verb, and the number cue should facilitate processing in (4c) vs. (4a) and (4d) vs. (4b), because in (4c) and (4d) only the subject noun phrase matches the number cue at the verb.
Studies addressing the comprehension of subject vs. object relatives in German with case-unambiguous relativizers (such as our items) are scarce and have mainly addressed the SR/OR asymmetry (Friederici, Steinhauer, Mecklinger, & Meyer, 1998;Burchert et al., 2003;Adelt et al., 2017). By contrast, the main goals in our Experiment 2 are to compare the performance of the activation-based and the modified direct-access model by modeling number interference in both types of relative clauses, and to evaluate the different theories of processing deficits in aphasia based on the model estimates.

Methods
The data that we model come from the experiments carried out by Pregla et al. (2022). The participants, procedure, and materials described here summarize the methods in Pregla et al. (2022). By contrast, the dependent variables and contrast coding described here are specific to the present paper. Pregla et al. (2022) analyze the visual-world eyetracking data, whereas we model the reaction times from the picture-selection task that followed the visual-world paradigm task.
Participants and Procedure Twenty-one IWA (9 females, mean age = 60.2 years, SD = 11.4; mean education = 15.2 years, SD = 3.2) and fifty control participants (32 females, mean age = 47.7 years, SD = 19.6; mean education = 18.1 years, SD = 4.0), all native speakers of German, took part in an auditory sentence-picture matching task combined with visual-world eye-tracking. Individuals with aphasia were in the chronic phase (at least 1 year after onset of the aphasia). More information about the patients is given in Appendix A. The procedure was as follows: At the beginning of the trial, a preview phase of 4000 ms was used to introduce two pictures to the participants. One of the pictures (target) corresponded to the correct meaning of the sentence, whereas the other picture (foil) depicted the opposite thematic interpretation. After the preview phase, an auditory recording of the sentence started playing. Sentences were presented at a normal speech rate, and participants were instructed to select the picture that matched the meaning of the sentence. The pictures were displayed until participants made a choice, or for a maximum time of 30 s. Once the participant pressed the choice button, the trial ended. During the trial, eye movements were recorded using a SensoMotoric Instruments eye-tracker (SMI RED250mobile; binocular tracking, Experiment Center version 3.7, sampling rate 250 Hz). The proportion of looks to the target picture against looks to the foil (or no picture) was calculated. The response time and the accuracy of the picture selection were also recorded. Each participant completed the experiments twice, in two sessions (test and retest), with a gap of approximately 2 months. 7 Participants also performed a battery of tests in order to assess auditory and visual comprehension, morphological discrimination, and lexical decision latency.

Materials
In Experiment 1, two conditions (mach and mismatch, as in example 3), with 10 items each were included. Example pictures accompanying the pronoun sentences are shown in Fig. 1. The pronoun items always used subject-control verbs, so that the target of the retrieval was always the first noun phrase. 8 Control verbs were selected from the ZAS Database of Clause-Embedding Predicates (Stiebels et al., 2018).
In experiment 2, 20 items per relative clause type (subject/object relative) were included. The noun phrase of the matrix clause (henceforth NP1) was always masculine and singular. Out of the 20 items, 10 had a singular embedded noun phrase (henceforth NP2), and 10 had a plural embedded noun phrase. The items were constructed using 10 bisyllabic transitive action verbs, and the noun 7 The test vs. retest main effect is not included in our modeling (see the section on dependent variables) because the model structure had to be simplified due to convergence issues. The test-retest reliability for these data has been investigated in Pregla et al. (2021). 8 The pronoun items that we use here were extracted from a larger experiment that also contained object-control verbs, and fillers.

Fig. 1
Example pictures used in the picture-selection task in Experiment 1. For the sentences in example (3) the left picture is the target and the right picture is the foil phrases were always bisyllabic animal names. Example pictures for the relative clause conditions are given in Fig. 2.

Dependent Variables and Contrast Coding
To assess participants' lexical access speed, which is important for evaluating the delayed lexical access hypothesis of Love et al. (2008) and Ferrill et al. (2012), we computed their average reaction times in a lexical decision task, based on LEMO 2.0 (Stadie, Cholewa, & de Bleser, 2013). In this task, participants have to decide whether an auditorily presented item is a word or a non-word. Participants responded by pressing one of two buttons on a computer keyboard, and the accuracy and response times were recorded. We computed the average reaction times in correct trials, which yielded a single measure (lexical decision time, LDT) for each participant that we use as a continuous predictor in the models.
Response times in lexical decision tasks have been previously used as a measure of lexical access speed in IWA. For instance, Caplan et al. (2015) correlated the lexical decision times for correct responses with the accuracy and reading times in a self-paced listening task. We centered and scaled the LDT predictor within groups. An LDT × group interaction would thus tell us whether an increase in LDT leads to a larger increase in RT for IWA compared to controls.
Another predictor in the models is the proportion of fixations on the target picture (centered and scaled within groups) at the critical sentence region, where retrieval is assumed to take place. We use the proportions of looks to the target at the critical region as a proxy for retrieval (see the next section for more details). The remaining predictors used in both models were sum-coded, with the following contrasts: group was coded with IWA as +1 and controls as −1; the high interference conditions (gender match in pronouns, number match in relative clauses) were coded as +1, and the low interference conditions (gender mismatch, number mismatch) as −1. In the relative clauses sub-experiment, the relative clauses were coded as OR +1, and SR −1.
We base our statistical inferences on the posterior distribution of the parameters, which we summarize with the mean and 95% credible interval (CrI). This is the convention used to report summaries of parameter values for which there is support in the data. When interpreting the estimates, the width of the CrI should be taken into account, as it shows the range of plausible parameter values that lie with 95% probability given our model and data.
For both the activation-based model and the modified direct-access model, the data from the pronoun experiment and the relative clause experiment were fitted separately. 9 The four models were implemented in Stan (Carpenter et al., 2017) and fitted in R (R Core Team, 2020) via the rstan package (Stan Development Team, 2020). The packages brms (Bürkner, 2017) and bayesplot (Gabry, Simpson, Vehtari, Betancourt, & Gelman, 2019) were used for examining and plotting the posterior distributions of the parameters. For each model, three chains each, with at least 6000 iterations each were run. Each chain included at least 3000 warm-up iterations. Convergence was assessed by checking that R was below 1.01 and by visually inspecting the convergence of the chains (Gelman et al., 2013). We also verified that the models could recover simulated parameter values. For both models, mildly informative priors were used (Sorensen, Hohenstein, & Vasishth, 2016;Nicenboim et al., 2021;Schad, Betancourt, & Vasishth, 2021). Details about the implementation and the priors are available in Appendix C.

Modeling Assumptions
Neither the activation-based model nor the direct-acccess model have a linking function that maps proportions of looks to a picture to retrieval times and/or retrieval probabilities of memory chunks. Therefore, we need to specify linking assumptions between fixations on the target picture, the assumed retrieval processes, and the reaction times and accuracies in the picture-selection task.
For the two sentence types, we assume two retrieval events. The first retrieval takes place in the middle of the sentence, at the critical region. In pronoun resolution, the critical region is the pronoun, and in relative clauses, it is the relativizer. Our linking assumption is that proportions of looks to the target at the critical region can be used as a proxy for retrieval. This assumption is based on the fact that the critical region provides the necessary cues to retrieve the target. Therefore, we predict that more looks to the target picture at the retrieval site correspond to a higher probability that the target has been retrieved at this point.
The second retrieval event happens at the verb region, that is, at the end of the sentence. In both experiments, the retrieval target must be re-accessed at this point, as a dependency with the verb needs to be established. We assume that the second retrieval is linked to the first, so that more looks to the target at the critical region (the pronoun/relativizer) go along with higher activation/availability, resulting in faster and/or more accurate retrieval at the verb region. As the picture-selection task takes place immediately after hearing the verb region, we assume that accuracies and RT in this task should show the interference effects predicted by the cue-based retrieval theory. We do not model retrieval failures, that is, trials in which neither the target nor the distractor can be retrieved. This simplification is necessary because participants in the Pregla et al. (2021) study had to select one of the two pictures and were not given the option to respond "I don't know," which could be interpreted to index retrieval failure . Furthermore, even in the presence of such an option, participants may resort to guessing rather than responding "I don't know," although it has been suggested that IWA only use such compensatory processes rarely (Hanne et al., 2011;Burchert, Hanne, & Vasishth, 2013;Arantzeta, Webster, Laka, Martínez-Zabaleta, & Howard, 2018). Parameters related to guessing can, in principle, be implemented in probabilistic cognitive models (e.g., Oberauer, 2006;Logačev & Vasishth, 2016), including the direct-access model, but we do not attempt such an implementation here, given that our models are already relatively complex.
In what follows, we will present the implementation and the fits of the activation-based model and the modified direct-access model to the data in turn. We also present quantitative model comparisons, which allow us to assess the relative goodness of fit of each model to the data.

Activation-Based Model
The activation-based model can be implemented as a lognormal race of evidence accumulators : For the two experiments, we assume that there are two accumulators of noisy evidence that correspond to the retrieval candidates in memory, namely the first and the second noun phrase (NP1 or NP2, target or distractor). For each trial i, the finishing times F T for NP1 and NP2 are each sampled from a lognormal distribution with location μ NP 1 or μ NP 2 respectively, and scale σ . The accumulator with the faster F T i determines both the selected picture (target or foil) and the reaction time for trial i. This implementation maps straightforwardly onto the notion of memory chunks with fluctuating activation values, with the chunk with the highest activation being retrieved on a particular trial.
The hierarchical structure of the models is implemented in the μ of both accumulators, which include fixed and random effects. The fixed effects added to μ NP 1 and μ NP 2 in the model for pronoun resolution are as follows: group (IWA vs. control), condition (match vs. mismatch), and the group × condition interaction. We also added the average reaction time from the lexical decision task (LDT), and the group × LDT interaction. Furthermore, we added the proportion of looks to the target at the critical region (fixations), the fixations × group interaction, and the threeway interaction fixations × condition × group. In addition, both μ included by-subject and by-item varying intercepts; the fixed effect of group included an adjustment by item, and the fixed effect of condition included an adjustment by subject. The parameter σ included a fixed effect of group. In addition, the model for relative clause conditions also included a fixed effect for RC type, an RC type × group and an RC × condition interaction, and the RC × group × condition three-way interaction. The predictions of the activation-based model for the two experiments are as follows: 1. An increase in fixations to the target picture at the critical region should lead to a decrease in RT for the target accumulator in the picture-selection task, as we assume that the first retrieval influences the second retrieval. If participants retrieved the target at the critical region, re-accessing it at the verb should be easier, meaning that there should be more correct retrievals as well as faster retrieval times. 2. The mean finishing time of the target accumulator should be faster for the mismatch conditions relative to the match conditions, as similarity-based interference slows retrieval. The mean of the distractor accumulator should be similar or slower in mismatch conditions relative to match conditions. 3. IWA should have slower RT relative to controls, so IWA's μ should be higher. This would be in line with the slow syntax theory. Similarly, IWA should have a higher σ , that is, more noisy accrual of evidence corresponding to more variable activation values, which would be in line with intermittent deficiencies. 4. If a delay in lexical access is causing processing difficulties in IWA, we would expect the effect of LDT to lead to a bigger increase in RTs for the target accumulator for IWA relative to controls, as higher RT for the target accumulator would indicate more difficulty in retrieving the target. This would be in line with delayed lexical access.
In addition, given earlier results (Burchert et al., 2003;Adelt et al., 2017), in Experiment 2, IWA should have longer mean finishing times for the target accumulator in OR compared to SR.

Modified Direct-Access Model
We implement the modified direct-access model as a hierarchical mixture model in the Bayesian framework, following Lissón et al. (2021b). Mixture models integrate multiple generative processes in one model (see Nicenboim et al., 2021 chapter 20, for a tutorial on these models in Stan). The implementation of the modified direct-access model as a mixture model allows us to take into account the probability and cost of backtracking as a latent variable. We assume that both correct and incorrect responses are generated from one of two distributions: Responses without backtracking follow a distribution with parameters μ and σ , while responses with backtracking follow another distribution with parameters μ = μ + δ and σ , where δ is the time needed for backtracking.
The direct-access model assumes that the availability of items in memory determines their probability of retrieval. In our implementation, we map availability to the parameter θ, which is the probability of retrieval of the target. Given that interference is expected to affect availability, we add a main effect of condition to θ . Because we expect IWA to have lower base availability compared to control participants, we also add a main effect of group to θ. We also add a main effect of fixations, following the same logic as for the activation-based model: More fixations on the target at the critical region should lead to a higher probability of retrieval of the target at the verb. In order to evaluate the delayed lexical access theory, we include LDT as a fixed effect to θ , and the interaction LDT × group. This interaction tests the delayed lexical access theory in IWA: If longer LDT leads to a larger decrease in θ for IWA, this would suggest that delayed lexical access lowers the probability of retrieval of the target, causing difficulties in the retrieval process.
The original direct-access model assumes that when the initial retrieval fails, a costly process of backtracking (or reanalysis) can be triggered, which leads to correct retrieval of the target (McElree et al., 2003;Martin & McElree, 2008). Our modified direct-access model adds the assumption that backtracking can fail. This is reflected in the added parameter θ b , which represents the probability of correct retrieval after backtracking. The additional parameter makes the modified direct-access model more suitable for modeling data from individuals with aphasia, as it allows for slow, incorrect responses. If IWA show a lower θ b , relative to controls, this could point towards a disruption in the process of backtracking as a main source of comprehension difficulties in IWA. The parameter P b estimates the proportion of trials for which backtracking is performed after an initial misretrieval. The parameter δ estimates the amount of time (in log ms) that backtracking takes. Main effects of group are added to the parameters θ b , P b and δ. P b and θ b additionally have by-subject random intercepts. 10 The mixture process for a given trial i works as follows: (a) if the retrieval of the target succeeds, with probability θ , RT i is drawn from LogNormal(μ, σ ). (b) if the retrieval of the target fails (1 − θ), backtracking is initiated with probability P b . RT i is sampled from LogNormal(μ + δ, σ ). After backtracking, the target is retrieved with probability θ b , and the distractor with probability 1 − θ b . (c) if the retrieval of the target fails and there is no backtracking, a misretrieval is predicted with probability (1 − θ) · (1 − P b ), and RT i is sampled from LogNormal(μ, σ ).
Notice that the probability of successful retrieval of the target, θ , and the probability of backtracking, P b are assumed to be independent. Interference can only indirectly affect response times through lower θ and the added cost of backtracking δ. Therefore, in the μ parameter, which estimates the mean average RT, we do not include an adjustment by condition, but we do include an adjustment by group, since retrieval may generally be slower in IWA compared to controls. The noise parameter, σ , also has an adjustment by group, as IWA may have more variable retrieval times. The priors used, as well as the full hierarchical model, are shown in Appendix C.
Due to the cost of backtracking δ, correct and incorrect responses following backtracking (b) are expected to be slower, on average, than correct retrievals (a) and misretrievals without backtracking (c). The RTs corresponding to an initial successful retrieval of the target (a) and to misretrievals without backtracking (c) are sampled from the same distribution. The predictions of the modified direct-access model for the two experiments are explained below.
1. Fixations to the target picture at the critical region should lead to an increase in the probability of retrieval of the target, as we assume that the first retrieval influences the second retrieval. Therefore, the estimate of the main effect of fixations to the target on the probability of successful retrieval θ should be positive. 2. The probability of successful retrieval θ should be higher for non-interference conditions relative to interference conditions, that is, higher in mismatch vs. match conditions. 3. IWA should have slower RTs relative to controls, so IWA's μ should be higher. This would be in line with the slow syntax theory. Similarly, IWA should have a higher σ , which would be in line with intermittent deficiencies. 4. If IWA's slower access to items from memory leads to difficulties in the retrieval, we would expect LDT to lead to a bigger decrease in θ for IWA relative to controls. This would be in line with delayed lexical access. 5. We expect IWA to have a lower probability of backtracking, P b , and a lower probability of retrieval of the target after backtracking, θ b . This would be in line with the resource reduction theory, assuming that backtracking is a parsing resource that is impaired in IWA. 6. Similarly, we also expect IWA to have a higher cost of backtracking, δ, which would be in line with slow syntax.
In addition, given earlier results showing that OR are more difficult to process than SR (Burchert et al., 2003;Adelt et al., 2017), for the relative clause construction, θ should be lower in OR compared to SR.
We now move on to the modeling results, which will be presented separately for pronoun resolution and relative clauses.

Experiment 1 -Pronoun Resolution
In the pronoun resolution items, NP1 is always the target of the dependency. Therefore, μ NP 1 accumulates evidence for the retrieval of the target, and μ NP 2 for the retrieval of the distractor. Figure 3 shows the distribution of estimated finishing times for the two accumulators (NP1 and NP2) across conditions and groups. The results confirm our predictions: IWA have longer finishing times relative to controls in both conditions. In controls, the means of the NP1 accumulator in the mismatch and match conditions are quite similar (1391 ms vs. 1465 ms), although responses are faster on average in the mismatch condition, as expected. IWA show a larger effect of interference: The mean of the NP1 accumulator in the mismatch condition is 4532 ms, compared to 5735 ms in the match condition. The interference effect can also be seen in the overlap of the With regard to the fixed effects on μ NP 1 and μ NP 2 , due to space limitations, we will only comment on the estimates that are relevant to the processing theories of aphasia that we are evaluating. The estimates for all parameters in this model and their credible intervals are shown in Appendix D.
The NP1 accumulator showed an indication of an LDT × group interaction (836 ms, CrI: [539, 1152] ms), but no indication of such an interaction was observed for the NP2 accumulator. This suggests that additional time needed for lexical access leads to a larger slowdown in IWA in the target accumulator, as predicted by the delayed lexical access theory. The estimates for fixations and the fixations × group interaction do not point in the predicted direction: Rather than facilitating correct retrieval of the target, an increase in fixations on the target picture leads to an increase in RTs in both accumulators (NP1: 41 ms, CrI: [−77, 160] ms; NP2: 518 ms, CrI: [−141, 1212] ms). However, due to large uncertainty around the estimates, the results are also compatible with no effect of fixations.

Modified Direct-Access Model
We begin by assessing the posterior distribution of θ, which is the probability of retrieving the target during the first retrieval attempt. Figure 4 shows that the probability of retrieval of the target is very high for controls: The mean of the distribution lies above 95% in both conditions (CrI mismatch: [98, 99]%, CrI match: [96, 98]%). By contrast, IWA show lower retrieval probabilities overall. This can be also seen in Fig. 4, where IWA's mean estimate for mismatch is 72% CrI: [66, 77]%, whereas the estimate for match is 55% CrI: [47, 62]%. The group × condition interaction is inconclusive (2% CrI: [−1, 5]%).
A unit increase in LDT leads to −5% CrI: [−8, −1]% in θ, and a negative LDT × group interaction (−9% CrI: [−12, −7]%) is consistent with the assumption that IWA are more affected by increased LDT. There was neither an indication of an effect of fixations (2% [−1, 5]%), nor of a fixation × group interaction (−2% CrI: [−5, 2]%). This means that for both groups, there is no indication that an increase in fixations to the target picture led to an increase in the probability of successful retrieval of the target. The

Activation-Based Model
In the relative-clause items, the accumulator mean μ NP 1 stands for the retrieval of NP1 as the agent of the action, whereas μ NP 2 stands for the retrieval of NP2 as the agent. Depending on the trial, NP1 (in subject relatives) or NP2 (in object relatives) will be the target of the retrieval, as we model the retrieval of the agent. Figure 6 shows the distribution of finishing times of the two accumulators in subject relative clauses. As expected, IWA have higher finishing times than controls across conditions. The mean of the NP1 accumulator (target) is roughly the same across conditions, whereas the mean of the NP2 accumulator (distractor) is higher in the mismatch condition than in the match condition. In general, controls show almost no overlap between the distributions, which indicates that controls retrieve the target (NP1) most of the time. By contrast, in IWA, the two distributions partially overlap, meaning that IWA often retrieve the distractor (NP2). Figure 7 shows the distribution of finishing times of the two accumulators in object relative clauses. IWA have higher finishing times than controls across conditions, and Finishing times (ms) Accumulator NP1 NP2 Subject relatives both groups have slightly lower finishing times in the NP2 accumulator (target) in mismatch vs. match conditions. Crucially, in the match condition, for IWA (right upper panel in Fig. 7, light dashed line), the mean of the NP1 accumulator is lower than the mean of the NP2 accumulator. Since NP2 is the retrieval target in object relatives, the pattern indicates that in the match condition, IWA retrieve the distractor more often than the target. That is, in the match condition, IWA are more likely to misinterpret the sentence than to interpret it correctly. However, in the mismatch condition, the mean of the two accumulators overlap, which indicates that IWA are equally likely to retrieve NP1 or NP2 on average. Comparisons between Figs. 6 and 7 show that controls perform similarly in subject and object relatives, whereas IWA display a subject-object asymmetry: IWA are estimated to correctly interpret subject relatives most of the time. By contrast, IWA are estimated to misinterpret object relatives more often, especially in the match condition.

Object relatives
The model estimates for the fixed effects and interactions on μ NP1 and μ NP2 are shown in Appendix D. No indication of an effect was found for condition or the condition × group interaction, but there was a RC type × condition interaction on μ NP2 (631 ms, CrI: [385, 884] ms): Interference (match) in OR lead to higher finishing times for μ NP2 relative to no-interference (mismatch). The threeway interaction RC type × group × condition for μ NP2 (−249 ms, CrI: [−491, −10] ms) indicates that the effect of condition within RC is different for the two groups in the μ NP2 accumulator: In the SR trials, the difference between match and mismatch conditions is bigger for controls. By contrast, in OR trials, the difference between match and mismatch conditions is bigger for IWA.
There was no indication of an LDT × group interaction, a LDT × condition interaction, or a LDT × condition × group interaction. There was an effect of fixations on μ NP2 (289 ms, CrI: [61, 521] ms). This main effect is uninformative, given that NP2 was the retrieval target in OR but not in SR. There was no indication of a fixations × condition interaction or a fixations × group × condition interaction, so that the role of fixations remains inconclusive. Finally, as predicted, IWA have higher noise than controls (σ IWA 0.55 log ms, CrI: [0.53, 0.57 log ms], σ controls 0.31 log ms, CrI: [0.3, 0.32 log ms]).

Modified Direct-Access Model
The posterior distributions of θ, the probability of initial retrieval of the target, by group and condition are displayed in Fig. 8. While controls have a slightly lower θ in OR relative to SR in the match conditions, SR and OR have a similar θ in mismatch conditions, around 95%. This indicates that, in line with the model predictions, mismatch facilitates the retrieval of the target, especially in OR. The number mismatch also benefits IWA on average, but IWA exhibit a stronger subject-object asymmetry, irrespective of the number manipulation, with higher θ in SR relative to OR for both match and mismatch conditions.

Discussion
In the two models, the location and scale parameters (μ and σ ) of the RT distribution were consistently higher for IWA than for controls. We linked these parameters to the slow syntax and intermittent deficiencies theories, respectively. Both models thus seem to be generally in line with these two theories of processing deficits in aphasia. We will now discuss the implications for the remaining theories within each model.

Activation-Based Model
We hypothesized that the accumulators in the activationbased model should reflect the interference effect predicted by cue-based retrieval theory, namely, lower mean finishing times for the target accumulator, and similar or higher mean finishing time for the distractor accumulator in the mismatch conditions compared to the match conditions. The accumulators show this pattern across the two experiments. In addition, in Experiment 2, the distribution of the accumulators across relative clause types shows that IWA experience a subject-object asymmetry, that is, IWA have more difficulties processing object relatives, in line with previous findings.
The conclusions for the rest of our predictions are more complex, since the results differ across the two experiments. For instance, a group × LDT interaction was found for the target accumulator in pronoun resolution. This interaction indicates that slower lexical access leads to increased processing difficulty for IWA, as predicted by the delayed lexical access theory. However, there was no indication of such an interaction in relative clauses. We therefore conclude that more research is needed in order to establish the role of delayed lexical access in the activation-based model.
The effect of looks to the target at the critical region also remains inconclusive. No effect of fixations was found in pronoun resolution. In relative clauses, an effect of fixations was found for the NP2 accumulator, but no indication of an interaction between fixations and RC type was found. Given that NP2 was the retrieval target in OR but not in SR, the main effect of fixations is uninformative.

Modified Direct-Access Model
We expected similarity-based interference to result in a lower probability θ of successful retrieval for the target in the match conditions compared to the mismatch conditions. The data from both experiments are in line with this prediction. In addition, in Experiment 2, IWA show a large effect of relative clause type, irrespective of the condition: IWA have more difficulties understanding object relatives compared to subject relatives. This subject-object asymmetry is broadly in line with the accuracies in Adelt et al. (2017), although Adelt et al. (2017) found this pattern in both IWA and controls.
The probability of backtracking is consistently lower for IWA than for controls, as is the probability of retrieval of the target after backtracking (θ b ). This pattern is expected under the resource reduction theory, assuming that backtracking makes use of parsing resources. In addition, the average cost of backtracking, δ, is twice as high for IWA compared to controls in both experiments. This adds support for the slow syntax theory.
According to the delayed lexical access theory, IWA should be more affected by delays in lexical access, as measured by a lexical decision task. The observed group × LDT interaction lends some support to this theory in the pronoun resolution sub-experiment, but not in relative clauses. Therefore, the effect of delayed lexical access in the modified direct-access model remains inconclusive, as for the activation-based model. The effect of fixations is also inconclusive: Although in relative clauses there is some indication that fixations at the critical region may lead to a increase in the probability of retrieving the target for IWA, no effect of fixations was found in pronoun resolution.

Model Comparisons
The activation-based model and the modified direct-access model make different assumptions about the retrieval mechanism, and thus the generative process behind the observed data. Within the framework of each model's assumptions, conclusions can be drawn about plausible underlying deficits. However, one crucial question remains open: Which model captures the generative process better?
In order to answer this question, we performed 10-fold cross-validation (Vehtari, Gelman, & Gabry, 2017; see also Nicenboim et al., 2021, chapter 17, for a tutorial on carrying-out cross-validation for Bayesian models such as the ones discussed here). This is a standard procedure in machine learning for quantifying the relative predictive fit of two or more models. Importantly, cross-validation can also be applied when the models assume different generative processes, as is the case with the activation-based and the modified direct-access models.
The procedure for 10-fold cross-validation is as follows: The data are partitioned into 10 balanced subsets containing about the same amount of data per subject. 11 One of the 10 subsets is held out, and the model is fit to the remaining subsets. The posterior distributions from the resulting model are used to compute predictive accuracy on the held-out subset. This is repeated 10 times, so that all subsets are covered. The expected log pointwise predictive density, elpd, is then calculated as a measure of predictive accuracy. elpd is the summed log-likelihood of all observed, heldout data points under each model. Models are compared by computing the difference in elpd, ( elpd), with higher elpd indicating better predictive fit. Because elpd is an estimate, the difference in elpd between two models has an associated standard error, which has the standard frequentist interpretation: elpd ± 2 × SE gives a 95% confidence interval. If the difference in elpd between the models is greater than 2 × SE, we conclude that there are grounds to assume that the model with the higher elpd provides the better fit for the given data.
The results of the cross-validation are shown in Table 1. In pronoun resolution, the modified direct-access model has a predictive advantage, but since the SE of elpd is large, the result is not conclusive. In relative clauses, Positive differences indicate an advantage for the activation-based model, whereas negative differences indicate an advantage for the modified direct-access model the activation-based model has a clear advantage over the modified direct-access model, but the advantage is mostly driven by the control participants, as shown in Appendix E. We also evaluated the predictive performance of the original direct-access model, that is, a model in which backtracking can only lead to the retrieval of the target. The results are shown in Appendix F. In relative clauses, the activation-based model outperforms both the original and the modified-direct access model, while the result for pronoun resolution is inconclusive.

General Discussion
This is the first-ever computational investigation of competing models of similarity-based interference in German language comprehension in IWA and unimpaired controls. We investigated interference in two linguistic constructions, namely pronoun resolution and relative clauses. Two models of cue-based retrieval were implemented in a Bayesian framework: The activation-based model of Lewis and Vasishth (2005) and a modified version of the direct-access model of McElree (2000) as implemented by . The activation-based model assumes a direct connection between retrieval latency and retrieval probability for memory items, whereas the modified directaccess model assumes a constant retrieval latency, along with a costly backtracking mechanism that triggers when retrieval fails. In the original direct-access model, backtracking leads to the correct retrieval of the target item from memory (McElree, 1993). In our modified direct-access model, backtracking can fail, leading to a costly misretrieval. We argue that this is a more suitable model for individuals with aphasia, as it can account for slow incorrect responses, a pattern that is frequently found in the aphasia literature (Hanne et al., 2015;Adelt et al., 2017;Lissón et al., 2021b;Pregla et al., 2021).
The predictive performance of the two models was compared against data from a visual-world experiment (Pregla et al., 2021), using the reaction time and accuracy in the picture selection task as dependent variables. Looks to the target at the critical sentence region, where retrieval is assumed to occur, were used as a predictor, along with the mean reaction times from a lexical decision task. We linked the parameters of each computational model to prominent theories of processing deficits in aphasia, aiming to answer two main questions: (a) Which model is better able to fit the data from IWA and control participants across the two experiments? and (b) What do the parameters in each model tell about the processing deficits and about interference in IWA? We will now discuss the answers to these questions, as well as the relation of our results to prior work in computational modeling of processing deficits in aphasia.
First, both models of retrieval perform well across the two linguistic constructions tested, in the sense that the relevant parameters are affected in the expected direction by group differences and by similarity-based interference. The activation-based model outperforms the modified directaccess model in the relative clauses experiment, mainly because it provides a better predictive fit for the data from control participants. However, both models perform similarly at fitting data from IWA. In pronoun resolution, the two models show similar predictive fit across groups and across conditions. Second, with regard to the underlying processing deficits in aphasia, both models are in line with slow syntax (Burkhardt et al., 2003;Burkhardt et al., 2008) and intermittent deficiencies (Caplan et al., 2013). Resource reduction (Caplan et al., 2007;Caplan, 2012), as implemented here, can only be evaluated with respect to the modified directaccess model, and the results show that the model is in line with this deficit. There was no strong indication in our data, across the two experiments and for both models, that delayed lexical access (Love et al., 2008;Ferrill et al., 2012) is a source of processing deficits in IWA: The predicted relationship between individual lexical decision latency and participant group was only found in some conditions. More experiments are needed in order to explore the role of this deficit.
Regarding the effect of similarity-based interference, based on the results for the activation-based model, we can conclude that in pronoun resolution, IWA are more sensitive to gender interference than control participants. The interaction group × condition was inconclusive in the modified direct-access model. Both models estimate that in relative clauses, the effect of number interference is rather small for both groups. The models suggest that IWA experience a subject-object asymmetry, whereas control participants do not. Below, we discuss the comparatively small effect of number mismatch in relative clauses and some possible explanations of the subject-object asymmetry in IWA.

Number Mismatch Versus Subject-Object Asymmetry
The results of the models show that for IWA, the presence of two candidate NPs with distinctive number features is of limited use in both subject and object relatives with regard to successful comprehension (see also the descriptive statistics for relative clauses, split by condition, in Appendix B). The modified direct-access model estimates no effect of number mismatch for IWA, while pronouns did show some indication of a gender mismatch effect. The activation-based model estimates that number mismatch between the NPs in object relative clauses slightly increases the probability of retrieving the target; however, the target is only retrieved about half the time in these conditions. For IWA, the main difference is between subject and object relative clauses, not between high-and low-interference conditions, as shown in Fig. 10. The subject-object asymmetry in relative clauses in IWA is in line with the canonicity effects reported in several German studies with IWA (e.g., Burchert et al., 2003;Burchert & de Bleser, 2004;Hanne et al., 2011;Adelt et al., 2017;Pregla et al., 2021). Canonicity effects refer to the fact that sentences with non-canonical word order (e.g., object-subject-verb in German) are more difficult to process than sentences with canonical word order.
In contrast to English, where subject and object relative clauses are distinguished by the subject NP intervening or not intervening in the object-verb dependency, cue-based retrieval cannot explain the subject-object asymmetry in German: Both in subject and in object relative clauses, the verb is clause-final and two NPs have to be retrieved, one that is adjacent to the verb and one that is not. Consequently, cue-based retrieval would predict no processing difference between subject and and object relatives in German.
One possible explanation for the differential effects of number marking and RC type is that German relative clauses feature case marking, and that IWA may pay more attention to case than to number cues. Case marking may be more difficult to process in German object relative clauses compared to subject relative clauses, given that there is a case mismatch between the target noun phrase and the relative pronoun in the former. One influential proposal is that of case attraction. Case attraction is analogous to the well-studied phenomenon of number attraction, which has been explained in terms of feature percolation (e.g., Schlesewsky, 1996;Eberhard, 1997;Nicol, Forster, & Veres, 1997;Bader & Meng, 1999;Fanselow, Schlesewsky, Cavar, & Kliegl, 1999;Logačev & Vasishth, 2012;Czypionka, Dörre, & Bayer, 2018). If the head noun and the relative pronoun mismatch in case, as in object relatives, the [+nom- inative] case feature of the head noun could percolate down to the relative pronoun, overriding its original [+accusative] case feature. Because the case feature of the relative pronoun signals the syntactic role of the relativized noun phrase, the object relative could be misinterpreted as a subject relative. The case attraction theory thus predicts that subject relatives are easier to process than object relatives, because the head noun and the relativizer have an identical [+nominative] feature in subject relatives. The proposal that IWA pay more attention to case than to number cues can be seen a differential weighting of retrieval cues. The cue-weighting proposal, implemented by Engelmann (2016) in the framework of the Lewis and Vasishth (2005) model, claims that depending on the linguistic structure, some retrieval cues may receive more weight, and therefore contribute more strongly to memory activation, than others (Dillon et al., 2013;Cunnings & Sturt, 2014;Parker & Phillips, 2017;Engelmann et al., 2019;Vasishth et al., 2019). Differences in weighting between case and number cues could be integrated in both the activation-based and the direct-access model, possibly at the individual participant level, as recently proposed by Yadav, Paape, Smith, Dillon, & Vasishth (2021).
Studies that have investigated processing of number and case in IWA in German provide mixed evidence with regard to differential cue weighting. For instance, Hanne et al. (2015) investigated IWA's use of case and number cues to interpret semantically reversible SVO vs. OVS sentences in German. Their data indicate that processing of case marking may be more impaired than processing of number marking. This contrasts with the results in Adelt, Burchert, Adani, & Stadie (2020), who tested case-unambiguous vs. caseambiguous, number-disambiguated object relatives. The authors found that IWA have a general processing advantage in the case-unambiguous conditions. The study of Adelt et al. (2020) supports the idea that IWA may rely more on case cues than on number cues in relative clauses. However, neither Adelt et al. (2020) nor Hanne et al. (2015) included both case and number cues within the same items. Our modeling shows that when both case and number cues are included in a sentence, IWA do not benefit from the extra number cue, suggesting that cue weighting may be a factor. Lissón et al. (2021a) investigated English relative clause processing in IWA vs. controls using a large-scale dataset from Caplan et al. (2015). Lissón et al. found an agent-first bias for control participants in English: In non-canonical clauses, such as object relatives, unimpaired controls tend to initially assign the agent role to the first noun phrase in the sentence, which is incorrect in object relatives. By contrast, IWA do not show an agent-first bias. The agentfirst bias in unimpaired controls has been attested in visualworld studies in both English (Mack, Wei, Gutierrez, & Thompson, 2016, passives) and German (Hanne et al., 2015, OVS sentences;Hanne et al., 2015, object relatives). In these studies, control participants initially show increased looks to the foil picture in non-canonical sentences. The foil picture in non-canonical sentences depicts the canonical interpretation of the sentence. As soon as they hear the relevant morphological cues (e.g., the relativizer in unambiguous German relative clauses), control participants start looking at the target picture.

Comparison with Previous Work
In the study of Lissón et al. (2021a), which modeled self-paced listening data, this processing bias was reflected in the estimates for the direct-access model. Controls had a lower probability of initial correct retrieval than IWA in object relatives (controls 40%, IWA 50%). That is, controls were estimated to initially retrieve the distractor more often than the target in object relatives. The model could still account for the higher accuracy of controls by assuming a high probability of backtracking for controls (80%) relative to IWA (20%). This pattern supported the notion that controls initially processed the first noun phrase in object relatives as the agent, until they revised and corrected their interpretation by backtracking. Surprisingly, in the present modeling of the German data, we do not see such an agentfirst bias. In the present study, controls did not have a lower θ relative to IWA, and controls' θ was always above 90% for all conditions. One possibility is that in our modeling, the RT at the end of the sentence do not reflect the agent-first bias, given that the initial interpretation has already been revised at this point.
With respect to model comparisons, Lissón et al. (2021a) found that the activation-based model had a predictive advantage over the original direct-access model, although the difference was not decisive, given the large standard error of elpd. In the present study, we show that in relative clauses in German, the activation-based model provides a better fit than both the original and the modified directaccess model. Our results contrast with , who compared the predictive performance of the activation-based model and the original direct-access model using self-paced reading data from unimpaired controls in German .  found that the original direct-access model provided a better fit to their data. However, the data modeled in  differ from our data in one important aspect: In correct trials (that is, trials with correct responses to the comprehension question), RTs were on average higher than the RTs in incorrect trials. This pattern in Nicenboim et al.'s data is crucial, because higher RTs for correct responses is what the original direct-access model assumes. Consequently, the advantage in predictive performance of the original directaccess model vs. the activation-based model came from the slow, correct responses. By contrast, our data show the opposite pattern. Correct responses are, on average, faster than incorrect responses, especially for IWA. Although in the present paper we have implemented a modified directaccess model that can account for slow incorrect responses, the cross-validation shows that the activation-based model outperforms both the original and the modified-direct access model in relative clauses in German.
Overall, our results are in line with previous modeling work in sentence comprehension in aphasia using the activation-based model Mätzig et al., 2018;Lissón et al., 2021a). If we assume that sentence comprehension is mediated by an activation-based model of cue-based retrieval, the performance of IWA can be explained by a combination of processing deficits, namely slow syntax and intermittent deficiencies.

Limitations and Future Directions
The data that we modeled in this paper (Pregla et al., 2021) is the largest-ever compilation of online measures for IWA in German. Nevertheless, the size of the IWA group (21 subjects) remains relatively small when compared to the number of subjects tested in typical eye-tracking experiments with unimpaired participants. Collecting online data from impaired populations is difficult, which is why most studies in the aphasia literature have a smaller number of participants. Usually, online experiments have 3 to 12 IWA and 10 to 20 control participants (e.g., Burkhardt et al., 2003;Love et al., 2008;Dickey & Thompson, 2009;Choy & Thompson, 2010;Hanne et al., 2011;Mack, Ji, & Thompson, 2013;Hanne et al., 2015;Mack et al., 2016;Adelt et al., 2017;Engel et al., 2018). An exception is the data presented in Caplan et al. (2013Caplan et al. ( , 2015, with more than 50 IWA, although it does not include eye-tracking data. The activation-based model outperforms the modified direct-access model in relative clauses, but both models perform similarly in pronoun resolution. We believe that the inability to find a clear answer in pronoun resolution has to do with the inherent limitations of the sample size, both in terms of subjects and items. Crucially, the relative clause experiment tested 40 items per subject (10 items per condition, 4 conditions), whereas the pronoun resolution tested 20 items per subject (10 items per condition, 2 conditions). Future work comparing these models will require much more data in order to distinguish between the models.
We have reported the most complex hierarchical structure that yielded converging fits, but in order to account for individual differences, an even more complex hierarchical structure would be necessary. This is especially true for the modified direct-access model, where an individual adjustment for the effect of δ would help in understanding the variability in the process of backtracking. The same holds for both models regarding the effect of eye fixations in the visual world paradigm. In addition, this work focuses on average, group-level effects. While the comparisons between groups yield an estimate of the average performance, it has been shown that IWA have large within and between-subject variability Mätzig et al., 2018;Pregla et al., 2021).
A future direction is to develop an individual-level modeling approach, in which parameters are estimated for each subject, in a similar vein to the modeling work in Mätzig et al. (2018) for offline data. Recent modeling work shows that under the cue-based retrieval, even among unimpaired participants, individual differences can modulate interference effects . Therefore, obtaining individual parameter estimates for IWA would be more informative regarding both interference effects, and the extent to which each processing deficit plays a role for each IWA. Furthermore, the possible role of feature percolation, that is, of target features being overwritten by features of distractor items, has been recently investigated by . Their results support the view that in unimpaired populations, feature percolation feeds retrieval processes, suggesting that a hybrid model may better explain similarity-based interference.
A second limitation of this work concerns the implementation of the models. In order to compare the two competing models in a common architecture, we implemented simplified versions that focus on a single retrieval event. However, in order to account for sentence processing deficits in IWA across the whole trial, the models should include a parser. The original Lisp implementation of the activation-based model in ACT-R (Anderson et al., 2004;Lewis & Vasishth, 2005) includes a left-corner parsing algorithm that could conceivably be added to our model. However, as far as we are aware, there exists no computational implementation of a parser in the (modified) direct-access model. A future direction would be to incorporate a parser into the (modified) direct-access model, and to fit Bayesian versions of both models that account for individual parsing steps. This is especially challenging because it requires finding a common architecture that supports a parsing algorithm for both models, so that model comparisons can be performed.
Finally, another limitation concerns the use of looks to the target at the critical region as an index of retrieval. An analysis of eye fixation patterns across the entire trial would possibly be more informative regarding the timecourse of interference during sentence processing. This is especially true for relative clauses, where our results show that higher fixations to the target at the critical region do not lead to faster reaction times in the picture selection task. Specific modeling techniques for visual-world-data could be considered, such as growth curve analysis (Mirman, 2017) or divergence point analysis (Stone, Lago, & Schad, 2020). Integrating these analyses with our computational modeling approach may be possible, and may yield important insights into the differences between IWA and controls over the course of the trial.

Conclusion
We conducted the first large-scale evaluation of two computational models of sentence processing in individuals with aphasia (IWA) in German. Our study tested two competing models of cue-based retrieval -the activationbased model and a modified version of the direct-access model -against online and offline data from IWA and control participants. The data came from a visualworld eye-tracking experiment with a picture selection task. Similarity-based interference was manipulated in two linguistic constructions, namely pronoun resolution and relative clauses. Reaction times from the picture selection task were modeled as a function of interference, group (IWA vs. control), lexical access speed, and fixations to the target picture at the critical region of the sentence. The results show that in pronoun resolution, IWA experience greater gender interference effects relative to control participants. In relative clauses, the data suggest that IWA exhibit a larger subject-object asymmetry than controls. In IWA, the subject-object asymmetry is much stronger than the effect of number interference, suggesting that in relative clauses, IWA may rely more strongly on case cues compared to number cues. The parameter estimates from both implemented models are in line with the slow syntax and the intermittent deficiencies accounts. In addition, the parameters of the modified direct-access model are also in line with the resource reduction theory. The crossvalidation results show that while both models have similar quantitative performance for the pronoun structures, the activation-based model outperforms the modified directaccess model in relative clauses.   Localization  T3  T11  Aphasia type   2  F  72  8  7  IMI  L  77  19  Anomic  3  M  76  20  17  IMI  L/R  61  20  Not-classifiable  4  F  47  13  21  IMI  L  78  20  Anomic  6  M  55  14  10  IMI  L  67  20  Anomic  8  F  51  19  7  MA  L  74  20  Anomic  9  M  64  15  2  IMI  L  73  20  Anomic  10  M  58  18  1  IMI  L  52  20  Broca  11  F  63  12  1  IMI  L  73  Appendix B: Descriptive statistics Figure 11 shows descriptive statistics for the pronoun sub-experiment by group and condition. Figure 12 shows descriptive statistics for the relative clause sub-experiment by group and condition.   For both the activation-based and the direct-access model, the varying intercepts and slopes for subject, u, come from a multivariate normal distribution with x dimensions (where x = number of by-subject adjustments), abbreviated as MV N x . The varying intercepts and slopes for items, w, also come from a multivariate normal distribution with y dimensions, MV N y (where y = number of byitem adjustments). In the equations below, 0 is a column vector of zeros with the x or y dimensions. The σ are the variance-covariance matrices of the multivariate normal distributions.

Activation-Based Model
For each trial i, values from two independent distributions are sampled, as shown in Eq.
(2). The lowest value becomes the estimated reaction time for trial i (see Eq. (3)).
Accumulator NP1 The hierarchical structure in the location parameter of the lognormal distribution for each accumulator (μ 1 and μ 2 ) and in the standard deviation (σ ) is shown in Eq. (4).

Appendix D: Model estimates (activation-based model)
Pronouns Table 3 shows model estimates for the activation-based model for the pronoun sub-experiment.  Table 4 shows model estimates for the activation-based model for the relative clauses sub-experiment. Δelpd Activation−based model vs. MDA Fig. 13 Graphical representation of the elpd between the activationbased and the modified direct-access model across groups and conditions. The dot stands for the elpd and the bars indicate to the 95% confidence interval. Positive values indicate an advantage for the activation-based model, and negative values indicate an advantage for the modified direct-access model (MDA)

Appendix F: Comparisons with original direct-access model
The elpd between the original direct-access model and the modified direct-access model is −41, SE 134 for pronoun resolution; and −175, SE 166 for relative clauses. The negative elpd estimates indicate that the modified directaccess model may have a better predictive performance, but given the large SE, the elpd are inconclusive. When comparing the activation-based model with the original direct-access model, the difference is not conclusive in the pronoun experiment ( elpd −68, SE 140), but in relative clauses, the activation-based model performs better ( elpd 578, SE 173). Thus, in relative clauses, the activation-based model outperforms both the original and the modified-direct access model.

Author Contribution
The first draft of the manuscript was written by Paula Lissón and all authors commented on previous versions of the manuscript. Dario Paape carried out revisions after the first round of review. All authors read and approved the final manuscript. Data analysis and modeling were carried out by Paula Lissón and supervised by Shravan Vasishth and Dario Paape. Nicole Stadie, Frank Burchert, and Dorothea Pregla contributed to the study conception and design. Material preparation and data collection were performed by Dorothea Pregla.

Code Availability
The data generated and/or analyzed during the current study are available in the OSF repository, https://tinyurl.com/ yz9acste.

Declarations
Ethics Approval Not applicable.

Consent to Participate Not applicable.
Consent for Publication Not applicable.

Conflict of Interest
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.