Introduction

Decades of research have demonstrated that prediction—pre-activation of expected input given (sentential) context (e.g., Altmann & Kamide, 1999)—is an integral aspect of language processing (Ferreira & Chantavarin, 2018; Kaan & Grüter, 2021). Given that language is rule-governed and systematic, anticipatingFootnote 1 what might come next in speech enhances processing efficiency.

Prediction takes place in a top-down way, meaning that our prior knowledge at different scales—general knowledge about the world, the context of the conversation, sentence meaning built so far, the quality of our linguistic representations, the consistency of provided grammatical cues in the current input, etc.—affects the integration of incoming information. The other side of the coin is bottom-up processing whereby incoming lower-level information (e.g., auditory or written words) is taken as an input which modifies higher-level information. While such bottom-up information is influenced by contextual constraints (Altmann & Kamide, 1999), research suggests that the language system also processes transient linguistic representations that operate independently of these constraints (Kukona et al., 2014; Swinney, 1979; Tanenhaus et al., 1979). Thus, language comprehension necessarily and inherently relies on both bottom-up and top-down information. Moreover, as work on prediction suggests, the interplay between the two is modulated by a number of factors such as cue reliability, language users’ goals and awareness, task demands and individual-difference factors such as comprehension ability, working memory, age, engagement with and proficiency in their language(s), etc. (Kaan & Grüter, 2021). Thus, in recent years, the focus in studying prediction has shifted away from the debate of whether it is an essential or necessary component of language processing to viewing it as a function of its expected utility (Kuperberg & Jaeger, 2016). Under this account, the degree to which comprehenders engage in prediction depends on their goals, confidence in their prior knowledge as well as their estimate of reliability of the input. Some of the pressing questions that are currently being explored in this domain ponder the circumstances under which prediction is most beneficial, the cues which are being used for it and how individual differences affect it.

In the context of this theoretical landscape, examining predictive processing in heritage language (HL) bilinguals, henceforth heritage speakers (HSs), could be especially illuminating. HSs are native speakers of a minority language acquired naturalistically in a context where there is a distinct majority language (Montrul, 2016; Polinsky & Scontras, 2020; Rothman, 2009; Rothman & Treffers-Daller, 2014). HSs most often (but not exclusively, see Kupisch & Rothman, 2018) wind up being dominant in the societal majority language, especially as age increases. Thus, HSs’ goals of language comprehension and ability to estimate reliability of input cues can vary considerably, not least because their exposure to input and/or opportunities for engaging with the HL are reduced compared to other types of native speakers (e.g. monolingual, L1-dominant bilinguals). While such variation exists also for monolinguals and L1-dominant bilinguals, the degree of this variation is much more pronounced across individual HSs. Amongst other co-morbid factors, this HS reality has been argued to be a source of grammatical representational and processing differences between HS aggregates and monolinguals and/or between individual HSs (see e.g. Montrul, 2022; Polinsky & Scontras, 2020; Kupisch & Rothman, 2018). In principle, reduced input, degree of variation in available input affecting cue validity, and/or lower opportunities for language use should have correlational and explanatory validity in the domain of predictive processing. And yet, comparatively few studies have examined the factors that affect predictive processing in HSs, much less those focusing on the individual level. The present study seeks to begin to fill this gap, showing how doing so both aids in better understanding HS processing, on the one hand, and, language processing, in particular prediction, more generally, on the other.

We approach this by examining both top-down and bottom-up processing in the HL of a group of Russian HSs in North America. Using a webcam-based eye tracking task, we investigate two domains: (i) prediction based on verb semantic constraint and (ii) lexical interference caused by incoming lexical information which is locally but not contextually (globally) coherent. Targeting the verb semantic constraint effect allows us to examine whether HSs of Russian use semantic information of a verb in a top-down manner, i.e., to generate predictions about the upcoming linguistic input. Exploring the lexical interference effect allows us to examine the interaction between top-down semantic and morphosyntactic (grammatical gender) cues and bottom-up lexical information. As we will explain in greater detail below, by juxtaposing these effects specifically with HSs we are able to test the relative contribution of various factors for predictive processing in general while simultaneously contributing to a deeper knowledge of individual variation in HS processing.

The effects of interest

Verb semantic constraint effect

Language processing occurs incrementally with comprehenders using incoming linguistic information to continuously update the set of potential continuations to include referents that meet the accumulating constraints (Altmann & Kamide, 1999). Arguably, the most significant type of constraints imposed by individual words are those linked to verbs which provide both semantic and syntactic restrictions on the kinds of complements or arguments they can take (Trueswell et al., 1993). Here, we focus on the effect of verb semantic constraints, i.e. how the meaning of a verb narrows down the set of possible arguments. For example, consider the verb knit, which implies an action of creating something by manipulating yarn (or other materials) with needles. Therefore, the types of objects that can appropriately follow the verb knit are those which can be made through knitting (e.g. sweater, scarf, mittens, etc.). It is still a rather large set of objects but the verb knit is more constraining in this way than the verb lose, for example, which can include all of the knittable objects and many more. Previous work shows that comprehenders do use this semantic information to anticipate what comes next in a sentence. This has been shown with native L1-dominant and L2 speakers (Dijkgraaf et al., 2017; Hopp, 2015; Kamide et al., 2003a, 2003b; Koehne & Crocker, 2015, among many others), however much less work in this domain has been done with HSs. The present study thus contributes to this sparse literature while taking a closer look at the role of individual factors in HSs' linguistic and cognitive abilities in predictive processing based on verb semantic constraints.

Lexical interference effect

While comprehenders use top-down information to restrict a set of potential continuations, incoming information which does not satisfy the sentential constraints is still considered, at least to some degree. Some previous work shows that comprehenders activate the contextually irrelevant meanings of ambiguous words (Swinney, 1979), consider locally but not globally consistent arguments for verbs (Peters et al., 2018), process referents for color words despite their incongruence with the context (Kukona et al., 2014), to give a few examples. The degree of activation of such locally coherent but globally inconsistent information has been shown to vary as a function of language experience, among other factors. For example, Kukona et al. (2014) used a self-organizing neural network to simulate the color word effect whereby subjects considered a white car as a referent for the color word white presented as a part of a sentence The boy will eat the white cake. The results of their simulation suggested that such bottom-up activation is a residue of the learning process. First, the system learns to operate on bottom-up representations, and only once such representations are acquired, context-dependent effects emerge. Thus when studying prediction especially in non-dominant speakers of a language it becomes important to consider not only the degree of activation of the predicted entities, but also the degree of activation/inhibition of competing representations and factors that affect it. For example, does locally coherent but globally inconsistent information interfere less with top-down predictions as a function of the number and kind of cues used for prediction? And does this interference effect change as a function of language and cognitive abilities of comprehenders? Here, taking advantage of the grammatical gender system of the Russian language, we build on the work by Kukona et al. (2014) by introducing a morphosyntactic—grammatical gender—feature into the eat this white cake context (as expressed both at the demonstrative pronoun and the adjective) to examine whether this additional cue leads to the fading of the interference effect. Testing this effect in a population of speakers with varying degrees of attainment of grammatical gender also allows us to investigate it as a function of individual differences.

Interim summary

Previous literature strongly suggests that comprehenders use verb semantic constraint cues to identify a set of plausible arguments. Both of our experiments tap into this ability. Additionally, the lexical interference paradigm also taps into the ability to ignore incoming color word information when it applies to objects outside this set by relying on semantic and morphosyntactic (e.g., grammatical gender) cues. The lexical interference effect thus involves not only forming top-down predictions or understanding bottom-up input, but rather the complex interaction between these processes. Juxtaposing these two effects makes HSs an ideal group for exploring how individual differences in language use and experience influence predictive processing. Specifically, we hypothesize that individual differences should not significantly impact the verb semantic constraint effect or do so significantly less as compared to the lexical interference effect, provided the subjects know the lexical meaning of the verbs themselves. Because such information is (more or less) conceptually universal, the reliability of this cue should be less vulnerable to input and engagement effects. In contrast, the interaction between top-down and bottom-up processing tested by the lexical interference effect is likely to be more sensitive to individual differences than simpler more isolated processes, like making predictions based solely on verb semantics. Specifically, speakers with higher levels of HL proficiency and maintenance are more likely to more successfully balance different types of linguistic information and inhibit locally coherent but globally inconsistent input (Kukona et al., 2014; Peters et al., 2018). Additionally, previous work on grammatical gender in Russian HSs in North America has shown it to be a particularly vulnerable domain, or at least one where there is considerable interspeaker variability (Polinsky, 2006, 2008). Such variability would be predicted to affect at least some HSs’ determination of the reliability of grammatical gender as a viable predictive cue—e.g., those who have a simplified system and/or those whose exposure to others is characterized by significant degrees of variation in grammatical gender assignment/agreement.

Predictive processing in HSs

Verb semantic constraints

An emerging body of work has examined predictive processing in HLs across various domains. However, research specifically focusing on prediction based on verb semantic cues remains highly limited. To our knowledge, only Ito et al. (2023) have tested the verb semantic constraints in HSs. Using Visual World Paradigm (VWP), they tested anticipation in HSs in Germany by manipulating verb semantic constraints (choosing verbs that had the same or different constraints in Vietnamese and German) and classifier constraints (specific classifiers are associated with certain prototypical semantic classes, functioning as a semantic cue). The study showed that HSs were using both verb semantics and classifier information to predict the target. When comparing them to the group of L1 Vietnamese-L2 German late bilinguals, the authors did find that HSs were slower in attending to the target when verb semantic constraints were different in Vietnamese and German than the late bilinguals, suggesting an influence from the societal dominant language German. Overall, this study suggests that HSs do make use of semantic information being encoded in verbs or classifiers, albeit with a delayed anticipation (compared to L1-dominant users) that may be caused by the influence of the societal dominant language.

While not focusing specifically on verbs, Hao et al. (2024) have also shown that HSs use semantic information predictively. In this study, Mandarin-speaking HSs were tested using the VWP to examine how they make use of Mandarin sortal classifiers to predict the upcoming nouns. Different from Vietnamese, Mandarin classifiers function both as a grammatical form class cue and a semantic cue. The results suggested that HSs, at the aggregated group level, relied primarily on grammatical information provided by classifiers to identify the upcoming target. However, an analysis of individual differences revealed that not all participants rely on grammatical form class cue. Instead, HSs with higher Mandarin Social Use scores (based primarily on questions related to informal literacy) and those having attended Mandarin language programs were more sensitive to semantic cues. The authors suggest that the effects of Mandarin social use and language program may be due to their impact on metalinguistic awareness, particularly given that the semantic association between classifiers and nouns is generally more robust than item-specific grammatical form class associations. The Hao et al.’s study highlights the importance of an individual differences approach, and the current study extends this approach to verb-based semantics prediction in HSs.

Another study which did not examine the verb semantic constraint per se but is still relevant for our purposes here is the one by Parshina et al. (2022) who examined lexical and morphosyntactic prediction in Russian HSs in the US and focused on individual differences. Targeting various types of Russian grammatical structures (various word orders, active and passive voice, various types of relative clauses) in a reading paradigm, they found evidence for anticipation of specific lexical items and that this effect was modulated by the Russian literacy experience and English reading fluency. As for the morphosyntactic cues, the authors examined six different features: word class, noun gender, case and number, as well as verb tense and number. HSs showed evidence of anticipation of the word class as well as noun and verb number. Both the lexical and morphosyntactic effects were modulated by Russian literacy experience and reading fluency in English.

The findings from the studies reviewed above are relevant to the present research as they indicate that HSs do utilize semantic information for prediction, although how much they rely on it varies depending on their language background. Therefore, in the population tested herein we should expect there to be a discernible effect of the semantic properties of the verb, i.e., evidence of the verb semantic constraints. Moreover, our approach does not rely solely on aggregated trends, but rather seeks to unpack anticipated individual differences in HL processing. While inspired by previous work on HS individual differences, our selection of factors related to language use remains somewhat exploratory. These factors will be summarized below, after we introduce our second effect of interest.

Lexical interference effect

To our knowledge, no study has examined the lexical interference effect in HSs. However, previous research with other speaker populations offers valuable insights into the individual background factors that can predict participants' behavior. For example, Nozari et al. (2016) show that the effect of bottom-up interference varies as a function of domain-general inhibitory control measured in a Flanker task (Nozari et al., 2016), i.e. the more people are distracted by the incongruent markers in the Flanker task, the more they are susceptible to locally but not globally coherent information in a linguistic task. Thus, the non-linguistic Flanker task (see Eriksen & Eriksen, 1974; Eriksen, 1995; Hübner & Töbel, 2019) and the linguistic lexical interference task seem to tap into the same underlying construct of competition suppression. Along similar lines, Peters et al. (2018) looked at how bilinguals activate representations which are compatible either with both the local and the global context or only with a local context. Using eye tracking, they showed that when hearing a sentence The pirate chases the ship and observing a display with a bone, a ship, a treasure box and a cat, participants with smaller vocabularies looked more at the locally but not globally coherent items (e.g. a cat who is chaseable but is unexpected in the context of a pirate). Summarizing across these studies, we predict that the interference effect will vary as a function of language experience, specifically, vocabulary size and usage patterns as well as inhibitory control ability in our sample of Russian HSs.

Other relevant work

It should be noted that in the context of our study, grammatical gender cues may function as additional markers, strengthening participants' commitment to their predictions and directing their attention away from distractors that are inconsistent with the sentential context. A few studies have examined whether HSs use grammatical gender as a cue for anticipating what comes next as language unfolds. For example, Sekerina (2015) presented their participants with 4 panels depicting a flying event with different objects and played a sentence describing it, e.g. Пo нeбy лeтeлa cepeбpиcтaя paкeтa -In sky was flyingFEM-SG silverFEM-SG-NOM rocketFEM-SG-NOM. The gender and the number of the target object could be identified upon hearing the verb in the past tense and the adjective preceding the object name. The authors manipulated the competitors which either matched or not the target object in grammatical gender/number. They found that participants used grammatical number and feminine (but not masculine) gender predictively.

Using eye tracking, Fuchs (2022a) investigated facilitative use of grammatical gender in on-line processing in HSs of Spanish. She recorded participants’ eye movements while they were observing a visual scene with two objects with either matched or mismatched grammatical gender and answering the question ¿Dónde está [ARTICLE] [NOUN]? (Where is [article] [noun])? In Spanish, grammatical gender is marked on the article and thus it is possible to identify the target in the gender mismatch condition already at the article. Fuchs found that when taking into account only trials with the canonical gender assignment by HSs, their processing is qualitatively similar to that of dominant L1 Spanish users and they do identify the target object earlier in the mismatch relative to the match condition. Some subsequent work with HSs of Polish (Fuchs, 2022b) showed predictive use of grammatical gender information in this population of speakers as well.

Other psycholinguistic work with HSs has investigated whether they use other grammatical cues predictively. For example, both Karaca et al. (2024) and Özsoy et al. (2023) looked at the predictive use of grammatical case in HSs of Turkish in the Netherlands. The results are not straightforward. Karaca et al. (2024) found that their participants used grammatical case only in the verb-medial position, when grammatical cues were scaffolded by verb semantics, and not in the verb-final position when only morphosyntactic cues were available. Moreover, exposure to and use of Turkish as well as engagement with literacyFootnote 2 in both Turkish and the majority language (Dutch) predicted the strength of this effect. In Özsoy et al. (2023) two different approaches to statistical analyses resulted in two opposite patterns of results. But even the approach which supported the predictive use of case in this population did so only in the group of participants who took part in the lab-based experiment and not in the online experiment. A closer look at individuals across both testing modalities (in-lab and online) showed that they could be divided in three groups: those who reliably use case predictively, those who do not, and those who use case predictively at a chance level. While Özsoy et al. (2023) did not have any further language background or cognitive details about their subjects, they do acknowledge that further work should look into what factors underlie such variability.

In contrast to the studies reviewed above, Sekerina and Laurinavichyute (2020) report evidence that HSs of Russian failed to use case information predictively. They conducted an eye-tracking study to explore both real time processing and offline comprehension of two types of questions with different word orders in dominant L1 users and HSs of Russian: SOV subject Wh-questions with a scrambled word order (absent in English, e.g. WhoNOM girlACC kissed at school?) and OSV object Wh-questions (similar to English, e.g. WhoACC boyNOM kissed at school), following a context sentence disambiguating who did what to whom. Dominant L1 Russian users were able to predict the target already at the verb (kissed) while HSs avoided commitment to the target reference for both types of questions. Offline comprehension of these questions was equally good in both groups. Sekerina and Laurinavichyute (2020) speculated that such online behavior may be attributable either to insufficient online resources while calculating a short syntactic dependency or low confidence in processing in the HL. These speculations encourage follow-up work looking at individual variation in these domains and how it affects reliance on grammatical cues for predictive processing.

Summary of research questions

Following the emerging trend in the (heritage) bilingualism literature to investigate bilingualism as a spectrum of experiences (Rothman et al., 2023) rather than a categorical variable, the present study is asking the following research question: do prediction and lexical interference in HSs vary as a function of their individual differences in language experience, proficiency and cognitive skills? And what does the answer to this tell us about predictive processing in general? Inspired by the above-reviewed studies, the factors this work examines are vocabulary, cognitive control, knowledge of grammatical gender, engagement with literary-based activities in the HL and a multilingual engagement score. These factors are further discussed in "The receptive vocabulary task"–"The language history questionnaire" sections.

Methods

Participants

The resulting final dataset consisted of 68 participants. Their demographic information is reported in Table 1. We recruited HS participants either through personal communication or via contacting US or Canadian universities with heritage Russian programs, asking their instructors to share the study details with students. Given our main interests lie in describing and unpacking individual differences between HS bilinguals and in following with argumentation that this can be best done without direct comparison to L1 dominant users as an assumed baseline, our participant recruitment focused uniquely on HSs (e.g., De Houwer, 2023; Rothman et al., 2023). We defined HS inclusion for this study as individuals who (i) spoke Russian because it was a/the language spoken in their home, (ii) were born in the US or Canada or immigrated there before school age (iii) also had native fluency in English and were primarily educated in English and (iv) were at least 18 years old. Seventy-three participants entered the study on Gorilla Experiment Builder (Anwyl-Irvine et al., 2019, 2021). The data from four people were excluded during the preprocessing stage due to low sampling rate (< 5 Hz, see section Data preprocessing and analysis for more details). The data from one additional participant were excluded because they did not specify Russian as one of their languages in the language history questionnaire.

Table 1 Participants’ demographic information

Tasks

The eye tracking task

We used the same materials (and setup) previously reported in Prystauka et al. (2024), who successfully used web-based eye tracking to test anticipation and lexical interference in just over 200 Russian L1 dominant users. That study contained a discussion of methodological details related to online eye tracking, which we do not repeat here, that couples together with an emerging methodological literature on web-based eye tracking (Slim & Hartsuiker, 2022; Vos et al., 2022) to serve as the basis for our confidence in the method to address the present research questions. The experimental procedure comprised two sets of materials (intermixed), both presented to the same participants within the same study. The materials for testing the semantic constraint effect consisted of sixteen stimulus sets. Each set included a quadrant-based visual display featuring four images (Fig. 1) and two minimal sentence pairs. These sentence pairs differed only in the choice of verb: constraining or non-constraining, determined by the verb's meaning in relation to the visual scene.

Fig. 1
figure 1

Example scene used to test the effect of verb semantic constraint. Note. Participants heard Жeнщинa пoльёт/пoдвинeт pacтeниe (The woman will water/move the plant)

In the constraining condition, verbs allowed reference to only one of the four objects in the visual scene post-verbally. For example, Жeнщинa пoльёт pacтeниe (The woman will water the plant). In the non-constraining condition, all four objects could be referred to post-verbally. For instance, Жeнщинa пoдвинeт pacтeниe/вилкy/пылecoc/дивaн (The woman will move the plant/fork/vacuum cleaner/couch). However, the target object was always the same as in the constraining condition.

To counterbalance, we created two stimulus lists, each containing eight sentences in each condition. Participants were presented with only one sentence from a minimal pair during the experiment.

Materials for testing the lexical interference effect comprised thirty-two sets of stimuli. Each set included a quadrant-based visual scene displaying four images (Fig. 2) and two minimal sentence pairs. These pairs differed only in the color adjective used. Visual scenes were composed of two pairs featuring distinct object types, with objects within each pair differing only in color—for instance, a scene might include both a black and a brown pipe and a black and a brown hat. In half of the stimuli pairs, the object types matched in grammatical gender, and in the other half they mismatched (this manipulation is explained in more details below).

Fig. 2
figure 2

Example scene used to test the effect of lexical interference. Note. Participants heard Дeдyшкa выкypит этy чepнyю тpyбкy (The grandfather will smoke this black pipe) or Дeдyшкa выкypит этy кopичнeвyю тpyбкy (The grandfather will smoke this brown pipe) (color figure online)

All sentences in this set had constraining verbs, meaning only two of the four objects in a scene could appear in a post-verbal position. The target object could be identified upon hearing the color adjective. Each object in a scene represented one of four conditions in a 2 by 2 design, incorporating factors of verb consistency and color consistency. For example, in a scenario involving pipes and hats, a participant might hear the sentence Дeдyшкa выкypит этy чepнyю тpyбкy (The grandfather will smoke this black pipe). In this context, the black pipe is consistent with both the verb's selectional restrictions and the color adjective, the brown pipe aligns only with the verb and not the adjective, the black hat aligns with the adjective but not the verb, and the brown hat does not align with either the adjective or the verb.

To address potential issues related to color saliency, half of the participants were presented with sentences featuring one color as the target, while the other half heard sentences featuring the alternative color as the target.

There was also an additional factor of grammatical gender in this subset of stimuli, such that the target and the distractor either matched (N = 16) or not (N = 16) in grammatical gender. There are three grammatical genders in Russian: masculine (46% of the nominal lexicon), feminine (41% of the nominal lexicon) and neuter (13% of the nominal lexicon). Variation in the resulting gender systems of Russian in HS contexts has been reported, especially in the US. Overall, Russian HSs in general use phonological cues, leading to a straightforward categorization of nouns based on their endings: consonant endings are associated with the masculine gender for all, while vowel endings can be categorized in two ways, resulting in a two-gender system for some (feminine being assigned to all nouns ending in a vowel). Still other HSs distinguish between neuter nouns (those ending in a stressed −o, thus easily differentiable from the rest of the vowel-final nouns) and all other nouns, which are interpreted as feminine (Polinsky, 2008). With this in mind, we only used masculine nouns ending in consonants and feminine nouns ending in vowels (not with stressed -o). Within each condition (i.e. match and mismatch), an equal number of targets (N = 8) were of masculine or feminine grammatical gender. Grammatical gender in Russian is expressed on parts of speech agreeing with the noun, which in this design–not least as Russian does not otherwise have overt articles–included a demonstrative pronoun this (этoтmasc/этafem) and an adjective. The prediction this study is testing is whether in the mismatch condition subjects will use an additional cue (grammatical gender) available already at the demonstrative pronoun to disregard the contextually irrelevant distractor despite its consistency with the color word, resulting in fewer looks to this distractor.

A female native speaker of Russian (a professional voice actress) was instructed to produce the sentences naturally. The stimuli were recorded in a sound-attenuated booth and edited using the Audacity software. Visual stimuli were created with the images sourced from the ClipArt collection (clipart.com). All materials are provided in Appendix.

The receptive vocabulary task

To assess participants’ receptive vocabulary in Russian we used an adapted version of the KORABLIK test—Clinical Assessment of the Development of Basic Linguistic Competencies (Lopukhina et al., 2019). Specifically, we used a subset of the test designed to assess comprehension of nouns and verbs. On each trial, participants were presented with 4 pictures and an auditory word. Their task was to choose the matching picture. There were 4 practice trials and 48 experimental trials, evenly split between nouns and verbs. Given previous work, we hypothesized that vocabulary score would be an important factor for both the verb semantic constraint and lexical interference effects (Kukona et al., 2014; Peters et al., 2018).

The Flanker task

To assess inhibitory control, we used the Flanker task (Eriksen & Eriksen, 1974). Participants were presented with sets of five arrows or an arrow surrounded by two dashes on each side. Their task was to specify the direction of the arrow in the middle. Surrounding arrows (flankers) were pointing either in the same direction (congruent condition, <<<<<) or in a different direction (incongruent condition, <<><<). Trials with an arrow and dashes represented the neutral condition. Incongruent trials require participants to inhibit the conflicting information coming from the surrounding arrows, thus leading to greater recruitment of the cognitive control system. The task consisted of 10 practice and 144 experimental trials evenly split into three conditions.

We primarily included this task to examine the link between inhibitory control and the lexical interference effect (Nozari et al., 2016). However, previous literature suggests a relationship between predictive behavior in language processing and different aspects of cognitive control (Noh & Lee, 2017; Zirnstein et al., 2018). That is why we also explored the role of cognitive control for our verb semantic constraint effect.

The gender knowledge task

The gender knowledge task consisted of the subset (N = 32) of items from the lexical interference condition. Participants were presented with an image of an object such as the black pipe, and four auditory descriptions of the target object. These descriptions varied in the adjective which was manipulated based on its gender agreement and color consistency with the noun. Using the black pipe as an example, participants heard чepнaя тpyбкa (blackfem pipe), бeлaя тpyбкa (whitefem pipe), чepный тpyбкa (blackmasc pipe), бeлый тpyбкa (whitemasc pipe). Their task was to click on the corresponding description.

We included this task to explore whether participants’ knowledge of gender assignments for the items in the study interacts with their ability to use grammatical gender predictively, which would provide them with an additional cue to inhibit contextually irrelevant items.

The language history questionnaire

We used the LHQ3 version of the Language History Questionnaire (Li et al., 2020) which includes a comprehensive set of questions for assessing the linguistic background of multilingual speakers. It also allows for the calculation of multiple aggregated scores such as proficiency, dominance, immersion and multilingual language diversity (MLD). The proficiency score is derived from self-rated proficiency across listening, speaking, reading, and writing components. Immersion quantifies the amount of time speakers were immersed in a particular language. Dominance integrates self-reported proficiency and the time spent on various language components. Lastly, MLD estimates the balance in the use of up to four languages, considering both dominance and frequency (and is based on the language entropy construct discussed in Gullifer & Titone, 2020). Moreover, during the calculation of the aggregated scores, researchers can assign different weights to the modules, for example it is possible to estimate proficiency in writing and reading (i.e. skills that require literacy activities) separately from speaking and listening. To test our research questions, we focused on self-rated proficiency in reading and writing in Russian as a proxy for engagement with literacy in the HL, as well as the MLD score. Regarding proficiency, we adjusted the calculation of this score to include an estimation of engagement with reading and writing. This modification was made as HSs tend to exhibit significant variation in this domain, and previous research has demonstrated its predictive value in language attainment (Bayram et al., 2019) and predictive processing (Mani & Huettig, 2014; Mishra et al., 2012). Regarding MLD, higher scores on this scale indicate more balanced—integrated—use of languages whereby all languages are used across different contexts and language mixing is a common phenomenon. Lower scores indicate more compartmentalized language use whereby one language is associated with one context. MLD calculation is based on language dominance which is in turn based on self-assessed proficiency and language use estimations in different contexts. It is thus a comprehensive measure reflective of individual differences in current language experience.

Procedure

The experiment was programmed in the Gorilla Experiment Platform (Anwyl-Irvine et al., 2019). Personal computers were set as the only allowed device type. The study started with a video instruction explaining the purpose and the general procedure of the experiment, following which the participants were directed to the consent form. Upon providing their consent, participants were redirected to the eye tracking task, which started with more specific video instructions, example trials and the first webcam calibration procedure. Participants who successfully finished calibration proceeded to the eye tracking experiment which consisted of 48 trials (16 for the semantic constraint effect, 32 for the lexical interference effect, intermixed), split into three blocks, divided by two additional calibration routines (i.e., a new calibration occurred after every 16 trials). Each trial started with a fixation cross and progressed to a visual display once the participants clicked on it. After a preview time of 1000 ms, the audio was played (the actual audio onset time varied somewhat among participants, which is further discussed in the Data Preprocessing section). Participants were instructed to press on the object mentioned in the sentence after the sentence offset, with the button press activated after the audio offset. Following the eye tracking task, participants also performed a vocabulary task, presented in one block, the Flanker task across three blocks and the grammatical gender task presented in a single block. Afterwards, they were redirected to Qualtrics (Qualtrics 2015) where they filled out the Language History Questionnaire. As the study targeted HSs whose reading and writing abilities may vary significantly, all instructions were auditory and all responses were programmed as button clicks (rather than typing). All task instructions were in Russian. The Language History Questionnaire required written responses and was in English.

Data preprocessing

We followed the same preprocessing as in Prystauka et al. (2024). Gorilla provides two output data quality metrics for the eye tracking data. One of them is the mean convergence value (“convergence”) for fitting a facial model. This represents the model’s confidence in finding a face (and accurately predicting eye-movements), with values varying from 0 to 1. Values below 0.5 suggest a likely convergence of the model. Another metric is “face_conf”, which represents the Support Vector Machine (SVM) classifier score for the face model fit. This score indicates the degree to which the image under the model resembles a face, with values ranging from 0 to 1. Numbers exceeding 0.5 indicate a well-fitted model. In our sample, all “convergence” values were 0, and approximately 0.06% of data points with “face_conf” values below 0.5 were excluded.

The number of incorrect responses (failure to click on the right object) was 1.5%. They were excluded from the data.

Sampling rate of the original sample (N = 73) varied from 1.7 to 29.9 Hz (see Fig. 3 for the distribution of sampling rate in our sample). We excluded participants with less than 5 samples per second, which resulted in the exclusion of 4 individuals and an additional participant based on their responses on the LHQ3. The mean sampling rate in the resulting group was 23.2 Hz (SD = 4.85 Hz, range 10.9–29.9 Hz).

Fig. 3
figure 3

The distribution of participants’ sampling rate

Additionally, the experiment design included an image preview time of 1000 ms, although the actual onset time of the sentence varied between participants due to the properties of their hardware and connection speed. Gorilla offers the option to download additional metrics on audio events, including the timing of when the audio event actually started (as opposed to when it was requested). The range of actual onset times spanned between 1001 and 3977 ms with only five trials exhibiting onset times above 2000 ms across different participants. The mean onset was 1055 ms, with a standard deviation of 77 ms. The distribution of onset times is illustrated in Fig. 4. Our analysis accounted for these variations in the onset of the audio.

Fig. 4
figure 4

The distribution of audio onset times among participants

Analysis of the eye tracking task

We defined our regions of interest (ROI) as quadrants, with the critical ROI being the one containing the target image (i. e., the object mentioned in the sentence) for the segment of the study testing the verb semantic constraint effect. For the lexical interference effect the critical ROIs were the ones containing the distractor items (i.e. items of the other semantic type than the target).

To analyze our data, we employed (generalized) linear mixed effects modeling, given its ability to test for condition effects while accounting for random effects such as trials and/or subjects. Different dependent variables were specified for the two effects of interest.

Verb semantic constraint effect (Altmann & Kamide, 1999)

First, we defined our time window of interest from 200 ms after the verb onset and until 200 ms after the noun onset (since it takes roughly 200 ms to plan and perform a saccade, Matin et al., 1993). We coded looks to the target binomially (1 = fixated if there was at least one look to the target, 0 = not fixated) for 100 ms time bins within our temporal window of interest and ran a mixed-effects logistic regression model (GLMM) using the glmer function from the lme4 package (Bates et al., 2015), which is commonly used for binary outcomes or proportions. The following fixed factors were used: condition (constraining vs non-constraining verb), mean vocabulary score (i.e. mean accuracy on the vocabulary task), Flanker effect size (defined as the reaction time difference between the incongruent and congruent trials), self-rated proficiency in writing and reading (a proxy for engagement with literacy) in Russian and MLD (based on the LHQ3 questionnaire). Our primary hypotheses revolved around the interaction of individual background variables with the verb condition, for this reason we included the interaction term between these variables and the condition variable. Specifically, when the verb is constraining and enables predictions about the upcoming target, various language background factors are expected to predict the likelihood of a speaker to act on this information, i.e., make predictions. The continuous variables were centered and scaled. VIF values were calculated for the predictor variables, ranging from 1 to 1.4. These low VIF values indicate that multicollinearity is not a significant concern in the model. Participants and trials were entered as random factors. The model included the following fixed and random effects:

$$\begin{aligned} & Model < - glmer(Looks \, to \, the \, Target\ \sim Condition*{(}Mean \, Vocabulary \, Score + Flanker \, Difference \, RTs \\ & \quad + Self - Rated \, Proficiency \, in \, Writing \, and \, Reading \, in \, Russian + Multilingual \, Language \, Diversity{)} \\ & \quad + \left( {1 \, | \, Trial} \right) + \left( {1|Participant} \right)) \\ \end{aligned}$$

Lexical interference effect (Kukona et al., 2014)

Given that both distractors are present in the picture, the looks to one entail not looking at the other. That is why we computed the advantage score defined as the log ratio of the proportion of looks to the distractor of the same color to the proportion of looks to the distractor of a different color in our time window of interest including 0.5 in both the numerator and the denominator to prevent computation errors in cases where the denominator is zero (Ito & Knoeferle, 2022). As previous studies (Kukona et al., 2014; Prystauka et al., 2024) as well as the data from this experiment suggest, the interference effect is most pronounced on the noun, that is why we defined our window of interest as 1000 ms post noun onset. Using a linear mixed-effects model, we then examined the advantage score as a function of our factors of interest. The following fixed factors were used: condition (gender match/gender mismatch), Flanker effect size (defined as the reaction time difference between the congruent and incongruent trials), mean accuracy score from the gender judgment task, mean accuracy score from the vocabulary task, self-rated proficiency in writing and reading in Russian and MLD, as well as the interaction between gender (mis)match and gender score. Our dependent variable, the advantage score, accounted for the difference in looks to the distractors of different colors, thus detecting an effect of any individual difference variable would entail the relationship between that variable and the difference in looks to the distractors of different colors. We incorporated only one interaction factor, specifically the interplay between gender manipulation and performance on the Gender Knowledge task. This decision was based on our prediction that participants who performed better on a gender task would be more inclined to use gender information as an additional cue to suppress contextually irrelevant distractors. We chose not to include interaction terms between the gender manipulation and other individual difference variables due to a lack of strong theoretical grounds to anticipate a significant relationship between these variables and different levels of a gender condition.

The continuous variables were centered and scaled. VIF values were calculated for the predictor variables, ranging from 1 to 1.9. These low VIF values indicate that multicollinearity is not a significant concern in the model. Participants and trials were used as random factors. The resulting model was as follows:

$$\begin{aligned} & Model < - lmer(Log \, Ratio\ \sim Gender \, \left( {Mis} \right)Match*Gender \, Score + Vocabulary \, Score \\ & \quad + Self - Rated \, Proficiency \, in \, Writing \, and \, Reading \, in \, Russian + Multilingual \, Language \, Diversity \\ & \quad + Flanker \, Difference \, RTs + \left( {1|Trial} \right) + \left( {1|Participant} \right)) \\ \end{aligned}$$

Results

Gender knowledge and vocabulary tasks

In the gender knowledge task, accuracy varied from 66 to 100%, with an average of 97% (SD = 6.7%). Accuracy on the vocabulary task varied from 62.5 to 100%, with an average of 91% (SD = 8.8%).

Flanker task

Four people did not provide their responses on the Flanker task, thus the data below is reported for the 64 participants who did. Accuracy in the congruent condition varied from 86 to 100%, with an average of 99% (SD = 2.2%). Accuracy in the incongruent condition varied from 46 to 100%, with an average of 96% (SD= 8.1%). Reaction times in the congruent condition varied from 337.3 ms to 574.7 ms, with an average of 444.6 ms (SD = 56.4 ms). Reaction times in the incongruent condition varied from 385.4 ms to 619. 9 ms, with an average of 488 ms (SD = 56.6 ms).

To address the issue of missing data,Footnote 3 we utilized the Multiple Imputation by Chained Equations (MICE) method in R (van Buuren & Groothuis-Oudshoorn, 2011). The imputation process was carried out on a part of a dataset that contained average values of other predictor variables (accuracy on the Vocabulary and Gender knowledge tasks, Proficiency in Reading and Writing in Russian and MLD) per participant. The imputed values followed the distribution of the observed data, suggesting that they could be plausible measurements had they not been missing.

LHQ3

Summary of the relevant LHQ3 based composite scores is provided in Table 2.

Table 2 Summary of LHQ3 based composite scores

Verb semantic constraint effect (Altmann & Kamide, 1999)

The time course of the proportions of looks to the target is illustrated in Fig. 5.

Fig. 5
figure 5

The average proportions of looks to the target across time. Note. A. Error ribbons represent standard error. Vertical lines indicate the time window between 200 ms after the verb onset and 200 ms after the noun onset

The analysis revealed a significant main effect of condition (with more looks to the target following constraining verbs) and significant interaction between condition and Flanker effect. The results are summarized in Table 3 and plotted in Fig. 6. The interaction suggests that the influence of the Flanker effect on the allocation of visual attention to the target is moderated by the verb constraint. Specifically, a constraining verb context seems to lead to a decreased likelihood of looking at the target with an increasing Flanker effect, while a non-constraining verb context shows the opposite pattern.

Table 3 Summary of the model to test the verb semantic constraint effect
Fig. 6
figure 6

Predicted values illustrating the a main effect of Condition, b significant interaction between Condition and Flanker effect (RT difference between the incongruent and congruent conditions). Note. The post-hoc tests on the Flanker effect indicate that the significant interaction is primarily driven by participants with a smaller Flanker effect, who exhibited a greater difference in the proportion of predictive looks to the target between the constraining and non-constraining conditions

Lexical interference effect (Kukona et al., 2014)

The time course of the proportions of looks to all items in a scene is illustrated in Fig. 7.

Fig. 7
figure 7

The average proportions of looks to all objects in a display (a); to the distractors in a zoomed-in window (b) and to the distractors broken down by the gender manipulation (c)

The analysis revealed a main effect of Vocabulary score (Table 4 and Fig. 8), whereby participants with a higher Vocabulary score exhibited a smaller difference between looks to the distractors of the same and different colors indicating reduced susceptibility to lexical interference.

Table 4 Summary of the model to test the lexical interference effect
Fig. 8
figure 8

Predicted values illustrating the main effect of Vocabulary Score and Flanker effect size

Discussion

This study investigated the interplay between language processing, visual attention, inhibitory control and language use in HSs of Russian in North America. We looked at two specific phenomena: (i) the anticipation driven by verb semantic constraints and (ii) the presence of lexical interference. Our approach focused on modeling these as potential functions of individual differences, including vocabulary size, knowledge of grammatical gender, cognitive control, self-rated proficiency in reading and writing in Russian, and language entropy (MLD). Crucially, we consider how doing so informs current discussion on predictive processing more generally.

The basic findings of this study suggest that the present Russian HSs utilized verb semantics to anticipate what comes next; they also experienced lexical interference from adjectives that were color word-consistent but verb-inconsistent, similar to the results observed in the original studies with monolinguals (Altmann & Kamide, 1999; Kukona et al., 2014) and adult L2 learners (Dijkgraaf et al., 2017). Detecting the basic effects of our experimental manipulations in this population of HSs is a valuable finding in itself. While a sizeable literature investigating HSs’ grammars in different language combinations exists (see Montrul, 2022; Polinsky, 2018; Montrul & Polinsky, 2021; Kupisch & Rothman, 2018 for review), the majority of available data come from offline behavioral methods. Recently, there have been calls to expand the experimental methods used with HSs to include the same variety of psycholinguistic experimentation used with L1-dominant speakers and L2 learners, especially online ones (Bayram et al., 2021; Polinsky & Scontras, 2020). Indeed, online (real time) psycholinguistic measures provide a more automatic way of estimating individuals’ linguistic knowledge. As such, their increased use might help overcome issues related to the unfamiliarity of HSs with the use of their HL in a formal context, which, in turn, might affect their metalinguistic judgements (Bayram et al., 2021; Montrul et al., 2014; Montrul, 2022). Our findings thus contribute to a growing literature on language processing in HSs and underscore the importance of utilizing a broader array of experimental techniques to fully capture the nuances of their linguistic abilities. Moreover, the fact that these real time processing effects were detected using online (i.e. web-based, as opposed to lab-based) eye tracking adds even more to the literature, highlighting the efficacy and utility of this methodology in general and with HSs in particular (Degen et al., 2021; Slim & Hartsuiker, 2022; Vos et al., 2022; Prystauka et al., 2024). This approach is particularly advantageous for studying populations such as HSs, who are often geographically dispersed and difficult to access for testing in a single physical location.

Perhaps a more significant contribution of the present paper concerns the individual differences results. The effect of anticipation based on verb semantic constraint was modulated by participants’ performance on a Flanker task. The measure we chose for the Flanker effect is the difference in reaction times between the incongruent and congruent trials which measures the additional time required to process and respond to incongruent trials relative to congruent ones. This measure is a commonly used indicator of the efficiency of cognitive processes involved in handling interference. Our results suggest that in the constraining condition, looks to the target increased with a decreasing Flanker effect while in the non-constraining condition, looks to the target decreased as a function of a decreased Flanker effect. In other words, those who managed interference in the Flanker task more adeptly (indicated by a lower reaction time cost) were also better at leveraging semantic constraint cues, ignoring distractions, and focusing on the target. Assuming that larger Flanker effect size reflects reduced inhibitory control, our findings suggest that attending to the target is not just a function of linguistic context but also relates to an individual participant’s ability to effectively and strategically allocate attention in situations which afford that. There is a sizeable body of work looking at the relationship between language processing and domain-general cognitive control, particularly in the bilingualism field where researchers investigate control processes involved in managing multiple languages (Abutalebi & Green, 2007; Bialystok, 2024; Bialystok & Craik, 2022; Luk et al., 2012) or in studies with monolinguals investigating linguistic stimuli with some sort of conflict (e.g. ambiguity or violations at different levels, Hsu et al., 2021; Brown-Schmidt, 2009; Novick et al., 2014; for a review, see Ness et al., 2023; Patra et al., 2023).

The literature on the role of domain-general cognitive abilities for predictive processing is more sparse, yet there is previous work highlighting different aspects of this relationship in different types of speakers. Noh and Lee (2017) tested the effect of verb semantic constraint utilizing a VWP design (similar to the one we report here) in a group of Korean L2 learners of English. They found that accuracy on a Flanker task correlated with anticipatory processing—a result very similar to the one reported here, yet we detected this relationship using an RT measure. Zirnstein and colleagues conducted a series of EEG experiments exploring a relationship between cognitive control and predictive processing in native and L2 speakers (Zirnstein et al., 2018), in older mono- and bilingual adults (Zirnstein et al., 2019) and in heritage and L2 speakers under noise conditions (Fricke & Zirnstein, 2022). Our findings extend this (and other Covey et al., 2024; Dave et al., 2021) work by providing evidence that cognitive control modulates predictive processing in a group of HSs more generally. Our participants listened to sentences which were neither ambiguous nor contained violations (as opposed to more complex/nuanced designs utilized in the above mentioned studies), and yet their inhibitory control abilities still predicted how much they rely on the semantic information of the verb to anticipate how the language will unfold constrained by the four images in a visual scene. It could be that this relationship was detected in our study because processing HLs is (often) more demanding and, thus, requires increased reliance on domain general control even when there is no violation/ambiguity. If on the right track, this would underline a crucial role of cognitive control not only in navigating linguistic complexity per se, but also in enhancing efficiency of language processing more generally. This perhaps is most readily detectable in populations like HSs because they are a type of native, naturalistic speaker who has had significantly less exposure to and opportunity to use the (heritage) language.

Of course, HSs vary at the individual level in terms of the quantity and quality of their language exposure and usage. It, thus, follows that the magnitude of such an effect would calibrate to an individuals’ measurements that proxy for these relevant differences. In other words, it is not to be taken for granted that (compensatory) processing manifestations of reduced input/intake and usage patterns will manifest to the same degree for all HSs, if at all for some. This bears out in the individual differences approach to the present data set and is in line with an emerging body of work focusing on the added value of individual differences approaches to bilingualism research (Rothman et al., 2023). As such, the present data set provides novel insights into the processing of HLs, on the one hand, and the relationship between domain-general cognition and language processing more generally, on the other (Luk & Rothman, 2022).

Our finding from testing the interference effect suggested that participants with higher vocabulary scores were less subject to lexical interference. This vocabulary finding is in line with our predictions, previous computational modeling, and experimental work (Kukona et al., 2014; Peters et al., 2018). This suggests that participants with larger lexicons are more likely to efficiently extract relevant information, i.e. from the constraining verb in our experiment, and inhibit the contextually irrelevant distractor despite it being locally coherent with the incoming lexical information. Vocabulary has long been used as a proxy for language proficiency (Treffers-Daller & Silva-Corvalán, 2016; Lemhöfer & Broersma, 2012), and our results add to the literature showing how knowing more words can affect sentence processing strategies and allocation of attention. Given that the domain we are investigating is highly dependent on lexically based processing, it stands to reason that vocabulary size would be a primary predictive factor for individual differences. As such, it should not be interpreted that vocabulary size is a holy grail of sorts to explain HS individual differences par excellence, but rather that it is a good candidate as one such variable for processing that is highly based on lexical constraints.

Returning to our predictions regarding the juxtaposition of the two effects, as anticipated, the factor that emerged as crucial for the verb semantics effect was not tied to language use and exposure, but instead was linked to participants’ domain-general cognitive abilities, specifically cognitive control. This finding underscores the idea that verb semantic constraints, being conceptually universal, are robust across varying levels of language experience and are less susceptible to the effects of reduced input or engagement. On the other hand, the factor that significantly influenced the interference effect was indeed related to participants' language abilities, reinforcing the notion that this effect is more sensitive to individual differences in language use and experience. These results contribute to the broader literature on predictive processing by highlighting the differential impact of cognitive and linguistic factors on the use of predictive cues (Huettig & Janse, 2016; Kaan & Grüter, 2021; Kuperberg & Jaeger, 2016).

One rather surprising finding was that our gender manipulation did not have any effect on participants’ attention. There can be two explanations for this. First, it could be that HSs (or at least participants in our sample) do not use gender information predictively as an additional cue to rule out contextually irrelevant information. Second, there could be a methodological caveat which prevented us from detecting this relationship: the lexical interference effect is rather small (around 5%), and thus does not leave much room for the exploration of individual differences. Follow-up work utilizing more sensitive designs with expected larger effect sizes should further investigate this relationship in HSs. For example, one line of follow-up work could look at whether HSs of Russian use gender predictively and, if so, under what specific linguistic conditions and/or under what individual differences pertaining to their personal language experiences. Doing so, would help to align this and related work that capitalizes on processing of grammatical gender as a potential secondary cue with the literature that focuses primarily on gender acquisition/processing itself in HLs in general and Russian specifically (Fuchs et al., 2022a; 2022b; Di Pisa et al., 2022; Mitrofanova et al., 2018; Mitrofanova et al., 2022; Polinsky, 2008; Sekerina et al., 2006; Sekerina, 20122015).

Conclusions

In summary, this study examining 68 HSs of Russian in North America, showed that participants exhibited anticipatory processing and were susceptible to lexical interference. The analysis of individual differences underscored the non-uniform nature of these effects. Vocabulary and domain-general cognitive control emerged as important factors influencing lexical interference and prediction, respectively, among Russian HSs. These findings contribute to a deeper understanding of the interplay between cognitive mechanisms and language processing in HSs, shedding light on the multifaceted nature of bilingual experience.