The extent to which unconscious learning can guide humans’ pursuit of rewards is a matter of intense controversy (Hassin, 2013; Newel & Shanks, 2014; Reber, 1993, 2022). As with other phenomena claimed to be sustained by unconscious knowledge (e.g., fear conditioning, evaluative conditioning), while some initial studies favored the possibility of unconscious knowledge guiding humans’ adaptive interaction with stimuli that lead to rewards (Atas et al., 2014; Bechara et al., 1997; Pessiglione et al., 2008), this phenomenon has been contested especially in recent years (Newell & Shanks, 2014; Skora et al., 2021a,b). This capacity of unconscious cognition to guide these adaptive interactions has been studied mainly in two areas of research: instrumental conditioning and reward learning.

In instrumental conditioning paradigms, participants learn that different responses to different stimuli bring different outcomes. For example, a Go response in the presence of stimulus A brings a reward, but a Go response to stimulus B brings a punishment. A No-Go response to either stimulus brings a neutral outcome. Hence, in order to collect the rewards and avoid the punishments, participants have to learn that the Go response is adaptive for stimulus A, and the No-Go response is adaptive for stimulus B. Pessiglione et al. (2008) found that participants learned to respond adaptively, in the manner described above, to two stimuli, of which one led to a monetary reward and one to a monetary loss. Importantly, this instrumentally conditioned effect appeared even though the two stimuli have been exposed subliminally, hence were not consciously perceptible by participants (Atas et al., 2014; Mastropasqua & Turatto, 2015). However, Skora, Yeomans, et al. (2021a), adapting this task by employing individual exposure durations and more sensitive trial by trial awareness checks for ensuring subliminal perception, found reliable evidence for the absence of unconscious conditioning (Reber et al., 2018; Skora, Livermore, et al., 2021b).

In reward learning tasks, participants are typically required to select, from several available options, the ones that are associated, in the long run, with the highest payoffs. However, to delay the conscious detection of the advantageous options, the outcomes for each option contain some degree of noise and uncertainty. For example, an option that in the long run leads to more gains than losses, occasionally will provide a loss. In an influential study that employed this approach, Bechara et al. (1997) asked participants to select among four card decks, of which two were advantageous and two were disadvantageous. They also probed participants’ awareness of the optimal selection strategy, first after 20, then after every 10 trials. They found that participants were able to preferentially select advantageous card decks, even before being aware of the outcomes associated with the available decks. Hence, they concluded that participants’ adaptive choices can be guided by unconscious knowledge. On the other hand, subsequent studies that used more sensitive, trial-by-trial, awareness measures found that advantageous selections are closely linked to awareness of the advantageous options (Maia & McClelland, 2004; see also Newell & Shanks, 2014, for a discussion).

Producing unawareness

As can be intuit from the previous section, in unconscious learning and conditioning studies, there are two general experimental approaches for stimulating unconscious processing. A first approach is to prevent the conscious perception of the predictive (conditioned) stimulus, during the conditioning procedure. Most recent studies that have exposed subliminally the conditioned stimuli, using improved methods for ensuring the unconscious character of exposure, have found evidence against instrumental learning outside of awareness (Reber et al., 2018; Skora et al., 2021a, b; see also Heycke & Stahl, 2020; Högden et al., 2018 for similar results obtained in other conditioning paradigms; contrast Greenwald & De Houwer, 2017). However, during subliminal exposure the stimulus is presented under conditions that are suboptimal for visual information processing. For instance, in masking studies the stimulus is exposed for a very short duration (e.g., 16 ms, 33 ms) and is immediately preceded and/or followed by the “mask,” which is another visual stimulus exposed for a longer duration, supraliminally (Skora, Yeomans, et al., 2021a). In crowding procedures, the stimulus is exposed parafoveally, in a low resolution zone of the retina, and is surrounded by distractors. The low resolution makes the target stimulus difficult to differentiate the from the flanking distractors (Atas et al., 2014). Finally, during continuous flash suppression, the target stimulus is exposed to the nondominant eye, while the dominant eye is presented with a high contrast, flashing, pattern. This flashing pattern interferes with the processing of the target stimulus and prevents it from entering awareness (Skora, Yeomans, et al., 2021a). However, these procedures might not only render the stimulus unconscious but likely they also diminish the amount of information about the stimulus that reaches the cognitive system. Consequently, it is possible that the information that reaches the cognitive system is insufficient for forming a representation of the subliminal stimulus of sufficient quality to support adaptive, selective, action (Peters et al., 2017; Reber et al., 2018; Sweldens et al., 2014; Timmermans & Cleeremans, 2015).

The second approach for studying unconscious conditioning consists in exposing participants to supraliminal stimuli but making the relations between the conditioned stimuli and the outcomes more difficult to detect or remember consciously. This is typically achieved by employing noisy, relatively complex regularities that are difficult to represent in the working memory or that are difficult to be accurately remembered consciously. However, typically in these experiments, some participants are aware of (parts of) the regularity. Hence, some conscious participants or conscious trials need to be excluded. This approach has been employed less often in studies on unconscious instrumental learning (Bechara et al., 1997; but see Maia & McClelland, 2004) but has been extensively used in other fields of research on unconscious learning processes, such as evaluative conditioning (Jurchiș et al., 2020; Olson & Fazio, 2001; Waroquier et al., 2020), Pavlovian conditioning (Leganes-Fonteneau et al., 2018, 2019), visuo-motor sequence learning (Kóbor et al., 2017; Fu et al., 2010) or learning of artificial grammars (Reber, 1967, 2022; Norman et al., 2019). The artificial grammar learning task (AGL) has been one of the most often employed experimental paradigms for investigations on implicit learning. In AGL, participants are exposed to multiple letter strings that follow a hypercomplex regularity, called “artificial grammar” (Fig. 1), which is difficult to detect consciously.

Fig. 1
figure 1

Artificial grammars used in the present study (Norman et al., 2019; Reber, 1967). The letter strings are generated by following the arrows. For example, the string VVTRVM follows grammar A and VVTRXM follows grammar B

In an acquisition phase, participants are typically told to memorize several letter strings, but nothing is disclosed about the existence of a grammar embedded in the strings. In a subsequent test phase, they are able to discriminate new strings that follow the grammar from strings that violate the grammar (indicating they have learned the grammar), even when they report not being aware of the grammar (Dienes & Scott, 2005; Norman et al., 2016, 2019; Scott & Dienes, 2008, 2010; Ivanchei & Moroshkina, 2018; contrast Shanks, 2005). Most of the evidence for unconscious learning in the AGL task has been produced, in the past decades, by studies that employ sensitive trial by trial measures of awareness, while also minimize the risk of some allegedly unconscious trials being substantially contaminated by conscious knowledge (Jurchis & Dienes, 2022; Shanks, 2017; Shanks et al., 2021). These include studies in which a clear majority of the trials are attributed to unconscious knowledge (Dienes & Scott, 2005; Scott & Dienes, 2008; Ivanchei & Moroshkina, 2018; Jurchiș et al., 2020; Norman et al., 2019; see Jurchis & Dienes, 2022 and Skora et al., 2020, for rationale), in which participants’ conscious accuracy is lower than their unconscious accuracy (Norman & Price, 2012; Scott & Dienes, 2010) or in which concurrent, independent, measures of awareness provide evidence of unconscious knowledge (Jurchis & Dienes, 2022; Norman et al., 2019).

Present study

We tested whether instrumental responses can be sustained by unconscious knowledge of reward-predictive information by creating a paradigm based on learning complex regularities that are difficult to detect consciously. This approach has been successful in producing unconscious learning in a variety of research areas and could constitute, in the context of instrumental learning, an alternative to the common approach of exposing the predictive stimuli subliminally. In this task, we expect participants to implicitly learn two artificial grammars and associate one of the two grammars with rewards. Furthermore, we expect that participants will be able to use this implicitly learned knowledge instrumentally, that is, for acquiring additional rewards. Our expectations are based on the previous studies showing that participants can learn, largely implicitly, two different grammars (Dienes et al., 1995; Norman et al., 2011, 2016, 2019; Wan et al., 2008) and also that they can associate one of the grammars with a positive and the other with a negative valence, in an evaluative conditioning paradigm (Amd, 2022; Jurchiș et al., 2020).

The learning/conditioning phase of the task is based on a combination between the two-grammar learning AGL design (Dienes et al., 1995; Norman et al., 2019) and a task-irrelevant conditioning procedure (Leganes-Fonteneau et al., 2018, 2019). For each trial, participants first memorize a string (the target string). Unknown to them, the target string follows one of two possible artificial grammars. Subsequently, the target string is exposed again, together with a distractor string, and the participant has to correctly identify which one is the previously seen target string. After some of their correct responses, participants receive rewards; but, unknown to them, they receive rewards only when the target string follows a specific artificial grammar (e.g., grammar B). When they respond correctly to a target string from the other grammar (e.g., grammar A), they receive neutral feedback (Fig. 3).

In a subsequent test phase, they are exposed, on each trial, with a new string from the rewarded grammar and a new string from the unrewarded grammar. Their task is now to select the string that is the most likely to bring rewards (i.e., the string from the rewarded grammar). Using an awareness measure widely employed in implicit learning and conditioning (Dienes & Scott, 2005; Norman et al., 2019; Waroquier et al., 2020), we assess the conscious/unconscious status of their structural knowledge, that is, knowledge of which are the structural elements (i.e., the features of the grammars) associated with rewards. The measure also captures the conscious/unconscious status of their judgment knowledge (i.e., whether they are aware of which string is likely to bring the reward). Note that, while this distinction between structural and judgment knowledge has been highly influential in a wide variety of implicit learning paradigms (e.g., AGL, visuo-motor sequence learning, learning of conjunctive rules, evaluative conditioning; see Dienes, 2012; Waroquier et al., 2020), it has not yet been applied for disentangling the conscious/unconscious bases of instrumental responses.

If participants learn the grammars in the learning/conditioning phase, and learn which grammar leads to rewards, we expect them to choose more strings from the rewarded than from the unrewarded grammar. Importantly, we expect that this will be the case when participants acquire unconscious knowledge of the grammars leading to unconscious judgments (Guessing), unconscious knowledge leading to conscious judgments (Intuition and Familiarity), and conscious knowledge of the grammars leading to conscious judgments (Rules, Remembering) (Dienes & Scott, 2005). Our approach for measuring the conscious and unconscious status of knowledge is detailed in what follows.

Conceptualization and measurement of awareness

Because discriminating between conscious and unconscious knowledge is a highly sensitive issue, we discuss in more detail the rationale behind our measure. First, because any awareness measure presupposes a subjacent theory, we used a measure of awareness grounded in the dominant theories of consciousness (Dienes, 2012; Dienes & Scott, 2005; Jurchiș et al., 2020; Waroquier et al., 2020). The higher-order thought theory of consciousness is built on the principle that, for being aware of an information, one needs to have a meta-representation of having that information; that is, one needs to know that one knows the information and, consequently, one should be able to report the information (Dienes, 2012; Rosenthal, 2004). The global workspace/global availability theory (Baars, 1997; Shea & Frith, 2019) assumes that information is firstly processed locally, unconsciously, by specialized modules. However, information that reaches the global workspace becomes conscious and globally available to all relevant processing modules, including those responsible for communicating the information. Hence, conscious information is identifiable and reportable by the person. Accordingly, we use a subjective measure of awareness, which asks participants to discriminate and report on their own mental states (Dienes, 2012; Dienes & Scott, 2005; Waroquier et al., 2020, for previous discussions of these arguments). Alternatively, one could use a performance-based, “objective,” measure of awareness, which assume that subjective measures can be insensitive and that participants’ conscious knowledge can be better indexed by them having objective above chance performance in some specifically designed tasks. However, given the bulk of research showing that participants can be objectively accurate when they, subjectively, claim to have no conscious knowledge, the assumption of process-purity is very difficult to substantiate (Timmermans & Cleeremans, 2015; Waroquier et al., 2020). Although there is no perfect subjective or objective measure of awareness, carefully designed and sensitive subjective measures are favored by a strong majority of consciousness researchers (Francken et al., 2021) and accepted even by authors that typically favor objective measures (Newell & Shanks, 2014).

Second, a valid and sensitive measure of awareness needs to specify what type of knowledge it attempts to capture. Specifically, Dienes and Scott (2005) have shown that in many implicit learning tasks and instances there are two main types of knowledge at work. The first type of knowledge, called structural knowledge, consists of representations of the learned regularities; for instance, of the configuration of an artificial grammar, or of the contingency between a stimulus (a card deck or some patterns of letters) and an outcome (a reward). Based on the knowledge of the regularity, one can judge whether a particular stimulus conforms to the regularity; this is called judgment knowledge (e.g., “Probably this string follows the previous grammar”; or “This string is likely to bring a reward”). When participants learn the regularity consciously (e.g., “I remember M appearing after X in the rewarded strings”), they can judge consciously whether a new stimulus conforms to the regularity (e.g., “…hence I believe the string XMVTRM will bring the reward”). However, when participants learn the regularity unconsciously, acquiring unconscious structural knowledge, their judgment knowledge can be either conscious or unconscious. Conscious judgment knowledge based on unconscious structural knowledge is typically experienced as a feeling of intuition or familiarity (e.g., “I (consciously) feel that this string follows the rewarded grammar, but I have no idea why”). Unconscious judgment knowledge is typically experienced as a guess or random response (“I have no idea whether the string follows the grammar and I responded at random”). In summary, when participants acquire conscious knowledge of the regularities, they can judge consciously whether a stimulus follows the rules or not, and are aware of using those rules; when they acquire unconscious structural knowledge, they can have conscious judgment knowledge, experienced as a feeling, or can have unconscious judgment knowledge, experienced as a guess (Fig. 2).

Fig. 2
figure 2

Conscious/unconscious status of structural and judgment knowledge. Structural knowledge refers to knowledge of the acquired regularity. Judgment knowledge refers to whether a particular stimulus follows the regularity or not. Conscious structural knowledge leads to conscious judgment knowledge, experienced as using conscious Rules or Remembering. Unconscious structural knowledge may lead to conscious judgment knowledge (Intuition or Familiarity) or unconscious judgment knowledge (Guessing). Adapted from Mealor and Dienes (2013)

Participants’ awareness of their structural and judgment knowledge is assessed by asking them to report, after each response (e.g., whether a string follows the grammar or not), what was the basis of their response, choosing, from Guess (having no idea whether it follows the grammar, and responding at random), Intuition/Familiarity (having a feeling that it follows the grammar, but having no idea what it is based on), or Rules/Remembering (consciously knowing the Rule and responding accordingly). This method has been most commonly applied in AGL, where it has shown that participants acquire, not only conscious, but also unconscious (structural) knowledge of the grammars, because they respond accurately when they rely on unconscious structural knowledge (Guess, Intuition, Familiarity) (Dienes & Scott, 2005; Jurchis & Dienes, 2022; Norman et al., 2016, 2019; Norman & Price, 2012; Scott & Dienes, 2008, 2010). Similar results with this method have been found in evaluative conditioning (Jurchiș et al., 2020; Waroquier et al., 2020), implicit sequence learning (Fu et al., 2010, 2018; Zhang & Liu, 2021), language learning (Paciorek & Williams, 2015; Zhao et al., 2021), symmetry learning (Jiang et al., 2012; Ling et al., 2018), learning of conjunctive rules (Neil & Higham, 2012, 2020), or implicit social learning (Costea, 2018; Costea et al., 2022). See Dienes (2012) for a summary of empirical data supporting the validity of this measure.

Methods

We preregistered the hypotheses, procedure, data collection, and statistical analyses before data collection here: https://osf.io/jy8am. Raw and processed data can be accessed at https://osf.io/v4b5t/. The complete list of stimuli is available in the supplementary material. The study has been conducted in accordance to the Babeș-Bolyai University’s and the APA Ethical guidelines.

Participants

Initially, 223 participants were enrolled in the study, in exchange for the chance to win one of four prizes, contingent on performance: the first place, 100 RON (roughly 20 euro, representing slightly more than 7% of the local minimum monthly net wage); the second place, 75 RON; and the third and fourth places, 50 RON each. Participants were recruited mostly from social media groups of local universities but also from the community. However, 12 participants were excluded for failing the attention/engagement checks, according to our preregistered criteria (i.e., more than 15% timeouts or mistakes in the learning/conditioning phase or more than 10% timeouts in test). There were 211 participants (mage = 26.90 years, SD = 8.72, 176 females) included in the analyses. The sample size was determined with a preregistered optional Bayesian stopping rule: we gathered participants until the Bayes factor (B) indicates strong support either for or against our hypothesis (i.e., B ≥ 10 or B ≤ 1/10), that participants will be accurate when relying on unconscious structural knowledge that leads to conscious judgments. At 80 participants we had already gathered the required evidence (i.e., B > 10), but the Bs were insensitive for most of the other hypotheses. Hence, we decided to recruit as many participants as we could to gather more robust evidence either for or against our secondary hypotheses.

Materials

We employed two artificial grammars commonly used in AGL studies (Dienes & Scott, 2005; Reber, 1967; Fig. 1) and used the same learning and test strings used in most AGL studies in which participants learn two grammars (Amd, 2022; Dienes et al., 1995; Jurchiș et al., 2020; Norman et al., 2011, 2016, 2019; Wan et al., 2008). The two grammars have the same possible starting and ending letters, and the length of the strings is balanced between grammars. Specifically, we used 32 learning strings from grammar A and 32 from grammar B, and 20 test strings from each grammar. As rewards, in the learning/conditioning phase, we used token points that could lead to a financial reward (signaled by the message “You win 1 point!”) together with emotionally positive images. As neutral feedback, we exposed the message “You win nothing,” together with an emotionally neutral image. As positive images, we selected from the OASIS database (Kurdi et al., 2017) the 30 most positive images (mvalence = 6.23, range 6.11–61.49), and as neutral images, the 30 images that were the closest from the neutral point of the valence scale in the validation study (mvalence = 4.02, range = 3.99–4.06). See the supplementary material for a complete list of stimuli.

Procedure

The experiment was conducted online, using gorilla.sc (Anwyl-Irvine et al., 2020). Participants were told that they would undergo tasks that investigate “some cognitive processes involved in memory and decision” and that some of their correct responses in these tasks will bring rewards (points and positive images). They also were instructed that the participants with the highest number of points gained throughout both tasks would be awarded one of four available monetary prizes, mentioned above. Nothing was disclosed, at the beginning of the task, about the existence of regularities or of other information that needs to be learned.

Learning/conditioning phase

Participants saw, on each trial, on the first screen, one target string from Grammar A or B. The target string was exposed for 5 seconds, and participants had to try to memorize the string. On the next screen the same string appeared, together with a distractor string from the same grammar, and participants had to choose, in less than 14 seconds, which was the previously seen target string. If participants chose incorrectly the feedback “INCORRECT” appeared for 1,000 ms. If participants chose correctly, unknown to them, the feedback was contingent on the grammar followed by the string. When the string obeyed the rewarded grammar, correct responses were followed by the reward screen (consisting of one positive image, the message “You win 1 POINT!,” the number of points earned, and the target string, all exposed for 4,500 ms; Fig. 3). When the string obeyed the unrewarded grammar, correct responses were followed by the neutral feedback screen (consisting of a neutral image, the message “You win nothing,” plus the target string, all exposed for 4,500 ms). Then, a 500-ms blank appeared and the next trial began. For counterbalancing, for half of the participants, grammar A strings were paired with rewards, and grammar B strings were followed by neutral feedback. For the other half of participants, grammar A was associated with neutral outcomes and grammar B with rewards. For each participant, it was randomly determined which grammar was associated with which type of outcome; hence, it was counterbalanced which was the rewarded and which the unrewarded grammar.

Fig. 3
figure 3

Structure of a learning/conditioning trial. First, participants see the target string; then, they have to discriminate the target string from a distractor. If they respond incorrectly, the feedback “INCORRECT” appears (screen b). If they respond correctly, they receive a reward (screen a) if the target string followed the rewarded grammar or a neutral feedback (screen c) if the target string followed the unrewarded grammar

The learning/conditioning phase was divided in two blocks. Each block consisted of 64 trials. In each trial we presented, on the first screen, 1 of the 32 different strings from grammar A or 1 of the 32 different strings from grammar B. The strings were the same in both learning/conditioning blocks. Therefore, throughout the entire learning/conditioning phase, participants saw each string two times as target string. Also, each string acted once per block as the distractor string. Rewarded and unrewarded trials (hence, the strings from the two grammars), alternated pseudorandomly. That is, participants saw several consecutive rewarded trials and several consecutive unrewarded trials, so that they might learn better the two grammars (see the supplementary material). After each learning/conditioning block, participants had a 30-second break.

Test/instrumental phase

Participants were instructed that the strings that were rewarded have been constructed following a complex set of rules, while the strings that were not rewarded were built following a different complex set of rules. Nothing was disclosed about the nature or configuration of the rules. Furthermore, participants were told that they will see new strings, some more similar to the strings that were rewarded, and that will bring further rewards (i.e., token points that will establish whether they receive a financial prize or not); they will also see some strings similar to the strings that were not rewarded, which will not bring any rewards. Their task was to choose as many rewarded strings as possible, in order to gather as many points as possible. They did not receive feedback on their performance after each trial, but only at the end of the task, in order to eliminate the possibility of conscious hypothesis testing. Hence, we presented participants with new strings: 20 strings followed the grammar paired with rewards and 20 strings followed the grammar associated with neutral feedback. On each trial, one string from the rewarded and one from the unrewarded grammar appeared, and participants had 12 seconds to choose one of them. After each choice, participants had to report their awareness level using the following awareness measure.

The awareness measure

For determining the awareness of structural and judgment knowledge, after choosing the string they thought would bring the reward, participants were asked to report what was the basis of their response, choosing from Guess/random response, Intuition, Familiarity, Rules, and Remembering, and were provided with the definitions from Table 1, before and throughout the entire duration of the test phase. Participants had no time limit for reporting their level of awareness (Fig. 4).

Table 1 Response options for reporting awareness and the corresponding definitions
Fig. 4
figure 4

Structure of a test trial

Because Subjective, self-reported, measures of awareness often were criticized in the unconscious learning and conditioning literature, due to the relatively low validity of some specific measures used (Newell & Shanks, 2014; Sweldens et al., 2017). Consequently, we took several specific measures for ensuring the accuracy of the awareness measure. To this end, we closely followed the immediacy, sensitivity, information (relevance), and reliability criteria, which have been developed for ensuring the validity of awareness scales (Berry & Dienes, 1993; Newell & Shanks, 2014; Shanks, 2005; Shanks & John, 1994; Sweldens et al., 2014, 2017). First, participants reported their awareness level immediately after the behavioral response (immediacy). For ensuring sensitivity: participants reported awareness while the two strings were still on the screen, the response to the awareness measure required the same amount of effort as choosing the rewarded string (one mouse click), and, crucially, we stressed the fact that participants should report as conscious knowledge (“rules”/“remembering”) even knowledge that is incomplete, partial, and that they are unsure of. For satisfying the information criterion, the definitions of the response variants did not restrict in any way the type of knowledge they can report as conscious; they could refer to any type of conscious rules, fragments, or any consciously remembered information. Finally, the reliability criterion specifies that the awareness measure should not be influenced by extraneous factors, such as demand characteristics, desirability, or various sources of noise. Participants had no apparent motivation to report conscious knowledge as unconscious, as we insisted that it is essential to report the awareness of their knowledge as accurately as possible, and, also, that they should report as conscious even knowledge that is incomplete, and they are unsure of, it was robust to an inflation of unconscious accuracy due to regression to the mean effects, because, as we show in the Results section, the majority of trials were attributed to unconscious knowledge (Jurchis & Dienes, 2022; Skora et al., 2020) and conscious knowledge was not more reliable than unconscious knowledge (Shanks, 2017).

In summary, participants first underwent the learning/conditioning phase, then the test phase, in which, on each trial, they had to choose which string is likely to bring the reward and, also, to report the conscious or unconscious basis of their choice.

Results

Our inferences are based on Bayesian analyses, although we also report the corresponding significance tests. For the Bayesian analyses (Dienes, 2016, 2021), we interpret Bayes Factors (Bs) between 0.33 and 3 as insensitive, from 3 to 10 as providing moderate evidence, and B ≥ 10 as strong evidence for the alternative hypothesis. Conversely, we interpret Bs between 0.33 and 0.10 as moderate and B ≤ 0.10 as strong evidence for the null. For most analyses, we use a preregistered half-normal prior distribution with the mean of 0 and the SD corresponding to an expected choice accuracy in the test phase of 0.55 (which represents an effect of 0.05 above the chance level of 0.50; hence, noted as BFH[0; .05]). The expected effect was derived from an unpublished study in our lab which used the same grammars, but in a paradigm in which participants directly selected between two grammars and received trial by trial feedback (rewards or punishment). In this study, in which participants had ample opportunities to test conscious hypotheses, their performance was 0.6 (i.e., 0.1 above chance). Given the more incidental nature of our task, we expected an effect that is roughly half of the original one (i.e., 0.05 above chance). We also report robustness regions (RR), which indicate the range of SDs for the prior which yield the same qualitative results.

Learning/conditioning phase

The overall accuracy in this phase was 0.97 (SD = 0.02), indicating that participants were engaged with the task and correctly discriminated the target from the distractor string in 97% of the trials.

Test phase

Participants recorded, on average, 0.22 timeouts (SD = 0.46), representing 0.01% of the total 20 test trials, indicating that they were engaged with the task and, generally, responded on time. The overall performance in selecting the correct string (i.e., the string from the previously rewarded grammar), was M = 0.534, SD = 0.138, which was significantly above the chance level of 0.50, BH[0; .05] = 180.51, robusteness region (RR) [0.004; 0.50], t(210) = 3.52, p < 0.001, Cohen’s d = 0.25, 95% confidence interval (CI) [0.015; 0.052]. Hence, participants have learned which grammar was associated with rewards.

Regarding the awareness measure, 69.93% of participants’ responses were based on unconscious structural knowledge, similar to typical AGL studies (Dienes & Scott, 2005; Norman et al., 2019). More specifically, most responses were based on Intuition or Familiarity, M = 0.530, SD = 0.266, followed by Rules or Remembering, M = 0.307, SD = 0.287 followed by Guessing, M = 0.163, SD = 0.192.

When examining whether participants acquired unconscious and conscious knowledge, we found, first, strong evidence for unconscious structural knowledge: for Guess, Intuition, or Familiarity pooled together, participants had above chance performance, M = 0.532, SD = 0.172, BH[0; .05] = 14.28, RR [0.006; 0.29], t(205) = 2.69, p = 0.004, d = 0.19, 95% CI [0.009; 0.056]. When analyzing only responses based on unconscious structural knowledge that led to conscious judgment knowledge (Intuition and Familiarity), we found, again, strong support for above-chance performance, M = 0.541, SD = 0.205, BH[0; .05] = 28.69, RR [0.006; 0.66], t(202) = 2.87, p = 0.002, d = 0.20, 95% CI [0.013; 0.070]. Regarding unconscious judgment knowledge (accuracy based on Guess), we found evidence against accurate responding, bordering the 0.33 B threshold, M = 0.487, SD = 0.320, BH[0; .05] = 0.34, RR [0.05; + ∞], t(150) = −0.502, p = 0.692, d = −0.04, 95% CI [−0.066; 0.039]. In a direct, non-preregistered, comparison, we also found that participants’ accuracy when relying on Intuition and Familiarity (together) was higher than their accuracy when Guessing, mdiff = 0.057, SEdiff = 0.031, BH[0; .05] = 3.36, RR [0.05; 0.08], t(142) = 1.84, p = 0.067, d = 0.15, 95% CI [−0.044; 0.054]. This comparison shows that when participants are not aware of their structural knowledge, their accuracy is higher if their judgment knowledge is conscious than when it is also unconscious.

Data were insensitive when testing whether, in addition to above-chance performance supported by unconscious structural knowledge, participants also exhibited accurate performance based on conscious structural knowledge (they relied on Rules or Remembering), M = 0.527, SD = 0.285, BH[0; .05] = 1.32, RR [-∞ 0.25], t(166) = 1.207, p = 0.115, d = 0.09, 95% CI [−0.017; 0.070] (Fig. 5). Furthermore, we tested whether there was a difference in accuracy when participants relied on unconscious versus conscious structural knowledge, and data were insensitive Mdiff = 0.005, SEdiff = 0.025, BH[0; .05] = 0.53, RR [0.001; 0.08], t(161) = 1.207, p = 0.839, d = 0.02, 95% CI [−0.044; 0.054].

Fig. 5
figure 5

Participants’ accuracy split on response bases. BF denote Bayes Factors for comparing accuracy against the chance level of 0.50. Error bars indicate the 95% confidence intervals for the mean

As an additional index of awareness, we tested the non-preregistered directional hypothesis that participants are more accurate in selecting the string from the rewarded grammar when they think they know the grammar (Rules, Remembering) than when they think they do not know it (Guess, Intuition, Familiarity). If this is not the case, it means that their conscious metacognitive knowledge about the grammar is inaccurate; hence that their objective performance is not sustained by conscious, but by unconscious, knowledge. Indeed, we found moderate evidence that they were not more accurate when they thought they knew the grammar, BU[0; 0.11] = 0.22, RR [0.09; + ∞]. This provides additional evidence for unconscious knowledge sustaining performance in this task.Footnote 1 The fact that in our task conscious knowledge did not outperformed unconscious knowledge also precludes the possibility that, due to random measurement noise, the accuracy of unconscious trials could have been inflated by a contamination with incorrectly classified conscious trials (Shanks, 2017).

When checking whether the counterbalancing factor (i.e., whether grammar A or grammar B was associated with rewards) had any influence on the results, we found, unexpectedly, that participants had better performance in the grammar A rewarded condition (M = 0.58, SD = 0.12), compared with the grammar B rewarded condition (M = 0.49, SD = 0.14), and the difference was significant t(209) = 4.78, p < 0.001, d = 0.65, 95% CI [0.017; 0.070]. This indicates that participants had a preference for grammar A. However, if only this preference for grammar A would operate and no learning occurred in our task, participants’ above chance accuracy in one condition would have been canceled out by a proportional below chance performance in the other condition. Hence, the pooled accuracy in the two conditions can surpass the chance level only if learning also is present, in addition to the preference for one of the grammars.

Discussion

We devised a novel paradigm for investigating unconscious instrumental responding effects based on unconscious knowledge, in which participants learned to associate, incidentally, a complex grammar with a reward and used this knowledge in order to obtain rewards. In this task, roughly 70% of participants’ responses were based on unconscious structural knowledge. Importantly, we found strong evidence that this unconscious structural knowledge was accurate, hence, that participants acquired unconscious knowledge of associations between elements of the grammar and the rewards. Even more specifically, they were accurate when the unconsciously acquired structural knowledge led to conscious judgment knowledge: that is, when they consciously knew which string is more likely to bring the reward but were not aware of what feature of the string makes it bring the reward. In contrast, we found some “weak to moderate” evidence that when unconscious structural knowledge did not produce conscious judgments, participants’ accuracy was at chance. It appears, thus, that in the context of this paradigm, unconscious knowledge can produce accurate judgments only to the extent to which it leads to conscious judgments. When participants thought they are aware of the structures leading to the rewards, they did not have above chance accuracy, but we could not establish either that their accuracy was at chance. A possible explanation for the high level of noise in the conscious responses could be the relatively small number of conscious trials. Also, it might be possible that, while participants indeed acquired some accurate conscious knowledge, they also used inaccurate conscious information that was a result of post-hoc confabulation and hypothesis generation.

Although conducted in a different paradigm, the results of the present study present a different picture compared to most recent studies on subliminal instrumental and evaluative conditioning (Heycke & Stahl, 2020; Skora et al., 2021a, b), which found that conditioning does not occur when the stimuli are exposed subliminally. The conclusion emerging from these studies point towards the necessity of consciousness for learning the appetitive value of stimuli. As explained in the introduction, a failure to find conditioning effects for unconsciously (subliminally) presented stimuli might mean that consciousness is necessary for learning and conditioning. However, it also might mean that the representation in the cognitive system of unconsciously presented stimuli is too degraded to be further processed or to sustain selective action (Peters et al., 2017; Sweldens et al., 2014; contrast, Scott et al., 2018). The present paradigm, although investigating a related phenomenon, is not directly contrastable with instrumental conditioning tasks in another aspect that may explain our different results compared with previous studies. In typical instrumental conditioning studies, the rewards follow an approach behavior (e.g., a Go response) made in the presence of a reward-predictive stimulus; an avoidance behavior (e.g., a no-Go) in the presence of the same stimulus leads to no reward. In the presence of a punishment predictive stimulus, the approach response leads to a punishments, while the avoidance response prevents the punishment (Pessiglione et al., 2008; Skora et al., 2020). Consequently, adaptive behavior in these tasks requires a more complex form of learning, which involves associating different responses to different stimuli. In contrast, in the learning/conditioning phase of the present study, receiving the reward was not predicted by different types of responses made by participants, but only by the grammar of the string they responded to. Only in the test phase receiving additional rewards required differential, instrumental, responses that were informed by the previously acquired knowledge of the appetitive value of the grammars.

On the other hand, the present results are consistent with recent data from Pavlovian and evaluative conditioning which attempted to stimulate unconscious processing not by subliminal exposure, but by employing regularities that are difficult to detect or to remember consciously. Leganes-Fonteneau et al. (2018, 2019) had participants detect the color of a target stimulus and rewarded them for correct responses in a hidden, task-irrelevant, manner: they received rewards in 90% of the trials in which the target stimulus was superimposed on a square, but only in 10% of the trials in which it was superimposed on an octagon (or vice-versa). When analyzing only participants that did not develop accurate conscious expectancies for the rewards, Leganes-Fonteneau et al. found that rewarded stimuli received more attention in a subsequent emotional attentional blink task, hence showing evidence of unconscious Pavlovian conditioning. See also Waroquier et al. (2020) and Jurchiș et al. (2020) for similar results obtained in evaluative conditioning paradigms. However, in these Pavlovian and evaluative conditioning paradigms, participants were not required to make instrumental responses. On a broader level, the results of the present study are consistent with the bulk of research from other implicit learning paradigms that use complex regularities (e.g., AGL, sequence learning), which provide often replicated evidence that unconscious knowledge supports adaptive responses to the task requirements (Scott & Dienes, 2010; Norman et al., 2019; but see Shanks, 2005, for a critique of these paradigms). The relative success in producing evidence for unconscious effects of paradigms based on complex regularities, compared to those based on subliminal processing, might also be attributed to the different levels of ecological validity of these two approaches. While there may be few instances in which we are required to learn about briefly presented or masked stimuli in the real life, there are instances in which our behavior appears to be guided by regularities that we do not consciously know or remember. For example, a native speaker of a language can immediately judge that a specific phrase is grammatically incorrect, even without being able to articulate what regularity has been violated (Zhao et al., 2021). Or, in the motor domain, a person can accurately anticipate the trajectory of a flying object without any conscious knowledge of the regularities that enable this anticipation (Reed et al., 2010).

In the context of our task, the combination of unconscious structural knowledge and conscious judgment knowledge was the most robust form of knowledge (i.e., was used in the majority of the trials and was the only one reliably accurate). This could indicate that unconscious learning of differential regularities and of their adaptive character is robust under noise, when the relevant information is difficult to represent in the working memory (Dienes & Scott, 2005, experiment 2). However, the necessity of conscious judgments for accurate responses might indicate that adaptive selective action still requires conscious control (Baars, 1997; Jacoby, 1991; Skora, Yeomans, et al., 2021a; but see Scott & Dienes, 2010, for a situation in which guessing is a robust strategy). In this context, it is worth mentioning that a contribution of the present study is that it distinguished, likely for the first time in the context of instrumental responding, between awareness of structural and awareness of judgment knowledge.

Limitations and future directions

A limitation of the present study is that the rewards we used were relatively weak (positive images and points that could lead to monetary rewards) compared with those used in other studies (e.g., immediate monetary rewards; Pessiglione et al., 2008). It is possible that a reward of a higher magnitude could have led to more accurate responding, but also to more consciously based responding, by better mobilizing explicit processing resources. Future research could address whether the magnitude of the reward influences the accuracy of responding and the conscious/unconscious bases of responding. Another limitation of the study is that the effect sizes for accuracy were relatively weak compared with those from typical AGL studies. For stronger effects, future studies could increase the length of the learning/conditioning phase. One also might want to prevent more effectively the development of conscious knowledge. This could be achieved by inducing even more noise in the surface stimuli, for example, by randomly varying the font and the color of the letters composing the strings (Norman et al., 2011, 2019).

While the present study shows that unconscious knowledge of artificial grammars supports instrumental responses, the present design leaves open some questions about the nature of representations that support these effects. Past studies have found that participants are able to learn two grammars in two successive learning phases; they are then capable to judge flexibly which strings are grammatical relative to the first grammar and which are grammatical relative to the second grammar (Dienes et al., 1995; Norman et al., 2011, 2016, 2019). Thus, they hold comparable representations of the two learned grammars and are able to selectively deploy them to satisfy the task requirements. It follows then that in our study, participants could have held representations of the two grammars of similar strength and quality, but these representations also embedded contextual, reward-related, information; in other words, participants knew both grammars, but also knew which one brought the rewards and which did not, and chose accordingly. A related possibility is that participants could have learned both grammars but associated the rewarded one with a positive affect. In evaluative conditioning studies Amd (2022) and Jurchiș et al. (2020) have found that artificial grammars can be affectively conditioned. Thus, participants could have used the positive affect primed by the strings from the rewarded grammar as indicator that those strings would bring the reward.

A different possibility is that, given its superior adaptive value, participants’ formed representations of higher quality for the rewarded than for the nonrewarded grammar. Consequently, the former had a higher level of availability in participants’ memory, which could have made the strings that followed the rewarded grammar more familiar, relative to those from the nonrewarded grammar. Past studies have found that this “relative familiarity heuristic” is one of the main mechanisms used by participants to discriminate grammatical from nongrammatical strings in typical artificial grammar learning studies (Scott & Dienes, 2008, 2010). We suggest that a difference in the quality of representations for the two grammars could have made this heuristic viable in the present task. Of course, the mechanisms we propose are not mutually exclusive. Future studies could investigate their contributions by testing whether familiarity ratings or affective ratings for the strings predict probability to choose the strings; also, one could test whether the quality of representations for the two grammars is different, by including additional test phases. In these additional phases, participants could be asked to discriminate between strings from the rewarded grammar and strings that follow neither of the two previously seen grammars (rewarded and nonrewarded). Separately, they would be asked to discriminate between strings from the nonrewarded grammar and strings that follow neither of the two grammars. If participants would be more accurate in discriminating the strings from the rewarded grammar from the new category of strings, than they would be in discriminating between strings from the nonrewarded grammar and the new category of strings, it would mean that participants learned better the rewarded grammar.Footnote 2

Conclusions

The present study shows that participants learn unconsciously complex contingencies that lead to rewards. It also shows that the selective, instrumental, application of the reward-related unconscious knowledge might require conscious judgments.