Introduction

Cognitive flexibility is a component of human executive function that allows the individual to mentally shift between tasks, strategies, and rules (Knauft et al., 2021), it is devoted to adaptation: the individuals flexibly change strategies to adapt to the changing environment. Therefore, it is crucial to adopt an ecological perspective when studying cognitive flexibility by assessing this ability using tasks that are relevant to everyday life to consider possible moderating variables.

Toward this aim, in this work, we focused on the study of a specific task: The Reversal Learning Task, a neuropsychological task employed to define the ability of set-shifting through a neuropsychological assessment. This task also allows to explore the sensitivity to reward and punishment, two aspects that affect the ability to learn and adapt to reward contingencies. Our goal was to compare this task with one of the most widely recognized neuropsychological tests of cognitive flexibility, the Wisconsin Card Sorting Test (WCST; Grant & Berg, 1948; Heaton, Chelune, Talley, Kay, & Curtiss, 1993). Further, we explored construct validity and examined stress and gender as possible mediators that might influence this cognitive process.

Cognitive flexibility assessment: the WCST and the RLT

One of the most popular neuropsychological instruments of cognitive flexibility is the Wisconsin card sorting test–WCST (Lange et al., 2017), which assesses executive function (Miles et al., 2021; Sherman, Tan & Hrabok, 2020). The WCST measures a sub-component of the executive function, the ability of set-shifting (Kopp et al., 2020; Lange et al., 2017). In this task, participants are asked to pair the cards in a deck (response cards) with four target cards according to an unknown classification rule. The rule is learned by receiving a “correct” or “wrong” feedback after each card pairing. During the test, the classification rule changes several times without warning, requiring the development of a new classification strategy. Despite its popularity, the literature shows important shortcomings (Kopp et al., 2020). Critically, several scoring systems exist (Figueroa & Youmans, 2013) and the scoring systems are complex (Greve, 1993), which makes this measurement tool difficult to use. Thus, the WCST should be administered and interpreted with caution (Miyake & Friedman, 2012). Among the dependent variables obtained from the WCST, number of perseverative responses and perseverative errors are the most commonly used (Baker et al., 2018; Wollenhaupt et al., 2019), but their scoring is problematic since researchers apply different scoring rules and this results in discrepancies when comparing results across studies (Miles et al., 2021). To overcome this issue, recent studies recommend Heaton et al. (1993) method to score perseverative response (Miles et al., 2021).

Another widely used measure of cognitive flexibility is the Reversal learning task—RLT (Cools et al., 2001; Izquierdo et al., 2017; Kehagia et al., 2010). This assessment proposes a paradigm that is more similar to daily life tasks in comparison to the WCST task. Individuals are asked to learn the association between a given stimulus and reward (Learning phase), and abandon this association when the reinforcement contingencies change (Reversal phase). The reversal phase occurs several times during the task and the individual is required to respond accordingly (Raio et al., 2017). RLT also studies all the aspects that adapting to a changing environment comprises: it tests the ability to learn from the rewards received (as well as absence of reward, or reward omission) upon choosing different stimuli (Stalnaker et al., 2015), it estimates the likelihood or prior probability that reversals can occur (Costa et al., 2015), and it generates an understanding of task or option space (Wilson et al., 2014, Saez et al., 2015).

To fully explore the cognitive flexibility in ecologic environments, many variants of the RLT paradigm have been developed, with several details (e.g., nature of stimuli, nature of reward/punishment feedbacks, timing, etc.) changing from one study to another. One RLT variant concerns the association between feedbacks and stimuli that could be deterministic (one stimulus is always associated with reward and the other with punishment) or probabilistic (one stimulus is mostly associated with reward—e.g., 80% true reward feedbacks; 20% false punishment feedbacks—while the other is mostly associated with punishment) (Bari et al., 2010; Dalton et al., 2014; Ineichen et al., 2012; Rygula et al., 2014). In the present study, we were interested in studying this latter since it reproduces the ambiguity of real-life settings. In fact, during reversal phases, the individual cannot be 100% sure that the association has changed (Cools et al., 2002). As a result, punishment feedback could be interpreted as a true punishment feedback (the stimulus–outcome has changed—reversal phase), which leads to changing the response accordingly, or as a false punishment feedback (the stimulus–outcome is not changed—still learning phase), which leads to continue selecting the same stimulus. This uncertainty makes the task more difficult and reduces the individual’s ability to anticipate the reversal phase and promptly select the stimulus associated with reward.

A peculiarity of the RLT: the reward and punishment sensitivity

A peculiarity of the RLT is that this task assesses cognitive flexibility influenced by motivation. Specifically, the RLT stresses the importance of the final goal and measures the ability to shift from one solution to another to attain the goal of avoiding punishments and approaching rewards. In light of this, the RLT measures reward and punishment sensitivity (Friedel et al., 2015), two individual traits that determine the motivation to approach reward and avoid punishment. The sensitivity to reward–punishment is the individual's responsiveness to such feedback and it measures the extent to which the individual perseverates in choosing the rewarding stimulus (reward sensitivity) or avoiding the punishing stimulus (punishment sensitivity) (Schlagenhauf et al., 2014). Researchers recently evidenced that approach–avoidance sensitivity is significantly related with cognitive flexibility, such that approach sensitivity is related with an enhanced cognitive flexibility and avoidance sensitivity is related with a reduced cognitive flexibility (Baas et al., 2020). Thus, these personality differences are important in understanding cognitive flexibility in individuals.

The reward and punishment sensitivities are usually assessed in the approach–avoidance literature with self-report measures that explore different facets of approaching reward and avoiding punishment (Monni et al., 2020), such as the BIS-BAS scale (Carver & White, 1994) and the Approach-Avoidance Temperament Questionnaire (Elliot & Thrash, 2010). The BIS-BAS scale explores the Behavioral Inhibition Sensitivity (BIS) and the Behavioral Activation Sensitivity (BAS) as stable and innate neurobiological sensitivity to attractors (BAS) and threatening stimuli (BIS) (Gray & McNaughton, 2000). The Approach-Avoidance Temperament Questionnaire investigates the individual’s predisposition to be extroverted, emotionally positive and more sensitive to rewarding stimuli (Approach temperament) and to be neurotic, emotionally negative and more sensitive to punishment stimuli (Avoidance temperament) (Elliot & Thrash, 2010). These constructs reflect basic, rudimentary aspects of psychological functioning. BAS and BIS sensitivity differs from approach and avoidance temperament, in that the former constructs are linked to a highly constrained set of eliciting stimuli, neuroanatomical structures, and neurophysiological processes, whereas the latter are presumed to be elicited by a broader range of stimuli and to emerge from a broader network of interacting but partially independent neuroanatomical structures and neurophysiological processes operative across the neuraxis (including, but not limited to, those detailed in the BIS and BAS literature). Exploring variables that could affect cognitive flexibility is important to better study this cognitive ability. The sensitivity to reward–punishment seems to influence cognitive flexibility. Thus, we were interested in exploring those sensitivities through a behavioral measure such the RLT and compared them with the self-reported reward–punishment sensitivities measured with BIS BAS and approach–avoidance temperaments.

Assessing cognitive flexibility in RLT and WCST: a comparison

The RLT and WCST both assess cognitive flexibility according to different perspectives, sharing some aspects and differing in some other. Both RLT and WCST explore the ability to learn an association and abandon this association when the reinforcement contingencies change. However, different details in each task allow researchers to explore specific aspects of cognitive flexibility. The RLT can also measure the specific structure of learning obtained from trial-by-trial responses to feedbacks (Klanker et al., 2015; Stolyarova et al., 2014). In fact, it permits to estimate the learning rate (Rizvi et al., 2016), that is the ability to quickly update the new response–outcome associations (Izquierdo et al., 2017).

The WCST explores the individual differences in learning when the environment is fully predictive, whereas the probabilistic RLT focuses on ambiguous environments. With respect to feedback, in RLT, participants receive a positive or negative affective feedback (a smiling or a sad emoticon), whereas in WCST, they receive a correct or wrong neutral feedback (Dias et al., 1997).

Finally, the WCST explores Intra-dimensional shifting, which requires to maintain a reinforced categorization rule when it is presented with other forms (i.e., sorting stimuli by “color” and shifting from sorting blue items to red items). WCST and RLT both examine the Extra-dimensional shifting, that is shifting from a reinforced categorization rule to another rule (i.e., for WCST sorting stimuli by “shape” to sorting stimuli by “color”; for RLT choosing the deck of heart associated with + 100 points and shifting to choose the deck of diamonds subsequently associated with + 100 points) (Watson et al., 2006). To better clarify their difference, the Intra-dimensional shifting can be explored only when the task requires participants to shift between (Extra-dimensional) and within (Intra-dimensional shifting) categories. When the task requires participant to shift according to only one rule (i.e., the described RLT), it is not possible to explore both types of shifting. Despite their differences, both instruments allow researchers to assess the ability of reverse learning (e.g., Dias et al., 1997) and to estimate similar cognitive flexibility scores (Murphy et al., 2003).

In sum, the WCST and RLT are characterized by differences that should be taken into account when choosing the appropriate assessment instrument for research. However, both tasks focus on measuring the same construct of cognitive flexibility. Since RLT and WCST share some characteristics and differ in some other, to better explore the cognitive flexibility according to different facets of each task, we analyzed, as an exploratory aim, if and to what extent they converge.

Effect of stress and gender on cognitive flexibility

Recent studies in rodent behavioral models have evidenced two moderating variables that influence cognitive flexibility: sex and stress (Chowdhury et al., 2019; Gargiulo et al., 2020; Goodwill et al., 2018; Grafe et al., 2017; Hupalo et al., 2019). In particular, performance in a probabilistic RLT is differently influenced by sex and stress in rat models (Bryce & Floresco, 2020), but this aspect has been less explored in humans. Researchers confirmed the different impact of stress in men and women on cognitive flexibility (for reviews Lupien et al., 2009; Shields & Slavich, 2017). However, among studies that specifically measure cognitive flexibility with the WCST and RLT in humans, only a few also analyzed the effect of stress and gender. The sole impact of gender does not influence the WCST (Cinciute et al., 2018; Eling et al., 2008; Nyhus & Barceló, 2009) but has an impact on the probabilistic RLT. Only women exhibit a preferred punishment-based reversal learning after dopamine depletion (Robinson et al., 2010) and during reversal phase, and men make fewer errors than women (Evans & Hampson, 2015). According to these results, women are more sensitive to punishment in RLT learning and men show a better RLT performance. However, only two studies do not suffice to come to this definitive conclusion.

Stress has been examined through a larger variety of studies on WCST and RLT, but they mainly studied the effect of acute stress on cognitive flexibility. Acute stress (Hendrawan et al., 2012) or high stress symptoms measured with the Depression and Anxiety Stress Scale (a 42-item self-report instrument designed to measure depression, anxiety and tension/stress) (Ajilchi & Nejati, 2017) do not influence WCST performance. With respect to RLT, some found that acute stress impaired RLT performance (Raio et al., 2017), while others found enhanced RLT performance (Friedel et al., 2015; Robinson et al., 2013). Studies on the joint effect of stress and gender on RLT and WCST are scarce and explore only acute stress. For example, in two studies that employed, a computerized version of the WCST researchers demonstrated that acute stress induction worsens WCST performance in men (Kalia et al., 2018; Shields et al., 2016).

Studies on acute stress reported mixed results, probably due to different operationalization and intensity of acute stress. Conversely, studies on chronic stress showed more consistent results. In animal models, negative consequences of chronic stress are much more pronounced than acute stress (Wolf, 2003), but in general, negative impact of chronic stress on cognitive flexibility has been more supported in humans. Chronic stress impaired attention shift (Liston et al., 2009), set-shifting performance (Orem et al., 2008), and cognitive flexibility measured as the ability to find alternative solutions and to be in control when facing difficult situations (Kalia & Knauft, 2020). However, very few studies explored the impact of chronic stress on cognitive flexibility measured through WCST and RLT. Chronic stress negatively impacts reversal learning performance since it reduces blood flow and brain activity in brain regions related to the goal-directed actions (Ohira et al., 2011). A more recent study on WCST explored the interaction between acute and perceived chronic stress on perseverative errors, such that only individuals who were experiencing low and medium levels of perceived chronic stress exhibited a significant increase in the number of perseverative errors due to high acute stress condition. In contrast, those experiencing high levels of perceived chronic stress did not see any change in flexibility due to high or low acute stress conditions. Results were not influenced by gender (Knauft et al., 2021). To our knowledge, no study explored the moderation effect of gender in the stress impact with respect to RLT, but some researchers encouraged future studies to investigate this aspect (Friedel et al., 2015).

These limited results on this topic require further empirical studies. For this reason, since cognitive flexibility is an important ability for adaptation, we decided to explore stress and gender as factors that might affect it.

The present study

Cognitive flexibility performance changes according to situations and individual characteristics. Although it is important to explore this aspect using a laboratory approach, to reduce variables and simplify the process, it is also necessary to adopt a more ecological perspective that allows for identifying conditions in which cognitive flexibility could change. In this study, we assessed this cognitive process by employing a novel RLT, which allows for more exploration of flexibility through a task that is more similar to everyday life tasks, thus connecting the RLT with the WCST that analyzes similar construct and adding moderating variables that influence this process, such as stress and gender.

In light of the evidence described above, the RLT permits to evaluate not only cognitive flexibility but also individual characteristics that might affect sensitivity to reward and punishment.

The classic RLT paradigm includes only one condition in which the individual receives both reward and punishment feedback (one stimulus =  + 100; other stimulus = − 100). From this condition, the authors derive either the individual’s sensitivity to reward or sensitivity to punishment (Schlagenhauf et al., 2014). To specifically explore the individual performance in a solely reward or punishment environment and calculate the specific reward or punishment sensitivity, the first aim of our study is to add two new conditions to Schlagenhauf and colleagues’ (2014) paradigm: A reward condition, in which the individual receives a reward versus a neutral feedback (+ 100; 0); and a punishment condition, in which the individual receives a punishment versus a neutral feedback (− 100; 0). The first aim of this study is to compare these conditions to the classic condition, in which the individual receives a reward versus a punishment feedback. We suppose that each condition might allow researchers to assess a purer reward sensitivity, in the reward condition, and a purer punishment sensitivity in the punishment condition. We hypothesize that the three conditions would provide different scores of reward and punishment sensitivity.

As we described in the previous paragraph, the reward and punishment sensitivities are usually assessed in the approach–avoidance literature with two self-report measures which explore different facets of approaching reward and avoiding punishment (Monni et al., 2020): the BIS-BAS scale (Carver & White, 1994) and Approach-Avoidance Temperament questionnaire (Elliot & Thrash, 2010). To analyze the construct validity, our second aim is to investigate if the reward–punishment sensitivity scores obtained with the RLT could converge with the approach–avoidance measures.

The WCST is the most used task to explore the cognitive flexibility but, despite its effectiveness, it simplifies the study of this cognitive process. We argue that an individual might be more committed to try to earn/not lose points (RLT) than to sorting card according to a rule (WCST), since the former task is more akin to everyday life tasks. Thus, taking the WCST as a landmark for its widely supported validity, we were interested in understanding whether these tasks shared some components or were simply correlated. Therefore, to complete the construct validity analysis, we aimed to explore RLT and WCST and analyze how they measured the different facets of cognitive flexibility. We conducted a principal component analysis and hypothesized that the RLT and the WCST, although measuring cognitive flexibility according to different perspectives, would share some components. In addition, the punishment–reward sensitivity measured with RLT and the punishment–reward sensitivity measured with BIS BAS and Approach-Avoidance Temperament Questionnaires would be part of another component.

The third aim is to explore the moderating variables that should be taken into account in the cognitive flexibility analysis: gender and stress. In our study, we wanted to analyze the impact of stress moderated by gender on RLT in comparison to WCST. As we described above, the literature on cognitive flexibility mostly studied the acute stressors exposure. However, literature evidenced that psychological functioning is more influenced by the individual stress response compared to the mere exposure to the stressors (Roth et al., 2015). In addition, chronic stress has been demonstrated to have a stronger impact on cognitive and psychological health since it is a proved risk factor for several neuropsychiatric disorders (Caspi et al., 2003), it impairs flexible problem solving (Beversdorf et al., 1999) and it is associated with general cognitive decline (Wilson et al., 2005). For these reasons, in our study, we assessed the chronic stress response measured by one of the most commonly used measures of chronic stress response, the Perceived Stress Scale (PSS; Cohen, 1988; Cohen et al., 1983; Taylor, 2015).

We expect to confirm previous results: stress will negatively influence RLT performance (Ohira et al., 2011) and WCST performance but only in men (Kalia et al., 2018; Shields et al., 2016). Moreover, given that women exhibit a significant improvement in punishment processing (Robinson et al., 2010), we expect to better highlight this difference through the separated RLT reward (100, 0) and punishment conditions (− 100, 0). Moreover, since no relevant literature exists, we pose an exploratory research question about a possible stress–gender interaction. Finally, we include age and education as covariates to control their effect that could bias the independent variables effects, which are stress and gender.

In sum, our purpose is to highlight different facets of cognitive flexibility such as flexibility measures (correct responses, failures, perseverative errors) but also sensitivity measures (sensitivity to reward or punishment), explore their construct validity and evidence how they differ in men and women and chronic stress response.

Methods

Participants and procedures

All participants were recruited through internet ads, leaflets and face-to-face recruitment in public places. We included all participants from 18 to 65 years old and excluded those with a concurrent diagnosis of psychiatric disorder (i.e., Major Depression, Generalized Anxious disorder etc.) or substance abuse, as well as those with neurological or general medical disorders likely to affect cognition. Behavioral tasks and self-report measures were administered face-to-face in three sessions in a random order, one session for behavioral tasks, one session for approach–avoidance measures, another session for PSS and other self-report measures not included in this study. The sample was composed of 374 healthy volunteers (164 women), age 18–65 (mean = 34.91, SD = 13.41), 7–26 years of education (mean = 14.63, SD = 2.87). Fourteen participants were excluded because they abandoned the study before completing the approach-avoidance questionnaires. Stress measure (PSS) was collected in a subset of the whole sample,Footnote 1 172 participants, (57 women), age 18–64 (mean = 37.63, SD = 13.39), 7–26 years of education (mean = 15.22, SD = 2.98).

A priori sample size was computed by means of the pwr package (Champely, 2018) within the R statistical software (R Core team, 2019). We chose a power = 80% with an alpha = 0.05. We used a small effect size (Cohen’s f = 0.1) for the repeated measures ANOVA with 3 conditions, a moderate effect size (Pearson's r = 0.3) for the correlation analysis, and a small effect size (Cohenis f = 0.14) for the moderated regression analysis. We obtained a desired sample size of 322, 84 and 140 participants respectively. We collected a higher number of participants taking into account possible outliers.

The study has been conducted in accordance with the Declaration of Helsinki and it was approved by the Ethics Committees of the La Sapienza University of Rome and the University of Cagliari. Data have been anonymized and were collected after obtaining written informed consent.

Behavioral Tasks

Participants performed a variation of Schlagenhauf and colleagues’ (2014) RLT paradigm (Fig. 1). They received these instructions: “In this task, you have to choose between two decks of cards, hearts and diamonds. One deck makes you win most of the time and the other one makes you lose most of the time. During the task, you are requested to try to understand which deck makes you win more and choose that one because the goal is to earn as many points as possible. After a series of choices, the rule changes so that the deck associated with reward no longer makes you win”. To incentivize participants to earn as many points as possible, the investigator stimulated their tendency to compete with others.

Fig. 1
figure 1

Novel RLT paradigm. Note. Two decks of cards are presented for a maximum of 2 s, or until the participant responds. The participant is requested to choose the deck positioned on the right or left side of the screen respectively pressing the “L” key or the “A” key of the keyboard. After individual’s response, the chosen deck is highlighted by a blue square and the feedback appears in the center of the screen for 1 s. Participants are required to choose a deck as quickly as possible and the trials are separated by a jittered interval that varies between 1 and 3.5 s to prevent habituation to the appearance time of the stimulus

The paradigm is composed of three condition, reward–punishment condition, reward condition and punishment condition (for their descriptions see Fig. 1). Participants perform 50 trials in each condition and the order of the blocks was randomized for each individual. To make the task more complex, the individual received for the reward-deck 80% true reward feedbacks and 20% false punishment feedbacks. As a result, the punishment-deck gets 80% true punishment feedbacks and 20% false reward feedbacks. The deck-feedback association was reversed several times during the block, always after 16 trials or after 10 trials if the subject correctly chooses the reward object 70% of the time. The experiment was preceded by a training phase of 10 trials.

We calculated 5 flexibility scores, learning rates and reward/punishment sensitivities for each reward, punishment and reward–punishment condition. Following Murphy and colleagues’ study (2003), we calculated the total number of correct responses: the number of correct sets, which occurs when participants made a consecutive series of 7 correct matches, the number of errors to criterion, which is the number of incorrect responses before the correct match when the criterion is reversed; the failure to maintain set, which indicates incorrect responses after a consecutive series of 5 correct matches and perseverative errors, which quantifies the number of perseverative incorrect responses. The described scores correspond to those calculated in the WCST and this allowed us to explore the convergence of these two instruments. In particular, total number of correct sets is similar to the number of categories completed of the WCST, total number of correct responses, failure to maintain sets and perseverative errors was calculated both in WCST and RLT.

The novel RLT paradigm is composed of 3 blocks: block 1 with reward feedback (+ 100, smiley face) and punishment (− 100, sad face), block 2 with reward (+ 100, smiley face) and zero pints, block 3 with punishment (− 100, sad face) and zero points.

To calculate the individual’s learning rates and reward and punishment sensitivities, we employed the Rescorla–Wagner model (Deserno et al., 2015; Friedel et al., 2015; Schlagenhauf et al., 2014). This model states that the individual’s choice is guided by the feedback that the individual expects to obtain. The individual’s choice on each trial is proportional to the desirability of each option and obtained using a softmax equation:

$${\text{formula}}:p\left( {a|Q_{t} } \right) \, = \, \exp \, \left[ {Q_{t} \left( a \right)} \right]/ \, \left\{ {\sum_{a^{\prime}} \exp \left[ {Q_{t} \left( {a^{\prime}} \right)} \right]} \right\}$$

The desirability of each option [Qt (a)] is learned using the Rescorla–Wagner equation:

$$Q_{t} \left( a \right) \, = \, Q_{{t \, {-}1}} \left( a \right) \, + \, \alpha \left[ {R_{t} {-} \, Q_{{t{-}1}} (a)} \right]$$

Here, α is the individual learning rate, which is calculated on the basis of the reward obtained on the trial (Rt) and the expected result (Qt-1 (a)). So, if the actual reward increases (Rt), then the probability of choosing the deck associated with reward increases (Qt (a)). This variable takes the value Rt = βrew if a reward has been obtained, namely sensitivity to reward, and βpun if a punishment has been obtained, namely sensitivity to punishment. Through an algorithm that uses the maximum a posteriori (MAP) likelihood method, the learning rate (α), sensitivity to reward (βrew), and sensitivity to punishment (βpun) were estimated for each individual. We refer to the article by Friedel et al. (2015) for the detailed description of the calculation method. Since our paradigm is composed of three conditions (reward, punishment and reward–punishment), we calculated learning rates and reward–punishment sensitivities separately in each condition.

In this study, we employed the WCST developed by Heaton and colleagues (1993), manual version (Italian version edited by Hardoy et al., 2000). The WCST consists of 4 stimulus cards and 128 response cards (2 decks of 64 cards). The represented figures are characterized by number (from 1 to 4 per card); shape (circles, triangles, crosses or stars); and color (red, blue, yellow and green). The individual is requested to match the cards of the response deck with the target cards according to a category and adjust the associations based on the examiner’s feedbacks. The individuals should correctly match 10 cards per category, after 10 consecutive correct sorts the category changes. The WCST continues until participant correctly sorted six categories or until all 128 cards have been exhausted. The WCST performance is analyzed through 6 scores: the number of categories completed, the number of total errors, the number of perseverative responses and perseverative errors,Footnote 2 the number of non-perseverative errors, which are random errors, and the number of failures to maintain sets, which occur when the individual changes response strategies despite the category remaining the same (Kopp et al., 2020). We included all six scores for the convergence analysis, but only Perseverative errors score was employed to assess cognitive flexibility in the moderation analysis.

Self-report instruments

The BIS-BAS scale (Carver & White, 1994; Campbell-Sills et al., 2004; Italian version Leone et al., 2002) The BIS-BAS scale is composed of 20 items with a 6-point Likert scale and measures 4 factors: Behavioral inhibition sensitivity (BIS) and Behavioral activation through BAS Drive, BAS Reward Responsiveness and BAS Fun Seeking. The average of the items per scale indicates the BAS Drive, BAS Reward Responsiveness, BAS Fun Seeking and BIS scores which range from 1 to 6. Higher scores indicate the predominant sensitivity. We obtained acceptable Cronbach’s alphas (BASd = 0.64, BASfs = 0.63, BASrr = 0.71, BIS = 0.79).

The Approach-Avoidance Temperament Questionnaire, ATQ (Elliot & Thrash, 2010; Italian validation Monni & Scalas, 2020) is composed of 12 items with a 7-point Likert scale response format and investigates with 6 items per scale, the Approach Temperament and the Avoidance Temperament. Approach and avoidance temperaments range from 6 to 42 each, higher scores indicate the predominant temperament. The instrument showed an acceptable internal reliability, confirmed in this study (Cronbach’s alphas are Approach Temp. = 0.75, Avoidance Temp. = 0.81).

The Perceived Stress Scale (PSS; Cohen, 1988; Cohen et al., 1983; Taylor, 2015; Italian translation by Cavallo et al., 2016) measures the perceived chronic stress, that is the degree to which situations in individual’s life are appraised as stressful and how respondents find their lives unpredictable, uncontrollable, and overloaded, how much individuals perceived their demands exceed their ability to cope over the previous 4 weeks on a 5-point Likert scale from 0 to 4. Scores range from 0 to 40, with 0–13 indicates low stress, 14–26 moderate stress, 27–40 high stress. The PSS is a valid and reliable measure and this study confirmed its good validity (Cronbach’s alpha is PSS = 0.87). In this sample, PSS scores are normally distributed (PSS mean = 15.14 SD = 7.07).

Results

Repeated measures ANOVA for the three RLT conditions

We conducted the analyses with SPSS software (version n. 24, Inc., Chicago, IL, USA). We performed a repeated measures ANOVA to verify whether the reward, punishment and classic reward–punishment conditions were statistically different from each other. We compared the average scores of learning rate, reward and punishment sensitivity, number of correct responses, correct sets, error to criterion, failure, and perseverative errors. To conduct the repeated measure ANOVA, we followed Girden’s recommendations in meeting three assumptions: independent observations, normality and sphericity (Girden, 1992). We met the first two assumption and we corrected data with Huyn–Feldt correction (1976) when the sphericity assumption has been violated.

The results (Table 1) show that, except for Failure, all RLT scores are statistically different, although this difference reflects a small effect.

Table 1 Repeated measures ANOVA between the three RLT conditions and post hoc tests

Among the comparisons, we registered the larger difference between condition in sensitivity to punishment scores (F (2, 746) = 9.959, p < 0.001; RewCond = 0.904; PunCond = 0.934; RewPunCond = 0.932). Specifically, a Fisher’s post hoc test evidenced that, across comparisons, reward and punishment conditions significantly differ in almost all condition. In addition, the punishment condition appears as the condition in which individuals have reached a better performance in comparison to reward and reward–punishment conditions. Participants in the punishment condition obtained greater learning rates, correct sets and lower errors to criterion, failures and perseverative errors than the other two conditions.

RewCond = Reward condition; PunCond = Punishment condition; RewPunCond = Reward-Punishment condition. *p < 0.05; **p < 0.001. N = 374.

Convergence validity with Approach-Avoidance self-report measures and WCST

To explore the convergence between RLT and approach–avoidance measures and WCST, we performed a principal component analysis. Preliminarily, we confirmed the assumptions required for PCA (Watkins, 2021): adequate linearity, sampling adequacy with Kayser–Meyer–Olkin and Bartlett’s test and no significant outlier. We conducted a PCA, oblimin-rotated correlation matrix with unrestricted factor extraction. We decided to restrict the PCA at three components after observing the eigenvalue (components 1–3 = range 9.1–2.6; over 4 < 2.5) and variance explained (42.23%).

Table 2 reports the results of the PCA. The results show that RLT and WCST were included in two separated components, whereas the third factor Error to Criterion converged with BAS reward responsiveness and reward sensitivity in the punishment condition. Neither BIS nor approach–avoidance temperaments converged with any factors. Analyzing the correlation between the components, we observed that RLT and WCST negatively correlated (r = -0.232; p < 0.05). This negative correlation is motivated by the nature of flexibility scores: WCST measures the inflexibility (errors, perseveration etc.), conversely RLT measures the flexibility (correct responses, correct sets).

Table 2 Principal component analysis of RLT, WCST, BIS-BAS and approach-avoidance temperaments scores

Impact of stress, moderated by gender, on cognitive flexibility

To explore the impact of stress moderated by gender on cognitive flexibility, we performed a moderated regression analysis using the software Process for SPSS (Hayes, 2012). We took into account age and level of education as covariates. We analyzed the number of perseverative errors as the most representative score of cognitive flexibility for both RLT and WCST in line with previous meta-analyses (Gamboz et al., 2009; Li, 2004; Rhodes, 2004; Westwood et al., 2016). In addition, we explored individual’s learning rate and reward-punishment sensitivities, which are specific scores of RLT paradigm. We checked data for four assumptions: normality, linearity and homoscedasticity and we controlled for the presence of outliers with the Mahalanobis, Cooks and Leverage test. We excluded all data that presented at least two scores out of the cut-off (see Table 3 for the number of individuals excluded for each cognitive flexibility variable). All data met the assumptions.

Table 3 Moderated regression. Stress moderated by gender predicts WCST and RLT scores with age and education as covariates

Results (Table 3) show that the WCST is unaffected by stress and gender and is only influenced by the increase of perseverative errors with respect to age. Conversely, significant results emerged with the RLT. The interaction of stress and gender affects perseverative error and punishment sensitivity. Specifically, reporting only statistically significant results between men and women groups (Fig. 2), we observed a positive association between chronic self-perceived stress and perseverative errors in the punishment condition (0, -100) and a negative association between chronic self-perceived stress and punishment sensitivity in reward conditions (100, 0) only in the women group. Therefore, stressed women commit more errors in the riskiest condition, when they might lose 100 points, and are less sensitive to punishment in the less risky condition, when they might, in the worst case, have zero point. Conversely, stress was not associated with flexibility behaviors in men.

Fig. 2
figure 2

Stress increases Perseverative Errors in Punishment condition and reduced Punishment sensitivity in Reward condition only in women group (Z-scores). Note. Solid line and black dots represent women, dashed line and white dots represent men

Discussion

In studying cognitive flexibility, it is crucial to adopt an ecological perspective by assessing this ability using a task similar to everyday life tasks that allows considering possible moderating variables. Toward this aim, this research analyzed cognitive flexibility in a novel RLT paradigm, explored the RLT convergent validity with WCST and approach-avoidance tendencies, and studied the effects of gender and stress.

To address the first aim, we included in the classic RLT paradigm a pure reward and a pure punishment condition and found that these conditions differently assessed cognitive flexibility and reward–punishment sensitivity in comparison to the classic RLT condition. This new RLT paradigm, taking into account the pure effect of reward and punishment, could be a promising way of studying flexibility and reward–punishment sensitivity in specific conditions of punishment (0, -100) and reward environment (+ 100, 0). One aspect that emerged in this comparison is a better performance achieved by participants in the punishment condition (0, -100). The riskiest condition, in which the individuals could only lose points, seemed to determine a greater commitment in the task, and subsequently a better learning rate, correct responses and fewer perseverative errors. This behavior could be considered a defensive reaction adopted by the individual to avoid failure, and this result is in line with the literature in which researchers found that motivation to avoid punishment enhances cognitive control and performance (Lindström et al., 2013).

With respect to the second aim, we explored the RLT convergent validity through PCA to strengthen the instrument validity. To our knowledge, only one study documented its test–retest reliability (Freyer et al., 2009), but since the RLT allows to explore aspects of cognitive flexibility precluded to other instruments, we considered it important to analyze its psychometric characteristics. In the PCA, we observed that RLT and WCST scores resulted in two separate components, but they were also significantly related. This result is in line with other studies showing that the RLT and WCST explore cognitive flexibility through different perspectives (Dias et al., 1997; Nagahama et al., 2001). Specifically, both tasks are designed to measure the Extra-dimensional shifting, which is shifting from a reinforced categorization rule to another rule (i.e., sorting stimuli by color or shape; choosing the deck of diamond or heart), but only the WCST also measures Intra-dimensional shifting, which requires maintaining a reinforced categorization rule when it is presented with other forms (i.e., sorting stimuli by “color” and shifting from sorting blue items to red items) (Dias et al., 1997). While the WCST analyzes higher level cognitive set-shifting (Nagahama et al., 2005), in a deterministic environment, the RLT explores the ability of set-shifting of lower-level stimulus reward–punishment associations. In addition, in a probabilistic environment, RLT studies the specific structure of learning derived from trial-by-trial responses (Klanker et al., 2015; Stolyarova et al., 2014) and it permits to assess different sub-processes that could be compared with other cognitive flexibility tasks. This result underlines that the RLT and WCST are not interchangeable but gives the opportunity to explore cognitive flexibility through different levels of analysis. Cognitive flexibility should be studied through several perspectives and these assessments can explore different facets of this complex phenomenon. To our knowledge, this is the first study that explicitly explored the convergence of these measures in a principal component analysis.

In the third PCA component, sensitivity to reward measured with BAS reward responsiveness converged with sensitivity to reward in the punishment condition (0; -100; choosing the zero deck). This also converged with the Number of Errors before reaching a Criterion, such that high sensitivity to reward seems to be associated with a high number of errors. Therefore, individuals that make more errors are characterized by a high reward sensitivity, such that this sensitivity seems to lead individuals more prone to risk and make more mistakes. However, this result should be considered with caution since RLT reward sensitivity score loads more on the first component (the RLT component). Despite this, we wanted to highlight this result given that for the first time a convergence has been found between a behavioral score, measured with RLT, and a widely used approach–avoidance self-report measures (BAS reward responsiveness), thus strengthening the RLT validity. The RLT has the advantage of assessing cognitive flexibility considering also the individual motivation to approaching a reward or avoiding a punishment. We are convinced that doing a task in which the individual receives a reward or a punishment (gain or lose point in the RLT) determines a different response in comparison to a task in which the individual simply receives correct/wrong feedback (WCST). Therefore, the RLT, differently from WCST, allows to explore also the personality characteristics that influence the flexibility performance. This extra measure could be advantageous since recent findings evidenced the influence of approach–avoidance sensitivity in cognitive flexibility performance (Baas et al., 2020).

In relation to the third aim of our study, the moderated regression analysis showed a complex picture, whereas the WCST was only affected by age, the RLT was influenced by the interaction of gender and stress; particularly stressed women showed increased perseverative errors in the punishment condition (0, -100) and reduced punishment sensitivity in the reward conditions (100, 0). Although normative WCST data reported age and education as normative factors, it is possible that education effects did not emerge in our analysis because levels of education were not equally distributed in our sample (the majority of participants—65.5%—had university level education). Null results of stress on WCST are in line with previous findings (Ajilchi & Nejati, 2017; Hendrawan et al., 2012) but considering the moderating effect of gender, previous studies reported an impaired cognitive flexibility on men participants (Kalia et al., 2018; Shields et al., 2016) that we did not observe. However, contrary to the perceived chronic stress assessed in our study, the studies mentioned above explored high stress symptoms measured with DASS (Ajilchi & Nejati, 2017) and acute stress induction (Hendrawan et al., 2012; Kalia et al., 2018; Shields et al., 2016). Knauft and colleagues (2021) explored the effect of chronic and acute stress reactivity on WCST performance and our results could be explained in light of their findings. They observed that the difference in perseverative response is mainly determined by the presence of acute stress. Specifically, acute stress does not influence WCST perseverative response in individuals that reported high levels of perceived chronic stress. Conversely acute stress influences WCST perseverative response in individuals that reported low and medium levels of perceived chronic stress. Thus, the perceived chronic stress could modulate the effect of acute stress exposure and the individual response to this trigger (Epel et al., 2018). Since in our study, we only explored the perceived chronic stress, we were not able to detect this difference on WCST. We argue that the perceived chronic stress leads to a stabilization of the WCST flexibility performance, at least in the short term.

With respect to the RLT, results are to our knowledge novel in the cognitive flexibility literature. In line with Raio and colleagues’ (2017) results, we found that in the RLT reward–punishment condition (i.e., the classic RLT), stress impairs performance. It is important to underline that Raio and colleagues (2017) studied the physiological acute stress reactivity. Thus, this is the first study that explored the perceived chronic stress and reported these results. In addition, for the first time, we evidenced that this impairing effect is only present in the women group. Stressed women make more perseverative errors in the riskiest condition (0, -100) and are less able to avoid the non-reward in the safer condition (+ 100, 0). In everyday life, the ability to avoid the non-reward could make individuals less exposed to possible stressors. It could be hypothesized that this impairing effect on cognitive flexibility might be considered, in the long run, a vulnerability factor for the psychiatric disorder occurrence. Since women population has a greater incidence of psychiatric disorder (Kuehner, 2017), future studies might extend this research and clarify whether this gender difference can be generalized in the general population.

Overall, these findings highlight the importance of studying cognitive flexibility in an ecological perspective, considering the variables that affect this cognitive process and the situations in which this cognitive process should function. The situation in which the individual would likely receive a punishment, activates more commitment in the task to avoid the negative outcome, as a defense mechanism. On the contrary, the rewarding situations determine a more relaxed predisposition. For this reason, cognitive flexibility should not be explored considering few variables in isolation but it should be analyzed considering variables that affect the individual performance. As we mentioned above, cognitive flexibility is devoted to adaptation and different life condition or individual differences can boost or reduce the intentionality to adapt. In addition, since cognitive flexibility plays an important role in psychological health and disease prevention (Izquierdo et al., 2017), our findings might be useful to highlight what variables or situation could promote or suppress flexibility and, in the long run, might be resilience or vulnerability factors for psychopathology occurrence.

Although this research reported novel results and take a different perspective in studying cognitive flexibility, some important limitations should be mentioned. The RLT paradigm was composed of only 50 trials per condition compared with previous RLT paradigms of 200 trials; in addressing the third aim, men and women were not equally represented (115 men, 57 women); and, although we reported significant results, all were small-sized effects. We hypothesize that these modest effect sizes are explained by the reduced number of RLT trials, which are below the average of 200 trials usually employed in research paradigms (Deserno et al., 2015; Friedel et al., 2015; Schlagenhauf et al., 2014). These limits should be kept in mind in interpreting our results and could be considered the next starting points for RLT research.

Future studies are called to deepen the investigation of gender and stress on cognitive flexibility employing a larger and equally represented sample and explore the cognitive flexibility in a pure reward and punishment condition employing a higher number of trials per condition. We encourage further analyses of this topic since the understanding of cognitive function has a crucial role in psychological science.