Impairments in goal-directed action and reversal learning in a proportion of individuals with psychosis

Cognitive impairment in psychosis is one of the strongest predictors of functional decline. Problems with decision-making processes, such as goal-directed action and reversal learning, can reflect cortico-striatal dysfunction. The heterogenous symptoms and neurobiology observed in those with psychosis suggests that specific cognitive phenotypes may reflect differing causative mechanisms. As such, decision-making performance could identify subgroups of individuals with more severe cortico-striatal dysfunction and help to predict their functional decline. The present work evaluated the relationship between goal-directed action, reversal learning, and symptom profiles in those with psychosis. We assessed decision-making processes in healthy controls (N = 34) and those with persistent psychosis (N = 45), subclassifying subjects based on intact/impaired goal-directed action. Compared with healthy controls (<20%), a large proportion (58%) of those with persistent psychosis displayed impaired goal-directed action, predicting poor serial reversal learning performance. Computational approaches indicated that those with impaired goal-directed action had a decreased capacity to rapidly update their prior beliefs in the face of changing contingencies. Impaired decision-making also was associated with reduced levels of grandiosity and increased problems with abstract thinking. These findings suggest that prominent decision-making deficits, indicative of cortico-striatal dysfunction, are present in a large proportion of people with persistent psychosis. Moreover, these impairments would have significant functional implications in terms of planning and abstract thinking. Supplementary Information The online version contains supplementary material available at 10.3758/s13415-022-01026-8.


Inclusion criteria
Those with persistent psychosis must have been diagnosed with a Persistent Psychotic Disorder (schizophrenia, schizoaffective disorder, bipolar I disorder, delusional disorder). Participants had no organic cause pf psychosis (i.e., epilepsy, intra-cranial pathology or HIV infection), and were between the ages of 18 and 50.

Serial reversal learning (SRL) task
For all stages of the reversal learning task, there were no limits on the time taken to respond. After selecting a stimulus, the outcome was presented on the screen for one second before the next trial was initiated. A running total of the 'credits' received was displayed in the bottom corner of the screen (participants were not aware of how much each credit was worth in monetary compensation). All stimulus pairs were binary images matched as closely as possible for whiteblack pixel ratio (see Fig S2), with all combinations being counterbalanced.

Initial training
Participants were shown the following instructions on the screen: "Two pictures will appear on the screen. On each turn, use the joystick to choose one of these pictures. The computer will tell you what credits you earned for your choice. One of the pictures will get you a reward and the other will not. The pictures will change sides randomly, so be careful to select the correct one". They then began a deterministic discrimination with a single reversal, whereby every correct response was rewarded (outcome of 1) and incorrect responses were not (outcome of 0). The initial discrimination contingencies were reversed after 8 consecutive correct responses were made and both stages had to be completed within 200 trials (participants got up to two attempts using unique sets of stimuli).

Probabilistic serial reversal learning
After successfully completing the training stage, participants were administered a probabilistic SRL task. Two instruction screens were presented, with the first restating the instruction from the training test. The second informed the participants of the probabilistic contingencies: "Unlike before, the correct picture will not always give you a reward and sometimes the wrong picture will give you a reward. Find out which picture earns the most credits. Stick with it even if it is sometimes wrong. At some point it may change so that the other picture earns more. Only start choosing the other picture when you are sure that the rule has changed".
The task consisted of 11 stages, each featuring the same pair of stimuli but varying in reward rate (probabilistic) and reward value (credits awarded). These included: initial discrimination (1 stage), initial reversal (1 stage), and serial reversal learning phase 1 (SRL1; 5 stages) and serial reversal learning phase 2 (SRL2; 4 stages). For the discrimination, initial reversal and SRL1 stages, the probabilistic reward contingencies were set at 80/20, meaning that the target stimulus was rewarded 80% of the time, whereas the non-target stimulus was rewarded only 20% of the time.
The reward outcomes included 1 credit for a rewarded trial and 0 credits for a non-rewarded trial.
For the SRL2 stages, the probabilistic reward contingencies were set at 80/40 to increase the task difficulty, meaning that the target stimulus was rewarded 80% of the time, whereas the non-target stimulus was rewarded 40% of the time. The reward outcomes were 2 or 6 credits for a rewarded trial (of equal probability) and 0 credits for a non-rewarded trial. The addition of variable reward values was included to analyze whether the strategy that a participant used was biased when receiving greater rewards on the preceding trial. Criterion for progressing to the next stage was 6 correct responses in a row. The test ended once the participant completed all 11 stages, or once 500 trials were completed. SRL1 and SRL2 trials to criterion were only included in analyses if at least 2 stages had been completed. All trials (completion of stage or not) were included in all other analyses.

Reversal learning performance measures and strategies
There are multiple measures of performance that can be quantified in reversal learning tasks.
These include, but are not limited to, total trials to criterion, perseveration (number of errors in the first 6 trials after a reversal), and response rates (total, or for correct and incorrect responses).
Other measures allow for detailed inspections of choice strategy, including whether a subject selects the same stimulus after attaining a reward (Win-stay) or whether they select the other stimulus (Win-shift). Similar strategies were calculated after losses, including whether the subject selected the same stimulus after a non-rewarded trial (Lose-stay) or selected the other stimulus (Lose-shift). These were calculated as the proportion of each strategy relative to the trials in which that strategy could be used (i.e., P.Win-stay = Total number of times the same stimulus was selected after a rewarded trial/total rewarded trials). All values were calculated for each individual stage, as well as across the SRL1 and SRL2 stages (inclusive of all combined trials).

Serial reversal learning exclusions
One participant (a male in the persistent psychosis group) failed to successfully complete the training stages after two attempts. Data from this participant were included in the outcome devaluation data but not for reversal learning or intact/impaired analyses.  28 day frequency (28d freq) was scored using the following criteria; 0 = no use, 1 = once in 28 days, 2 = 2-3x in 28 days, 3 = 1-2x/week, 4 = 3-6x/week, 5 = daily, or 6 = multiple uses daily. The data are expressed as mean (± standard deviation) where applicable. TOPF, test of premorbid functioning; ss, standard score; WASI-II, Wechsler Abbreviated Scale of Intelligence -2 nd edition;