Introduction

Perceptual learning describes the phenomenon of improved sensory discrimination abilities that occur with training. Associated perceptual improvements in visual discrimination tasks correlate with neuronal activity and tuning changes in subcortical and cortical areas (Adab & Vogels, 2011; Adab et al., 2014; Crist et al., 2001; Freedman & Assad, 2006; Ghose et al., 2002; Ito et al., 1998; Law & Gold, 2008; Li et al., 2004; Raiguel et al., 2006; Sanayei et al., 2018; Schoups et al., 2001; Thiele, 2004; Yan et al., 2014; Yang & Maunsell, 2004; Yu et al., 2016). However, the extent of changes at different levels of the processing hierarchy (e.g. Ghose et al., 2002; Gu et al., 2011; Law & Gold, 2008; Schoups et al., 2001; Uka et al., 2012) and the underlying mechanisms (Dosher & Lu, 1998, 1999; Lu & Dosher, 1998; Lu et al., 2010) remain under debate. Most prior studies have performed single-electrode recordings, comparing pre-training activity to post-training activity or activity from trained to untrained animals or hemispheres (e.g. Adab & Vogels, 2011; Adab et al., 2014; Raiguel et al., 2006; Yang & Maunsell, 2004). A few studies have analysed striate cortex (V1) activity using multiple chronically implanted electrodes or two-photon imaging during learning (Astorga et al., 2022; Schumacher et al., 2022). Often, it was reported that training improves coding abilities of neuronal populations by signal enhancement, while reduction in neuronal (correlated) noise made no contribution (Yan et al., 2014) (but see Cheng et al., 2023; Sanayei et al., 2018).

We previously investigated population coding mechanisms of perceptual learning in a mid-level visual area, where we recorded from chronically implanted electrodes in macaque area V4, while monkeys performed a two-alternative forced choice contrast discrimination task (Sanayei et al., 2018). In V4, perceptual learning increased information encoding in individual neurons. Additionally, it was accompanied by a reduction in noise correlations which further increased coding abilities of V4 neuronal populations.

Here, we used the same monkeys and performed the same non-roving task during V1 recordings to allow for a direct comparison between the effects in V1 and those previously seen in V4 (with the caveat that receptive field locations between the V1 and V4 recording sites differed and that sessions were consecutive, not simultaneous). It nevertheless allows us to compare the strength of training effects on neural activity between cortical areas. While the consecutive measurements do not allow for a direct test of the reverse hierarchy theory of perceptual learning (RHT Ahissar & Hochstein, 2004; Hochstein & Ahissar, 2002), RHT predicts that learning effects in area V4 would be stronger than in V1 (Ahissar & Hochstein, 2004), that improvement of fine contrast discrimination abilities should be location-specific, and that transfer between locations should be larger for easy discriminations.

In addition to non-roving conditions, we performed recordings under roving-without- and roving-with-flanker conditions. These were added for two reasons. (1) Human studies have shown that perceptual learning under roving-without-flanker conditions is slower and more limited (Adini et al., 2004; Kuai et al., 2005; Yu et al., 2004; Zhang et al., 2008) than under non-roving conditions (Yu et al., 2004). We wanted to explore whether similar effects occur in macaques, with the benefit of being able to obtain thousands of (training) trials from single individuals. (2) Flanker stimuli were used to explore the role of context-dependent neural plasticity in perception and learning. The addition of flanker stimuli may change the balance of excitation and inhibition in a local network and therefore increase plasticity and perceptual learning in adults (Adini et al., 2004; Polat & Sagi, 1993; Tsodyks et al., 2004) (but see Yu et al., 2004). A human study found that while training without flankers produced no significant improvement on contrast discrimination thresholds, the presence of flanker stimuli yielded threshold reductions of ~ 50% (Adini et al., 2002).

We found that perceptual learning occurred under non-roving and roving-with-flanker conditions (although performance under roving-with-flanker conditions never exceeded non-roving performance); i.e. performance improved with training, but were largely absent under roving-without-flanker conditions. Changes in neural activity in V1 were overall modest and were mostly restricted to non-roving-based perceptual learning.

Methods

Most of the methods described have been published previously. To preserve as much of the method details as possible, we duplicate most of the relevant text from the original papers for reference purposes (Chen et al., 2013, 2014; Sanayei et al., 2018).

Data Collection

All procedures were approved by the Newcastle University Animal Welfare Ethical Review Board (AWERB) and carried out in accordance with the European Communities Council Directive RL 2010/63/EC, the US National Institutes of Health Guidelines for the Care and Use of Animals for Experimental Procedures, and the UK Animals Scientific Procedures Act. Two male macaque monkeys (5 and 14 years of age at the start of the study) were used.

Head Post Implantation

An initial surgical operation was performed under sterile conditions, in which a custom-made head post (PEEK, Tecapeek) was embedded into a dental acrylic head stage. Details of surgical procedures and post-operative care have been published elsewhere (Thiele et al., 2006).

General Training

Initially, monkeys were trained to perform a delayed match-to-sample task, in which they compared the colour of a circle stimulus with that of succeeding circle stimuli, while maintaining fixation on a central target. When a target stimulus appeared (a circle of a matching colour), subjects were required to release a touch bar in order to receive a fluid reward. Fluid control was within levels which do not negatively affect physiological or psychological welfare (Gray et al., 2016). Eye position was monitored using an infrared video tracking system (Dalsa CCD camera [model SIM-0002] and eye-tracking software from Thomas Recording ET-49 [version 1.2.8]). This initial training allowed subjects to familiarise themselves with the experimental setup and the timing structure of the task; this task was otherwise unrelated to the contrast discrimination experiment described below.

Electrode Array Implantation

For surgical preparation, animals were sedated with ketamine. During surgery, anaesthesia and analgesia were maintained by sevoflurane (gaseous, 1–3%) and alfentanil (intravenous 156 μg/kg/h), respectively. Blood pressure, rectal temperature, blood oxygen saturation, and end tidal CO2 were measured continuously. After the surgery, analgesic (Metacam 0.1/kg) and prophylactic antibiotics (Ceporex 0.5 ml/kg) were given for 3 to 5 days.

During surgery, the animals were placed in a stereotaxic head holder and the skull overlying the occipital and posterior temporal cortices was exposed. A craniotomy was made to remove the bone overlying V1, V2, and dorsal V4, using a pneumatic drill. The bone was kept in sterile 0.9% NaCl for refitting at the end of the surgery. The dura was opened to allow access to V1 and V4. Microelectrode chronic Utah arrays, attached to a CerePort™ base (Blackrock® Microsystems, connection dimensions of 16.5 mm [height] × 19 mm [base diameter] × 11 mm [body diameter]), were implanted under sterile conditions in the cortex, using a Blackrock microarray inserter. In monkey 1, one 5 × 5 grid of microelectrodes was implanted into area V1, two 4 × 5 grids of microelectrodes were implanted in area V4, and one 5 × 5 grid of microelectrodes was implanted into area 7a; in monkey 2, a 5 × 5 grid was implanted in V1 and V4 each. Electrodes were 1 mm in length, and their tips reached depths of up to 1 mm. Wire bundles were held in place with biologically compatible glue (histoacrylic), and the connector (CerePort™) was secured to the skull with titanium bone screws. Following array insertion, the dura was re-sutured over the array, the exposed area was thinly covered with sterile Tisseel Lyo two-component fibrin sealant (Baxter Healthcare), and the bone flap was reinserted into the skull (before the Tisseel had fully set). The bone flap was cross bridged to the surrounding skull using Synthes orbital plate fragments and Synthes titanium bone screws.

The V1 electrode arrays were inserted under visual guidance into V1. The recording locations were confirmed to be in area V1 in both animals via visual inspection immediately post-mortem and by analysis of post-mortem Nissl-stained brain sections in monkey 1. In and around the V1 and V4 implant locations, clear signs of gliosis were found in the Nissl stain.

Apparatus

Stimulus presentation was controlled using CORTEX software (Laboratory of Neuropsychology, NIMH, http://dally.nimh.nih.gov/index.html) on a computer with an Intel® Core™ i3-540 processor. Stimuli were displayed at a viewing distance of 0.54 m, on a 25″ Sony Trinitron CRT monitor with a resolution of 1280 by 1024 pixels, yielding a resolution of 31.5 pixels/degree of visual angle (dva). The monitor refresh rate was 85 Hz for monkey 1 and 75 Hz for monkey 2. The output of the red and green guns was combined using a Pelli-Zhang video attenuator, yielding a luminance resolution of 12 bits/pixel, allowing the presentation of contrasts that were well below contrast discrimination thresholds (Pelli & Zhang, 1991). A gamma correction was used to linearize the monitor output.

Data Acquisition and Processing

Raw data were acquired at a sampling frequency of 32,556 Hz with a 24-bit analogue-to-digital converter, with minimum and maximum input ranges of 11 and 136,986 microvolts respectively (pre-set by Neuralynx, Inc.), a DMA buffer count of 128, and a DMA buffer size of 10 ms, using a 64-channel Digital Lynx 16SX Data Acquisition System (Neuralynx, Inc.). Digital referencing of voltage signals was performed prior to the recording of raw data, using commercially provided Cheetah 5 Data Acquisition Software v. 5.4.0 (Neuralynx, Inc.), to yield good signal-to-noise ratios for each channel.

Following each recording session, the raw data were processed offline using both commercial (Neuralynx, Inc.) and custom-written (Matlab, MathWorks) software. Signals were extracted using Cheetah 5 Data Acquisition Software. The sampling frequency remained the same (32,556 Hz), while the bandpass filter frequency and input range settings were individually tailored to each channel. Raw data were bandpass filtered with a low-cut frequency of 600 Hz and a high-cut frequency of 4000 Hz and saved at 16-bit resolution. This stage of processing generated ‘continuous MUA’ data, which was further processed to yield ‘spiking MUA’.

Spiking Multi-unit Activity (Spiking MUA)

An iterative procedure was carried out on the continuous MUA signal for each channel, in which the threshold for spike extraction was varied according to a staircase procedure, in order to yield levels of spontaneous spiking MUA (before the onset of the sample stimulus) that were similar (within 1% of a ‘target’ level) across sessions. To set the target level for each channel, the threshold was initially selected manually for all channels and sessions, and a ‘representative’ session was selected for each channel (i.e. a session with an ‘average’ signal-to-noise ratio [see below for description] for that channel). Hence, the extraction of spiking MUA was performed such that spontaneous activity levels were standardized across recording sessions. As spontaneous activity levels were deliberately kept uniform across training days, we did (or could) not study whether spontaneous activity levels changed during training. What this method did allow, however, was the rigorous comparison of levels of stimulus-evoked activity across the training period, relative to spontaneous levels.

Receptive Field Characterization

Receptive fields (RFs) were mapped using a reverse correlation procedure (Gieselmann & Thiele, 2008), for each recording channel prior to training and recording. Additionally, orientation and spatial frequency (SF) tuning were determined using a reverse correlation procedure (Gieselmann & Thiele, 2008). RF locations (see supplementary Figure S1) and tuning preferences were highly consistent across the training period as determined by regular remapping while learning occurred (every 3–5 days).

Behavioural Task

Each monkey was trained in a contrast discrimination task in which he differentiated between the relative contrasts of two successively presented stationary Gabor gratings (Fig. 1). Monkeys were initially trained on a very basic version of the contrast discrimination task in which stimuli were presented at a location in the upper visual field, i.e. at a substantial distance from the receptive fields covered by our electrodes which were located in the lower left visual field (for details see below). When the animal understood the main concept of the task in the upper visual field, the stimuli were shifted to the lower left visual field.

Fig. 1
figure 1

Contrast discrimination task. Monkeys fixated upon a central spot, whereafter a sinusoidal grating sample stimulus of 30% contrast (20, 30, 40% for roving without and with flanker conditions) was presented for 512 ms, followed by a 512-ms interval. Thereafter, a sinusoidal grating test stimulus (of higher or lower contrast than the sample) was presented for 512 ms, followed by a second interval of 512 ms. Finally, two target stimuli appeared to the left and right of the location at which the sample and test had previously been presented; the fixation spot changed colour from black to grey, signalling that the animals were allowed to make a saccade to their chosen target. If the test was of a higher contrast than the sample, the monkeys had to saccade to the white target; otherwise, if the test stimulus was of a lower contrast, they had to saccade to the black target

Before we performed the perceptual learning task described here, we initially assessed perceptual learning at a more peripheral location that covered the recording grids implanted in area V4 (Sanayei et al., 2018) of both monkeys. In that V4 study, the stimuli (Gabor gratings, σ = 4°, SF = 2 cycles per degree [cyc/°], orientation = 90°, i.e. vertical) were presented at an azimuth of – 5° and an elevation of – 16° in both monkeys (left and below relative to the fixation point). Relevant behavioural and neuronal data have been reported previously (Chen et al., 2013; Sanayei et al., 2018).

After having concluded the V4 study, we started the non-roving V1 study with the full range of contrasts, 4 days later in monkey 1 and 1 day later (i.e. the next day) in monkey 2. Behavioural and neuronal data reported in the current manuscript were obtained with stimuli located more centrally and using sinusoidal rather than Gabor gratings. In monkey 1, the stimuli were presented in the lower left quadrant at an eccentricity of – 3.5° azimuth and – 3° elevation (stimulus diameter 3°, SF 2 cyc/°, [4 cyc/° during roving tasks]). In monkey 2, stimuli were at an eccentricity of – 0.7° azimuth and – 1.3° elevation (stimulus diameter 0.75°, SF 4 cyc/°). The stimulus locations were matched to the receptive field locations (supplementary Figure S1) covered by the Utah array electrodes.

Critically, the stimulus locations covered in the first (V4) and in this (V1) study had no overlap, and the minimal border distance between the stimuli that covered V4 RFs and those that covered V1 RFs was 3.58° in monkey 1 and 7.94° in monkey 2.

We initially assessed perceptual learning using a fixed sample contrast of 30% Michelson contrast and 14 different test contrasts (these were: 5, 10, 15, 20, 22, 25, 28, 32, 35, 40, 45, 50, 60, and 90% Michelson contrast). This is referred to as the non-roving condition. These sessions were followed by sessions where we employed stimulus roving, i.e. where the sample contrast could be 20, 30, or 40% Michelson contrast on any given trial (Chen et al., 2014). Each sample stimulus was followed by a test stimulus with a contrast chosen from 12 possible contrasts. For a sample of 20% contrast, the possible test contrasts were 5, 10, 12, 15, 18, 22, 25, 28, 35, 45, 60, or 90% Michelson contrast. For a sample of 30%, the possible test contrasts were 5, 10, 15, 22, 25, 28, 32, 35, 38, 45, 60, or 90% Michelson contrast. For a sample of 40%, the possible test contrasts were 5, 10, 15, 25, 32, 35, 38, 42, 45, 50, 60, or 90% Michelson contrast.

After perceptual learning under roving-without-flanker conditions had been assessed, we determined how performance and perceptual learning were affected by adding flanking stimuli. Here, the centre grating stimuli were identical to those used in the roving-without-flanker conditions (described above), but additional gratings were displayed collinearly, immediately above and below the vertically oriented sample and test stimuli, forming a column of three gratings, positioned edge to edge. The flanker stimuli had the same size, SF, and orientation as the sample and test stimuli. The contrast of the flankers was constant at 30% Michelson contrast (Chen et al., 2014).

Each trial was initiated by monkeys touching a touch bar and fixating a fixation spot (diameter = 0.1°, fixation window = 2° by 2°) presented on a grey background (52.17 cd/m2). Five hundred thirty-nine millisecond after fixation onset, a vertically oriented sinusoidal stimulus centred at the V1 receptive field coordinates was presented for 512 ms (this was flanked by above mentioned flankers in the flanker experiment). The sample contrast was as described above. This was followed by a 512-ms inter-stimulus interval (with only the fixation point present). Thereafter, a test stimulus was presented for 512 ms. The test was identical in size and orientation to the sample stimulus, but differed in contrast (chosen pseudo-randomly from 14 different contrasts for the non-roving conditions and from 12 different contrasts for the different roving conditions, as described above). After test offset, another blank period of 512 ms with only the fixation point occurred, followed by the appearance of two target squares (one black, one white, size = 0.5°) located to the left and right of the previous sample and test location, which also was the cue for the monkey to indicate whether the test had a higher or lower contrast than the sample stimulus. The monkeys had to make a saccade to the white square (within a 2° by 2° window) if the test stimulus had a higher contrast than the sample stimulus and to the black square if the test stimulus had a lower contrast than the sample. A correct saccade resulted in a fluid reward, while an incorrect saccade resulted in no reward and a 0.2-s timeout. If the monkey broke fixation before saccade cue onset or failed to respond within 1000 ms of the onset of the saccade cue, the trial was terminated immediately, followed by a 0.2-s timeout. In order to motivate subjects to complete each trial and discourage them from guessing on difficult trials, stimulus drumming was implemented using the ‘repetition with delay’ function on CORTEX following error trials, i.e. enforcing the repeated presentation of a stimulus condition, until a minimum number of correct trials was accrued. Recording began simultaneously with the first day of training.

Spontaneous Activity Level Matching Across Sessions

We employed procedures to achieve approximately constant spontaneous activity across session as previously described and justified in detail in Sanayei et al. (2018). Briefly, we implemented an automated threshold for spike extraction using a Matlab routine, where we selected a target level of spontaneous activity (used as a reference across sessions). This was based on a session with ‘medium’ signal quality (i.e. with an ‘average’ SNR (see below) compared to other sessions) and with satisfactory stimulus-induced responses. The level of spontaneous activity obtained during this session was taken as the ‘target’ level of spontaneous activity across all sessions (rt), for that particular channel. We then employed an iterative staircase procedure to arrive (for each session and channel) at a level of spontaneous activity (rs) that lay within 1% of the target value.

Signal-to-Noise Ratio Calculation

The signal-to-noise ratio (SNR) was calculated for each channel on each day. The SNR was calculated as

$${\text{SNR}}=\frac{{{\text{mean}}}_{\mathrm{stimulus\;activity}}-{{\text{mean}}}_{{\text{spontaneous}}}}{{{\text{SD}}}_{{\text{spontaneous}}}}$$
(1)

where by the mean stimulus activity was obtained from 30 to 542 ms after test onset, while the mean spontaneous activity was obtained during the 512-ms period before test onset with a 30-ms offset for the time window to account for latencies. SD is the standard deviation of the mean response. This was calculated for each test contrast condition, yielding fourteen SNR values per recording session for a given channel. Trials were included regardless of whether the subject’s response was correct. The size of the SNR varied depending on the test contrast. The highest of the fourteen SNR values was then taken as being representative of the signal quality from a given channel for each session during non-roving sessions. During roving sessions, the SNR values obtained upon presentation of the highest possible test contrast (on trials with a 20% sample contrast) were taken as being representative of the signal quality from a given channel for each session. SNR values for different channels and training days are shown in Supplementary Figures S2 and S3.

We performed analysis of the data in 2 ways: (1) Channels were included in the individual channel analyses if they had daily SNR ≥ 1, on at least 80% of the total number of recording days. This resulted in 15 channels being included in the non-roving analysis from monkey 1 and 25 channels from monkey 2. For the roving-without-flanker sessions, nine channels were included for monkey 1 and also for monkey 2. For the roving-with-flanker sessions, nine channels were included for monkey 1 and 25 channels for monkey 2. (2) Alternatively, we included all channels in the analysis, irrespective of their SNR. Overall, we did not observe any qualitative differences in our results between the two approaches, and we report the approach with channel exclusion (SNR-based approach) here. The reason for this approach is that channels with very poor signal-to-noise ratio throughout could add noise to the data and conceal perceptual learning effects at the population level. The safeguard of SNR ≥ 1, on at least 80% of the total number of recording days, ensures that channels which attain contrast tuning (or contrast responses) through learning will still be detected and included.

Determination of Analysis Time Window

The results reported in this paper are based on the analysis of multi-unit spiking activity. We analysed data in two ways: (1) using the entire stimulus period (i.e. a time window of 512 ms shifted by 30 ms to account for the latency of the neuronal response). (2) We determined the period of neural activity that encoded the most information about stimulus contrast (discriminability) and used that time window for all analyses. The time window was determined, by performing an ‘area under the receiver operating characteristics’ (AUROC) ‘ideal observer’ discrimination analysis using a sliding time window over the test period as described previously (Sanayei et al., 2018). To avoid biases in the assessment of how learning affects the discriminability of single channels, we used the summed activity from all channels for this analysis. Furthermore, to avoid biases due to possible differences between sessions, results from these exploratory analyses were considered only after averaging activity across all experimental sessions, without any distinction between early or late sessions. Discriminability varied over the 512-ms interval in both animals. In both monkeys, maximal stimulus discriminability values occurred shortly after stimulus onset and decayed sharply after the transient response to stimulus onset; i.e. a window of 128 ms yielded the best discriminability. Conversely, choice ‘discriminability’ (or choice probability) was shifted towards later time periods in monkey 2, where significant choice probability occurred for time periods during the last 256 ms of the analysis period. In monkey 1, choice probability was low throughout the analysis period. Given these differences in stimulus and CP discriminability periods presented in data here, we used a single unified window that spanned the entire 512-ms presentation period. We performed control analyses using just the 128-ms period after stimulus onset (shifted by 30 ms to account for response latency) for stimulus discriminability, but the results were qualitatively the same and quantitively almost indistinguishable.

Contrast Response Function (Neurometric Functions)

To estimate neurometric functions, which give an indication of stimulus discriminability, we calculated the area under the receiver operating characteristic (AUROC), using the responses that occurred during a given sample (i.e. separate neurometric functions for each sample contrast) and each test presentation period (30 ms after stimulus onset to 542 ms after stimulus onset) for each recording session (see above for control analysis time windows using only the initial 128 ms of the response window). These AUROC values were then fitted with a four-parameter Weibull function using maximum likelihood estimation (MLE), according to the following formula:

$$y=\delta -\left(\gamma {e}^{{-\left(\frac{x}{\alpha }\right)}^{\beta }}\right)$$
(2)

where y is the AUROC value; x is the contrast of the test stimulus; α is the contrast at which the neurometric function is at 63% of its range; the shape exponent β modulates the slope at threshold; γ is the range; and δ is the maximum (fitted) AUROC value reached by the neurometric function.

We calculated the slope at 30% contrast as

$${\text{Slope}} {\text{ }} {\text{at}} {\text{ }} 30\%=\frac{d[\delta -\left(\gamma {e}^{{-\left(\frac{x}{\alpha }\right)}^{\beta }}\right)]}{dx}=[ \frac{\beta \gamma {(\frac{30}{\alpha })}^{\beta }{e}^{-{(\frac{30}{\alpha })}^{\beta }}}{30}]$$
(3)

We also determined the point of neuronal equality (PNE) for each channel and training day, i.e. the point where neuronal responses to sample and test contrasts were indistinguishable (AUROC = 0.5). During a subset of sessions for some channels, the range spanned by the AUROC values did not include the value of 0.5 (i.e. the fitted neurometric curve was located entirely within either the upper or lower half of the range spanned by the y-axis); thus, the PNE could not be calculated for these sessions. For days in which PNEs could not be calculated for certain channels, the averages were calculated across those channels for which PNEs could be calculated.

Additionally, we assessed contrast tuning by fitting a Naka-Rushton function to the single-channel response data (spikes/second) of each session. The Naka-Rushton fit yielded the following: (1) the slope of the tangent to the best-fitted Naka-Rushton function at a contrast level of 30% (the sample contrast). The steeper the slope at (and around) 30% contrast, the better the channel was at discriminating between stimuli with contrasts close to the sample contrast (the categorization boundary); (2) C50, the contrast that elicited a response of half the response range; and (3) the minimum and (4) maximum values of the best-fitted Naka-Rushton function.

Calculation of Point of Neuronal Equality (PNE) Changes at the Population Level

On some channels, the PNE was > 30% at the start of learning. On other channels, the PNE was < 30% at the start of learning. Hence, to examine whether the PNE changed with learning at the population level, we calculated the absolute value of the difference between PNE and 30% contrast. By using the absolute value of the difference, we were able to combine the two groups of channels (those with PNE > 30% at the start of learning and those with PNE < 30% at the start of learning) and investigate whether PNEs shifted systematically towards the sample contrast with learning, irrespective of their starting position.

Sample-Test Discriminability

To analyse how well channels discriminated between sample and test stimuli, we calculated AUROC values for each sample-test contrast pair (see above) and determined whether they systematically changed with learning. Specifically, we would expect the AUROC values for test contrasts that were higher than the sample stimulus to increase with learning and those for test contrasts that were lower than the sample contrast to decrease with learning.

Choice Probability

Choice probabilities (CP) were monitored over the course of training to assess the degree to which neuronal activity reflected the identity of monkey’s chosen target (is correlated with the choice). Levels of spiking activity for a given test stimulus were categorized according to whether the subject made a saccade to the black or to the white target; i.e. they were conditioned upon the monkey’s choice. This yielded two activity distributions for each test stimulus. CPs were calculated from these two distributions using the AUROC approach. This was done for the challenging test contrast conditions (e.g. for 22, 25, 28, 32, 35, and 38% when a 30% sample contrast was presented). For each channel, the mean CP (for a given test contrast) was calculated for early and late sessions (the first and last 5 days of training, respectively). A mixed-model two-way RM-ANOVA was performed to determine whether CPs changed significantly with training days (early versus late sessions, factor 1) and test contrast (factor 2). In addition, we calculated CP differences for contrasts on opposite sides of the categorization boundary (e.g. CP for 40% minus CP for 20% test contrast stimuli, 38–22% test contrast) and pooled these for each channel and recording day. We then determined whether CP difference distributions were significantly different between early (first 5 days of training) and late sessions (last 5 days of training) using a Wilcoxon signed-rank test.

Grouping Data During the Roving Tasks

The roving tasks (without or with flankers) yielded many contrast differences between samples and test stimuli (36 in total, i.e. three different sample contrasts with 12 test contrasts each). Additionally, absolute contrast differences varied between sample contrasts. Assessment and visualization of these data in a comprehensive manner across training days were difficult; hence, after initial analysis of individual conditions, we decided to group the data according to assumed task difficulty, using Weber fractions (Weber fraction = test/sample contrast) as a grouping mechanism (pooling was done after initial CP and neurometric value calculation). Specifically, for each sample-test contrast, we calculated the Weber fraction and used eight groups in total. Groups consisted of the following Weber fractions: 0–0.5, > 0.5–0.75; > 0.75–0.875; > 0.875–1; > 1–1.11; > 1.11–1.25; > 1.25–1.5; and > 1.5. This grouping ensured that for almost every sample contrast, we obtained at least one sample-test fraction pair in each group. This allowed comparison of potential effects within and across sample contrasts. We then averaged Weber fractions within each group that had the same sample contrast. To assess changes over training, we calculated the non-parametric correlation coefficient (Spearman’s rho) between training days and AUROC values for neurometric and choice probability analyses. We used an FDR correction (Benjamini & Hochberg, 1995) to account for multiple comparisons.

Noise and Stimulus Correlation Analysis

Noise correlations were calculated separately for each stimulus contrast and recording day. We calculated the Pearson correlation of firing rates between two channels given a specific test stimulus on each training day. Noise correlation values were then Fisher z-transformed (separately for each channel pair and for each test contrast) and finally averaged across the first and last 5 days of training. To determine whether noise correlations changed with learning, we performed a mixed-model two-factor RM-ANOVA, with contrast and training period as main factors. Stimulus correlations were calculated using mean contrast-dependent firing rates for each neuron.

Fisher Information Analysis

We used a recently published method and algorithms (Kanitscheider, Coen-Cagli, Kohn, Pouget, et al., 2015) to calculate the Fisher information in single channels and in populations of simultaneously recorded channels (Kanitscheider et al., 2015a, 2015b). We estimated the information present when comparing 28–32% contrast, 25–35% contrast, 22–38% contrast, etc. The derivative to calculate the Fisher information for, e.g. 28 to 32% contrast, is thus delta = 4% contrast (see (Kanitscheider et al., 2015a, 2015b; Kanitscheider, Coen-Cagli, Kohn, Pouget, et al., 2015) for details). For 25–35% contrast, the delta = 10% contrast (and so on forth). This is analogous to the methods described by Kanitscheider et al. (Kanitscheider et al., 2015a, 2015b), but it is converted from the orientation domain to the contrast domain. In the orientation domain used by Kanitscheider et al. (Kanitscheider et al., 2015a, 2015b), the Fisher information was scaled by the orientation difference (maxD = pi). We have used an analogous system where we assume that 50% contrast difference is equal to maxD = pi; i.e. a 4% contrast difference would equate to (pi/50)*4. Note that even if this conversion is not equivalent as contrast data are not circular (while orientation data are), it does not affect the conclusions from our study. This is because absolute values of information were of little interest here; rather, our objective was to examine whether learning altered the information encoded for a fixed contrast difference. To calculate the information that a given channel (or channel population) encoded in the first (or last) 5 days of training, the trials from a given channel and a given contrast pair for all 5 days were concatenated, as if they had been recorded in a single session. We included trials with correct decisions in this analysis. The analysis required equal numbers of trials between the two stimulus conditions, but the number of trials was not equal as the animal stopped working at unpredictable times on individual days. We therefore used the lower number of trials available for a given test contrast pair on a given training day and discarded the excess trials. This approach yielded between 168 (minimum) and 335 (maximum) trials for each channel, test contrast comparison, and monkey (monkey 1: n = 168–236; monkey 2: n = 233–335).

The information encoded by differently sized (neuronal) populations was calculated by using the approach described above to concatenate trials from different recording channels and then calculate the information in a population of size x (i.e. number of channels) with channel and trial identity retained. To identify the extent to which correlated activity reduced the information present in a population, we calculated the activity when trials were shuffled, using the algorithms provided by (Kanitscheider et al., (2015a, 2015b).

Significance of Noise vs. Signal Correlation Regression Slope Changes

We performed a permutation test to determine whether the slopes (of an intermediate linear regression; intermediate as both variables were dependent variables) found for the late period were significantly different from the slopes during the early period for channel pairs with positive signal correlations. To do so, we joined the early and late distributions of the signal and of the noise correlations for the respective channel samples (separated according to their information content, see ‘Results’). We then drew 1000 random samples (with a sample size which equalled the sample size for the late distributions) from that joint distribution and calculated the slope for each of these. If the original slope from the late training period fell outside the 95% range of the slopes from the joint distributions, it was deemed significantly different to the slope from the early distribution.

Results

Task

Two monkeys performed three versions of a two-alternative forced choice (2-AFC) task (Chen et al., 2013, 2014), where they discriminated whether a test stimulus had a higher or lower contrast than a preceding sample stimulus. The three different versions of the task are referred to here as (1) non-roving, (2) roving-without-flanker, and (3) roving-with-flankers.

Non-roving Task

Here, the sample stimulus contrast was fixed at 30%. The test stimulus contrast varied between 5 and 90% contrast in 14 steps (5, 10, 15, 20, 22, 25, 28, 32, 35, 40, 45, 50, 60, and 90% contrast). Sample and test stimuli were each presented for 512 ms, with a delay of 512 ms between stimuli (Methods and Fig. 1 for details). Monkeys indicated whether the test stimulus had higher or lower contrast by making a saccade to one of two targets appearing 512 ms after test offset (Fig. 1 for a task sketch and timeline). Sample and test stimuli were presented in the same visual field location, which covered the aggregate receptive fields (RFs) of the channels recorded (Supplementary Figure S1; additional details see Methods).

Roving-Without-Flanker Task

Here the contrast of the sample stimulus was not fixed at 30%, but could take on one of three values (20, 30 or 40%) on a given trial (pseudo-randomly). The test stimulus took on one of 12 possible contrasts, depending on the sample contrast (20% sample: [5, 10, 12, 15, 18, 22, 25, 28, 35, 45, 60, 90% test]; 30% sample: [5, 10, 15, 22, 25, 28, 32, 35, 38, 45, 60, 90% test]; 40% sample: [5, 10, 15, 25, 32, 35, 38, 42, 45, 50, 60, 90% test]), yielding 36 conditions in total.

Roving with Flankers

All basic grating parameters were identical to those in the roving-without-flanker task, but additional flanker gratings were displayed collinearly immediately above and below the central sample and test stimuli, forming a column of three gratings, positioned edge to edge. The flanker stimuli were identical to the sample and test stimuli in terms of size, SF, contrast, and orientation.

Grating stimuli were centred at parafoveal locations in the visual field at an eccentricity of 4.6° (azimuth − 3.5°, elevation − 3°) and 1.5° (azimuth − 1.3°, elevation − 0.7°) for monkeys 1 and 2, respectively. The gratings were vertically oriented; the SF was 4 cyc//° in both monkeys; and the diameter was 3° in monkey 1 and 0.75° in monkey 2. The stimulus size was chosen based on stimulus eccentricity and RF size, hence the difference in stimulus size between the two monkeys.

Data Set and Analyses

Spiking activity was obtained from chronically implanted Utah arrays (Methods). We refer to small multi-unit neuronal clusters, recorded from a given electrode, as ‘channels’. We recorded from 22 and 25 channels in monkeys 1 and 2, respectively. These yielded good responses (signal-to-noise ratio (SNR) > 1) on more than 80% of the recording days (‘Methods’). To obtain comparable activity levels across sessions, we carried out matching of baseline activity across sessions for MUA data (‘Methods’). We performed analyses where we included either all channels and only good SNR channels (‘Methods’). These two approaches yielded similar results overall; hence, we report the results where we excluded channels with poor SNR.

For all the main analyses, we used a 512-ms analysis window, from 30 ms after stimulus onset to 30 ms after stimulus offset. Note that we also applied an approach in which we used the window length that contained the maximum amount of information (see ‘Methods’) using a 128-ms window starting 30 ms after stimulus onset (to account for response latency), but this approach did not result in notable differences in the overall results. Hence, we used a fixed time window across all analyses reported here.

Behavioural Data from Non-roving, Roving-Without-Flanker, and Roving-with-Flanker Conditions

Before we present neuronal and associated behavioural data obtained under learning of non-roving, roving-without-flanker, and roving-with flanker conditions, we provide an overview of the large-scale behavioural changes seen across the three task conditions. These changes provide context to what can (or cannot) be expected in terms of neuronal changes and the discussion that follows later. For comparison, we also provide the behavioural data that were obtained under non-roving conditions when V4 neurons were recorded (previously published Chen et al., 2013, 2014; Sanayei et al., 2018)). We first calculated a single overall hit rate as a function of training days. This was done for matched test contrasts (i.e. only using the test contrasts that were used in both data sets) for the V1 and V4 non-roving sessions. Matched test contrasts were used as otherwise differences in task difficulty could account for possible performance differences. For roving conditions, we calculated a single performance measure across all sample/test contrasts, as this best gives an indication of whether true learning occurred or whether trade-offs did occur (e.g. improvements for some conditions counterbalanced by deteriorations for other conditions). The results are shown in Fig. 2A. Under non-roving conditions, performance improved with training in both monkeys. This was the case for the V1 and the V4 data set. For both data sets, it appears that the range of learning was slightly larger in monkey 2 than in monkey 1 (compare the differences between performance at the start and the end of learning). Under roving-without-flanker conditions, neither of the animals showed clear signs of performance improvement; indeed, performance remained below that attained at the end of the non-roving conditions, testament to the difficulty of adjusting a categorization boundary on a trial-by-trial basis. Learning did occur again under the roving-with-flanker conditions, whereby both monkeys started at a performance level that was below the level attained under roving-without-flanker conditions. Monkey 1 then improved to a level that matched the non-roving conditions (and exceeded the roving-without-flanker performance), while monkey 2 only reached the performance under roving-without-flanker, but not the performance under non-roving conditions. Thus, while introduction of flankers did aid learning, performance under roving-with-flanker never exceeded performance under non-roving conditions in either monkey. To summarize, (A) learning under non-roving conditions appeared to be slightly larger in monkey 2 than monkey 1, (B) under roving without flankers, no performance improvements occurred, and (C) performance improvements under roving with flankers did not exceed performance under non-roving, suggesting that overall sensitivity did not exceed sensitivity previously attained under non-roving conditions. From these behavioural data, we predict that if V1 neurons show signatures of perceptual learning, these would be present under non-roving conditions, and they would be slightly larger in monkey 2, while under roving conditions, neuronal changes would be largely absent in both animals.

Fig. 2
figure 2

A Behavioural performance under non-roving (V1 and V4), roving-without-flanker, and roving-with flanker conditions. Average behavioural performance (probability of a correct decision, y-axis) across matched test contrasts for V1 and V4 non-roving conditions and across all test contrasts (roving conditions) across training days (x-axis). V1 non-roving, roving-without-flanker, and roving-with flanker conditions are indicated by magenta, black, and blue colours respectively. Data from the V4 sessions (peripheral stimulus locations) are added for comparison (red). R-values and p-values for correlations between behavioural performance and training day are given as insets. B Hit rate for the six easiest test contrasts under non-roving conditions as training progressed when recordings were performed at the V4 RFs (peripheral locations), which was followed by training and recording at the V1 RF (parafoveal) locations. Test contrasts are colour coded and values are given as insets. C Performance under non-roving conditions for matched test contrasts on the first and last day of training for V4 sites (light blue, magenta, open diamonds), for the first day of training at V1 sites (light blue filled circles), and for the last day of training at V1 sites (magenta, filled circles)

Figure 2B shows performance for the six ‘easiest’ test contrasts as learning progressed when stimuli were placed at the more peripheral locations (V4 recording sites) and when they were placed at the parafoveal (V1 recording site RF) locations. Note that in both monkeys, training at the V1 sites followed training at the V4 sites immediately (3-day gap in monkey 1, next day in monkey 2). The data show that for the easy test contrasts, performance initially dropped when the stimulus location changed, arguing against (full) transfer across retinotopic locations even for the easy conditions. Figure 2C shows the performance ‘drop’ for matched contrast conditions when training was moved from peripheral (V4) to parafoveal (V1) locations. Performance dropped for almost all matched test contrasts, suggesting that there were no test contrasts for which transfer between sites occurred. Following training, it recovered to levels similar to those attained at the V4 locations.

Non-roving Data: Neurometric Contrast Response Functions

To calculate neurometric functions and neuronal discriminability, we performed ‘area under the receiver operating characteristic’ (AUROC) analyses.

To detect changes in neurometric functions, we monitored the point of neuronal equality (PNE, ‘Methods’), which is the point where activity levels elicited by the sample and test stimuli were identical (AUROC = 0.5). Changes in the slope of the neurometric function at 30% contrast as well as changes in the PNE of an example channel are shown in Fig. 3A–C. The neurometric function became shallower (at 30% contrast) over the course of training (Fig. 3B). Moreover, the PNE shifted away from the value of 30% with training (Fig. 3C). The example shown in Fig. 3 reflects the pattern seen across the population in monkey 2.

Fig. 3
figure 3

Neurometric changes with learning for an example channel. A Single-channel neurometric functions and their changes with learning (earlier sessions in blue and later sessions in purple). Vertical lines show the point of neuronal equality (PNE) for each recording day. B Slope of the neurometric function at 30% (the sample contrast). C Change in the PNE with learning

To calculate whether the parameters of our fitting functions changed over time with training, we calculated Spearman rank correlations for average parameter values across channels (n = 15 for monkey 1 and 25 for monkey 2) for each session (n = 17 sessions for monkey 1; n = 22 sessions for monkey 2). None of the parameters of the neurometric function changed systematically with training in monkey 1 (see Fig. 3 for details and statistics). In monkey 2, the slope of the neurometric function at 30% contrast decreased significantly (Fig. 4, Spearman’s rank correlation, p < 0.001). Contrary to our expectations, the PNE shifted away from the sample contrast in monkey 2 (away from 30%, Spearman’s rank correlation monkey 2: p = 0.007). The exponent β of the Weibull function increased in monkey 2 (Fig. 4, Spearman’s p < 0.001). Contrast tuning assessed using a Naka-Rushton function is presented in the supplementary materials (Supplementary Figure S4). Overall, these data suggest that neuronal changes in V1 either show no systematic change (monkey 1) or show changes that are not in line with an increased sensitivity at the sample contrast (monkey 2).

Fig. 4
figure 4

Learning-induced changes in selected parameters of the neurometric function. Changes in the contrast at which the neurometric function reached 63% of its range; the slope of the neurometric function at 30% contrast; and point of neuronal equality (PNE, relative to the sample contrast) of the neurometric function. Insets show the Spearman rank correlation coefficients (r) and the p-value of the parameter of interest (dependent variable) vs. recording day (independent variable). Averages across channels are displayed along with error bars show (S.E.M) (n = 15 and n = 25 for each recording day for monkeys 1 and 2 respectively)

The results differ from those obtained in area V4 of the same monkeys (Sanayei et al., 2018). Note that the spatial location of the sample and test stimuli was different from that used to assess the effects of perceptual learning that was previously reported in area V4; hence, training had not occurred specifically for the locations used in the current study in V1.

Changes in Test-Sample Neuronal Discriminability with Learning

Behavioural changes with learning for the most difficult contrasts occurred in both monkeys (Fig. 5A). We defined the first 5 days as being ‘early’ sessions and the last 5 days as ‘late’ and determined whether performance for the six most difficult contrasts changed significantly between early and late sessions using a two-factor ANOVA (factor 1: time, factor 2: contrast). Both factors changed significantly in both monkeys and there was also an interaction between the factors (see insets in Fig. 5A for F- and p-values). For this analysis, we used values obtained on individual days, not those obtained by averaging data across three consecutive recording days. This approach ensured independence of samples and was applied to all statistical tests performed throughout the paper.

Fig. 5
figure 5

Changes in discriminability at behavioural and neuronal levels. A Average proportion of reports that the test contrast was higher than the sample contrast, as training progressed. For test contrasts higher than the sample contrast (yellow and red colours), the proportion increased, while for test contrasts lower than the sample contrast (blue colours), the proportion decreased, indicating improved performance across all conditions. Insets at the bottom indicate F- and p-values from an ANOVA, indicating that performance depended on contrast, training day, and an interaction between contrast and training day. B Neuronal discriminability (AUROC) for sample-test contrast as a function of learning. Error bars show S.E.Ms. C Distribution of discriminability difference for the three most difficult sample-test contrast comparison pairs (e.g. AUROC values for comparisons between 28% versus 32%, 25% versus 35%, and 22% versus 38%) for the first (blue) and last (magenta) 5 days of learning across all channels recorded. Darker red regions show overlap of the two distributions. Insets display the mean and S.E.M of the two distributions. p-values indicate whether distributions differed significantly. Performance and discriminability for each data point were averaged over three consecutive days; i.e. error bars in A and B denote S.E.M of performance (AUROC) averaged across 3 days (thus, the number of data points is the total number of recording days minus 2)

Learning-induced changes of neuronal discriminability were quantified using signal detection theory approaches (AUROC) comparing sample- and test-evoked activity (e.g. the difference between 30 and 28% contrast), for each day and channel. Values of AUROC that deviated from 0.5 indicated higher discriminability (0.5 corresponded to chance level; 0 and 1 indicate perfect discriminability). The 14 different test contrasts yielded 14 groups of AUROC values for each recording session. We focus on the six contrast levels that were closest to the sample contrast, namely the three contrasts just above (32, 35, and 38% contrast) and just below (22, 25, and 28% contrast) the sample contrast, as these were the most difficult discriminations, with clear changes in behavioural performance (Fig. 5A). The average AUROCs for these contrasts as a function of learning are shown in Fig. 5B. In both monkeys, the data suggest that AUROC differences (between lower and higher test contrasts) increased with learning; i.e. AUROCs on the two sides of the categorization boundary became more separated. To quantify this, we calculated AUROC differences between 22 and 38%, 25 and 35%, and 28 and 32% test contrasts for the first and last 5 days of training. We then averaged those three difference values for each training day. The difference distributions for these two training periods are shown in Fig. 5C. Training significantly increased the differences in both monkeys (monkey 1: p = 0.02; monkey 2: p < 0.001, two-sided Wilcoxon signed-rank test). Thus, there were changes in behavioural and neuronal discriminability in V1 neurons, even though these were not unequivocally apparent when analysing neurometric response functions (Fig. 4).

Choice Probability Analysis

To determine whether training affected the degree to which the monkeys’ upcoming decision was reflected in the neuronal responses (in neutral terms: whether the two were correlated), we computed choice probabilities (CP, see ‘Methods’ for details). This was done for each channel as a function of time after training onset (Fig. 6A, with a 3-day running average). Calculations of CP required a sufficient number of incorrect as well as correct trials; hence, this analysis focused on data obtained from the six most demanding test contrast conditions. CPs closer to 0 corresponded to the selection of the ‘lower test contrast’ target, while CPs closer to 1 corresponded to the selection of the ‘higher test contrast’ target. If neuronal activity in our target areas became more effective in influencing the animal’s upcoming decision (or if the readout of sensory information improved), then CP values for test contrasts of less than 30% should have decreased over the course of training, while CP values for test contrasts of more than 30% should have increased.

Fig. 6
figure 6

Choice probability and noise correlations as a function of learning. A Choice probability as a function of learning for both monkeys for different test contrasts (colour coded and displayed as insets). Averages across channels are displayed. Left column for each monkey: Choice probability for test activity levels (separately for the three hardest contrast levels below and above sample contrast, respectively). Insets show CP difference (e.g. CP at 40% minus CP at 20%) distributions for the first 5 days of learning (light blue histograms) and the last 5 days of learning (magenta histograms). P-values for differences between the distributions are shown next to the histogram plots. Data are averaged over three consecutive days, i.e. number of data points = recording days minus 2. Error bars denote S.E.M. B Left subplot: average noise correlations between channel pairs for the different test contrasts during the first (light blue) and last 5 days (magenta) of learning. Right subplot: distribution of noise correlation across all test contrasts during the first (light blue) and last 5 days (magenta) of learning. Vertical bars indicate sample means; p-values indicate whether noise correlations differed between early and late training stages

To determine whether training significantly affected the CP distributions, CPs were calculated separately across the first and last 5 days for each recording channel and each monkey. A two-way ANOVA was performed, with training period (early or late) and test contrast as factors. In both monkeys, significant main effects of contrast occurred, and a significant interaction between period and contrast occurred (monkey 1: test contrast: F(5, 888) = 15.3, p < 0.001; training period: F(1, 888) = 0.2, p = 0.696; interaction: F(5, 888) = 3.7, p = 0.002; monkey 2: test contrast F(5, 1488) = 58.8, p < 0.001; training period: F(1, 1488) = 11.1, p < 0.001; interaction: F(5, 1488) = 27.2, p < 0.001).

We then calculated the CP difference for contrast pairs (38–22%, 35–25%, 32–28%) for each channel, averaged those differences for each recording day, and determined whether difference distributions over the first 5 days vs. over the last 5 days of training were significantly affected by training (Wilcoxon signed-rank test). In both monkeys, CP differences significantly increased with training (Fig. 6, p < 0.001).

Population Coding Analyses

Thus far, we analysed information content in individual recording channels. We next examined how the information present at the population level changed with learning. Changes in information across the population could have been due to changes in single-channel coding (see above), but may also be due to changes in the correlation structure (noise correlations) of simultaneously active channels. In line with the analysis performed on V4 data under identical task conditions (Sanayei et al., 2018), we first examined whether the information that was encoded by a single channel regarding the stimulus changed with learning and whether this depended on its coding abilities at the start of training. We then analysed information encoded by the population and associated changes. In monkey 1, information encoded was limited overall and changed very little with learning (Supplementary Figure S5). In monkey 2, the information content was much higher (compared to monkey 1) at the start of learning and increased with learning (Supplementary Figure S5).

Which channels improved most with learning? Those with large information content at the start of learning, or those with relatively little information content? To investigate this, we examined whether the amount of information encoded for a specific contrast pair was correlated with the information encoded for a different contrast pair between early and late training periods (Supplementary information and Supplementary Figure S6 A-C). We then analysed the correlation between information values during early training and the proportional gain in information that was obtained with learning (the proportional information gain was defined as the difference in information between late and early training, normalized by the information encoded in early training). If information increases were proportional across all channels, we would find no correlation. If the channels containing the lowest amount of information gained proportionally the most during learning, then this correlation would be negative (and similarly, if channels containing the highest amount of information showed the least gains, this correlation would be positive). We found generally negative correlations for all contrast pairs (Supplementary Figure S6 D, for associated p-values, see inset in Figure S6 D). Neurons with relatively small discrimination power for small contrast differences gained proportionally more discrimination power, while already-selective neurons showed proportionally lower gains in selectivity. Thus, learning increased the number of neurons carrying useful information about difficult contrast differences, thereby increasing the size of the population that could contribute to solving the task.

Changes in Noise Correlations with Learning

Noise correlations were calculated for each contrast for the first five 5 of training and for the last 5 days of training, for each channel combination (see ‘Methods’ for details). Noise correlations (when averaged across contrasts) increased with learning in monkey 1 and decreased with learning in monkey 2 (Fig. 6B, rank-sum test). Thus, unlike in area V4 (Sanayei et al., 2018), noise correlations did not systematically decrease with perceptual learning in macaque V1.

Noise correlations affect coding abilities of neuronal populations (Abbott & Dayan, 1999; Panzeri et al., 1999; Pola et al., 2003). Thus, the decrease in correlations with learning in monkey 2 could improve population coding abilities beyond the single-channel discriminability increase described. Conversely, in monkey 1, where single-channel coding abilities did not increase notably, noise correlations actually increased and could possibly have been detrimental to population coding. To investigate these possibilities and allow for a comprehensive comparison to previously published V4 data from the same animals, we examined the amount of information encoded as a function of population size when we retained noise correlations (by analysing simultaneous responses) and when we removed correlations (by analysing shuffled population responses). We analysed linear Fisher information about test contrast as a function of population size, increasing the population one channel at a time (see ‘Methods’ for details). In monkey 1, population information coding was larger for difficult contrasts during early sessions, and the difference between unshuffled and shuffled population responses was relatively small (Supplementary Figure S7). For easy contrast conditions, learning increased population coding information and the difference between shuffled and unshuffled coding was generally similar (Supplementary Figure S7). Thus, the increased noise correlation in V1 appeared to reduce population coding abilities. In monkey 2, population-encoded information increased strongly with training for difficult contrasts but showed little difference for easy contrasts. Generally, the shuffled population encoded a lot more information, than the population where noise correlations were retained. This showed that the reduction in noise correlations in monkey 2 was not sufficient to remove ‘detrimental’ correlations; however, the differences between shuffled and unshuffled information coding might have been larger had the noise correlation reduction not occurred.

To further determine how changes in noise correlations affected population coding abilities, we calculated the slope between signal and noise correlations for early and late learning periods. A shallower slope enables neuronal populations to encode more information (Gu et al., 2011; Minces et al., 2017). The slope between noise and signal correlation was calculated separately for channel pairs where both channels were part of a less sensitive population (bottom third of information-coding channels), or where both channels were part of a more sensitive population (top third of information-coding channels). The slope between signal and noise correlations was not significantly affected by training, irrespective of information content or animal (all p > 0.05, two-sided permutation test, Supplementary Figure S8).

In sum, training did not systematically change noise correlations in V1 across monkeys, and it did not change the relationship between signal and noise correlations. Thus, changes to the correlation structure of neurons in V1 did not systematically increase encoding abilities of neuronal populations.

Roving-Without-Flanker Data

Given the absence of overall behavioural improvements under roving-without-flanker conditions (Fig. 2), we did not expect to see changes in neuronal tuning in V1. Notably, overall performance dropped to levels below those attained under non-roving conditions, but this drop is probably a reflection of increased categorization difficulty, not in contrast discrimination difficulty. Previous studies have argued that adjustments to categorization boundaries are reflected in higher cortical areas, not in low/mid-level sensory areas (Freedman & Assad, 2006; Freedman & Miller, 2008). We therefore also did not expect to see adjustments to categorization boundaries reflected in V1 activity.

Neurometric Contrast Response Functions Under Roving-Without-Flanker Conditions

Under roving-without-flanker conditions, we performed the same analysis of neurometric functions as described for non-roving conditions; however, here, the analysis was done separately for each sample and its associated test contrasts. Contrary to the results under non-roving conditions (where changes were found in monkey 2, but not monkey 1), under roving-without-flanker conditions, we found that the slope of the neurometric function at the sample contrast significantly decreased with training in monkey 1 (details in Supplementary Materials, Supplementary Figure S9) and the point of neuronal equality changed for a sample contrast of 20% (moving towards 20%), for 30% (moving away from 30% towards 20%), and for 40% (moving from just above 40% to just below 40%). Neurometric function parameters hardly changed in monkey 2, except for a change in the point of neuronal equality changed for a sample contrast of 30% (moving from just above 30% to just below 30% (Supplementary Figure S9).

Changes in Test-Sample Neuronal Discriminability with Learning Under Roving-Without-Flanker Conditions

The average performance across all test contrasts conditioned upon the three different sample contrasts across training days is shown in Fig. 7A (note that this is similar to the roving data shown in Fig. 2, but is broken down into the different sample contrasts). Simply looking at the statistical significance of the correlations, it appears that some (tiny) improvements occurred. Across all test contrasts, significant improvements occurred for the 40% sample contrast in monkey 1 and for the 20% and 30% sample contrasts in monkey 2. However, these changes are very small in both monkeys, even if significant (FDR-corrected) for some sample contrasts.

Fig. 7
figure 7

Behavioural performance and neuronal discriminability under roving-without-flanker conditions. A The average behavioural performance across all test contrasts for the three different sample contrasts across training days. R-values and p-values (FDR corrected for multiple comparisons) for correlations between behavioural performance and training day are shown. B Changes in neural (red) and behavioural (blue) discriminability for different test contrasts, pooled across different task difficulty levels using Weber fractions. Easy discriminations (where the difference between sample and test contrasts was large) correspond to Weber fractions that are much smaller or larger than 1. R-values and p-values (FDR corrected for multiple comparisons) for correlations between behavioural and neurometric performance and training day are shown. Significant negative correlations indicate that behavioural performance and/or neurometric discriminability decreased with training, while significant positive correlations indicate that they increased with training

Behavioural improvements may occur for some test contrasts, which are offset by deterioration for other test contrasts. To assess this, we grouped the behavioural and neuronal data according to task difficulty, using Weber fractions (Weber fraction = test/sample contrast) as a grouping mechanism (Fig. 7). For easy discriminations (where the sample-test contrast difference is large, i.e. for small and large Weber fractions), performance in both monkeys was close to perfect (values close to 1) for all sample contrasts (Fig. 7B, see data for Weber fractions < 0.5 and > 1.5, even though some significant changes did occur, see insets). For more difficult discriminations (Weber fractions closer to 1), performance improvement occurred for some sample-test contrasts, but this was often counterbalanced by performance deterioration with training for other contrasts (Fig. 7B; for example, compare the condition with a 20% sample contrast and a Weber fraction of 0.75–0.875 to that with a 20% sample contrast and a Weber fraction of 1.25–1.5 in monkey 2). Changes in neural discriminability showed decreasing discriminability for both monkeys for Weber fractions < 1 for many sample contrasts, especially in monkey 1 (Fig. 7B, ROC values). This was less pronounced for Weber fractions > 1, where significantly increased discriminability occurred occasionally (see r- and p-value insets in Fig. 7B, e.g. at Weber fraction 1.11–1.25, monkey 2, sample contrast 30%). However, increased discriminability was modest overall and occurred less often than decreased discriminability. To determine whether changes in behavioural discriminability were correlated with changes in neuronal discriminability, we calculated the correlation between behavioural and neuronal changes and associated significance. In monkey 1, no significant correlation existed between these measures for any of the Weber fraction grouped contrasts (FDR-corrected). In monkey 2, two significant (p < 0.05, FDR corrected) correlations were found, but only one of these was associated with significant behavioural and neuronal changes with training (increases for both; Weber fraction 1.11–1.25, 30% sample contrast). Overall, this suggests that the minimal (and counterbalanced) behavioural changes that occur under roving-without-flanker conditions are not associated with changes in V1 neural activity and are a reflection of (minor) high-level adjustments in behavioural strategy instead.

Roving-with-Flanker Conditions

Behavioural improvements occurred with learning after flankers were added (Figs. 2 and 8). However, the initial performance upon flanker introduction dropped to levels below performance levels attained under non-roving and roving-without-flanker conditions. Even after learning, performance never exceeded non-roving performance or even stayed below those performance levels (monkey 2, Fig. 2). As stated previously, given these behavioural data, we did not expect a further change in contrast sensitivity in V1 neurons, as perceptual learning–based contrast sensitivity had already been ‘maxed out’ under non-roving conditions.

Fig. 8
figure 8

Behavioural performance and neuronal discriminability under roving-with-flanker conditions. A The average performance across all test contrasts for the three different sample contrasts across training days. Insets show correlations (and p-values, FDR corrected for multiple comparisons) between performance and training day. B Change in neural (red) and behavioural (blue) discriminability for different test contrasts, pooled across different task difficulty levels using Weber fractions. Easy discriminations (where the difference between sample and test contrasts was large) are indicated by Weber fractions much smaller or larger than 1. Significant negative correlations indicate that performance/discriminability decreased with training; significant positive correlations indicate that performance/discriminability increased with training

Neurometric Contrast Response Functions Under Roving-with-Flanker Conditions

Under roving-with-flanker conditions, none of the neurometric function parameters changed significantly with training in monkey 1 (Supplementary Figure S10 for details). In monkey 2, the only changes observed in the parameters of the neurometric functions were a change in the slope for sample contrasts of 20% and of 30% (Supplementary Figure S10).

Changes in Test-Sample Neuronal Discriminability with Learning Under ‘Roving-with-Flanker’ Conditions

As with the analyses on roving-without-flanker conditions, we grouped the data based on Weber fractions. The average performance across all test contrasts for the three different sample contrasts across training days is shown in Fig. 8A. In both monkeys, the average performance significantly increased with training (see insets in Fig. 8A). Figure 8B shows that in monkey 1, increases in performance occurred mostly for more difficult conditions, as performance was close to perfect for the easiest conditions. In monkey 2, performance increased across all conditions, although improvements were more pronounced for difficult conditions (Fig. 8B, monkey 2). Changes in neural discriminability (ROC values in Fig. 8B) were very limited across all conditions in both monkeys, with significant increases for some conditions in both monkeys (see insets Fig. 8B; all p-values are FDR-adjusted [n = 23]). To determine whether changes in behavioural discriminability were correlated with changes in neuronal discriminability, we calculated the correlation between these two measures. In monkey 1, no significant (FDR corrected) correlation existed between these measures for any of the Weber fraction grouped contrasts (FDR corrected). In monkey 2, six (6/24) significant (p < 0.05, FDR corrected) correlations were found, but only four of these occurred for conditions where significant behavioural and significant neuronal changes with training occurred (increases for both; Weber fraction < 0.5, 20% and 30% sample contrast; Weber fraction 1.25–1.5 sample contrast 30%; Weber fraction 1.5–10, 30% sample contrast), one of the significant correlations showed a significant increase at the behavioural level and a trend for (p = 0.06, FDR corrected) increased discriminability at the neuronal level (Weber fraction 1.11–1.25, 30% sample contrast), while one significant anti-correlation between behavioural (significant decrease) and neuronal changes (trending increase, p = 0.08, FDR corrected) was found for a Weber fraction of 1–1.11 (40% sample contrast). Thus, while some changes co-occurred at the behavioural and neuronal level in monkey 2, this was limited to 1/3 of conditions tested, and no correlated changes were found in monkey 1.

We therefore argue that behavioural changes that occurred under roving-with-flanker conditions are a reflection of improved learning of categorization boundaries, not improved contrast sensitivity, and are not (or only minimally) reflected in V1 activity.

As a final check for this argument, we determined choice probabilities under roving-without- and roving-with-flanker conditions. Under roving-without-flanker conditions, no significant changes of choice probabilities were found in either monkey (Supplementary Figure S11). Under roving-with-flanker conditions, 5/25 choice probabilities significantly changed in monkey 1, whereby 2/5 significantly decreased and 3/5 significantly increased. In monkey 2, 10/24 choice probabilities significantly changed with training, whereby 7/10 significantly decreased and 3/10 significantly increased (Supplementary Figure S12). Thus, changes to CP were overall very limited under roving conditions and showed increases as well as decreases.

Changes in Noise Correlations with Learning Under Roving Conditions

To complement the comparison to non-roving conditions, noise correlations were calculated for each contrast for the first 5 days of training and for the last 5 days of training, for each channel combination (see ‘Methods’ for details). Noise correlations when averaged across contrasts did not show any consistent changes across sample contrasts with learning in monkey 1 (Fig. 9A). Noise correlations increased significantly for a sample contrast of 30%, decreased significantly for a sample contrast of 40%, and showed no significant change for a sample contrast of 20%. However, in monkey 2, noise correlations decreased significantly with learning across all three sample contrasts (Fig. 9A).

Fig. 9
figure 9

Noise correlations as a function of learning under different roving conditions. A Distributions of noise correlations across all test contrasts during the first (blue) and last 5 days (red) of learning under roving-without-flanker conditions. B Distributions of noise correlations across all test contrasts during the first (blue) and last 5 days (red) of learning under roving-with-flanker conditions. Vertical bars indicate sample means; p-values indicate whether noise correlations differed between early and late training stages

Under roving conditions with flankers (Fig. 9B), noise correlations increased significantly in monkey 1 for sample contrasts of 20% and 30%, while there were no significant changes for a sample contrast of 40%. In monkey 2, noise correlations decreased significantly with learning across all three sample contrasts (Fig. 9B).

Thus, in monkey 2, noise correlations significantly decreased with learning for all training conditions (non-roving, roving-without-flanker, roving-with-flankers), while for monkey 1, either no changes occurred or changes varied across conditions. If anything, noise correlations in monkey 1 showed a trend to increase with learning.

Discussion

Training improved behavioural performance of contrast discrimination under non-roving and roving-with-flanker conditions, but were largely absent under roving-without flanker stimuli in macaque monkeys. In the case of non-roving stimulus conditions, behavioural improvements were accompanied by changes in neural discriminability and choice probability in area V1, but overall, these neuronal changes were limited and some of the measures employed (e.g. noise correlation) differed between animals.

Previous studies in humans of contrast discrimination perceptual learning under non-roving and roving conditions yielded somewhat contradictory results. Some studies argued against perceptual learning in the absence of flanker stimuli (Adini et al., 2002; Dorais & Sagi, 1997), while a separate study showed evidence for perceptual learning of contrast discrimination under non-roving conditions (Yu et al., 2004). In the latter study, flanker addition did not result in improvements beyond the levels attained under non-roving conditions (they even argued ‘…that contrast roving essentially kills contrast learning regardless of the presence of flankers.’ (Yu et al., 2004)). Our data are in line with those of Yu et al. (2004). We attribute the more limited changes seen under non-roving conditions in Adini et al. (2002) when compared to changes in our monkey (but also somewhat in Yu et al. (2004)) to differences in training exposure. Our monkeys were generally trained over longer periods of time and performed a larger number of trials than human subjects (who usually performed ~ 4–5 sessions of ~ 2 h). Indeed, performance in both monkeys continued to increase after 4–5 training days under non-roving conditions.

Parameters of neurometric functions only changed in one of the two monkeys under non-roving conditions, and here, training shifted the point of neurometric equality away from the contrast that formed the decision boundary (30% contrast). This was associated with a decrease in the slope of the neurometric function at 30% contrast. Thus, perceptual learning did not cause a systematic sharpening of the tuning curve and did not improve neuronal tuning at the decision boundary in V1 neurons, when assessed with neurometric curve fitting approaches (or more traditional Naka Rushton contrast fitting). This result is contrary to previous reports, where perceptual learning of orientation differences was investigated in areas V1 or V4 and higher ventral stream areas (Adab & Vogels, 2011; Adab et al., 2014; Raiguel et al., 2006; Yang & Maunsell, 2004), and it differs from our previously published results in V4 (in the same monkeys) where contrast discrimination was investigated (Sanayei et al., 2018), or V1 in cats that were trained in a contrast detection task (Hua et al., 2010). However, neurometric discriminability (AUROC, Fig. 5) itself increased close to the decision boundary under non-roving conditions. This is in line with results from V4 under identical task conditions (Sanayei et al., 2018); but the changes were substantially more modest in area V1 than in V4. Why did neurometric (or Naka-Rushton) fitted contrast tuning change so little (or not at all monkey 1) when AUROC-based discriminability at the decision boundary did show small but significant changes? Fitted tuning functions are based on the entire range of response values, and small changes of just a few values may fail to result in consistent adjustments of the fitted parameters, as some changes in one parameter can be offset by changes in a different parameter. Determining AUROC-based discriminability bypasses that problem. In addition to AUROC-based discriminability, changes in choice probability were also found under non-roving conditions, which qualitatively matched those from area V4 (Sanayei et al., 2018), but again changes in V1 were quantitatively smaller.

In V1, unlike in area V4, learning-induced changes in noise correlations were restricted to one of the two monkeys, both of which were also used in our previous V4 study. Thus, results in area V1 differ from data reported for various cortical areas (Gu et al., 2014; Kumano & Uka, 2013; Ni et al., 2018; Sanayei et al., 2018; Uka et al., 2012; Yan et al., 2014). The degree to which changes in noise correlations benefit decoding abilities is debatable (Gu et al., 2011; Minces et al., 2017; Moreno-Bote et al., 2014; Ni et al., 2018; Panzeri et al., 1999, 2022; van Kempen et al., 2017; Yan et al., 2014). In our previous V4 study, we found that a simultaneous change in signal and noise correlation could benefit decoding (Sanayei et al., 2018). This was not uniformly the case under identical task conditions in V1, and our data are thus more similar to the lack of reduction in noise correlations as a function of learning in area V1 that was previously observed in a contour detection task (Yan et al., 2014).

Training at V1 RF locations (parafoveal) occurred after training had been done at V4 RF (peripheral) locations. Could lack of location-specificity of perceptual learning or the sequence of the training at different locations have contributed to the smaller learning effects in V1? Several of our results speak against this possibility. Perceptual learning at the behavioural level was at least as pronounced at parafoveal locations (V1) as it was at peripheral (V4) locations (Fig. 2 and also see Fig. 4 in Chen et al., 2013). Indeed, in monkey 1, the behavioural changes were more pronounced at parafoveal (V1) than at peripheral (V4) locations. Critically, behavioural data suggest that transfer of learning across locations was very limited in both monkeys (Fig. 2B, C). The lack of transfer between locations also argues against the idea that monkeys simply became better at performing a classification task, as this would probably result in larger transfer between stimulus locations at least for easy conditions. The lack of transfer argues for retinotopic specificity. However, this does not mean that learning must occur at the lowest cortical level, as the spatial distance between training locations would allow for the retinotopic specificity to exist in low (V1, V2) as well as mid-level visual areas such as V4 or TEO. If the size of stimuli determined the location of learning (low-level stages for small stimuli vs. mid-level stages for larger stimuli), we would have expected to see similar-sized changes in our V4 and V1 study, which was not the case.

The reverse hierarchy theory of perceptual learning predicts that learning effects in area V4 would be stronger than in V1, that learning should be location specific for difficult discriminations, but transfer between locations should be larger for easy discrimination conditions than difficult conditions (Ahissar & Hochstein, 2004; Hochstein & Ahissar, 2002). While the first two predictions were met in our data set, transfer was not larger for easy vs. difficult conditions (Fig. 2C). This also differs from results obtained in a deep neural-network model of perceptual learning, which supports learning specificity for fine but not easy discriminations, but similar to our data argues for increasing changes in sensitivity with increasing hierarchy (Wenliang & Seitz, 2018).

Under roving-without-flanker conditions without flankers, perceptual learning was largely absent for both monkeys. Some improvements occurred at individual test and/or sample contrasts (see Fig. 6), but these were often counteracted by reduced performance at other test and/or sample contrasts (previously described in Chen et al., 2013, 2014). Given the absence of overall behavioural improvements, it is no surprise that neurometric functions, neuronal discriminability, and noise correlation also did not change in this condition. In the absence of flankers, it was not even the case that monkeys were able to adequately adjust their categorization boundaries under roving conditions.

Addition of Flanker Stimuli

Flanker introduction resulted in an initial drop in performance, relative to non-flanker conditions. While learning under roving-with-flanker conditions did occur in both monkeys, overall performance never exceeded non-roving conditions (in monkey 1), it even stayed below non-roving conditions (monkey 2). Assuming that neuronal changes in V1 help to improve behavioural contrast discrimination, an absence of additional neuronal changes in our study (after non-roving training) is unsurprising given that overall performance does not improve any further. The learning that does take place under roving-with-flankers conditions is likely to be a ‘high-level-conceptual’ learning, namely the ability to heed the sample more and adjust the categorization boundary accordingly from trial to trial, but this was only possible after introduction of the flankers. In a previous study, Law and Gold (Law & Gold, 2008) recorded from area MT and LIP while monkeys were being trained on a coarse motion discrimination task. In their training and recording, they changed the axis of motion from day to day, which would induce some roving, as with our design. While they did not find any changes in neural responses in area MT, they did find neural correlates of learning in LIP. In line with the results of Law and Gold (2008), our V1 data show that low-level sensory areas are not critical for representing variable categorization boundaries.

In this study, we used MUA activity to investigate perceptual learning in V1. This was done to enable analysis of changes as learning progressed. We do not believe that results would have been different had we recorded from single units instead. Continuous recording from single units over extended periods of time is difficult. Spike amplitudes (and waveforms) may change across different recording days. Using a spike template across recording days could then result in dramatic activity changes over the course of training for some channels as spikes drift in and out. To counteract the problem of some neurons (spikes) drifting in, others drifting out, we applied a baseline firing rate matching approach in conjunction with multi-unit thresholding (Methods). In the previous study (Sanayei et al., 2018), we had a few single units that (we believe) were stable across all recording sessions and which produced the same overall results as those seen in our MUA approach. The advantage of our approach (continuous recording as learning progressed from multiple simultaneously recorded channels) also allowed us to perform population coding analyses, which is not possible when averaging across single-cell recording data. Using this approach, we found that changes to the structure of population responses increased coding abilities in area V4 (Sanayei et al., 2018), but not V1.

What can we conclude from the differences found between our V4 and V1 data? According to the reverse hierarchy theory of learning, learning-induced neuronal changes occur throughout the visual hierarchy, but are overseen by high-level cognitive processes (Ahissar & Hochstein, 2004; Hochstein & Ahissar, 2002), assumed to occur first in high- and mid-level areas. It proposes that attention mechanisms ‘alert’ the cortex to behaviourally relevant stimuli and that a form of gating is carried out by neuromodulators that operate in task-relevant regions, enabling plasticity. Thus, top-down mechanisms such as attention are responsible for selective alterations of relevant neuronal populations. Under contrast discrimination, this learning signal induced more plasticity in mid-level (V4) than low level (V1) areas, and some of these low-level changes were present in our data set. The difference seen in our V1 and V4 data suggest that neuronal changes might be even more pronounced in area TEO and IT than area V4. Additionally we might expect changes under roving-with-flanker conditions in high-level areas that represent categorization boundaries in line with results from LIP, IT, and prefrontal recordings (Freedman & Assad, 2006; Law & Gold, 2008; Meyers et al., 2008).

How to interpret the differences across monkeys? Fig. 2 compares the level of behavioural learning across tasks and also to the previous V4 (peripheral) data. Monkey 2 overall shows a wider range of behavioural improvements (except for the roving-without-flanker data where neither monkey showed changes), even if the final performance can be above (V1 non-roving) or below (V1 roving-with-flankers, V4) the performance of monkey 1. Similar to the changes seen in neuronal V1 data, neuronal changes in monkey 2 in the previous V4 study were also overall larger than in monkey 1 (see e.g. Figure 3C, 4B, 5, 6, 7 in Sanayei et al., 2018). Thus, there seem to be genuine differences in the amount of neuronal change that occurred in the two monkeys which parallel the behavioural changes seen. Whether a larger neuronal change in monkey 2 was the cause for the wider range in behavioural changes (or vice versa) or whether the increased starting performance in monkey 1 limits the amount of neuronal change that could be obtained is unclear. We suspect that it was the higher starting level in monkey 1, maybe having better contrast discrimination naturally, that limits the overall neuronal change possible.

Overall, in our study, the contribution of V1 neurons to improved perceptual performance was comparatively limited, and changes were smaller than those previously seen in area V4 and differed from the results seen when monkeys were required to perform contour integration (Astorga et al., 2022; Yan et al., 2014) or bisection tasks (Crist et al., 2001). Perceptual learning of contrast discrimination is possible under non-roving conditions. Under roving conditions with flanker stimuli in macaque monkeys, we also saw behavioural improvements, but we believe these were a reflection of learning of categorization boundaries, not perceptual improvements per se. The neuronal structures underpinning changes associated with perceptual learning, while partly present in V1, appear to increase with cortical hierarchy. Changes associated with improved categorical decision-making were absent in V1.