Introduction

In a 1993 Nature paper, Rauscher et al. (1993) showed that listening to Mozart's sonata K448 enhanced spatiotemporal reasoning. Despite some conflicting results, subsequent studies substantially confirmed the presence of the so-called Mozart effect (Jenkins, 2001). It has been suggested that cognitive enhancements following music listening might be mediated by the arousal/attentional state of participants following aesthetic appreciation of music, rather than being caused by music per se (Thompson et al., 2001). Indeed, beauty perception may be associated with a specific attentional attitude (i.e., the “aesthetic attitude”) focused on the perceptual activity for its own sake (Menninghaus et al., 2019), which diverges from everyday pragmatic, interest-driven perception (Kemp, 1999; Nanay, 2016; Pearce et al., 2016). Accordingly, aesthetic appreciation has been linked to attention enhancement and knowledge acquisition (Schoeller & Perlovsky, 2016; Van de Cruys & Wagemans, 2011). Previous studies demonstrated that behavioural and electrophysiological markers of attention and perceptual learning, such as the mismatch negativity (MMN) response, are enhanced in response to more appreciated visual (Sarasso et al., 2020b) and auditory stimuli (Sarasso et al., 2019, 2021a, 2021b). However, whether or not this positive effect on perceptual learning can be protracted even after an aesthetic experience has not been systematically investigated yet. The present behavioural-EEG study aimed to: (1) test whether perceptual learning of sensory regularities is enhanced following preferred music listening, and (2) investigate the role of attentional deployment (toward the self vs. the environment) in the genesis of cognitive enhancement following music listening.

To this purpose, we designed two experiments, an EEG study (Experiment 1) employing a well-validated roving paradigm (see Methods) to measure the ability of the sensory system to learn statistical regularities of the sensory environment (Garrido et al., 2016; Ostwald et al., 2012), and a behavioural study (Experiment 2) exploring participants’ interoceptive awareness (Ainley et al., 2012) following music listening. To modulate participants’ aesthetic attitude in the present experiments, we employed four classical music pieces played forwards and backwards. We collected subjective aesthetic ratings for each of the eight musical pieces and we compared results obtained after listening to more and less appreciated music, grouped according to subjective ratings (Sarasso et al., 2021b).

In Experiment 1, following music listening, participants attended to a stream of sounds varying in their frequency (Hz), while we recorded their EEG. We then computed the MMN, a differential wave obtained by subtracting the event-related response to standard (i.e., repeated) sounds from that of deviant sounds (Näätänen et al., 2007). The MMN reflects activations in the posterior auditory cortex and the inferior frontal gyrus, and peaks around 100–250 ms post deviancy onset (Molholm et al., 2005). The MMN is commonly considered as a neurophysiological signature of perceptual learning, indicating the update of the predictive models generated by the nervous system to anticipate the sensory regularities occurring in the environment (Lieder et al., 2013). To further investigate the possible modulation of the neural encoding of sensory regularities/surprise induced by preferred music listening, we performed a point-by-point correlation (Novembre et al., 2018; Sarasso et al., 2021b) between EEG single-trial amplitude fluctuations and a theoretic index of perceptual learning, i.e., the Bayesian Surprise (Baldi & Itti, 2010). The Bayesian Surprise describes the potential learning occurring when the brain has “assimilated” the new input to update its predictive model of the environment (Faraji et al., 2018). It is computed as the divergence between prior and posterior beliefs following each sound and can be conceived as the quantity of information conveyed by single stimuli, assuming that the perceiver is an ideal Bayesian observer (see Methods and Online Supplemental Material (OSM)). Thereby, the correlation between brain responses and Bayesian Surprise captures how much the cognitive system is “attuned” to sensory information.

Altogether, MMN amplitude fluctuations and the single-trial correlation with Bayesian Surprise index represent validated methods to track participants’ learning abilities throughout the experiment. Thanks to these biomarkers, we will be able to measure the possible outcomes of the aesthetic attitude, but we will not uncover the neural mechanisms establishing such an attitude. To this aim, we recorded participants’ neural activity during music listening. Previous studies showed increased alpha oscillatory activity immediately after music listening (Jaušovec et al., 2006; Verrusio et al., 2015). Crucially, alpha power at rest is a powerful predictor of participants’ attentional state and ability to learn (Jann et al., 2010; Sigala et al., 2014). It has been demonstrated that approximately 40% of the inter-subject variability in perceptual learning can be explained by the resting-state posterior parietal alpha intrinsic oscillations (Freyer et al., 2013). Therefore, to verify whether listening to preferred music was able to enhance alpha oscillatory activity, we explored resting-state EEG spectral profiles recorded during the last minutes of musical pieces’ attendance.

We previously suggested (Sarasso et al., 2020a) that the beauty-induced aesthetic attitude toward learning might be paralleled by a transient disinterest in self-related utilitarian concerns (for a discussion on disinterestedness in neuroaesthetics, see Vassiliou, 2020). To test this hypothesis, in Experiment 2, we investigated whether the possible attentional modulations and perceptual learning enhancement following preferred music listening were associated with changes in self-focused attention. Importantly, interoceptive accuracy in hearth beat detection (HBD) appeared increased by the attentional focus on the self (Pollatos et al., 2016) and is commonly employed to assess attentional focus toward the self versus the environment (García-Cordero et al., 2017). Furthermore, it has been demonstrated that priming stimuli that direct attention toward the self (Ainley & Tsakiris, 2013; Ainley et al., 2012; Maister & Tsakiris, 2014; Weisz et al., 1988), or even pathological self-concern in panic/anxiety disorders (for a recent review, see Brewer et al., 2021), can enhance the accuracy in HBD tasks. Therefore, to test whether the beauty-induced aesthetic attitude may affect self-focus attention, we designed a HBD task performed by participants listening to the same musical stimuli employed in Experiment 1.

If the appreciation of preferred musical stimuli is able to induce an aesthetic attitude following music listening, we should observe an enhancement of participants’ attention and perceptual learning abilities in Experiment 1. More specifically, we expect: (1) MMN to be larger after listening to preferred versus non-preferred music in the roving paradigm; (2) the correlation between EEG responses single-trial amplitudes and Bayesian Surprise to be stronger after listening to preferred versus non-preferred music, thus indicating a better neural encoding of sensory information (Experiment 1). If an aesthetic attitude is established during preferred music listening, by recording participants’ neural activity, we should pinpoint the neural mechanism responsible for such an attitude. We predict to observe (3) a resting-state alpha power to be larger during listening to preferred music (Experiment 1). Importantly, the enhancement of participants’ attention toward the sensory environment, resulting in an amplified perceptual learning, may be paralleled by the detriment of self-focused attention, thus decreasing interoceptive awareness in the HBD task (Experiment 2). Therefore, we predict (4) the accuracy measured by the HBD task to be lower after listening to preferred music (Experiment 2).

Experiment 1 (EEG experiment)

Methods

Participants

Eighteen right-handed healthy subjects participated in Experiment1 (11 women, mean age: 25.833 years; SD: ± 1.790; mean years of education: 18.833; SD: ± 1.505). All participants gave their written informed consent to participate in the study, which conformed to the standards required by the Declaration of Helsinki and was approved by the Ethics Committee of the University of Turin (Prot. n. 121724 – 01/03/18). Participants were not compensated for taking part in the experiment.

Sample size (N = 18) was a priori determined to match the average number of participants involved in previous studies highlighting evoked-response potential (ERP) modulations driven by aesthetic appreciation (Sarasso et al., 2019, N = 22; Sarasso et al., 2020b, N = 13; Sarasso et al. 2021b, N = 18; average = 17.7).

Stimuli and experimental design

Musical stimuli

The eight musical stimuli we employed were composed of four classical music excerpts cropped to a 5-min length, and the same four excerpts played backwards. We selected the following classical music pieces (see OSM): Debussy’s Violin Sonata in G Minor Mvt.1 (allegro vivo); Chopin’s Nocturne in B-Flat Minor Op.9 N.1 (larghetto); Mozart’s Piano Sonata No.11 in A Major K.331 (andante grazioso); Ravel Piano Concerto in G major (allegramente). These musical pieces were chosen for their low familiarity for a non-expert public. Indeed, none of our participants could recognize or remember them. As assessed by previous studies (Fritz et al., 2013), reversed music is effective in modulating participants’ aesthetic appreciation while maintaining similar acoustic features (e.g., rhythm and pitch). Moreover, other than direction (i.e., forward or backward), musical stimuli differed by two additional variables: key (i.e., Major or Minor) and tempo (i.e., fast or slow). Musical pieces were paired for key and tempo (two with major and two with minor key; two with fast and two with slow tempo). Stimuli were matched for key, tempo and the direction of presentation to control for whether aesthetic preferences were actually driving the observed effects, or whether instead other objective features of the music stimuli were the responsible for the results. As reported in the Data analyses section, we compared ACC and aesthetic judgements (AJs) following forward versus backward, major versus minor and fast versus slow music listening. Furthermore, to avoid any sequence effect due to specific orders of presentation, musical pieces were played in a random order across the eight blocks. Therefore, the sequence of presentation of the eight musical excerpts was different for each subject.

Procedures

The experiment was based on a within-subject design and consisted of eight blocks of two subsequent tasks. First, for each block, participants passively listened to 5 min musical excerpts (i.e., ‘Musical Listening’; subjects passively listened to 40 min of music or reversed music in total in the whole experiment). Subsequently, subjects were asked to express an aesthetic judgment using a Likert scale ranging from 1 to 9 (where 1 corresponded to “The ugliest music I can imagine” and 9 corresponded to “The most beautiful music I can imagine”; Sarasso et al., 2019, 2020b, 2021) by pressing the corresponding key on the computer keyboard. AJs were registered for each block/musical piece (E-Prime 2.0 software, Psychology Software Tools, Inc. USA). After each musical excerpt, subjects listened to a stream of sounds presented according to a roving paradigm while we registered their EEG (see EEG Mismatch Negativity Task). Furthermore, we recorded the resting-state electrophysiological activity during the last 2 min of passive forwards or backwards music listening. Resting state EEG data served as input for the analyses in the frequency domain (see Data analysis). During the whole experiment, participants sat at a table with eyes open, in front of a 53-cm (diagonal) computer screen. The screen centre was aligned with the subjects’ trunk midline. Participants’ arms were resting on the ipsilateral leg during the MMN roving paradigm.

EEG Mismatch Negativity Task (MMN)

The EEG Mismatch Negativity Task is based on a roving auditory paradigm (Baldeweg et al., 2004; Ostwald et al., 2012), with standard and deviant sounds differing in their frequency (Hz). The roving paradigm was designed to investigate the MMN differential wave, a consolidated neurophysiological marker of implicit perceptual learning of sensory regularities (Garrido et al., 2016; Lieder et al., 2013). MMN responses were registered while subjects listened to sounds created with Csound (https://csound.com/). The software allowed us to select the frequency of the synthetic sounds composing the roving sequence. Low-pitch and high-pitch sounds had a frequency of 600 and 1,200 Hz, respectively. Sound sequences were presented by Eprime V2.0 presentation software (Psychology Software Tools Inc., Pittsburgh, PA, USA). Each sound was played for 50 ms via loudspeakers. Loudness of sounds was set at a comfortable level (≅ 70 dB) and was kept equal across subjects and experiments.

Sound sequences consisted of trains of 288 sounds (duration 50 ms) per run, each lasting 302.4 s (sounds were played at a 1-Hz frequency – block whole duration 288 s + 50 ms duration of each sound). While listening to the sound sequences, participants were asked to remain silent and look straight ahead at a central fixation cross on the computer screen. Differently from traditional oddball paradigms, in roving protocols each stimulus type has the same probability to occur, thus allowing dissociation of genuine effects of Bayesian perceptual learning from rarity-driven modulations. In roving paradigms, different stimuli (high-pitch and low-pitch tones in our case) can represent both Deviant and Standard stimuli (Fig. 1), as opposed to traditional oddball paradigms (Näätänen et al., 1978) where the repeated presentation of standard sounds is occasionally interrupted by the occurrence of physically different deviant sounds. In our case, high-pitch and low-pitch intervals were presented in consecutive trains of alternating pitch with a constant inter-stimulus interval of 1 s (in accordance with previous studies employing similar inter-trial intervals; Ostwald et al., 2012). Any time a change in the stimulation stream occurs (i.e., the transition from a high-pitch to a low-pitch stimulus train or vice versa), the first stimulus of the new train constitutes a Deviant event, since it differs in frequency (Hz) from the preceding train of stimuli, which are therefore considered Standard. The length of the trains of high-pitch and low-pitch intervals was chosen according to a pseudo-random order, so that both the number of presentations and the average value of the Bayesian surprise (see Bayesian perceptual surprise computation) were equal across pitch types (i.e., high or low; Fig. 1). Moreover, the ratio between Standard (80%) and Deviant (20%) trials was kept equal across blocks. The average length of consecutive equal sounds was 5.143 ± 3.739, with a maximum of 16 equal consecutive sounds and a minimum of one sound.

Fig. 1
figure 1

Experiment 1 – experimental paradigm. The sequence of deviant and standard sounds presented according to a roving paradigm followed 5 min of music (or reversed music) listening. Right after the end of each musical piece we collected aesthetic ratings. The whole procedure was identically repeated eight times, once for each musical stimulus. The EEG was registered during the whole experiment

Bayesian perceptual surprise computation

For a detailed description of the mathematical computations please refer to the OSM. Similar to previous studies (Baldi & Itti, 2010; Ostwald et al., 2012), to relate single-trial EEG signals to Bayesian perceptual learning, we computed Bayesian Surprise for each single trial using a sequential Bayesian learning algorithm of stimulus probabilities, thus obtaining 288 (i.e., the total number of sounds composing the sequence presented in each session of the EEG MMN task) estimated surprise values. The model assumes that the brain implements a trial-by-trial Bayesian parameter learning scheme starting from an uninformative prior and computes Bayesian Surprise as the divergence between the parameter prior and posterior probability density functions at the single-trial level. Following Ostwald et al. (2012), we use a variant of this model that assumes an exponential forgetting of stimuli that are observed in the distant past (we set the forgetting parameter to τ = 4, which was shown to best describe neural activity; Ostwald et al., 2012). The degree of perceptual learning at the n-th trial is then defined as Bayesian Surprise, i.e., the Kullback–Leibler divergence between the prior and posterior distribution over the probability of observing a high-pitch interval on the n-th trial (Cover & Thomas, 1991). Surprise values are larger when the Deviant stimulus (e.g., a high-pitch sound) comes after longer trains of identical stimuli (e.g., low-pitch sounds), since strong evidence for Standard sounds (e.g., low pitch) accumulates with increased precision and the divergence between prior and posterior probability distributions results is larger.

Data analysis

Aesthetic judgements

We first analysed behavioural results from the aesthetic judgment task. This allowed us to identify the preferred musical pieces for each subject. We divided the eight musical pieces into two categories (i.e., preferred and non-preferred), for each participant separately based on her/his ratings. The group-level effect of preference was evaluated by comparing the average ratings of the four preferred and non-preferred musical pieces. Thus, we performed a two-tailed t-test (N = 18) between the preferred and the non-preferred musical pieces. Moreover, to exclude that the results from subsequent analyses were influenced by musical stimulus objective features, rather than by subjective preferences, we compared average aesthetic judgements for forward versus backward music, major versus minor music, and fast versus slow tempo music. Therefore, three additional control two-tailed t-tests were performed to investigate whether key, tempo and direction significantly influenced aesthetic judgements.

Electrophysiological recordings and EEG preprocessing

EEG data were collected during the eight runs of the EEG MMN task and during the last 2 min of music listening, with 32 Ag-AgCl electrodes placed on the scalp according to the extended International 10–20 system and referenced to the nose. Electrode impedances were kept below 5 kΩ. The electro-oculogram (EOG) was recorded from two surface electrodes, one placed over the left lower eyelid and the other placed lateral to the outer canthus of the left eye. EEG activity was recorded with a HandyEGG amplifier (HandyEEG e SystemPlus Evolution, Micromed, Treviso – IT) and continuously digitized at a sampling rate of 1,024 Hz.

MMN analyses

Data collected during the EEG MMN Task were off-line pre-processed with Letswave6. Data were segmented into epochs of 1 s, including 200-ms pre-stimulus and 800-ms post-stimulus intervals. Epochs were band-pass filtered (0.5–40 Hz) using a fast Fourier transform filter (in accordance with previous literature exploring MMN; Ostwald et al. 2012). Filtered epoched data were baseline corrected using the interval from -0.15 to 0 s as reference. Ocular artefacts were eliminated using Independent Component Analysis (ICA; Jung et al., 2000). ERPs were divided according to the participants' subjective aesthetic appreciation as measured by the aesthetic evaluation following each musical stimulus, resulting in two conditions: preferred and non-preferred. That is, independent of their objective features (e.g., forward or backward presentation), the four most appreciated musical excerpts corresponded to the preferred condition, while the four least appreciated excerpts were assigned to the non-preferred condition. ERPs belonging to the same condition (i.e., preferred or non-preferred) and to the same deviance condition (i.e., Standard vs. Deviant) were then averaged, to obtain four average waveforms for each subject (i.e., Preferred Standard, Preferred Deviant, Non-preferred Standard, Non-preferred Deviant).

MMN responses were obtained by subtracting the ERPs elicited by standard intervals from those elicited by deviant intervals (Näätänen et al., 2007). Importantly, in this analysis we included only the last standard trial for each stimulus train occurring before deviant trials (N = 52 per run; Ostwald et al., 2012). In this way, in the MMN analysis, the number of standard and deviant trials was matched. Single participants’ MMN registered on single channels were entered in subsequent group-level analyses. We were interested in testing for possible differences in MMN registered after listening to more and less appreciated musical stimuli.

In the present study we employed point-by-point statistical tests. Point-by-point analyses represent a statistical approach common in EEG studies (Bruno et al., 2020; Harris et al., 2013; Novembre et al., 2018; Ronga et al., 2013), directed to highlight possible differences between conditions across the whole epoch time-course, without any a priori assumption. One statistical comparison for each time point composing a waveform is performed. To correct for multiple comparisons, cluster-based permutation testing approach (1,000 random permutations) was employed to each point-by-point analysis (Maris & Oostenveld, 2007). The thus obtained clusters of significance represent the result of the point-by-point analyses, corrected by permutation testing. For a similar statistical approach, please refer to Sarasso et al. (2019, 2020b).

To test for possible differences in MMN registered after listening to preferred versus non-preferred musical stimuli, we performed a point-by-point t-test (Novembre et al., 2018), with clustersize-based permutation correction for multiple comparisons based on temporal consecutivity and spatial adjacency (1,000 permutations; alpha level = 0.05; percentile of mean cluster sum = 95; minimum number of adjacent channels = 2), on differential MMN waves (Deviant-Standard). The test compared single subjects’ MMN amplitudes corresponding to the preferred and non-preferred conditions, for each channel separately. This allowed us to identify time-clusters containing brainwave amplitudes that significantly differed between preferred and non-preferred conditions.

Trial-by-trial correlation with Bayesian Surprise

Pre-processed epochs and Bayesian Surprise values corresponding to single trials constituted the input of a point-by-point trial-by-trial correlation analysis (Novembre et al., 2018; Sarasso et al., 2019, 2020b). For each participant and for each block separately, the analysis computed the correlation between Bayesian Surprise and trial-by-trial (N = 288) fluctuations of the EEG signal amplitude registered at single channels. The outcome of the correlation analysis was eight 1-s long (from 0.2 s pre-onset to 0.8 s post-onset) time series of r-values for each channel and for each subject. R-values were then averaged across blocks according to subjective preference, resulting in two time series, one for preferred and one for non-preferred music. This constituted the input for a group-level two-tailed point-by-point t-test with permutation-based correction for multiple comparisons (1,000 permutations; alpha level = 0.05; percentile of mean cluster sum = 95). The test compared correlation coefficients corresponding to the preferred and the non-preferred conditions. Possible differences in r-values between amplitudes and Bayesian Surprise would indicate a different encoding of sensory surprise during the perception of sound sequences in the two different preference conditions.

Analyses in the frequency domain

EEG recordings collected during passive music listening served as input to analyse possible differences following preferred and non-preferred music in the frequency domain and to explore whether such difference might be predicted by a specific frequency modulation observed in the last two minutes of music listening. The off-line pre-processing and analyses were conducted with Letswave6 toolbox (Nocions, Ucl. BE) for Matlab (Mathworks, Inc., USA). The EEG data corresponding to each of the eight musical excerpts were divided into 20 segments of 4 s registered at the end of each excerpt (i.e., 80 s at the end of each musical piece; Verrusio et al., 2015). Each segment (i.e., 40 segments) was band-pass filtered (0.5–70 Hz) and notch filtered (50 Hz) using a Fast Fourier transform filter. Filtered epoched data were transformed in the frequency domain using a Fast Fourier Transformation (FFT). Epochs served as input for calculating the power in the various EEG frequency bands from 1 to 50 Hz (delta = 1–3.5 Hz; theta = 4–7.5 Hz; alpha = 8–12.5 Hz; low beta = 13–19.5 Hz; high beta = 20–29.5 Hz; and gamma = 30–50 Hz). We then averaged the 20 FFT transformed segments (each corresponding to a frequency power spectrum) obtaining an average spectrum for each block/musical excerpt. The average spectral profiles were then averaged according to the participants' subjective aesthetic appreciation, resulting in two conditions: preferred and non-preferred. As previously described in the MMN analyses, independently from their objective features (e.g., forward or backward presentation), the four most appreciated musical excerpts corresponded to the preferred condition, while the four least appreciated excerpts were assigned to the non-preferred scenario. The four average power spectrums corresponding to single blocks/musical stimuli that belonged to the same condition (i.e., preferred or non-preferred) were then averaged, to obtain two average spectral profiles for each single participant (i.e., preferred and non-preferred).

Single participants’ average power spectrum registered on single channels were entered in subsequent group-level analyses. We performed a point-by-point t-test (Novembre et al., 2018), with clustersize-based permutation correction for multiple comparisons based on temporal consecutivity and spatial adjacency (1,000 permutations; alpha level = 0.05; percentile of mean cluster sum = 95; minimum number of adjacent channels = 2), on average spectrums (preferred vs. non-preferred). The test compared single subjects’ frequency spectral power corresponding to preferred and non-preferred conditions at each time point, for each channel separately. This allowed us to compare the power spectral profiles registered after listening to the most and the least appreciated musical pieces, in all the frequency bands of interest. Lastly, in order to obtain a visually intuitive representation of the effect of aesthetic appreciation on EEG spectral profiles, single subjects’ average spectrum scenarios were merged to obtain a grand-average spectrum corresponding to preferred and non-preferred musical stimuli (see Fig. 2d).

Fig. 2
figure 2

Experiment 1 – results. In panel A, grand average (N = 18) responses triggered by standard and deviant sounds after listening to preferred and non-preferred music are depicted. Scalpmaps represent average amplitudes across deviancy and preference conditions at 100, 224 and 390 ms post-onset, corresponding to the N1, P2 and N4 peak latencies. MMN differential waveforms (deviant-standard) are represented in panel B and peaked at around 150 ms post-onset. T-values from the point-by-point t-test comparing MMN after preferred and non-preferred music listening are shown at the bottom of the panel. Significant t-values are highlighted in black and grey shaded areas and correspond to significant clusters surviving cluster-based permutation correction. Scalpmaps represent t-values within significant clusters. Panel C shows average r-values between amplitudes at Fz and Bayesian Surprise and t-values from the point-by-point trial-by-trial correlation analysis. Shaded areas correspond to significant clusters surviving cluster-based permutation correction as revealed by the pint-by-point t-test comparing average r-values corresponding to preferred vs. non-preferred music listening. The scalpmap represents t-values within the significant cluster. Panel D shows the resting state EEG spectral profile registered during the last 80s of music listening. Shaded areas correspond to significant clusters surviving cluster-based permutation correction as revealed by the pint-by-point t-test comparing the spectrum corresponding to preferred vs. non-preferred music listening. The scalpmap represents t-values within the cluster centred over the upper alpha frequency (10.5 Hz)

Results of Experiment 1

Aesthetic appreciation of musical stimuli

The aesthetic ratings associated with subjectively preferred (mean = 7.393; SD = 0.961) versus non-preferred (mean = 4.683; SD = 1.249) musical stimuli were significantly different (t17 = -13.43; p < 0.001; Cohen’s d = 2.432) as revealed by the two-tailed point by point t-test. The comparison of aesthetic ratings corresponding to musical stimuli with different objective features (key, tempo, direction), revealed that, as expected, only the direction of presentation (forward vs. backward) significantly affected aesthetic rating (t17 = 4.99; p < 0.001; Cohen’s d = 1.407), with higher ratings for forward (mean = 6.967; SD = 1.242) compared to backward music (mean = 5.18; SD = 1.298). Preferences were neither significantly affected by different keys (MINOR: mean = 6.298; SD = 1.223; MAJOR: mean = 5.848; SD = 1.045; t17 = -1.882; p = 0.077; Cohen’s d = 1.014) nor by different tempo (SLOW: mean = 6.086; SD = 1.119; FAST: mean = 6.06; SD = 1.115; t17 = 0.121; p = 0.905; Cohen’s d = 0.922). Therefore, subsequent control analyses run to exclude the possibility that electrophysiological results were affected by mere objective features of musical stimuli rather than by subjective preferences were restricted to the comparison of results following forward and backward music listening.

MMN results

Point-by-point group level analyses on single subjects’ MMN differential waves revealed the following results. To exclude that the effect on perceptual learning was merely due to the fact that musical pieces were played forward or backwards, we compared MMN responses registered after forward versus backward music listening. This analysis revealed no significant difference between MMN registered after forward versus backward music listening (no cluster survived permutation correction). Crucially, once MMN were grouped according to subjective aesthetic preferences (preferred vs. non-preferred), the same analysis highlighted several clusters of significance: the point-by-point t-test comparing MMN registered after preferred versus non-preferred music listening revealed a large significant cluster extending across frontal electrodes from 100 to 276 ms post-onset and including the MMN negative peak at 120–150 ms post-onset (Fig. 2b). The effect showed its maximum over F3 and F7 and remained significant over fronto-lateral electrodes during the entire time cluster (see scalpmaps in Fig. 2b). A second large frontal cluster (Fpz, Fp1, Fp2, F7, F3, Fz, F4, F8, Fc5, Fc1, Fc2, Fc6, Fcz), centred around 305–364 ms post-onset, corresponded to the latency of the offset of the P3 component and the onset of the N400 component.

Trial-by-trial correlation with Bayesian Surprise

In order to test for the presence of a possible enhancement of the encoding of sensory surprise (i.e., informative value) after preferred music listening, compared to non-preferred music listening, we performed a point-by-point correlation analysis between trial-by-trial fluctuations and Bayesian Surprise, an information-theoretic index of sensory surprise and EEG amplitudes (see Methods and OSM). In agreement with previous studies (Mousavi et al., 2020; Rabovsky et al., 2018), Bayesian Surprise was more strongly correlated with trial-by-trial amplitude fluctuations in correspondence of the MMN peak around 130 ms post-onset (as indicated by r-values local peaks; Fig. 2c), the P300 peak at 230 ms post-onset, and the N400 negative component (395 ms post-onset). A single significant cluster, however, survived permutation cluster-based correction in the point-by-point t-test comparing r-values corresponding to preferred versus non-preferred music listening. This cluster was centred over frontal electrodes (F7, F3, Fz, Fc1, Fc2) at 356–396 ms post-onset and corresponded to the latency and scalp amplitude distribution of the N400 component (Fig. 2c).

Results in the frequency domain

Additionally, to explore possible differences in resting state brain oscillatory activity following preferred versus non-preferred music listening, we computed, for each participant, the average power spectrum registered during the last 2 min of passive music (or reversed music) listening. For each participant and for each channel, we obtained two curves (preferred vs. non-preferred music) indicating the average power across frequencies (see grand-average values in Fig. 2d). These data served as the input for a subsequent group level point-by-point t-test comparing the distribution of power across frequencies following preferred and non-preferred music listening. In accordance with previous findings (Jaušovec et al., 2006; Verrusio et al., 2015), this analysis revealed a significant increase in the power of the posterior upper alpha rhythm following preferred versus non-preferred music listening (see Fig. 2d). As reported in Fig. 2d, the point-by-point t-test revealed a significant cluster surviving permutation cluster-based correction, corresponding to the upper alpha band (10.286–11.067 Hz), peaking at 10.5 Hz and centred over centroparietal electrodes (CP1, CP2, Pz, P4). As can be seen in Fig. 2d, this analysis revealed a change in the spectral profile of the alpha frequency, with increased power in the upper alpha (10–13 Hz), compared to the lower alpha (8–10 Hz), after listening to preferred music.

In sum, we performed Experiment 1 to investigate the neural attunement to sensory surprise after listening to subjectively preferred versus non-preferred music. An increased correlation between surprise and EEG responses, also evidenced by larger MMN responses, after preferred music listening demonstrates the possible presence of a positive aftereffect on perceptual learning dynamics triggered by subjective musical aesthetic pleasure. The aftereffect is preceded by a power shift toward upper alpha frequencies during preferred music listening. Such an increase in upper alpha power is generally related to increased attention to external auditory stimuli (Sadaghiani et al., 2010; see Discussion). To further evaluate the effect of musical preferences on attentional dynamics (toward external stimuli and toward the self) we designed Experiment 2.

Experiment 2

We predicted that the enhancement of participants’ attention toward the sensory environment, resulting in an amplified perceptual learning, may be realized at the expense of self-focused attention. To verify the hypothesis that aesthetic pleasure redirects attentional resources from self-related stimuli to environmental stimuli, we designed Experiment 2, which exploits the well-acknowledged link between performances in HBD tasks and self-directed attentional focus (Pollatos et al., 2016; see also the Discussion).

Methods

Participants

Twenty healthy right-handed subjects participated in the experiment (16 women, mean age: 25.500 years; SD: ± 2.212; mean years of education: 18.550; SD: ± 1.050). This group of participants was entirely different from the sample of Experiment 1 (i.e., participants of Experiment 1 did not take part in Experiment 2). All participants gave their written informed consent to participate in the study, which conformed to the standards required by the Declaration of Helsinki and was approved by the Ethics Committee of the University of Turin (Prot. n. 121724 – 01/03/18). Participants were not compensated for taking part in the experiment.

Sample size (N = 20) was a priori determined through a power analysis based on the effect size obtained in a pilot experiment identical to this behavioural experiment, involving ten additional participants and comparing HBD accuracies (ACC) between preferred and non-preferred conditions (Cohen’s d = 0.613; α = 0.05; required power = 0.95).

Stimuli and experimental design

Procedures

Similarly to Experiment 1, Experiment 2 was based on a within-subject design and consisted of eight blocks of three consecutive tasks: (1) passive listening to musical pieces (i.e., ‘Musical Listening’ as in Experiment 1); (2) an Aesthetic Judgement task identical to Experiment 1; (3) an active heartbeat counting (i.e. ‘Counting Task’) in which participants were asked to estimate the number of heartbeats they felt during different time intervals (Ainley & Tsakiris, 2013; Koeppel et al., 2020), while their effective heartbeat was recorded using an ad hoc device (Astel Electronic Engineering Srl, Torino, Italy). Musical stimuli, passive musical listening and aesthetic judgements procedures were identical to Experiment 1.

Counting task

Participants were asked to lie down after having listened to and judged each musical excerpt. After the aesthetic judgment was collected, the Counting Task began. In each block subjects were asked to silently count the number of heartbeats during four time intervals. To avoid participants using temporal cues during beat estimation, the duration of the four intervals varied: intervals of 25, 30, 35 and 45 s were used, in accordance with the Mental Tracking Method described by Schandry (1981). The order of presentation of the different time intervals employed in the counting tasks was fully randomised across blocks. The beginning and the end of each interval in the counting phase were signalled vocally by the experimenter. After the stop signal, for each interval, subjects were required to verbally report the number of heartbeats they counted. Subjects were informed about neither the length of the counting intervals nor their performance.

Data analysis

Aesthetic judgements

Analyses on aesthetic ratings were the same as those employed in Experiment 1.

HBD accuracy

ECG raw data were analysed with the Signal Processing Toolbox in MATLAB. Low (45 Hz) and high (0.5 Hz) pass filters were applied to the signal to filter out noise. The number of R-wave peaks on the ECG trace for each interval in each block was then measured and recorded. ECG recordings were visually inspected for artifacts and the number of R-wave peaks was recounted manually, when necessary.

Interoceptive accuracy was derived according to Schandry (1981) as the difference between reported and objective (obtained from the ECG) heart beats, divided by the objective number of heart beats. This index is thus inversely related to HBD accuracy. Resulting accuracy scores were averaged across the four intervals, yielding an average value for each block/musical stimulus per each participant. Average accuracies were grouped according to the subjective appreciation rating collected after each musical excerpt: one group was composed by the accuracies corresponding to the four most appreciated musical pieces, while the accuracies corresponding to the remaining four least appreciated pieces were assigned to the non-preferred group. A two-tailed t-test was performed to assess any significant difference in accuracies between the two conditions. Similar control analyses were conducted for the mean heart rate recorded during the listening phase.

Results of Experiment 2

Aesthetic appreciation of musical stimuli

As in Experiment 1, the aesthetic ratings associated with subjectively preferred (mean = 7.519; SD = 0.705) versus non-preferred (mean = 5.086; SD = 1.322) musical stimuli were significantly different (t19 = 12.99; p < 0.001; Cohen’s d = 2.906) as revealed by the two-tailed point-by-point t-test. The direction of presentation of musical stimuli also affected AJs, with significantly higher aesthetic ratings for forward music (mean = 7.065; SD = 0.894) as compared to backward music (mean = 5.527; SD = 1.448; t19 = 4.816; p < 0.001; Cohen’s d = 1.077). Similarly, mean appreciation was also significantly higher for minor key music (mean = 6.665; SD = 1.018) as compared to major key music (mean = 5.927; SD = 1.074; t19 = 4.189; p < 0.001; Cohen’s d = 0.937), suggesting that AJs were also influenced by key. The comparison between AJs corresponding to fast versus slow tempo music revealed no significant difference (t19 = 0.384; p = 0.705; Cohen’s d = 0.086). Coherently, direction and key were included as control factors in subsequent analyses (to verify whether HBD accuracies were modulated by these features).

HBD accuracy

To evaluate the effect of subjective preferences on HBD accuracies, as in Experiment 1, results were split into two conditions (Preferred and Non-preferred) according to subjective preferences: for each subject separately, HBD accuracy scores registered after listening to the four least appreciated musical stimuli were averaged together and corresponded to the non-preferred condition, while the remaining four most appreciated musical stimuli corresponded to the preferred condition. On average, as represented in Fig. 3, HBD accuracies were lower in the preferred condition (grand average (N = 20) = 0.327; SD = 0.204) compared to the non-preferred one (grand average (N = 20) = 0.298; SD = 0.208). The group-level two-tailed t-test comparing HBD accuracies revealed a significant difference between the preferred and non-preferred conditions (t19 = 2.87; p = 0.01; Cohen’s d = 0.141). To exclude the possibility that this result was influenced by heart rate variability across preference conditions, we measured the pulse for preferred and non-preferred music. Heart rate was comparable after listening to preferred (mean = 72.526; SD = 10.347) and non-preferred (mean = 70.773; SD = 8.865) music, with no significant difference between the two (t19 = 1.25; p = 0.227; Cohen’s d = 0.274).

Fig. 3
figure 3

Experiment 2 – results. The histogram represents average error scores (the index is inversely related to heart-beat detection (HBD) accuracies; see Methods) from the HBD task following preferred and non-preferred music listening. Dots represent single subjects’ error scores. Bars represent standard deviations

Additionally, as a further control, to exclude the potential effect of music stimuli basic features, we compared ACC collected after listening to forward versus backward music and after listening to major versus minor key music, independently from subjective aesthetic preferences. Different to the modulatory effect of aesthetic preferences, ACCs in the HBD task were significantly modulated neither by the direction of presentation (forward vs. backwards: t19 = 0.533; p = 0.596; Cohen’s d = 0.06), nor by the musical key (major vs. minor: t19 = 1.747; p = 0.085; Cohen’s d = 0.195).

Discussion

In accordance with our initial hypotheses, after listening to preferred music compared to non-preferred music: (1) MMN responses to frequency deviant sounds were larger; (2) the correlation between single-trial N400 amplitudes and Bayesian Surprise was stronger; (3) resting-state upper alpha power was increased; (4) HBD accuracy was lower.

The magnitude of MMN responses is considered to reflect the update of brain predictive representations triggered by surprising sensory inputs (i.e., perceptual learning; Ostwald et al., 2012; Rosch et al., 2019). Therefore, we demonstrated the presence of an aftereffect on perceptual learning triggered by subjective musical preferences. This is further confirmed by the enhanced correlation between N400 amplitudes and Bayesian Surprise, suggesting a greater attunement of the sensory system to changes occurring in the sensory environment. The N400, as the MMN, is thought to reflect the amount of new information that is conveyed by an incoming unpredicted stimulus (Kuperberg, 2016). Accordingly, the N400 brainwave is sensitive to violations of global regularities in repetitive auditory stimulation (Liaukovich et al., 2020). On the other hand, results of HBD accuracies indicate a diminished self-focused attention. Indeed, it is well known that self-focused attention modulates interoceptive accuracy in HBD tasks (Pollatos et al., 2016). As an example, HBD accuracies can be enhanced by viewing one’s reflection in the mirror (Ainley et al., 2012; Weisz et al., 1988), or one’s own photograph (Maister & Tsakiris, 2014), or words that are related to the self (Ainley & Tsakiris, 2013).

Altogether, in agreement with previous accounts of aesthetic appreciation (Sarasso, et al., 2020a), our findings suggest that aesthetic appreciation of music can enhance low-level perceptual learning via the redirection of processing resources from self-focused attention to environmental stimuli, minutes after the aesthetic experience is over. To our knowledge, our study provides the first empirical evidence of a long-lasting learning-oriented attentional modulation induced by aesthetic appreciation. These results confirm previous theoretical predictions suggesting that the physiological, cognitive and affective states characterising aesthetic appreciations do not only depend on the bottom-up processing of beautiful sensory features, but can also be induced by the top-down attitudinal (Stolnitz, 1978), task-related (Jacobsen et al., 2006) or contextual (Pelowski et al., 2017) expectations of the beholder (i.e., the expectancy of an aesthetically pleasurable sensory context). In our study, the beauty-induced effect on perceptual sensitivity might be mediated by a top-down attentional up-weighting of sensory information (Sarasso, et al., 2020a), here referred to as aesthetic attitude. Indeed, attention can be operationalized as the responsiveness (i.e., sensitivity) of sensory cortices to sensory surprise driven by the plastic modulation of the synaptic gain in primary sensory cortices (Brown & Friston, 2012). The correlation between brainwaves and Bayesian Surprise measured in Experiment 1 ideally captures such neural dynamics, since it measures the extent to which sensory cortices respond to the information conveyed by sensory stimuli (Ostwald et al., 2012; Sarasso et al., 2021b). Therefore, in metaphorical terms, aesthetic appreciation induced by music might act as a sort of “valve” that turns up the volume of the sensory environment at the expense of self-focused attention.

This interpretation fits well with recent theories suggesting that aesthetic appreciation might be considered as a hedonic feedback (Schoeller & Perlovsky, 2016; Van de Cruys & Wagemans, 2011) on information gains (Gottlieb et al., 2013; Kaplan & Oudeyer, 2004), able to assign value to information and to further direct the sampling of sensory inputs (Sarasso et al., 2020a). The aesthetic attentional attitude might represent the long-lasting effect of this greater attunement with perceptual experiences.

Furthermore, we propose that the modulations in EEG alpha powers might represent a possible neural signature underlying the aesthetic attitude. The alpha frequency spectral profile at rest is an effective marker of subjects’ state of arousal and attention. A change in the spectral profile toward the upper alpha band (10–13 Hz) was found to be associated with task performance and cognitive abilities (Angelakis et al., 2007; Klimesch, 1997; Richard Clark et al., 2004), reaction time and speed of information processing (Klimesch et al., 1996) in oddball (Gath et al., 1983) and detection tasks (Lockley et al., 2006) and the maintenance of auditory attention (Dockree et al., 2007). Coherently, the correlation between alpha spectral power shift toward the upper alpha frequency band (10–13 Hz) and cerebral blood flow – which indicates a high baseline energy metabolism associated with increased attention to external stimuli (Jann et al., 2010) – is associated with a cortico-thalamic network of brain areas controlling the modulation of attention and preparedness (Alper et al., 2006; Jann et al., 2010). Similarly, the activity in a cingulo-insular-thalamic network responsible for sustained alertness and attention to external stimuli is positively correlated with the global field power of oscillations in the upper alpha band (Sadaghiani et al., 2010). Conversely, upper alpha frequency at rest was shown to be negatively correlated with activations in most brain areas included in the default mode network (Bowman et al., 2017). In sum, upper alpha power is generally regarded as a neural signature of tonic alertness and sustained attention to external auditory stimuli (see Sadaghiani et al., 2010, for a review). Namely, upper alpha might enhance sensitivity by rhythmically clearing the flood of sensory information on a rapid time scale to reduce distraction and hence enhance detection of novel (i.e., surprising) sensory information (Sadaghiani et al., 2010). This fits well with the evidence of enhanced MMN responses to surprising stimuli after more appreciated music and with the hypothesized role of beauty-induced alertness in triggering cognitive enhancements following music listening (Thompson et al., 2001). Together with our results on HBD accuracies, increased upper alpha power might therefore indicate a redistribution of attentional resources from self-focused perception toward novel environmental auditory stimuli following preferred music listening.

To conclude, we suggest that musical aesthetic experiences might trigger the attentional up-weighting of external sensory stimuli at the expense of self-focused attention. These results, if confirmed by future studies, suggest that aesthetic experiences could be employed as an original factor for the study of the neurocognitive mechanisms associated with learning and memory retrieval (Lehmann & Seufert, 2018). Moreover, the role of aesthetic experiences in automatically guiding attentional processes toward learning and change has potentially interesting applications in numerous human activities, for example teaching (Mastandrea et al., 2019), neurorehabilitation and psychotherapy (Beebe, 2010; Roubal et al., 2017; Spagnuolo Lobb, 2018).