Introduction

Understanding how our brain perceives complex and continuous inputs from the real-world has been an attractive problem in cognitive neuroscience in the past few decades. Brain imaging technology provides an opportunity to address this issue. However, revealing brain states is generally more difficult during real-word experiences than those recorded brain activities during resting-state or simplified abstract stimuli like controlled and rapidly repeated stimuli (Hasson et al. 2010; Malcolm et al. 2016; Spiers and Maguire 2007). The question of how to disentangle stimuli-induced brain activity from spontaneous activity still remains open for scientific research due to the complexity of natural situations. In the present study, we attempt to formulate an approach with several analysis techniques including spatial ICA, source localization, acoustic feature extraction, and temporal correlation to examine the elicited oscillatory brain networks using ongoing electroencephalography (EEG) recorded during music listening.

Recently, the brain state under the naturalistic stimuli including music and movie has been investigated through functional magnetic resonance imaging (fMRI) (Alluri et al. 2012a, b; Alluri et al. 2013; Burunat et al. 2014, 2016a, b; Liu et al. 2017; Toiviainen et al. 2014), MEG (Koskinen et al. 2013; Lankinen et al. 2014) and EEG (Cong et al. 2013a, b; Daly et al. 2014, 2015; Schaefer et al. 2013; Sturm et al. 2015; Zhu et al. 2019, 2020). Alluri et al. explored the neural correlates of music feature processing as it occurs in a realistic or naturalistic environment, where eleven participants attentively listened to the whole piece of music (Alluri et al. 2012a, b; Burunat et al. 2016b, a). They successfully identified brain regions involved in processing of musical features in a naturalistic paradigm and found large-scale brain responses in cognitive, motor and limbic brain networks during continuous processing of low-level (timbral) and high-level (tonal and rhythmical) acoustic features using fMRI. Burunat et al. studied the replicability of Alluri’s findings using a similar methodological approach with a similar group of participants and found the processing mechanisms for low-level musical features were more reliable than high-level features (Burunat et al. 2016b, a). Unfortunately, all BOLD measurements by fMRI are to some degree confounded since they are indirect assessments of brain activity; they relate to blood flow and not to electrical processes and are therefore limited by poor temporal resolution due to the protracted hemodynamic response (Brookes et al. 2014; Li et al. 2019). After that, Cong et al. used an analogous to correlation analysis technique to investigate neural rhythms based on ongoing EEG data collected during listening to same music stimuli (Cong et al. 2013a, b; Wang et al. 2016). They found the theta and alpha oscillations along central and occipital area of scalp topology seems significantly associated with high-level (tonal and rhythmical) acoustic features processing. Also, many other studies tried to examine the neural underpinnings of music listening based on sensor-level EEG data (Jäncke et al. 2015, 2018; Markovic et al. 2017), in which different frequency bands were extracted using time–frequency analysis methods and further analyzed separately (e.g., event-related synchronizations and oscillatory power changes). Those studies showed the influence of different music listening styles on neurophysiological and psychological state interpreted by brain activation. Some sensor-level EEG studies examined the physiological correlates of continuous changes in subjective emotional states while listening to a complete music piece (Mikutta et al. 2012, 2014). Compared with sensor-level EEG analysis, recent studies adopted a mathematical approach (called sLORETA–ICA) combing source localization techniques with ICA to detect the independent functional networks during music listening (Jäncke and Alahmadi 2016; Rogenmoser et al. 2016). Although the aforementioned studies investigated the oscillatory activation or functional networks during music listening, the specific networks emerging from dynamic processing of musical features are not yet fully understood (Meyer et al. 2006). For example, there is evidence indicating that timbral feature processing was associated with increased activations in cognitive areas of the cerebellum, and sensory and default mode network cerebrocortical areas, but musical pulse, and tonality processing recruited cortical and subcortical cognitive, motor and emotion-related circuits (Alluri et al. 2012a, b; Meyer et al. 2006). Thus, we aimed to examine the electrophysiological underpinnings of these networks emerging from dynamic processing of musical features.

Independent component analysis (ICA) is a well-established data-driven approach increasingly used to factor resting-state fMRI data into temporally covarying, spatially independent sources or networks. By contrast, in the analysis of EEG/MEG data, ICA has mainly been applied for artifact rejection. However, spatial Fourier–ICA was proposed for data-driven characterization of oscillatory brain activity using EEG/MEG data. Compared with other ICA method applied to the context of music listening, spatial Fourier–ICA used in the current study can automatically extract narrowband oscillations from broadband data without having to manually specify a frequency band of interest. So far, spatial Fourier–ICA has already been proved to be fruitful in gaining insights into electrophysiological underpinnings of networks (Kauppi et al. 2013; Li et al. 2018; Ramkumar et al. 2014).

By applying spatial Fourier–ICA in combination with acoustical feature extraction, this study aims at probing the spatial–spectral patterns under music listening. Particularly, the current study attempts to provide an analysis framework for identifying the spatial, temporal, and spectral signatures of brain activation recruited during dynamic processing of music features. Similar to our previous music listening studies (Alluri et al. 2012a, b; Cong et al. 2013a, b), we extracted five musical features from the musical stimulus, and spatial, temporal, and spectral factors using spatial Fourier–ICA to EEG data. We then analyzed the correlation between temporal courses and the musical feature time series to identify frequency-specific brain networks emerging from dynamic processing of musical features. We expected spatial Fourier-ICA to reveal functionally oscillatory EEG source contributing to the musical feature processing.

Material and Methods

Data Acquisition

Participants

Fourteen right-handed and healthy adults aged 20 to 46 years old were recruited to take part in the current experiment after signing written informed consent. None of them was reported about hearing loss or history of neurological illnesses and none of them had professional musical education. However, many participants reported background in different music-related interests such as learning to play an instrument, producing music with a computer, singing. Table 1 demonstrates the age and the non-professional musical background of each participant. This study was approved by the local ethics committee.

Table 1 Age and musical background of each participant

EEG Data Acquisition

During the experiment, participants were informed to listen to the music with eyes open. A 512 s long musical piece of modern tango by Astor Piazzolla was used as the stimulus. Music was presented through audio headphones with about 30 dB of gradient noise attenuation. This music clip had appropriate duration for the experimental setting, because of its high range of variation in several musical features such as dynamics, timbre, tonality and rhythm (Alluri et al. 2012a, b). The EEG data were recorded according to the international 10–20 system with BioSemi electrode caps (64 electrodes in the cap and 5 external electrodes at the tip of the nose, left and right mastoids and around the right eye both vertically and horizontally). EEG were sampled at a rate of 2048 Hz and stored for further processing in off-line. The external electrode at the tip of the nose was used as the reference. EEG channels were re-referenced using a common average. The data preprocessing was carried out using EEGLAB (Delorme and Makeig 2004). The EEG data were visually inspected for artefacts and bad channels were interpolated using a spherical spline model. A notch filter at 50 Hz was applied to remove noise. High-pass and low-pass filter with 1 Hz and 30 Hz cutoff frequencies were then applied as our previous investigation of the frequency domain revealed that no useful information was found in higher frequencies (Cong et al. 2013a, b). Finally, the data were down-sampled to 256 Hz. In order to remove EOG (i.e., eye blinks), ICA was performed on EEG data of each participant. To additionally remove any DC-jumps occasionally present in the data, we differentiated each time series, applied a median filter to reject large discontinuities and reintegrated the signals back (Ramkumar et al. 2012).

Musical Features

Based on the length of the window used in the computational analyses, the musical features can be generally classified into two categories: long-term features and short-term features (Alluri et al. 2012a, b; Cong et al. 2013a, b). Five long-term musical features including Mode, Key Clarity, Fluctuation Centroid, Fluctuation Entropy and Pulse Clarity were examined here. They were extracted using a frame-by-frame analysis approach commonly used in the field of Music Information Retrieval (MIR). The duration of the frames was 3 s and the overlap between two adjacent frames 67% of the frame length. The chosen length of the frame was approximately consistent with the length of the auditory sensory memory (Alluri et al. 2012a, b). This analysis process yielded the time series of musical feature at a sampling frequency of 1 Hz, in accordance with the short-time Fourier transform (STFT) analysis of EEG data. Thus, both the musical features and temporal courses of EEG had 512 time points. All the features were extracted using the MIRtoolbox (Lartillot et al. 2008) in MATLAB environment.

For the completeness of the content, we briefly introduce the five features below. We extracted two tonal and three rhythmic features. For the tonal features, Mode represents the strength of major or minor mode. Key Clarity is defined as the measure of the tonal clarity. The rhythmic features included Fluctuation Centroid, Fluctuation Entropy, and Pulse Clarity. Fluctuation Centroid is the geometric mean of the fluctuation spectrum, representing the global repartition of rhythm periodicities within the range of 0–10 Hz (Alluri et al. 2012a, b). This feature indicates the average frequency of these periodicities. Fluctuation entropy is the Shannon entropy of the fluctuation spectrum, representing the global repartition of rhythm periodicities. Fluctuation entropy is a measure of the noisiness of the fluctuation spectrum (Alluri et al. 2012a, b; Cong et al. 2013a, b). Pulse Clarity, naturally, is an estimate of clarity of the pulse (Alluri et al. 2012a, b; Cong et al. 2013a, b).

Source Localization

For each subject, the brain’s cortical surface was reconstructed from an anatomical MRI template in Brainstorm (Tadel et al. 2011). Dipolar current sources were estimated at cortical-constrained discrete locations (source points) separated by 15 mm. Each hemisphere was modelled by a surface of approximately 2000 vertices, thus a mesh of approximately 4000 vertices modelled the cortical surface for each subject.

The measured EEG signals are generated by postsynaptic activity of ensembles of cortical pyramidal neurons of the cerebral cortex (Lei and Yao 2011). These cortical pyramidal neurons can be modelled as current dipoles located at cortical surface (Lin et al. 2006). The scalp potentials generated by each dipole depend on the characteristics of the various tissues of the head and are measured by the EEG scalp electrodes (Tian et al. 2011). With the geometry of the anatomy and the conductivity of the subject’s head, the time course of the dipole’s activity can be assessed by solving two consecutive problems: the forward problem and the inverse problem.

The forward problem is to model the contribution of each dipole to the signals of the EEG electrodes by solving Maxwell’s equations, which takes the geometry and conductivity of head tissues into account. In this study, a forward solution was calculated using the symmetric boundary element method (BEM) for each source point while a relative conductivity coefficient was assigned to each tissue (with default MNI MRI template).

To solve the inverse problem, minimum-norm estimate (Lin et al. 2006) was adapted with a loose orientation constraint favoring source currents perpendicular to the local cortical surface (no noise modelling). When computing the inverse operator (1) the source orientations were constrained to be normal to the cortical surface; (2) a depth weighting algorithm was used to compensate for any bias affecting the superficial sources calculation; and (3) a regularization parameter, \(\lambda^{2} = 0.1\) was used to minimize numerical instability, and to effectively obtain a spatially smoothed solution. Finally, an inverse operator G of dimensions \(N_{s} \times N_{c}\) (where \(N_{s}\) is the number of source points and \(N_{c}\) is the number of channels: \(N_{s} \gg N_{c}\)) was obtained to map the data from sensor-space to source-space. Here, we had \(N_{s} = 4000\) and \(N_{c} = 64\).

Spatial Fourier Independent Component Analysis

Spatial Fourier-ICA was recently proposed to characterize oscillatory EEG/MEG activity in cortical source space (Ramkumar et al. 2012, 2014). The main idea was to apply complex-valued ICA to short-time Fourier transforms of source-level EEG/MEG signals to reveal physiologically meaningful components. We briefly introduced the main steps of spatial Fourier-ICA for the completeness of the content. Figure 1 demonstrates the analysis pipeline based on spatial Fourier-ICA and acoustical feature extraction.

Fig. 1
figure 1

Analysis pipeline based on spatial Fourier–ICA and acoustical feature extraction. Temporal, spectral and spatial profiles of brain pattern were extracted using spatial Fourier–ICA. Musical feature time series were extracted using acoustical feature extraction. Then, correlation analysis between temporal course of components and musical time series were performed to retain music elicited components. The spatial maps of retained components were clustered into several patterns

Time–Frequency Data in Cortical Source Space

Preprocessed EEG data \({{\varvec{Y}}}_{0}\) (\({N}_{c}\) channels × \({N}_{p}\) sampling points) were transformed by STFT to obtain complex-valued time–frequency representation (TFR) data \({{\varvec{Y}}}_{1}\) (\({N}_{c}\), \({N}_{f}\), \({N}_{t}\)). To obtain TFR data in source space, three-way sensor-space TFR data \({{\varvec{Y}}}_{1}\) was reorganized as two-way matrix \({\widehat{{\varvec{Y}}}}_{1}\) (\({N}_{c}\), \({N}_{t}\times {N}_{f}\)). The source-space TFR data \({\widehat{{\varvec{Y}}}}_{2}\) was then obtained by left-multiplying the linear inverse operator G (\({N}_{s}, {N}_{c}\)) which was computed using the minimum-norm estimate inverse solution sensor-space data \({\widehat{{\varvec{Y}}}}_{1}\),

$$\hat{\varvec{Y}}_{2} = {\textbf{\textit{G}}}\varvec{\hat{Y}}_{1}$$
(1)

Two-way data \({\widehat{{\varvec{Y}}}}_{2}\) (\({N}_{s}\), \({N}_{t}\times {N}_{f}\)) can be rearranged as a three-way tensor format \({{\varvec{Y}}}_{2}\) (\({N}_{s}\), \({N}_{t}\), \({N}_{f}\)). For application of spatial Fourier ICA, we then rearranged the three-way tensor \({{\varvec{Y}}}_{2}\) as a two-way matrix \({X}_{0}\) (\({N}_{t}\),\({N}_{f}\times {N}_{s}\)). Thus, each row of \({{\varvec{X}}}_{0}\) was comprised of the complex-valued short-time Fourier coefficients from each source point for specific time points and each column represented a time point corresponding to a short-time window. In this study, the Hamming-widow with 3-s-length and 2-s-overlap of the adjacent windows was selected, resulting in a sampling rate of 1 Hz in time dimension. This sampling rate was in consistent with musical feature time series (see Musical features). The duration of EEG was 512 s, so we had \({N}_{t}=512\) time points. We adopted a 512-point FFT to calculate the STFT resulting in 256 frequency bins (Range of frequency: 1–128 Hz) for each window. We selected the range of frequency bins covering 1–30 Hz (\({N}_{f}=60\)) for further analysis.

Application of Complex-Valued ICA on Reshaped Data

For data \({{\varvec{X}}}_{0}\), we applied complex-valued ICA (A. Hyvarinen et al. 2010) and treated each row as an observed signal assumed to be a linear mixture of unknown spatial spectral pattern. Since the original data (\({{\varvec{X}}}_{0}\)) dimension was relatively high for the complex ICA calculation, data dimension reduction was required in the preprocessing step of ICA. A common approach of data dimension reduction is principal component analysis (PCA) which is linear. Here we extended PCA to the complex domain by considering complex-valued eigenvalue decomposition (Li et al. 2011). The choice of model order was based on previous studies (Abou-Elseoud et al. 2010; Smith et al. 2009), which suggested the number of a dimension slightly larger than the expected number of underlying sources. In this study, we tried different model orders and found that 20 was a reasonable order, which preserved much of the information in the data and reduced the dimensionality of the results. Then we extracted 20 independent components using complex-valued FastICA algorithm which applied ICA to STFT of EEG data in order to find more interesting sources than with time-domain ICA (A. Hyvarinen et al., 2010). This method is especially useful for finding sources of rhythmic activity. After complex-valued ICA, a mixing matrix \(\widehat{{\varvec{A}}}\) (\({N}_{t}\), \({N}_{ic}=20\) is number of components) and estimated source matrix \(\widehat{S}\) were obtained. Each column of \(\widehat{A}\) represented the temporal course for each independent component (IC). The ICs in the rows of \(\widehat{S}\) (\({N}_{ic}\), \({N}_{f}\times {N}_{s}\)) represented spatial-spectral patterns, which can be decomposed into the spatial power map and power spectra.

Spatial Map, Spectrum, and Temporal Course of ICs

By reshaping each row of \(\widehat{{\varvec{S}}}\) for each IC, we obtained a matrix (\({N}_{f},{N}_{s}\)), which meant there was a Fourier coefficient spectrum for each cortical source point. To obtain and visualize the spatial map of the IC, we computed the average of the squared magnitude of the complex Fourier coefficients across those frequency bins. Since the distribution of mean squared Fourier amplitude over the whole brain is highly non-Gaussian, we did not apply conventional z-score-based thresholding; instead, we applied a threshold to display for each component map only source points with the top 5% squared Fourier amplitude (Ramkumar et al. 2012). Then we analyzed the correlation coefficient of the spatial maps in those frequency bins and those spatial maps were similar. To visualize and obtain the spectrum of each IC, we calculated the mean of the Fourier power spectrum across those source points exceeding the 95th percentile (Ramkumar et al. 2012). Finally, we extracted the absolute values of the column of mixing matrix \(\widehat{{\varvec{A}}}\) corresponding to the row of the estimated IC as the time course, which reflected fluctuations of the Fourier amplitude envelope for the specific frequency and spatial profile.

Stability of ICA Decomposition

To examine the stability of ICA, we applied 100 times ICA decomposition for each subject with different initial conditions. For the real-valued case, ICASSO toolbox (Himberg et al. 2004) has been used to evaluate stability among multiple estimates of the fastICA algorithm (Hyvarinen 1999). All the components estimated from all runs were collected and clustered based on the absolute value of the correlation coefficients among the squared source estimates of ICASSO. Finally, the stability index \(\mathrm{I}\mathrm{q}\) was computed for each component. \(\mathrm{I}\mathrm{q}\) reflects the isolation and compactness of a cluster (Himberg et al. 2004). \(\mathrm{I}\mathrm{q}\) is calculated as follows:

$${\text{Iq}} = \overline{S}\left( i \right)_{int} - \overline{S}\left( i \right)_{ext} , {\text{i}} = 1, \ldots ,{\text{J}}$$
(2)

where \({\stackrel{-}{S}\left(i\right)}_{int}\) denotes the average intra-cluster similarity; \({\stackrel{-}{S}\left(i\right)}_{ext}\) indicates average inter-cluster similarity and J is the number of clusters. The \(\mathrm{I}\mathrm{q}\) ranges from ‘0′ to ‘1′. When \(\mathrm{I}\mathrm{q}\) approaches ‘1′, it means that the corresponding component is extracted in almost every ICA decomposition application. This indicates a high stability of the ICA decomposition for that component. Otherwise, it means the ICA decomposition is not stable. Correspondingly, if all the clusters are isolated with each other, ICA decomposition should be stable. In general, there is no established criterion upon which to base a threshold for cluster quality. Given the preliminary nature of this investigation, we consider the decomposition is stable if the \(\mathrm{I}\mathrm{q}\) is greater than 0.7.

In this study, the ICASSO toolbox was modified to be available for the complex-valued case as well. The correlation matrix was used as the similarity measure for clustering in real-valued ICASSO. For the complex case, since the ICs were complex-valued, we just considered the correlation matrix among the magnitude ICs to perform the clustering (Li et al. 2011). Then, we took the \(\mathrm{I}\mathrm{q}\) as the criterion to examine stability of the ICA estimate.

Testing for Stimulus-Related Networks

After ICA decomposition, we obtained 20 × 14 = 280 ICs (14 subjects, 20 components for each subject). Now the challenge is to determine which one of these represents the genuine brain responses. In all ICA based methods, it is a general question that which independent components need to be retained or which component just reflects noise. Here, we examine which components were modulated significantly by the musical features. We computed the correlation (Pearson’s correlation coefficient) between the time courses of musical features and the time courses of those ICs (the dimensionality of both them is 512 points) in order to select stimulus-related activations. We used the Monte Carlo method and permutation tests presented in our previous research (Alluri et al. 2012a, b; Cong et al. 2013a, b) to calculate the threshold of significant correlation coefficient. In this method, a Monte Carlo simulation of the approach was performed to determine the threshold for multiple comparisons. We kept those ICs whose time courses were significantly correlated (p < 0.05) with the time courses of musical features for further analysis.

Cluster Analysis

The selected ICs had been represented by spatial map, spectrum, and temporal course. Since spatial ICA was carried out on individual level EEG data, we needed to examine the inter-subject consistency among participants. In this study, we focused on the spatial pattern emerging in the process of freely listening to music, so a group level data analysis was performed by clustering spatial maps of the selected ICs to evaluate the consistency among the participants. For reliable clustering, we applied a conventional z-score-based normalization to each spatial map. All spatial maps of the screened components significantly correlated with musical features were clustered into M clusters to find common spatial patterns among most of participants. Here for simplicity, a conventional k-means cluster algorithm was used with the Kaufman Approach (KA) for initializing the algorithm. We used the minimum description length (MDL) to determine the number of clusters M. Afterwards we countered the number of subjects involved in ICs in each cluster. If the number of subjects in one cluster is less than half of the all subjects, this cluster would be discarded for the reason that such a cluster does not reveal information shared among enough participants. For the retained clusters, the spatial-spectral-temporal information was obtained, which was represented by the centroid of the cluster, the spectra of ICs and the numbers of subjects whose temporal courses were involved in this cluster.

Results

Musical Features

Five musical features were extracted by MIRtoolbox (Lartillot and Toiviainen 2007) with 3 s time-widow and 2 s overlap, resulting in 1 Hz sampling rate of temporal course. They are Fluctuation Centroid, Fluctuation Entropy, Key Clarity, Mode and Pulse Clarity. The time series of these features had a length of 512 samples, which matched the length of the time course of the EEG components. Figure 2 shows their temporal courses.

Fig. 2
figure 2

Time course of five musical features

Stability of ICA Decomposition

We extracted 20 ICs using modified ICASSO with 100 runs for each subjects’ data, then we obtained the stability index \(\mathrm{I}\mathrm{q}\). Figure 3 shows the magnitude of \(\mathrm{I}\mathrm{q}\mathrm{s}\) for each participant, greater than 0.7 for most ICs. The 20 ICs were separated with each other for every participant from the view of clustering. Thus, the ICA estimate was stable and the results of ICA decomposition in this study were satisfactory for each participant data to further analysis.

Fig. 3
figure 3

Iq of each component extracted. Different curves represent different participants

Interesting Clusters: Frequency-Specific Networks

After 85 ICs whose spatial maps were significantly correlated with musical features were selected, we set the number of clusters as five by performing MDL to estimate the optimal model order. Then the spatial maps of ICs were clustered into five clusters. Three clusters representing frequency-specific networks were chosen since the number of subjects in the cluster is more than half of the all subjects. Figure 4 demonstrates one of these clusters including the centroid of all spatial maps (Fig. 4a), the distribution of number of subjects across musical features (Fig. 4b) and the spectrum of the ICs in this cluster (Fig. 4c). Then we computed the correlation coefficients among spatial maps in each cluster to evaluate the performance of clustering. Figure 5 shows the inter-cluster similarity. We computed the mean of the correlation coefficients in each cluster and the corresponding standard deviation (SD). For cluster#1, the mean is 0.642 and the corresponding standard deviation (SD) is 0.1238. For cluster#2, the mean is 0.7125 and SD is 0.0572. For cluster#3, the mean is 0.8084 and SD is 0.0747. This indicates that the spatial patterns are similar across the participants. In the Table 2, we listed the participants whose EEG data were correlated with every musical feature in each cluster.

Fig. 4
figure 4

Cluster#1: Beta-specific networks. a The centroid of the spatial maps in the cluster, which reflected a spatial pattern across most of participants. The bilateral superior temporal gyrus (STG) was activated (from left to right: left hemisphere, top view, right hemisphere). b The number of ICs and subjects involved in the cluster were distributed across musical features. c The spectra of each components were located in delta or beta band. Different curves represent different ICs

Fig. 5
figure 5

Correlation coefficients matrix among spatial maps of the ICs in each cluster. The mean correlation coefficient in cluster#1 is 0.642 and the corresponding standard deviation (SD) is 0.1238. For cluster#2 the mean is 0.7125 and SD is 0.0572. For cluster#3, the mean is 0.8084 and SD is 0.0747

Table 2 Participants involved in each cluster across musical features among 14 subjects (from 1 to 14)

Beta-Specific Network

Figure 4 shows results of the Beta-specific brain networks engaged in processing music features. The spatial map displays that musical features were associated with increased activation in the bilateral superior temporal gyrus (STG). The spectrum of ICs in this cluster illustrates the beta rhythm (focusing on 20 Hz) was involved in generating this network. Thus, relatively large-scale brain region generated by beta rhythm was activated in the bilateral STG and the magnitude of activation in right hemisphere was a little stronger than left hemisphere. This Beta-specific network was found in seven subjects during music free-listening (see the first row of Table 2). Fluctuation Centroid were associated with this brain networks among subjects 2, 4, 5, and 12. The brain networks of subjects 1, 2, 3 and 13 were correlated with key feature. For fluctuation entropy, pulse clarity and mode, there was one subject involved in this cluster respectively. In addition, the number of ICs correlated with the musical features was more than the number of participants since there were 20 ICs for each subject.

Alpha-Specific Network

Figure 6 displays relatively large brain activity in the bilateral occipital lobe according to the spatial map. As can be seen, the oscillations of this pattern were dominated by alpha rhythm (focusing on 10 Hz) with few ICs located in Delta band. There were eight participants appearing alpha-specific occipital networks under free-listening to music. The second row of the Table 2 shows the subjects involved in the networks linked with each musical feature.

Fig. 6
figure 6

Cluster#2: Alpha–Delta-specific networks. a The centroid of the spatial maps in the cluster, which reflected a spatial pattern across most of participants. As can be seen that the bilateral occipital cortex was activated (from left to right: left hemisphere, top view, right hemisphere). b The number of ICs and subjects involved in the cluster were distributed across musical features. c The spectra of each components were located in delta or beta band. Different curves represent different ICs

Delta-Beta-Specific Network

Figure 7a illustrates increased activity linked with musical features in bilateral prefrontal gyrus (PFG). The spectrum (Fig. 7c) shows both beta and delta oscillations recruited these areas across participants. The delta-beta-specific networks were found in eight subjects. Mode was associated with this brain networks among subjects 2, 3, 4, 6, 7 and 9. The networks of subjects 4, 5, 7, 9 and 11 were correlated with Fluctuation Centroid (see the third row of Table 2).

Fig. 7
figure 7

Cluster#3: Alpha–Beta-specific networks. a The centroid of the spatial maps in the cluster, which reflected a spatial pattern across most of participants. As can be seen that the bilateral prefrontal cortex was activated (from left to right: left hemisphere, top view, right hemisphere). b The number of ICs and subjects involved in the cluster was distributed across musical features. c The spectra of each components were located in delta or beta band. Different curves represent different ICs

Discussion

In this study, we investigated spatial spectral profiles of brain networks during music free-listening. To this end, we proposed a novel method combing spatial ICA, source localization and music information retrieval. EEG data were recorded when participants listened to a piece of music freely. Firstly, we applied STFT to preprocessed EEG data. After this, an inverse operator was obtained using source localization and the sensor-space data was mapped to source-space data. Then complex-valued ICA was performed to extract spatial-spectral patterns. The stability of ICA estimate was evaluated using a complex-value ICASSO. Meanwhile, the temporal evolutions of five long-term musical features were extracted by the commonly used MIRtoolbox. Following this, the spatial-spectral ICs related to music stimuli were chosen by correlating their temporal course with the temporal course of musical features. To examine the inter-subject consistency, a cluster analysis was applied to spatial patterns of the retained ICs. Overall, our results highlighted the frequency-dependent brain networks during freely listening to music. The results are consistent with previous findings published in other studies (Alluri et al. 2012a, b; Cong et al. 2013a, b; Janata et al. 2002).

It was found that beta-specific brain networks in the bilateral STG emerged from dynamic processing of musical features (see Fig. 4). The bilateral STG were mostly activated during music listening, which was involved in long-term musical features processing. It was interesting to note that the beta oscillations were enhanced in this bilateral spatial profile (see Fig. 4c). This spatial-spectral pattern appeared more related with Fluctuation Centroid and Key processing than Fluctuation Entropy, Mode and Pulse Clarity (see Fig. 4b). The same areas were found in previous studies where timbre-related features were correlated with activations in large areas of the temporal lobe using fMRI (Alluri et al. 2012a, b). Besides, early MEG studies demonstrated that cortical rhythm activity in beta band activity (15–30 Hz) was tightly coupled to behavioral performance in musical listening and associated with predicting the upcoming note events (Doelling and Poeppel 2015). Since beta bands have been associated with motor and rhythmic processes, listeners may voluntarily engage in mental activities related to motor during listening to segments engaged in dancing (Meyer et al. 2006; Poikonen et al. 2018b). For the participants who like dancing, music is comprehensive and collaborative. Music forms a setting in which dancers produce movements that are coherent with (or intentionally in contrast to) the prevailing sound in terms of rhythm, sentiment, and movement style (Poikonen et al. 2018a). When freely listening, a participant might be more focused on the gist of the music than to the sequence of an individual instrument, melody contour, or rhythmic pattern. Importantly, in the current study, no participant was familiar with the presented music stimuli. Thus, the beta-specific brain networks emerging in the bilateral STG could reflect the activation of higher-level brain processes (Pearce et al. 2010; Poikonen et al. 2018b).

We also observed alpha oscillatory visual networks (see Fig. 6), which is in line with our previous study (Cong et al. 2013a, b). Alpha oscillations play an important role in basic cognitive process, which is linked to suppression and selection of attention (Klimesch 2012). Event-related brain activation in alpha band has been found in studies with sensory or motor tasks and with attention and working memory tasks. For example, alpha event-related synchronization was showed over the leg area of the motor cortex while event-related desynchronization in alpha was observed over the hand area when participants performed hand-movement tasks. This compensatory distribution of alpha activity demonstrates that alpha oscillation in task-irrelevant regions is associated with cortical disengagement (Pfurtscheller 2003). That could be the reason that the alpha-specific power over visual cortices was larger when attention was focused on the auditory stimuli.

A delta-beta oscillatory network in prefrontal cortex were also observed during listening to music (see Fig. 7). Helfrich et al. argued that the prefrontal cortex provides the structural basis for numerous higher cognitive functions and oscillatory dynamics of prefrontal cortex provide a functional basis for flexible cognitive control of goal-directed behavior (Helfrich and Knight 2016). Besides, prefrontal cortex has the function of entrainment as a mechanism of top-down control (Helfrich and Knight 2016). Our findings provided the evidence that the higher cognitive function with specific rhythms were involved in continuous and naturalistic music. Janata et al. identified an area in the rostromedial prefrontal cortex as a possible brain units for tonal processing (Janata et al. 2002). In addition, some studies demonstrated that oscillations in the delta and beta bands were instrumental in predicting the occurrence of auditory targets (Arnal et al. 2015; Doelling and Poeppel 2015). Music is shown to be a powerful stimulus modulating emotional arousal, an increase of posterior alpha, central delta, and beta rhythm was observed during high arousal (Mikutta et al. 2012; Poikonen et al. 2016a, b; Poikonen et al. 2016a, b). That may explain why the delta–beta oscillations in this study appears in prefrontal cortex (Fig. 7).

From the methodology consideration, most of these studies investigated one pattern of the spatial spectral profile and did not examined the interplay between brain networks and spectral mode. In contrast, we studied the interactions between brain region and cortical oscillations and found the brain networks during music listening were frequency-dependent. In terms of our proposed approach for analysis of frequency-specific networks during naturalistic music listening, we can credibly find the spatial-spectral patterns elicited by musical stimulus. There are some related approaches using spatial ICA in a variety of specific techniques to investigate the RSNs under MEG data. Nugent et al. proposed a method named as MultibandICA to derive frequency-specific spatial profile in RSNs. However, six frequency bands (delta, theta, alpha, beta, gamma, high gamma) firstly need to be extracted from the MEG data and were concatenated in certain dimensionality; ICA was then performed to concatenated data (Nugent et al. 2017). Similar methods were proposed in (Sockeel et al. 2016). Here distinctly, the proposed approach is completely data-driven and does not require pre-define the frequency band. Another important asset of our study is that the clustering was applied to the spatial maps to examine the inter-subject consistency in proposed method. The correlation coefficients were then computed in each cluster. We observed that the individual spatial-spectral profiles in every retained cluster were similar but the corresponding time courses were different. This is different from analysis of event-related potential (ERP) where temporal ICA components sharing identical spatial profiles might be similar. The differences might be resulted from different responses of participants under real-word experiences. In the future, we will attempt to develop group spatial ICA to analyze group-level data where the individual data are concatenated in time dimension.

Conclusion

In this study, we introduced a novel framework with several techniques including Fourier ICA, source estimation, acoustic feature extraction, and clustering for exploiting the spectral–spatial structure of brain during naturalistic stimulus. A complex-value ICA applied to source-space time–frequency representation of EEG data. Following this, a modified ICASSO was performed to evaluate the stability of ICA estimate and a cluster analysis was applied to examine the inter-subject consistency. The identified networks involved in music perception were in line with those previous studies. Further, we found that brain networks under music listening were frequency-specific and three frequency-dependent networks associated with processing musical features were observed.