Deep-MEG: spatiotemporal CNN features and multiband ensemble classification for predicting the early signs of Alzheimer’s disease with magnetoencephalography

In this paper, we present the novel Deep-MEG approach in which image-based representations of magnetoencephalography (MEG) data are combined with ensemble classifiers based on deep convolutional neural networks. For the scope of predicting the early signs of Alzheimer’s disease (AD), functional connectivity (FC) measures between the brain bio-magnetic signals originated from spatially separated brain regions are used as MEG data representations for the analysis. After stacking the FC indicators relative to different frequency bands into multiple images, a deep transfer learning model is used to extract different sets of deep features and to derive improved classification ensembles. The proposed Deep-MEG architectures were tested on a set of resting-state MEG recordings and their corresponding magnetic resonance imaging scans, from a longitudinal study involving 87 subjects. Accuracy values of 89% and 87% were obtained, respectively, for the early prediction of AD conversion in a sample of 54 mild cognitive impairment subjects and in a sample of 87 subjects, including 33 healthy controls. These results indicate that the proposed Deep-MEG approach is a powerful tool for detecting early alterations in the spectral–temporal connectivity profiles and in their spatial relationships.


Introduction
Deep convolutional neural networks (CNNs) have become very popular in recent years thanks to their ability to decode images [1], video streams [2], and other biomedical signals [3], including 2D and 3D neuroimaging data [4]. The use of CNNs, in fact, offers the possibility to recognize the presence of patterns that other techniques are not able to reveal. In clinical scenarios, when data availability is limited, transfer learning (TL) can be applied to transfer the knowledge, previously learned by the CNN, for solving new problems faster or with different learning solutions [5,6]. By combining the merit of multiple classifiers, ensemble learning can be an additional powerful instrument to improve the performance of predictive models [7][8][9][10].
Thanks to these advantages, the use of CNNs is becoming predominant for the analysis of many types of biomedical images, in particular for decoding encephalographic signals, in which it is important to recognize the various neural activation patterns in relation to diseases. Many studies [11][12][13][14] have used deep learning to 1D electroencephalography (EEG) signals to reach this goal, looking for solutions and architectures of different neural networks in order to extract the most discriminating features.
In recent years, the analysis of brain activity has been extended to an innovative technique, known as magnetoencephalography (MEG) [15][16][17][18]. MEG is a powerful non-invasive diagnostic tool that possesses the unique advantage of providing a direct measure of the neural activity of the pyramidal neurons in the brain, ensuring high spatial and temporal resolutions (of order of mm and ms, respectively), and a fast preparation time [18]. A set of MEG recordings, together with the positions of the corresponding sources, encompass complex high-dimensional information on the brain network functioning, which can be difficult to uncover via standard methodologies, as for the case of the Alzheimer's disease (AD).
AD is a neurodegenerative disorder and the most common form of dementia worldwide [19]. AD may start decades before the symptoms occur and then gradually evolve, with progressive alteration of cognitive and functional abilities. A precursory condition to AD, named mild cognitive impairment (MCI), is known to indicate a deviation from normal aging and an increased risk of developing dementia in future [20]. MCI, which can be caused by disorders other than AD (such as frontotemporal dementia), can remain a stable condition over time (stable MCI or sMCI), or finally progress to AD (progressive MCI or pMCI). The full-blown AD is a disabling condition resulting from the synaptic disruption of local and large-scale networks of the brain for which there is no cure. Finding new methods to detect pre-symptomatic or prodromal phases, i.e., pMCI, and predict earlier their progression toward AD would facilitate the timely implementation of therapeutic strategies [26].
To date, the most effective approaches for early AD diagnosis involve the use of invasive techniques such as the cerebrospinal fluid analysis [21] or the positron emission tomography (PET) [22,23], which require performing a lumbar puncture or the use of radioactive tracers, respectively. Non-invasive diagnostic tools are being explored as alternatives [24][25][26][27][28], with MEG representing a promising technique to be taken into account [15][16][17][18]. In the following section, we report an overview of the state-of-theart methods relative to the analysis of MEG data, with particular reference to the early AD diagnosis.

Literature review
Research on deep learning-based analysis of MEG signals is in progress. Deep learning architectures have been applied for artifacts removal [29] or to decode the brain responses to a set of visual, auditory and somatosensory stimuli [30]. In particular, Croce and colleagues [29] derived spectra and 2D topographic representations of the independent components (IC) of EEG and MEG recordings. The set of ICs was used as input to the convolutional layers of a CNN for the automatic identification of artifacts. The obtained accuracy values outperformed the stateof-the-art feature-based methods for artifact removal. Zubrarev et al. [30] used a mixture of k-latent sources based on a linear autoregressive model to represent the MEG time courses. The authors designed two variants of CNNs, 1D and 2D, to process the temporal dynamics of the obtained signals and applied them to decode the brain responses to a set of visual, auditory, and somatosensory stimuli. Recently, Aoe and colleagues [31] proposed a deep neural network, Mnet, which is based on the EnvNet-v2 [32], an architecture originally designed to classify environmental sounds. By directly analyzing 160 channels of raw MEG signal and the relative powers of six frequency bands, the proposed approach achieved high level of accuracy in the computer-aided diagnosis of spinal cord injury and epilepsy.
Recent studies addressed the discrimination of mild forms of cognitive impairment from healthy subjects. A shallow neural network was used by Amezquita-Sanchez et al. [33] to distinguish 18 MCI patients and 19 control subjects. MEG frequency sub-bands were characterized via ensemble empirical mode decomposition and permutation entropy measures and then classified via an enhanced probabilistic neural network (EPNN). In the work by Lopez-Martin et al. [34], CNN models were used to decode a large set of randomized features, i.e., mean, median, standard deviation, mean absolute deviation, and range, relative to the mutual information between paired MEG time series and rearranged as 2D matrices. Their method outperformed the classic machine learning approaches in the classification of patients as MCI or healthy subjects.
A very promising tool for the neuroimaging research community is represented by the MEG-based measures of functional connectivity (FC) [15][16][17][35][36][37][38]. FC analysis can be performed in relation to different frequency bands, and it is capable of providing a huge amount of information on the relationship between the brain regions and on their organization into large-scale networks. In fact, a reduced [36,37] or increased [15][16][17] synchronization between the activities of key brain regions has been revealed in AD patients by means of FC, posing MEG-based FC as a promising biomarker to evaluate AD progression. The biomagnetic activity of AD patients, from a spectral perspective, is generally associated with changes in the h, b, and a bands [39]. Similar patterns have been also observed in the more severe forms of MCI [39], suggesting that MEG-based spectral characteristics are fundamental indices for AD diagnosis. The b band oscillations, in particular, have been proposed as quantitative indicators to predict the progression to AD at the MCI stage [40][41][42][43], while higher synchronization in low-frequency bands, e.g., the h band, has been observed in MCI groups as compared to the control healthy groups [39,44].
Recently, based on the analysis of the functional strength between pre-and post-conversion MEG scans, Pusil and colleagues [38] succeeded in automatically detecting all the MCI subjects progressing to AD. Their analysis was based on the multivariate connectivity phase estimation (PCE) in five MEG frequency bands using both pre-and post-conversion MEG data. However, the temporal, power spectrum, and the topological properties of MEG data seem to drive complementary information [17] that can be further characterized and investigated to detect AD earlier and before the symptoms occur. In fact, the diagnostic prediction of the conversion of MCI to AD, using MEG data of the asymptomatic at-risk stage, i.e., not showing clinical evidence of AD, is still an open problem [15-17, 36, 37]. To our knowledge, neither deep CNNs nor ensemble architectures have been deployed to recognize the FC alterations due to the early phases of AD. We believe that deep learning can help decoding the more subtle changes in the brain network activity occurring during the early phases of AD progression, increasing the predictive capability of the automated approaches to the analysis of MEG data [45,46].

The proposed method
In this work, we propose to exploit transfer learning via pre-trained CNNs to decode FC maps, whose explicitness to human eye is not trivial, with the aim to reveal the topography of the neural activations based on MEG/MRI data. Indeed, the key to uncover early signs of AD may be hidden in the multidimensional nature of FC maps which encode not only information on the electrical coupling between spatially distant neuronal populations but also on the way such neuronal activity is spatially distributed and coordinated [15][16][17].
As a continuation of our preliminary study [47], in the present work we take a step further in decoding the electrophysiological anomalies occurring before the conversion to AD (early diagnosis) by proposing a deep learning approach, named Deep-MEG, which exploits the new paradigm image-based coding of MEG/MRI data, and different ensemble classification architectures. The proposed methods include: (1) The extraction of temporal, multi-frequency, and spatial data from MEG recordings and MRI scan in the form of FC maps; (2) The novel coding of the FC maps into deep features by using transfer learning; (3) The implementation of an ensemble learning architecture to cooperatively combine the decision of multiple predictive modules on the basis of different FC mapping.
Deep-MEG differs from the other existing approaches in the way the FC patterns of the brain network are decoded by 2D CNNs. Being the FC maps used in our approach topologically organized based on the subject-specific MEG/MRI source reconstruction, Deep-MEG derives information not only on the individual hypo-or hypersynchronization responses, but also on the 2D patterns related to the spatial arrangement of FC values within the maps, corresponding to an information route on the brain connectome up to now unexplored [48]. Pre-trained deep CNNs provide us the means to decode those FC patterns via transfer learning, i.e., without the need for large datasets to set the network parameters at the training level. Ensemble classifiers, in addition, led us to extend our analysis to selected frequency bands, stressing the role of definite spectral profiles activities that are associated with different levels of AD progression [39][40][41][42][43][44].
To evaluate the predictive performance of the proposed system, we performed quantitative experiments on data from a longitudinal study from the Hospital Universitario San Carlos (Madrid, Spain) [15], involving 54 MCI patients (of which 27 pMCI patients who progressed toward AD during a 3-year follow-up) and 33 healthy controls (HC).
The rest of the paper is organized as follows. In Sect. 2, we describe the characteristics of the subjects involved in the study and the acquisition process of MEG recordings and MRI scans. In Sect. 3, we describe the methods used in our study. The preprocessing of spatial and temporal data with the band-filtering operations, the different variants of FC indicators and their mapping into FC images, the derivation of deep spatiotemporal features based on Alex-Net [49], and the ensemble learning architectures for the two-class and three-class scenarios [50]. Experimental results obtained with the proposed pipeline for investigation and comparison with other existing approaches are reported in Sect. 4. Finally, a discussion of the results is included in Sect. 5.

The case study
A total of 54 MCI patients were recruited from the Hospital Universitario San Carlos (Madrid, Spain) [15], and 33 healthy controls were enrolled in this study after signing an informed consent. In Table 1, we present the demographic characteristics of the participants. All of them were righthanded [51]. The study was approved by the Hospital Universitario San Carlos Ethics Committee (Madrid). A diagnosis of MCI was made on 54 patients according to the National Institute on Aging-Alzheimer Association (NIA-AA) clinical criteria [52]. Additional 33 elderly healthy subjects were included in the present work as control (HC). Besides meeting the clinical criteria, MCI participants had signs of neuronal injury (hippocampal volume measured by MRI). Thus, they might be considered as ''MCI due to AD'' with an intermediate likelihood [52]. The MCI patients were cognitively and clinically followed up for approximately 3 years (every six months) and were split into two groups, i.e., sMCI and pMCI, according to their clinical outcome. The sMCI group (n = 27) was comprised of those participants that still fulfilled the diagnosis criteria of MCI at the end of follow-up. The pMCI group (n = 27) was composed of those subjects that met the criteria for probable AD at the end of the follow-up [53]. None of the participants had a history of psychiatric or neurological disorders (other than MCI or AD). General inclusion criteria were: age between 65 and 80, a modified Hachinski score [54] B 4, a short-form Geriatric Depression Scale score B 5, and T1 MRI within 12 months and 2 weeks before the two MEG recordings without indication of infection, infarction, or focal lesions (rated by two independent experienced radiologists) [55]. Patients were off those medications that could affect MEG activity, such as cholinesterase inhibitors, 48 h before recordings.

MEG recordings
Weighted MEG recordings were acquired with a 306-channel Vectorview system (Elekta Neuromag) at the Center for Biomedical Technology (Madrid, Spain). MEG recordings were collected at the same time of the day in two different periods: (1) pre-conversion stage (54 MCIs and 33 HCs), at baseline (first MEG); (2) post-conversion stage (27 sMCIs and 27 pMCIs), 24 ± 6 months after the first MEG (second MEG).
In both the sets of MEG recordings, participants were in an awake, resting state with their eyes closed. For each subject, 5-min task-free data were recorded at a sampling frequency of 1000 Hz. In the present study, the baseline pre-conversion data are used to test the predictive power of the system with reference to the early signs of AD in pMCI subjects, i.e., when the dementia is still not present. Postconversion data in which signs of AD are clinically evident in pMCI subjects but not in sMCI subjects are used for comparative analysis.

Methods
A schematic representation of the pipeline of the proposed platform is given in Fig. 1. The main characteristics of the methods are as follows: i. MEG recordings and MRI scan are processed to derive temporal, multi-frequency, and spatial data.
The system receives as input a set of MEG recordings and corresponding MRI scan. Sensor-space MEG signals are filtered in different frequency. The MRI scan is used to reconstruct the MEG signal at the neural sources and derive the spatial relationships among the measured MEG time series [56][57][58].
The statistical interdependence between MEG signals measured at two or more spatially separated brain regions is quantified through functional

Artifacts removal, segmentation, and band filtering
MEG recordings were first band-pass-filtered online between 0.1 and 330 Hz. Then, the Maxfilter software (Elekta Neuromag Ò v2.2, correlation threshold = 0.9, time window = 10 s) was used to remove external noise of the raw MEG data with the temporal extension of the signal space separation method with movement compensation [59]. MEG data were automatically scanned for ocular, muscle, and jump artifacts using the Fieldtrip software [56]. Subsequently, artifacts were visually confirmed and removed by a MEG expert. The remaining artifact-free data were segmented in 4 s segments (epochs), as shown in Fig. 1. An independent component analysis-based procedure was used to remove the heart magnetic field artifact. Previously to source data calculation, MEG signals were filtered into h (4-8 Hz), a (8-12 Hz), b (12-30 Hz), and c (30-55 Hz) frequency bands with a 1800-order finite impulse response filter with Hamming window and a twopass filtering procedure. Being the beta band very wide, for some analyses it was useful to further divide it into b 1 (12-20 Hz) and b 2 (20-30 Hz).

Source reconstruction and brain parcellation
We employed Freesurfer software (version 5.1.0.21) [60] to obtain the cortex, skull, and scalp segmentation. A regular grid with 10-mm spacing was created in the brain template from the Montreal Neurological Institute (MNI). This set of nodes was transformed to each participant's space using a nonlinear normalization between the native T1 image (whose coordinate system was previously converted to match the MEG coordinate system) and a standard T1 in the MNI space. The forward model was solved with a single-shell method [61] with a unique boundary defined by the inner skull (the combination of white matter, gray matter, and cerebrospinal fluid) taken from the individual T1. We carried out the source reconstruction independently for each subject and frequency band, using a linearly constrained minimum variance (LCMV) beamformer [62]. Beamforming filters were estimated with normalized lead fields, regularized covariance matrices averaged over trials, and a 1% regularization factor (Fig. 1). The neural MEG sources so derived were anatomically parcellated by dividing the cortex into 90 regions of interest (ROIs) according to the AAL atlas [58] as shown in Fig. 1.

Functional connectivity analysis
The spatial, temporal, and band-filtered data extracted through the MEG recordings and the MRI scans were analyzed to quantify the way in which the information is processed within the brain. For each frequency band, FC measures, named phase locking value (PLV) [63] and magnitude coefficient (MC) [64,65], were computed starting from the combinations of pairs of signals derived from the 90 ROIs in which the brain cortex was parcellated. Details on the computation of individual FC measures are reported in Appendix. Based on the time series used for the computation of the FC measures and on the averaging strategy along time, a set of seven different FC indices was obtained as follows.
Two representative sets of the band-filtered time series were considered: the cent signal and the pca signal. For the case of the cent signal, the geometrical centroid was computed for each ROI and the signal obtained from the closest source to the centroid was considered. To obtain the pca signal, the signals measured from the same brain area were subjected to a principal component analysis and the first principal component was considered. With the obtained combinations of signals, we extracted the FC measures for each pair of 4 s segment and finally obtained the average value along the segments.
An additional set of FC values was considered, the intra-ROI FC. In this case, the time series of all the sources pertaining to each ROI were used to estimate the FC indices among each combination of seed-test sources and finally a single average value has been extracted for each ROI.
Two different versions of the MC index, named MC ma and MC am , were derived with respect to the 4 s segments of the series. The MC ma was obtained by computing the mean of the Pearson complex correlation values among the segments first and then the absolute value, while the MC am was obtained by computing the absolute value of the complex Pearson correlation for each segment and then averaging the obtained results. The set of seven FC indices so derived are summarized in Table 2.

Derivation of image-based representations of FC
For each MEG sample and for a given a FC index, the measures computed between all possible pairs of ROIs, 90 in total, were topologically arranged into a 90 9 90 matrix. For each frequency band, seven FC maps corresponding to the seven FC indices, i.e., PLV cent, PLV pca , Intra ROIPLV, MC cent ma , MC pca ma ; MC cent am , and MC pca am , with pixel values in the range [0,1] were derived. The h, a, b, and b 1 frequency bands were considered in this study, so that a total of 28 FC maps were generated per MEG sample. We rendered each map as a digital image, in which the topological arrangement of FC values and their spatial coordinates on the x-axis and the yaxis carry meaningful information. Such information, which is relative to the intricate communication patterns among neuronal populations, has not been fully investigated by previous MEG studies for AD diagnosis. In fact, MEG-based FC analysis has been addressed, up to now, by means of standard features-based approaches, without contemplating the spatial information contained in the FC maps [15-17, 36, 37]. In Fig. 2, three examples of MC cent maps in the b band are reported for a control case, as MCI patient and a pMCI patient, respectively. Although globally similar, it can be noted that the maps contain sub-regions of hyper or hypo-activation which provide information on both the amount of activation and the spatial location of the neuronal populations. Once derived for multiple frequency bands and for different FC indices, the set of images provides a direct visual representation of the neuronal activity ready to be decoded. A further image-based representations, named RGB, was generated by combining multiple FC indices and frequency bands, as shown in the graphical representation of Fig. 3. In fact, given the symmetrical nature of the FC maps, data integration at the level of diagonal values and as triangular portions could reduce redundancy or lack of information. The discriminative power of different combination of features was checked at the classification level, and the best image-based representation was obtained by integrating the b 1 sub-band frequency as the  complementary triangular portion of the b band for the MC cent indicator and by substituting the ones on the main diagonal in each color channel with the intra-ROI P LV for the b 1 sub-band frequency, as illustrated in Fig. 3. After distributing the two image-based representations of the MC cent, i.e., MC cent ma and MC cent am , in different RGB channels, the RGB images so derived reduce redundancy by integrating multiple levels of data but all pertaining a similar information content in terms of frequency band. This choice, made at the pre-conversion stage, is consistent with the results reported by previous studies in the field [40]. In particular, b oscillations are believed to maintain the sensorimotor and cognitive state of an individual [41], with the motor performance that is impaired in AD, but not in MCI [42,43], thus confirming the prominent role of the b band in the early detection of AD. We will see, in Sects. 4.1.1 and 4.1.2, that the phase synchrony in the b and b 1 bands will be pivotal, also in the forms of individual FC maps, to discriminate the sMCI from the pMCI cases at the pre-conversion stage.

Deep-MEG feature transfer
The design of convolutional and pooling layers and their integration in deep learning architectures have boosted the performance of digital image classification in so many different scenarios, which have become the preferred choice for image analysis. In fact, with CNN, it is possible to extract meaningful image features automatically once the parameters of convolutional and pooling layers have been tuned and learned from big datasets of images, a procedure known as deep-feature transfer [5,6]. Among the existing deep neural networks, AlexNet [49] is a large network structure with 60 million parameters and 650,000 neurons, consisting of five convolutional layers, most of them followed by max-pooling layers, and of three fully connected layers with a final 1000-way softmax layer for classification into 1000 classes. For our purposes, pretrained AlexNet was used as a feature extractor without retraining the architecture, as shown in Fig. 1. After preliminary tests with other existing pre-trained CNNs providing comparable results, AlexNet was chosen due to its reduced number of intermediate descriptors. In particular, the pooled Conv5 layer was used to characterize the finegrained structures present in MEG images and to decode, at the appropriate level of abstraction, the relatively simple patterns of interest [6]. For each patient, the image-based representations of FC were resized to a size of [227 9 227] pixels matrix using bicubic interpolation, and then, CNN feature transfer was performed using the pre-trained AlexNet architecture. The pooled Conv5 features so derived represent not only the individual values of the FC indicators for the relative frequency band but also their spatial arrangements and the generated patterns within the FC images. Dimensionality reduction on the features so derived was performed using standard deviation [66]. As the amount of dispersion of the deep features from their mean value should be indicative of higher information content and discrimination capability, only the features with a standard deviation higher than a given threshold were retained. The final subset of relevant features was selected using stepwise regression [68] with the training data of each round of cross-validation.

Classification
Linear discriminant analysis (LDA) and support vector machines (SVMs) were used as classification algorithms [43] with the scope of classifying MEG recordings relative to each patient as HC or sMCI or pMCI. For each frequency band and FC image, including the RGB images, the overall procedure was applied and the results obtained for each classification task are reported and discussed in Sects. 4 and 5 in terms of accuracy and area under the ROC curve. Leave-one-patient-out (LOPO) was used for crossvalidation of results, and the classification was performed on a per-patient basis. Additional cooperative classification rules were designed to aggregate, at the test level, the assignment of base classifiers or ensemble modules trained with the image-based representations of FC. Further details are given in the following paragraph.

Cooperative classification
For the binary classification of MEG recordings as sMCI or pMCI, at the post-and pre-conversion stages, ensemble classifiers were derived to combine the probability scores of individual Deep-MEG modules (see ensemble #1 shown in Fig. 4a). For the more complex classification scenario including the MCI subjects at the pre-conversion stage and the HC subjects, a different ensemble architecture between two suboptimal binary classifiers was derived to aggregate the assessment of the individual Deep-MEG modules (ensemble #2 shown in Fig. 4b). With the second derived architecture, it was possible to detect the early signs of AD within a more complete scenario in which different rates of progression of the cognitive impairment (CI), going from absence of CI (in HC subjects) to pre-symptomatic AD phases (in pMCI subjects), were present.

Deep-MEG ensemble #1
An ensemble architecture was used in which base classifiers receive as input different FC images, also relative to diverse frequency bands, and are trained independently with the same set of patients. The outputs of individual classifiers, i.e., the probability scores of belonging to each class, are combined to derive the final assignment. In particular, the average values among the probability scores assigned to each class by the base classifiers were computed, and the sample was assigned to the class with the maximum obtained value between the two, as shown in Fig. 4a. Ensemble classifiers were obtained for discriminating the pMCI from the sMCI at both the pre-and postconversion stages, as well as for discriminating HC from each of the two MCI classes.

Deep-MEG ensemble #2
The cooperative decision-making procedure, shown in Fig. 4b, is based on an AND logic between two base classifiers: one trained with the RGB images on MCI subjects for the discrimination of sMCI from pMCI patients and the other trained with the PLV cen map in the h band, for the discrimination of HC from MCI patients. At the test level, a consensus mechanism is applied between the two classifiers so that the sample is assigned to the pMCI class only if both classifiers agree, i.e., if the probability scores of belonging to the pMCI and MCI classes are both higher than 0.5. The sample is assigned to the HC class if the probability score of belonging to the HC class is higher than 0.5; otherwise, the sample is assigned to the sMCI class.
The h band has been chosen, in the present ensemble architecture, for the discrimination of HC from MCI patients due to its discrimination capability in preliminary tests and because changes in the h band have been reported in the literature as indicative of MCI [39,44]. In particular, the studies conducted by Lopez et al. [39,44] outlined a hyper-synchronization of the h band in MCI patients compared to the control subjects in resting state, which was also related to hippocampal atrophy and to lower global cognitive status. The increase in h power is also considered as the most stable pattern of EEG activity in MCI patients [39], claim that has been confirmed by the present and the other studies on MEG signals [39,44].

Results
In this section, the obtained results are presented for different classification scenarios. First, we report the results obtained for the classification of MEG recordings of the MCI subjects as sMCI or pMCI with respect to two classification approaches: (1) individual Deep-MEG classifiers based on different FC maps and on the RGB images (2) Deep-MEG ensemble #1. Finally, for the early detection of AD within a more complete scenario also including HC subjects, the results obtained with ensemble #2 are reported. Results are labeled as post-conversion when the MCI data include pMCI patients who met the criteria for probable AD and as pre-conversion when the pMCI patients were still clinically undistinguishable from the sMCI patients.

Individual Deep-MEG modules
The results obtained with individual FC maps are first considered, and the frequency bands and image-based representations of FC that are relevant for each classification task, are reported. In Fig. 5, we show the accuracy obtained at the postconversion (a) and pre-conversion stages (b, c) for the classification of MEG recordings as sMCI or pMCI using LOPO. The results obtained with LDA and SVM and relative to the three best FC maps are reported for both scenarios in Fig. 5a, b. For the post-conversion stage, when the signs of AD are clinically evident in pMCI patients, accuracy values of 0.78, 0.87, and 0.70 were obtained with the h, a, and b 1 bands, respectively, and the PLV pca, PLV cent, and MC cent am FC maps, using LDA. The obtained values are reported together with the values of accuracy obtained using SVM (Fig. 5a). For the pre-conversion stage (Fig. 5b), accuracy values of 0.74, 0.78, and 0.76 were obtained with the b, b 1 , and b bands, respectively, and the MC cent ma , MC cent am , and MC cent am FC maps. For both post-and pre-conversion stages, the rest of FC maps or frequency bands provided lower results when decoded individually by a single classifier.
For the pre-conversion stage, using a single classifier based on the RGB images as image representation of FC, accuracy values of 0.89 and 0.87 were obtained, respectively, with LDA and SVM. The confusion matrix obtained with LDA is reported in Fig. 5c. The FC indicators and relative frequency bands used to derive the RGB images are summarized in Table 3. The RGB-based Deep-MEG model was trained using three deep-features, on average, selected in each round of LOPO cross-validation.

Deep-MEG ensemble #1
Ensemble decisions were obtained by aggregating the probability scores of individual base classifiers (i.e., individual classifiers, LDA or SVM, of the Deep-MEG modules receiving as input a single FC map), as described in Sect. 3.6 and as illustrated in Fig. 4a. The FC maps performing the best with individual Deep-MEG modules in the post-and pre-conversion stages have been chosen to derive the corresponding ensembles. The FC indicators used to derive the image-based representations, or FC maps, are reported in Table 3: MC cent am in the b and b 1 were used in the pre-conversion ensemble; PLV cen, PLV pca, and MC cent am , respectively, in the a, h, and b 1 frequency bands were selected for the post-conversion ensemble.
The contribution of the b and b 1 bands at the pre-conversion stage confirms their role in the discrimination of sMCI from the pMCI cases [40][41][42][43]. Regarding the postconversion stage, our analyses revealed that the phase synchrony in the a band serves as a predominant sign of AD only in symptomatic patients. In fact, a hyper-synchronization in the a band between the anterior cingulate region and the temporo-occipital region of pMCI patients as compared to sMCI was also reported by previous studies [16,17] and seems to be correlated with cognitive performance. Two are the possible mechanisms behind such hyper-synchronization: (1) a compensation mechanism in response to the presence of compromised brain circuits in other brain areas; (2) the loss of GABAergic synapses caused by the accumulation of bamyloid plaques leading to establish aberrant relationships between the areas affected by the AD, which are hence the result of an inhibitory deficit [16]. The presence of the phase synchrony map in the h band confirms, as reported in Sect. 3.7.2, its role in the recognition of MCI and, in this case, in the c Confusion matrix obtained with the RGB images at the preconversion stage using a SVM classifier discrimination of sMCI from MCI progressed toward AD, i.e., pMCI at the post-conversion stage.
The results obtained for classification of MEG patients as sMCI or pMCI are shown in Fig. 6a in terms of accuracy for LDA and SVM base classifiers and in Fig. 6b and c in terms of confusion matrices. For the post-conversion stage, accuracy values of 0.93 and 0.85 were obtained with LDA and SVM, respectively (Fig. 6a). In Fig. 6b, the confusion matrix relative to LDA indicates a sensitivity of 0.93 for the pMCI cases, which correspond to patients with evident signs of AD, and specificity of 0.93. For the overall ensemble architecture, 42 deep features have been automatically selected, on average, to train the base classifiers.
For the pre-conversion stage, the histogram in Fig. 6a indicates the accuracy values of 0.89 and 0.87 obtained with the ensemble classification of MEG recordings as sMCI or pMCI with LDA and SVM, respectively. In this case, sensitivity of 0.89 and specificity of 0.78 were obtained, as reported in the confusion matrix in Fig. 6c, and 16 deep features were automatically selected, on average, to train the base classifiers.

Deep-MEG ensemble #2
The aggregation method, named ensemble #2, was used for the classification scenario that included the HC subjects. An AND logic was used to derive the final assessment at the pre-conversion stage, as described in Sect. 3.7 and shown in Fig. 4b. In addition to the RGB images, which encode multiple FC indicators in the b and b 1 bands, also the information of the PLV cent in the h band was taken into account. The accuracy results obtained with ensemble #2 are reported in Fig. 6 relative to LDA.
For the three-class classification, an accuracy of 0.74 was obtained (Fig. 7a). For the overall ensemble, eight deep features were automatically selected, on average during the rounds of LOPO cross-validation, to train both Table 3 Image-based representations of FC, or FC maps, received as input by the ensemble classifiers and by the RGB images used for the classification of MEG recordings as sMCI or pMCI at the post-and pre-conversion stages The FC indices and frequency bands from which the images were derived are indicated The ROC curves relative to the individual base-ensemble classifiers, each trained to solve a specific binary classification task, are reported in Fig. 7b. An AUC of 0.90 was obtained for the classification of MEG recordings as sMCI or pMCI by the Deep-MEG module based on RGB images and an AUC of 0.83 was obtained for the classification of MEG recordings as HC or MCI using the PLV cent map in the h band (Fig. 7b). Similar results were obtained using SVM. With the sole PLV cent map in the h band, when received as input by a single Deep-MEG-based classifier, accuracy value of 0.80 was obtained. For the discrimination of pMCI cases from the rest of the cases, i.e., HC ? sMCI, accuracy of 0.87 and a sensitivity of detection of 0.82 were obtained with ensemble #2, as shown by the confusion matrix in Fig. 7c.

Comparative analysis
In this paragraph, the results obtained with the proposed approach are compared with the results obtained with the standard classification approach, i.e., when the FC indices were used as data for feature selection and classification without contemplating the information encoded in the spatial arrangement of pixel values. When the FC indicators were automatically selected at the training level using stepwise feature selection, we did not obtain any satisfactory results. To derive better results, the set of FC indicators with AUC values higher than a given threshold was considered to increase the classification performance of the standard approach and select the best combination of features. When single types of FC indicators were used, the higher results were 0.83 and 0.77 for the classification of sMCI and pMCI in the post-and pre-conversion phases, respectively, as compared with accuracy values of 0.87 and 0.78 obtained with individual Deep-MEG classifiers. In addition, for each classification scenario, multiple types of FC indicators and frequency bands were used and combined as a single feature vector. In this case, after selection based on AUC, accuracy values of 0.85 and 0.83 were obtained, for the classification of sMCI and pMCI in the post-and pre-conversion phases, respectively, as compared to accuracy values of 0.93 and 0.89 obtained with the proposed Deep-MEG approach. With the standard approach in the three-class scenario, we did not obtain satisfactory classification results. The best results obtained with the comparative analysis are reported in Table 4.

Discussion
We have presented a deep-feature transfer approach, named Deep-MEG, and a set of ensemble classification architectures for decoding MEG recordings based on a new visual perspective on FC for the early diagnosis of AD. Image-based representations of FC were derived starting from the MEG time series. The MEG signals were first processed and filtered to derive meaningful data for FC analysis and to quantify the spatiotemporal characteristics of the brain connectome, in conjunction with the anatomical information encoded in the MRI scans, as described in Sects. 3.1 and 3.2. Different versions of the PLV and MC indices were computed as FC descriptors and organized as RGB images, also relative to multiple frequency bands (see Fig. 7 Results obtained with ensemble #2 for the early detection of AD. a Confusion matrix obtained for classification of MEG recordings as HC, sMCI, or pMCI. b ROC curves and corresponding AUC values obtained at pre-conversion stage using the two base classification modules composing the ensemble. c Confusion matrix obtained for classification of MEG recordings as pMCI or the rest. All the results are relative to the pre-conversion phase Sects. 3.3 and 3.4). Such images could be received as input data by the pre-trained CNN and pooling layers in the AlexNet network, used as feature extractors and decoders of FC patterns, as described in Sect. 3.5. Cooperative decision architectures among Deep-MEG modules allowed the integration of the brain signaling at multiple levels of frequency bands to derive increased performance (see Sect. 3.7).
The main novelty of the proposed study is the analysis of the MEG-FC patterns of the brain network via deep CNNs. We have shown that the information on the hypo-or hyper-synchronization is conveyed not only by the FC values, but it is also embedded in their spatial arrangement as FC maps, which gave us additional information on the connectome disruption related to AD. Individual Deep-MEG modules (see Sect. 4.1) allowed the discrimination of sMCI and pMCI patients at the post-and pre-conversion stages, with accuracy values of 0.87 and 0.78, respectively, using the PLV cent in the a band and the MC cent am in the b 1 band. For the binary classification of MEG recordings as HC or MCI, using a single Deep-MEG module based on the PLV cen map in the h band, we obtained an accuracy of 0.80.
A composed image, named RGB, was designed to encode multiple levels of information also avoiding redundancies. The new set of deep features so extracted, boosted the classification performance at the pre-conversion stage to 0.89. In this scenario, we found that when data were integrated in different color channels of a single image, the encoded information has to be similar and homogeneous to guarantee appropriate decoding by the CNNs, i.e., indicators relative to the b and b 1 bands.
As the connectivity patterns of different frequency bands were unique and differently informative in terms of activation patterns, multiband ensemble classifiers were used to integrate the information encoded in different image-based representations (see Sect. 4.2). By averaging the probability scores of the best image representations of FC at the decision level, increased accuracy values, i.e., 0.93 and 0.83 for the post-and pre-conversion stages, respectively, were obtained for the binary classification of MEG recordings as sMCI or pMCI. These results showed the fundamental role of integrating heterogeneous and diverse data (at the spatiotemporal and frequency levels) for better representation and decoding of the brain functional connectivity.
In our experiments in the three-class scenario also including the HC cases, none of the single image-based representations of FC was effective, neither was ensemble #1 among Deep-MEG modules based on LDA or SVM classifiers. More importantly, it was not possible to detect the pMCI cases at the pre-conversion stage, when also HC cases were present, i.e., the discrimination of pMCI cases from the HC and sMCI, that is the main goal for early detection of AD. The reasons of this finding may lie in the fact that the dynamic activity of the brain network in relation to diverse clinical conditions possesses diverse manifestations in terms of spatiotemporal and frequency responses and that the information relevant to each binary sub-problem are encoded into different frequency bands or FC indicators. Therefore, we used another ensemble logic, named ensemble #2, in which two classification modules solve the two different sub-tasks, as described in Sect. 3.6. Finally, we used an AND Logic to aggregate the assessment of the individual predictors. For the three-class results, accuracy value of 0.74 was obtained. The best discriminated class was the pMCI, with a percentage of detection of 0.82. The sMCI cases were mostly confounded with the HC cases. As the sMCI possess different severity levels, which may lie on a continuum from the cognitive perspective [68], different levels of cognitive impairments, in turn, may be associated with different FC connectivity patterns, some of which resulted to be similar to those of the HC cases. This is not surprising, especially considering the stability in terms of AD conversion of the sMCI cases over the three years of observation. When the HC and sMCI were considered as a single class (Fig. 7c), the accuracy of classification of ensemble #2 increased up to 0.87, maintaining a sensitivity for the pMCI class at 0.82. Ensemble #2 is the result of cooperation of only two Deep-MEG-based classifiers trained, on average, with eight deep features automatically selected during the rounds of LOPO cross-validation from the RGB images and the PLV cent map in the h band. Data diversity and cooperation at the decision level were crucial to boost the recognition of pMCI cases at the pre-conversion stage and discriminate them from the HC and sMCI cases. We have also shown that the proposed combination of deep spatiotemporal features and multiband ensemble classification showed superior performance as compared to other existing methods, including the standard approach to the analysis of FC, in which the FC indicators, also relative to different frequency bands, are used as feature descriptors. This was verified with respect to multiple combinations of FC features, even when the best FC features were selected on the complete pool based on their individual AUC values.
We tested multiple combinations of image representations of FC, and the higher results are reported in this study. A major advantage of this approach is that the learned models can be interpreted in neurophysiological terms. The results obtained with the present study support the notion of different functional brain connectivity patterns associated with different rates of progression and conversion to AD [68]. In line with other work in the literature [15,16,[39][40][41][42][43][44], we have observed the role of specific frequency bands as potential biomarkers for the different phases of progression of the disease. In particular, a different set of FC images and frequency bands was determinant for the two ensemble classifiers relative to the post-and pre-conversion phases. In fact, at the post-conversion stage, our results indicate that the phase synchrony in the a band can serve as a predominant sign of AD in symptomatic patients [15,16], while, at the pre-conversion stage, the results indicate evidence of changes in the pattern signs relative to the amplitude correlation in the b and b 1 bands among the MEG signals [40][41][42][43]. Moreover, the discrimination of the HC cases from the MCI cases, instead, was favored by the presence of the FC indicators in the h band [39,44], which were not informative for the sMCI vs pMCI scenario in the pre-conversion phase (Table 3).
To further validate the platform, it would be important to test the proposed methods in a larger study. In the present work, to avoid overfitting, we extracted the deep data features from a pre-trained AlexNet architecture. Such features were automatically selected at the training level within rounds of LOPO cross-validation, thus allowing the training of simpler LDA or SVM classification modules based on a small set of features. In addition, our effort was devoted to identify the important MEG-based FC representations that inform classification (top-down approach) as the aggregation was performed using the best combinations of FC maps. The results obtained using knowledgebased computer vision techniques can be used as reference for deriving possible biomarkers for AD (down-up approach).
The results obtained in the present work compare favorably with the standard approach, in which the FC indicators are used as mono-dimensional training features, and with previous studies in the literature on the pre-conversion phase based on MEG data [17] or other imaging modalities [22,23], posing the basis for further investigations on the proposed Deep-MEG architectures.

Conclusions
MEG provides the unique advantage of measuring the brain function with a remarkable combination of spatial and temporal resolutions. With this work, we have presented a novel system for decoding MEG recordings based on image-based representations of FC, deep CNN features, and ensemble classification architectures. The proposed methods for deriving and codifying the MEG-based FC measures allow the generation of pictures that represent, visually and numerically, the intricate communication patterns among spatially separated brain regions, which could be decoded by deep CNN features. The derivation of different cooperative architectures for integrating the spatiotemporal and multi-frequency information encoded in such images was the key to recognize the early alterations of the brain connectome relative to patients who undergo conversion to AD over a 3-year follow-up period.
In future, the analysis in resting state used in the present work may be extended with the analysis of other taskrelated activation patterns in order to optimize future applications of Deep-MEG architectures for predicting early signs of AD. Our findings may also have implications for the use of MEG-based FC as a biomarker in therapeutic trials. Finally, the proposed methods can be applied in other predictive scenarios to decode early signs of diverse neurodegenerative or neuropsychiatric diseases as well as to decode EEG signals.
with PV the Cauchy principal value. From Eq. (A1), A s t ð Þ and u s t ð Þ are the instantaneous amplitude and the instantaneous phase, respectively, of the analytic signal.
The Phase Locking Value (PLV) [63] was computed as FC measure of phase synchrony, since it quantifies how the phase difference between two signals is preserved during the time course. First, the instantaneous phase u t ð Þ has been extracted for each of the 4 s segment of the series, s(t). Finally, starting from pairs of analytic signals, The PLV between two time series was evaluated with the following expression: where N indicates the number of time points, h•i the time average and Du rel t n ð Þ the cyclic relative phase, i.e., the difference between the instantaneous phases of the two signals, bounded in the interval [0-2p). PLV ranges from 0 to 1; a value close to 0 reflects the relative phase is uniformly distributed (or the phase distribution has n peaks at values which differ by (2p)/n), while a value of 1 indicates perfect phase locking between the time series.
As a second index of co-variability between two signals, we evaluated the magnitude of the complex Pearson correlation between the analytic signals associated to the time series [64,65], that we refer to as the Magnitude Coefficient (MC): MC ¼ P N n¼1 s AN;1 t n ð Þhis Ã AN;2 t n ð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P N n¼1 s AN;1 t n ð Þ Á s1 Ã AN;2 t n ð Þ q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P N n¼1 s AN;1 t n ð Þ Á s Ã AN;2 t n ð Þ q ðA4Þ where the superscript * denotes the complex conjugate. The MC gives a measure of the strength of the linear relationship between the envelopes of the signals, in a scale that ranges from 0 to 1.