Introduction

Alzheimer’s disease (AD) is a clinical syndrome characterized by the progressive deterioration of the memory and cognitive functions, particularly in elderly people. The disease usually appears silently, and the process is slow and irreversible. According to the 2019 Alzheimer’s World Report [1], there are more than 50 million people with AD. The figure may rise to 152 million by 2050.

In recent years, the attention paid to AD has been gradually increasing. So far, only five drugs have been approved by the Food and Drug Administration (FDA) for the treatment of AD [2], and all of them can only delay the development of AD and alleviate symptoms, but not cure or even treat AD. Consequently, early diagnosis is important to delay the symptoms through medication. Typically, AD is divided into four stages, and the best time to diagnose the disease is during the early stages of mild cognitive impairment (MCI) and mild AD [3,4,5].

Electroencephalography (EEG) is the non-invasive acquisition of signals corresponding to electrical activity in the brain using electrodes positioned directly on the scalp. Magnetoencephalography (MEG) is also a non-invasive technique which is used to acquire signals by recording the magnetic activity of the brain. Functional magnetic resonance imaging (fMRI) indirectly detects changes of the brain neuronal activity based on the linked alterations of cerebral blood flow as exhibited by the differentiated magnetic properties of the hemoglobin molecule between its oxygen saturated and desaturated states. The difference between AD patients and normal control subjects can be detected using these brain signals, each coming with different advantages and disadvantages. Machine learning methods related to the classification between AD patients and normal control subjects using EEG, MEG, and fMRI brain signals are listed in Table 1.

Table 1 Summary of papers using EEG/MEG/fMRI signals to design a classification system for AD/MCI detection

With the increasing use of deep learning techniques, many deep AD detection methods have recently emerged. Sarraf and Tofighi [14] used LeNet-5, a convolutional neural network (CNN) architecture, to classify fMRI data from AD subjects and normal controls, with an accuracy on the testing dataset of 96.85%. They used 5-fold cross-validation on a dataset containing 28 AD subjects and 15 normal controls. Kim and Kim [15] proposed a classifier based on deep neural networks using the relative power of EEG to fully exploit and recombine features through its own learning structure. Their dataset contained 10 MCI subjects and 10 normal controls, and leave-one-out cross-validation was used to evaluate the model’s performance. The accuracy obtained on the testing dataset was 59.4%. Duan et al. [16] used EEG functional connectivity as the network input to train ResNet-18, achieving an accuracy of 93.42% and 98.5% on the MCI and mild AD datasets, respectively, where the former contained 22 MCI subjects and 38 normal controls, and the latter contained 17 mild AD subjects and 24 normal controls.

Among the aforementioned brain signals (EEG, MEG, and fMRI), EEG has the best temporal resolution. Nevertheless, since EEG signals are acquired from several locations on the scalp with electrodes, their spatial resolution is not as good as that of the measurements for the other two types of signals. Despite this, the spatial distribution of the signals can be optimized in the processing steps with the use of well-designed algorithms [17,18,19,20,21]. Given that EEG signals are easier to acquire and is less expensive than other techniques, EEG-based methods for AD detection are currently more popular.

In studies based on EEG signals, deep learning methods are trained on small datasets, as electrophysiological signals are more difficult to acquire in AD patients. The learning capability of deep learning models partially relies on their large number of hyper-parameters. A high amount of samples is required to fit these hyper-parameters and avoid the over-fitting problem [22, 23]. One way to deal with the issue is using data augmentation.

Data augmentation can be implemented by generating artificial data [24, 25]. The strategy of decomposing and recombining the original EEG signals is one possible way to create new artificial data for data augmentation [26,27,28]. EEG signals can be decomposed into different filter banks. In each filter bank, the frequency of the decomposed EEG signals is within a certain frequency band. All filter banks cover a wide range of frequencies. This strategy helps to achieve a better performance using deep-learning models in the enhancement of small-size datasets. Note that in studies where this particular data augmentation strategy has been implemented, the details about the models used are not entirely the same throughout, even though the same overall approach is being used. For instance, Zhao et al. [26] proposed a method of random recombination of EEG signals in different filter banks, which are decomposed by the discrete cosine transform. This approach enhances the classification performance of one-dimension convolutional neural networks in the epileptic seizure focus detection task. Zhang et al. [27] used the augmentation strategy to enhance the classification performance of motor imagery. Instead of decomposing signals with the discrete cosine transform, the empirical mode decomposition (EMD) technique was adopted [29]. In the decomposition–recombination strategy, EMD has the advantage that the signals can be recovered by simply adding up the decomposed intrinsic mode funtions (IMFs). Besides the decomposition–recombination strategy, generative adversarial networks (GANs) also offer a solution to generate artificial signals [30]. However, GANs require a large dataset to tune the parameters and fit the model. Since the goal of data augmentation in small Alzheimer’s datasets is to solve the problem of insufficient samples, it is not possible to use GANs to generate artificial data.

In this paper, we propose a decomposition and recombination model for data augmentation in a small Alzheimer’s data set, which is used to distinguish AD patients from normal controls. The decomposition and recombination approach consists of three steps. First, empirical multivariate mode decomposition (MEMD) is used to decompose EEG signals into IMFs. These IMFs are then randomly recombined within each of the two groups. Finally, in each group, the IMFs are added up to generate a new artificial trial. These artificial trials are used to extend the AD training dataset.

This work is organized as follows. "Method" includes the description of the small Alzheimer’s datasets used, the scheme of the proposed decomposition and recombination approach, and the neural networks used for classification. "Results" presents the experimental results, including the classification performance of the neural networks during the training process and the effects of data augmentation in the datasets. Then, these results are discussed in "Discussion", together with the limitations associated with the method. Finally, the conclusions are presented in "Conclusion".

Method

Alzheimer’s Datasets

All experiments in this work use two datasets: the MCI dataset, containing 22 subjects with MCI and 38 normal controls, and the mild AD dataset, containing 17 subjects with mild AD and 24 normal controls. Other studies have been conducted based on these datasets [5, 7, 31].

Fig. 1
figure 1

Schematic display of the electrode positions from above

The MCI Dataset

The MCI dataset is comprised of data from subjects who complained of memory impairment and of control subjects who did not have memory impairment or other diseases. The patient group included 53 subjects who underwent a comprehensive neuropsychological test; the results showed quantitative and objective evidence of memory impairment, but their overall cognitive, behavioral or functional status was not significantly lost. The classification of mild dementia impairment requires a score of at least 24 in the Mini-Mental State Examination (MMSE) [32], a score of 0.5 on the Clinical Dementia Rating (CDR) scale [33] and a standard deviation lower than the normal memory performance reference value. All subjects met these criteria. Then, these subjects underwent an initial assessment, and their progress was monitored in the clinic during the subsequent 12–18 months. According to the criteria defined by the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA), 25 of these 53 mild AD patients might develop into AD. The average age of the 25 subjects in the MCI data set is 71.9 \(\pm\) 10.2 years old, and the MMSE score is 28.5 \(\pm\) 1.6. The control group had 56 age-matched healthy subjects with an average age of 71.7 \(\pm\) 8.3 years old and an MMSE score of 26 \(\pm\) 1.8.

Twenty-one electrodes from Biotop 6R12 (NEC-Sanei, Tokyo, Japan) were placed on the subject’s scalp in a 10–20 international system with a sampling frequency of 200 Hz. In addition, Fpz and Oz electrodes were added to the system, as shown in Fig. 1a. After the data was collected, analog bandpass filtering was used to retain data between 0.5 and 250 Hz, and then third-order Butterworth filters (forward and reverse filtering) were used to perform digital band-pass filtering between 0.5 and 30 Hz.

The Mild AD Dataset

The mild AD dataset is comprised of data from 17 mild AD patients (age: 69.4 ± 11.5 years) and 24 healthy subjects (age: 77.6 ± 10.0 years). The patient group underwent a full set of cognitive tests (MMSE, Rey auditory verbal learning, Benton visual retention, and memory recall tests) along with psychological tests. The results were graded and interpreted by psychologists and then discussed in meetings with multidisciplinary teams. The subjects in the control group were all healthy volunteers, and their EEG was judged to be normal by the clinical neurophysiology consultants.

Nineteen electrodes were placed on the subject’s scalp using the Maudsley system, which is similar to the international 10–20 system. The sampling frequency was 128 Hz, as shown in Fig. 1b. After data acquisition was carried out, a third-order Butterworth filter (forward filter and reverse filter) was used for digital band-pass filtering between 0.5 and 30 Hz.

Recording Conditions in Both Datasets

During the collection process of the two aforementioned datasets, the subjects were awake and with their eyes closed. The whole process lasted for 5 min. After that, the EEG data was checked by EEG experts, and the data containing artifacts were discarded. Finally, only clean EEG data of 20 s of length was saved for each subject, discarding the subjects whose data did not meet this condition. Based on this procedure, the MCI dataset finally comprised of 22 subjects with MCI and 38 normal controls, while the mild AD dataset comprised of 17 subjects with mild AD and 24 normal controls.

A Decomposition and Recombination System

In small data sets, neural networks often face overfitting problems. Data augmentation is used to enlarge the size of the training set, as shown in Fig. 2.

Fig. 2
figure 2

The concept of data augmentation. In a small data set, the training set is small in size, since it is generated from only a portion of the (few) original data. When a neural network is used to fit the training set, there is a potential overfitting problem. Data augmentation is used to mitigate this issue by enlarging the size of the training set

In this work, we propose a decomposition and recombination system to generate artificial trials and thus enlarge the training set. For the decomposition part, the empirical mode decomposition (EMD) method is used. EMD can divide a signal into multiple intrinsic mode functions (IMFs). These IMFs cover different frequency bands, with low overlap. The original signal can then be recovered by adding up these IMFs [29]. The recombination part consists of adding IMFs from different trials, taking each of the IMFs from a different one.

The simplest EMD method is classical empirical mode decomposition (CEMD), which is the original version of EMD, as shown in the algorithm 1. A faster version of EMD is serial EMD (SEMD), which is used to deal with multi-channel signals. SEMD converts multi-channel signals into a single channel by concatenating them over time, ensuring the continuity of the signals by suitably adding a transient part between channels. CEMD is then used to decompose the single (long) channel. Multivariate EMD (MEMD) is also a method used for decomposing multi-channel signals, as shown in the algorithm 2. First, it places the multi-channel signals in a tangent space and then decomposes these signals into IMFs. The IMFs are finally reverted to normal space. Figure 3 shows the original multi-channel signals and the signals decomposed by MEMD. MEMD ensures that IMFs with the same index (shown in Fig. 3) cover the same frequency band.

figure a
figure b
Fig. 3
figure 3

Data decomposition with MEMD. MEMD can decompose multi-channel signals into IMFs. The IMFs are located in different frequency bands, but in all the decomposed channels, the kth IMF covers the same frequency band. In this figure, the IMFs are sorted in descending order in the frequency domain

In order not to decompose each trial separately, which would result in IMFs covering non-equal frequency bands in the same trial, and also to decrease the processing time, we combine the MEMD and SEMD methods as shown in Fig. 4. Multi-channel signals from several trials are first concatenated along the time axis as in SEMD, and then MEMD is used to decompose the concatenated signals, ensuring that each trial has the same number of IMFs. Figure 5 presents an example of generating an artificial trial with the original EEG signals.

Fig. 4
figure 4

The procedure of SEMD-MEMD decomposition for multiple trials of multi-channel EEG signals. Trials of EEG signals are concatenated along the time axis and then decomposed

Fig. 5
figure 5

Outline of the proposed decomposition and recombination system. As an example, for an artificial signal generated in channel c, the procedure consists of i randomly selecting \(N_{IMF}\) trials from the original EEG signals; ii obtaining the IMFs, which are decomposed using the method outlined in Fig. 4; iii collecting the decomposed IMFs of channel c from randomly selected \(N_{IMF}\) trials; iv recombining the IMFs in channel c. The \(n_{imf}\)-th IMF of the artificial signal is the \(n_{imf}\)-th IMF of the \(n_{imf}\)-th randomly selected trial; and v adding the IMFs and obtaining the artificial signal of channel c

Neural Network Classifiers

In the analysis of EEG signals, there are two traditional options used as inputs for the neural networks. In the first case, the original multi-channel signals are used as inputs. In the second, the multi-channel signals are converted into a functional connectivity (FC) matrix [34]; this is an EEG-based connectivity matrix between brain regions obtained by calculating the inter-channel EEG similarity, e.g., by means of the coherence measure. The degree of similarity between two brain regions can be reflected in the FC matrix. In this way, the generated matrix preserves the spatial information of the multi-channel signals. To distinguish between controls and AD patients, EEG is often analyzed in four frequency bands: delta (0.1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), and beta (13–30 Hz). The signal in each band contains different information about brain connectivity and synchronization [35]. In this work, however, we adopt slightly different frequency bands, namely 4–8 Hz, 8–10 Hz, 10–13 Hz, and 13–30 Hz. These bands are derived from a previous work [16] and are optimized for the datasets used [7].

The main goal of this work is to measure the effect of the data augmentation method on the performance of the classifiers when functional connectivity matrices are used as inputs to the models. Therefore, it is not in the scope of this work to determine the best possible model. To evaluate the effects of the data augmentation method on the small AD datasets, three neural networks are used: BrainNet CNN [36], ResNet [37], and EEGNet [38]. To simplify the explanation of the networks, some symbols are defined here. In the following, B is the batch size, C is the number of input EEG signals, and T is the number of sample points of the EEG signals.

Methods such as Pearson’s correlation coefficient or coherence can be used to compute the correlation or relationship between channels. Here, we adopt the coherence to compute the FC matrices. EEG coherence measures the degree of phase synchronization of EEG spectral activity between two electrodes [39]. For two temporal signals x(t) and y(t), the coherence between them can be defined as follows:

$$C_{xy}=\frac{\left| G_{xy}\left(\kern 0.10emf\kern 0.10em\right) \right| ^{2}}{G_{xx}\left(\kern 0.10emf\kern 0.10em\right) G_{yy}\left(\kern 0.10emf\kern 0.10em\right) },$$
(1)

where \(G_{xy}\) is the cross-spectral density between x and y, and \(G_{xx}\) and \(G_{yy}\) are the power-spectral density of x and y, respectively. Considering an EEG sample that has 21 channels containing data of 20 s of length, we can obtain an FC matrix with a size of \(C\times C\) by calculating the coherence between each pair of EEG signals. Here, we first divide the original signals into the four aforementioned frequency bands, namely 4–8 Hz, 8–10 Hz, 10–13 Hz, and 13–30 Hz. As a consequence, the input of the neural networks is of size \(4\times C \times C\) (where C is the channel number of EEG signals). The inputs for BrainNet CNN and ResNet are the FC matrices of the four frequency bands. The input for EEGNet is the original multi-channel time series.

BrainNet CNN

BrainNet CNN is a network architecture that analyzes the FCs of different frequency bands [36]. This network has three basic convolutional blocks: edge-to-edge (E2E), edge-to-node (E2N), and node-to-graph (N2G), which are specially designed for FC matrix processing. The three blocks are convolutional layers with different kernels. E2N is a convolutional layer with kernel size (1, C) which converts the edges in FC matrices to nodes. N2G is a convolutional layer with kernel size (C, 1) which suppresses the output nodes of the E2N layer. Finally, E2E is the added-up output of convolutions with kernel size (1, C) and (C, 1). An illustration of E2E is given in Fig. 6. The structure of the BrainNet CNN is given in Table 2.

Fig. 6
figure 6

A schematic depiction of the E2E block in BrainNet CNN. The output of the block is the sum of two convolution results

Table 2 The structure of BrainNet CNN

ResNet

In the training process of deep learning methods, the backpropagation of multiple layers faces the problem of gradient vanishing [40]. The residual module of the deep residual network can reduce the influence of gradient vanishing by introducing a shortcut connection [37]. The deep residual network is a network that has already been validated on a large number of classification problems. Compared with that of deep neural networks without shortcut connections, the shortcut connection of the deep residual network allows raw input information to be sent directly to a later layer. Assuming that the input of the residual block is x, the expected output is H(x). The learning target of the deep residual network is then \(F(x)=H(x)-x\), which is called residual, and then the input and output of this block are added together through the shortcut (Fig. 7). This approach greatly increases the training speed of the model, improves the training effect, and effectively solves the vanishing problem when the number of layers is deepened without adding extra parameters and calculations to the network. In this study, we employed the ResNet-18 deep residual network.

Fig. 7
figure 7

A residual block with a shortcut in ResNet

EEGNet

EEGNet is a universal solution to the classification of multi-channel EEG signals, which has been validated in the classification of other brain activity signals such as motor imagery and movement-related cortical potential [38]. EEGNet takes the original multi-channel EEG signals as the input instead of the FC matrices. Even though EEGNet has not been validated in the classification of early AD, in this work, we use it to test and explore the data augmentation performance. The structure of EEGNet is given in Table 3.

Table 3 The structure of EEGNet

Parameter Setting

In the training of these neural networks, the adaptive moment estimation (Adam) optimizer was used, with \(\beta _{1}=0.9,\quad \beta _{2}=0.99\) and 0.0001 for the learning rate. ResNet and BrainNet CNN were trained using 100 epochs, and EEGNet was trained using 200 epochs. The mini-batch size was 50.

Results

The experiments aim to explore the effects of data augmentation on the small AD dataset with the decomposition and recombination strategy using FC matrices as inputs and with three different neural networks as classifiers. In Table 4, the number of trials in the training and testing sets is given. In the training set, 10 trials are randomly selected from the original EEG signals of AD patients and controls to avoid the imbalance of the training set. Five hundred artificial trials are generated from the 10 original trials for each class. The rest of the original trials are used in the testing set. The chance level is calculated with the stratified dummy classifier in Python’s scikit-learn toolbox [41]. The training set consists of both original and artificial EEG signals. Artificial EEG signals in the training set are generated exclusively from the real EEG data of this set (Fig. 5). The original EEG signals in the training set are randomly selected ten times, and the classification is repeated as a cross-validation procedure.

Table 4 Distribution of the number of trials

Feature Distribution

First, the feature distributions of the artificial data generated by data augmentation are assessed. To clearly illustrate this, the FC matrices of mild AD patients vs controls are depicted in Fig. 8 using the uniform manifold approximation and projection method (UMAP) [42, 43]. There are four FC matrices for each trial: 4–8 Hz, 8–10 Hz, 10–13 Hz, and 13–30 Hz. Since FC matrices are symmetric, their upper triangle is taken as the feature of said matrix. For each trial, we have \(4\times C\times (C-1)/2\) features. The UMAP model is first trained with features from 10 mild AD trials and 10 control trials. Then, 100 artificial mild AD trials and 100 artificial control trials generated with SEMD-MEMD, SEMD, or CEMD are transformed with the trained UMAP model. In the UMAP setting, the size of the local neighborhood used for manifold approximation is set to 10, and the effective minimum distance between embedded points is set to 1; the training epoch number for embedding optimization is 1000. The dimension of the features is reduced and projected onto a two-dimensional map with UMAP. Figure 8 shows that artificial data of the two classes generated with MEMD are more easily separable than those generated with SEMD or CEMD.

Fig. 8
figure 8

Feature map of artificial mild AD patients vs controls, plotted with UMAP. For each class, 100 artificial samples are generated using MEMD a, SEMD b, and CEMD c. The obtained embedding is normalized with min-max normalization before visualization

Performance Analysis

The evolution of the classification accuracy of the classifiers during the training process is depicted in Fig. 9. The training set is augmented with SEMD-MEMD. For the mild AD dataset, EEGNet has the worst classification performance with an average accuracy of around 53%. The data augmentation deteriorates the performance of EEGNet compared to the case of not using artificial data. On the other hand, the classification accuracy for BrainNet CNN improves with data augmentation when the number of artificial trials is greater than 20, as the accuracy converges faster than without data augmentation. The ResNet performance also improves with data augmentation.

Fig. 9
figure 9

The testing accuracy averaged across ten folds of the two datasets during the training process. A different number of artificial trials are generated, each one of them shown in a different color in the subplots. For each case, the upper panel contains the experiments with 0 to 50 artificial trials, and the lower panel contains the experiments with 0 to 500 artificial trials. The dashed line represents the 0 case, where no artificial trials are used

In Fig. 10, the trend of the accuracy of the classification is given. The accuracies of ResNet and BrainNet CNN in this figure are obtained after 100-epoch training, while the number of training epochs of EEGNet is 200. We note that data augmentation does not always help to improve the training of neural networks.

Finally, Fig. 11 shows the confusion matrices, with only real data (before) or with 10 artificial trials per class (after), respectively. The number of artificial trials generates an increase of 100% of samples in the training dataset (factor of 2). These confusion matrices are calculated using MATLAB’s “confusionmat” function [44]. The results were obtained by averaging over ten folds, and the final values were normalized by dividing by the sum of each row. The experiment was carried out using only a small number of artificial trials, as the results depicted in Fig. 10 pointed out that this was a good value in almost all the models. Table 5 contains the accuracy, sensitivity, and precision calculated using the “confusionmat” function.

Fig. 10
figure 10

Accuracy evolution when the number of artificial trials increases from 0 to 500. The trend of the accuracy is fitted with the power function \(f(x)=ax^b+c\). The dotted line represents the accuracy without data augmentation

Fig. 11
figure 11

Comparison of the confusion matrices before and after data augmentation for the two datasets. The confusion matrices are averaged across ten folds and normalized by dividing by the sum of each row

Table 5 Performance measurement before and after data augmentation, calculated using the “confusionmat” function in Fig. 11. The best result in each case is highlighted in bold

Discussion

In this work, we proposed a decomposition and recombination system to enlarge the size of two AD datasets and explored the data augmentation performance on three different neural networks. This work is based on the following two assumptions:

  1. 1.

    The AD dataset is a small dataset.

  2. 2.

    Neural networks need a considerable amount of data to tune the parameters.

Most patients affected by AD are elderly people. In contrast to the EEG signal acquisition of healthy people, AD patients are easily exhausted, weak, or less willing in the process of acquiring EEG signals. Sometimes, the acquisition can even be interrupted for unexpected reasons such as the non-collaboration of the patients. Therefore, AD datasets are very valuable and are usually small in size. To protect the health of the patients and to facilitate data acquisition in experiments, a data augmentation method is needed to process small AD datasets.

When it comes to the second assumption, note that deep neural networks can accurately find the unknown relationship between the raw data and the corresponding labels because of their intrinsic nature and huge number of parameters. At the same time, these parameters can only be learned from the available data, but the higher the number of parameters, the higher the number of signals needed to train the model. Therefore, data augmentation on small AD datasets is again of great interest.

In addition to the decomposition and recombination strategy in data augmentation, generative adversarial networks (GANs) are also a universal solution for time series data augmentation. However, in these, both the generator and discriminator parameters require a certain amount of data to be tuned. For an AD dataset of limited size, this requirement on the amount of data is not met, and hence, GANs are not suitable in this case.

In the classification of mild AD, data augmentation has a positive effect on the training of ResNet. When the number of artificial trials increases, the average accuracy of ResNet increases from 72.38 to 77.62%, with a consistent performance. In the BrainNet CNN case, a positive outcome is also obtained in the classification performance when using data augmentation in the mild AD dataset. However, this effect is only positive for a small number of artificial trials in the MCI dataset; if the number of artificial trials increases above 30, the mean accuracy decreases. Finally, the EEGNet network is the one with the poorest results for the mild AD dataset, and artificial trials only have a moderate positive effect for the MCI dataset again when the number of artificial trials is small.

In Fig. 11, the confusion matrices before and after data augmentation are given. Both ResNet and BrainNet CNN obtain a consistent accuracy, sensitivity, and precision increase when 10 artificial trials per class are used. As expected, the improvement is more noticeable in the mild AD database, as the two classes (controls and patients) are more distant from each other when compared to the MCI case, in which the patients are closer to the control subjects.

Summarizing the above experiments, the proposed decomposition and recombination system helps the training of neural networks in small AD datasets, and it seems that just a factor of 2 is enough for that. Having more artificial data does not always provide a better result, as we have seen in our experiments. The effects of the data augmentation depend on two factors: (i) the type of neural networks and (ii) the data set. Determining the number of artificial trials is influenced by these two factors, and ascertaining how to obtain an optimal value requires further experiments.

One possible reason for why the proposed data augmentation method does not always improve the accuracy results is due to the different characteristics of the two datasets. In Fig. 9a, the accuracy of ResNet in the mild AD dataset converges as the number of training epochs increases, and the result is stable in the training, with a small variance around the mean accuracy. However, in Fig. 9d, the accuracy in the MCI dataset still fluctuates in a larger range, especially compared with the mild AD dataset. This means that the network is more difficult to fit for the MCI dataset or that perhaps the quality of the data is also worse in that case. Although data augmentation improves the accuracy in the MCI dataset very slightly when the number of artificial trials is small, it still helps to train the ResNet: when the accuracy converges, the number of training epochs needed after data augmentation is smaller than without data augmentation, as shown in Fig. 9d. Similar fluctuations can be observed for the BrainNet CNN network in both datasets (Fig. 9b, e). This could explain why data augmentation is not helping in this case.

The proposed decomposition and recombination system has its own limitations. No pre-processing was used to remove artifacts or noise in the databases used in the experiments. Since the proposed method recombines all existing information in the data to enlarge the size of the training data, it is possible that artifacts or noise may also be replicated, which would negatively affect the results. Another aspect that can play a role is the decomposition method used. Here, we combine SEMD and MEMD, but other EMD-based methods have been proposed in the literature. Each method has different properties which impact the frequency mixing effect (overlapping of IMFs) and hence may influence the quality of the artificial frames. Moreover, the number of required artificial trials is unknown, as has been shown, and should be further investigated. More experiments are also needed to determine the number of epochs in the training phase, as our results indicate that the use of artificial trials may help to reduce the number of epochs in training and thus control possible overfitting. All of these aspects are now under consideration, and we expect to propose more reliable methods in future works.

Conclusion

In this paper, we proposed a decomposition and recombination system for data augmentation of the small AD data set as a way to solve the problem of insufficient data in neural network training.

This system consists of signal decomposition with SEMD-MEMD and a random recombination of the decomposed IMFs. The performance of this system is evaluated using three classifiers on two datasets. The main results show that the proposed system improves the accuracy of ResNet on the mild AD dataset with an increase of 5.24% and on the MCI dataset with an increase of 4.50%. Furthermore, BrainNet CNN results improve on the mild AD dataset with an increase of 2.38% and an increase of 0.75% on the MCI dataset. This work is expected to help the training process of detection methods for early diagnosis of Alzheimer’s disease.