Novelty detection-based approach for Alzheimer’s disease and mild cognitive impairment diagnosis from EEG

Alzheimer’s disease is diagnosed via means of daily activity assessment. The EEG recording evaluation is a supporting tool that can assist the practitioner to recognize the illness, especially in the early stages. This paper presents a new approach for detecting Alzheimer’s disease and potentially mild cognitive impairment according to the measured EEG records. The proposed method evaluates the amount of novelty in the EEG signal as a feature for EEG record classification. The novelty is measured from the parameters of EEG signal adaptive filtration. A linear neuron with gradient descent adaptation was used as the filter in predictive settings. The extracted feature (novelty measure) is later classified to obtain Alzheimer’s disease diagnosis. The proposed approach was cross-validated on a dataset containing EEG records of 59 patients suffering from Alzheimer’s disease; seven patients with mild cognitive impairment (MCI) and 102 controls. The results of cross-validation yield 90.73% specificity and 89.51% sensitivity. The proposed method of feature extraction from EEG is completely new and can be used with any classifier for the diagnosis of Alzheimer’s disease from EEG records.

memory and other cognitive dysfunctions. AD is the result of brain damage that begins decades before clinical manifestations begin [1]. Currently, AD is the most common type of dementia. Previous studies have shown that this disorder is associated not only with regional brain abnormalities but also with changes in neuronal connectivity between anatomically distinct brain regions [2]. A global connectivity deficit was found in AD as it has been reported in [3]. Changes in the functional organization of the brain in patients with AD can be observed in the resting condition [4]. The hypothesis of Alzheimer's disease as a disconnection syndrome assumes a functional or structural disconnection of larger parts of the brain rather than an isolated involvement of small areas of the brain [5]. In recent years, graph theory has been used to study anatomical and functional brain connectivity, which provides a better understanding of the relationships between different brain structures [6]. Recent studies support the hypothesis of a loss of global information integration in the AD brain due to the loss of long-distance connectivity [7]. Furthermore, the increase in theta and delta power, the decrease in beta and the slowing of the alpha frequency in AD patients were proven [8][9][10][11][12][13][14]. Using EEG in AD patients, it was observed a marked amplitude decrease of alpha (8)(9)(10)(11)(12)(13) and an increase in power and spatial distribution in the slower delta (2)(3)(4) and theta (4)(5)(6)(7)(8) rhythms [15]. Association between slow-wave activity in the spectral analysis of the electroencephalogram and wholehead MEG and volumes of hippocampus in AD and MCI subjects has been observed [16,17]. Another reported effect of AD on EEG is reduced complexity and perturbations or the decrease of EEG synchrony [14].
Mild cognitive impairment (MCI) is an intermediate stage between the expected cognitive decline of normal ageing and the more pronounced decline of dementia. It involves problems with memory, language, thinking, and judgement that are greater than typical age-related changes. However, the changes associated with MCI are not severe enough to interfere with day-to-day life and ordinary activities. Early detection of MCI may help prevent the transition to AD.
According to a previous study [18] an estimated 5.4 million Americans of all ages suffered from AD in 2016. By mid-century, the number of people living with AD in the USA is predicted to rise up to 13.8 million. Research of EEG classification for AD (and MCI) is important because it is a tool for the detection of dementia in its early stages. The early detection of dementia onset is important for establishing effective treatment [19][20][21][22][23][24][25][26]. The early diagnosis and treatment could slow down the process of dementia development [27]. For this reason, the development of easily applicable methods for diagnosis methods is still needed. EEG measurement and analysis appear to be a platform with good potential. EEG is easy to measure, non-invasive, and inexpensive.
The most common features used for the diagnosis of AD via EEG classification are frequency-domain descriptors. The review study [28] compares various classifiers using frequency-domain descriptors as it has been proposed in [29]. The results of this study (Table 1) indicate that with such features it is possible to obtain up to 94% of sensitivity and 85% specificity with known classifiers. The features obtained with deep convolutional neural networks have been used in a more recent study [30]. However, such features are impossible to interpret and computationally exhaustive to obtain. The method proposed in this study extracts the features from EEG in a computationally easy way, but it is completely different from the frequency analysis. This is the reason it can improve the classification performance of any classifier. Therefore, the goal of this study is not the classifier testing, but the analysis of the proposed feature extraction method potentials. The proposed method is based on an adaptive novelty detection method introduced in [31] called the error and learning-based novelty detection (ELBND). The suitability of this method for non-stationary data was demonstrated in study [32]. Novelty can be considered as a measure of entropy. Different entropy measures show a decline in patients with Alzheimer's disease [33,34]. This study is an extension of previous conference paper [35]. In this study, we use a different dataset containing also MCI patients and data measured on two different machines -in order to determine whether this approach is independent on a device or not. A different group of AD patients was used. New patients were recorded on higher quality EEG and diagnosed in a specialized department of the University Hospital).
The ELBND method is unique. Unlike other similar methods, ELBND uses the error of a predictive model and also an increment of the adaptive weights. Only the prediction error [36,37], or the increment of adaptive weights [38] are used in other studies. An interesting advantage of the proposed ELBND method is that even if the signal is non-stationary and of a nonlinear dynamic, the prediction model could be linear. This method was useful for the ECG signal analysis in [31]. In that report, it was proven that even the incorrectly chosen model could be sufficient for successful search for perturbations with the ELBND method.

Participants
EEG data were obtained from 59 patients with moderate dementia (Mini-Mental State Exam (MMSE) score = 10-19) and seven patients with MCI. All patients were diagnosed according to the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA) Alzheimer's Criteria [39]. Clinical history was obtained from the patient and a caretaker. Information about co-morbidity at the time of diagnosis was requested from the general practitioner. A neurological and physical examination was performed in all patients. Multi-slice CT was used to assess hippocampal atrophy.
Blood levels of folate, vitamin B12, thyroid stimulating hormone, calcium, glucose, complete blood cell count, renal and liver function tests were evaluated at the time of diagnosis. Serological tests for syphilis, Borrelia and HIV were made when necessary. EEG was recorded for differential diagnosis of AD to differentiate Creutzfeldt-Jakob disease or transient epileptic amnesia. CSF 14-3-3 or total tau, phospho-tau and Ab42 measurement was made in patients with rapidly progressive dementia. The Minimental State Examination Test (MMSE) was used for cognitive screening. For MCI testing the "Revised criteria for mild cognitive impairment may compromise the diagnosis of Alzheimer disease dementia" were applied [40]. Diagnosis MCI required change in cognition recognized by the affected individual or observers; objective impairment in one or more cognitive domains measured by ACE; independence in functional activities assessed by the Functional Activities Questionnaire (FAQ); and absence of dementia according to the NINCDS-ADRDA Alzheimer's Criteria.
The control group of 102 age-matched, healthy subjects had no memory or other cognitive impairments. They failed to meet NINCDS-ADRDA Alzheimer's criteria and showed no signs of other neurodegenerative diseases. The average MMSE of the AD group was 14.9 (standard deviation = 2.3). The mean ages of all three groups were 70.5 ± 4.9 years in the AD group, 67 ± 7.6 years in the MCI group and 72.2 ± 5.3 for the normal subjects. The structure of the groups was as follows: Alzheimer's group, 28 men and 31 women; MCI group, 3 men and 4 women; and control group, 43 men and 59 women.
Alzheimer's disease was diagnosed according to the "NINCDS-ADRDA Alzheimer's Criteria" (odkaz v našem textu). Clinical history was obtained from the patient and a caretaker. Information about co-morbidity at the time of diagnosis was requested from the general practitioner. A neurological and physical examination was performed in all patients. Multi-slice CT was used to assess hippocampal atrophy. Blood levels of folate, vitamin B12, thyroid stimulating hormone, calcium, glucose, complete blood cell count, renal and liver function tests were evaluated at the time of diagnosis. Serological tests for syphilis, Borrelia and HIV were made when necessary. EEG was recorded for differential diagnosis of AD to differentiate Creutzfeldt-Jakob disease or transient epileptic amnesia. CSF 14-3-3 or total tau, phospho-tau and Ab42 measurement was made in patients with rapidly progressive dementia. The Minimental State Examination Test (MMSE) was used for cognitive screening. For MCI testing the "Revised criteria for mild cognitive impairment may compromise the diagnosis of Alzheimer disease dementia" were applied (odkaz morris). Diagnosis MCI required change in cognition recognized by the affected individual or observers; objective impairment in one or more cognitive domains measured by ACE; independence in functional activities assessed by the Functional Activities Questionnaire (FAQ); and absence of dementia according to the NINCDS-ADRDA Alzheimer's Criteria.
The informed consent was obtained from all subjects, and the study was approved by the local ethics committee: Ethics committee Faculty Hospital Hradec Kralove. For academic projects (where no new drugs are tested and the studies are not approved by the state drug control institute) there are no assigned study numbers.

EEG recordings and preprocessing
All recordings were performed under similar standard conditions. The subjects were in a comfortable position, on a bed, with their eyes closed. The length of the resting state recording was 15 min. Hyperventilation, photostimulation and alpha attenuation reaction were excluded from the calculation. The beginnings and ends of events such as eye opening, hyperventilation, and photostimulation in the EEG record were manually marked by the technician. During record preprocessing, sections between marks were excluded from processing. Experienced technician wakened the patients with signs of falling asleep. The electrodes were positioned according to the 10-20 System. The 10-20 system of electrode placement is a method used to describe the standardized location of scalp electrodes. It ensures that the inter-electrode spacing is equal and electrode placements are proportional to skull size and shape. The "10" and "20" refer to the 10% or 20% inter-electrode distance. Most electrode names correspond to the cerebral lobe above which they are located. The letters Fp, F, T, C, P, and O stand for Frontopolar, Frontal, Temporal, Central, Parietal and Occipital. Pre-frontal electrodes (Fp) are placed above anterior part of frontal lobe. Even numbers (2,4,6,8) refer to the right hemisphere and odd numbers (1, 3, 5, 7) refer to the left hemisphere. The "z" refers to an electrode placed on the midline. The smaller the number, the closer is the position to the midline. The recording was conducted on a 21-channel digital EEG setup (Walter EEG PL-231, Germany) with a sampling frequency of 256 Hz and TruScan 32 (Alien Technik Ltd., Czech Republic) with a frequency of 128 Hz and 21-channel setup. The data in the group with recording frequency of 256 Hz were down-sampled to 128 Hz. Both groups of data were then detrended and filtered with notch filter that filters out 50 Hz. The linear detrending subtracted the best-fit line in the least-squares sense from the evaluated segments of the EEG data. The analysis began with manual artifact removal. Artifacts were rejected by experienced neurophysiologist by visual inspection. The following artifacts were eliminated by manually selecting a sample: myogenic potentials, glossokinetic artifact (important in AD patients), eye movements, ECG artifacts, pulse artifacts, respiration artifacts, skin artifacts and electrode artifacts. Afterwards, the data were grouped into non-overlapping segments of 1000 time samples (7.8125 s).

Feature extraction
The features used for the data description were obtained from adaptive parameters of the predictive model and its error. As the predictor, we used a linear neural unit (LNU) [41,42], with Gradient Descent (GD) adaptation [38], also known as stochastic online back-propagation.
The LNU could be described by Eq. (1): where y(k) is prediction output, w(k) is vector of adaptive weights (parameters) of the model and x(k) is input vector of the model. The input vector for prediction of every new sample is computed on individual EEG channels as follows: where y r (k) denotes measured EEG values. The input vector contains the history of last n time samples and bias (in this case bias = 1). The size of the used history n = 6 was chosen experimentally. The higher number does not produce better results but it increases the time complexity. The GD adaptation of the model Eq.
(1) could be written: where Δw(k) is the vector of adaptive weight increments as follows: where the μ is learning rate and the error e(k) is calculated as follows e(k) = y r (k) − y(k).
To improve the adaptation convergence, the measured EEG records are z-scored as follows: whereȳ r is the mean value of y r , and σ yr is the standard deviation of y r . With such a normalization, it is possible to achieve better simulation stability of weight update system [43] with a higher learning rate. For further improvement of the adaptation convergence, we normalize the learning rate μ [44,45]. For such an adaptation we used modification of the learning rate normalization as in [38] that is calculated as follows: where η is the normalized replacement for the learning rate μ. The learning rate adaptation is evaluated before the prediction of every individual sample. This algorithm is also called normalized least-mean-squares (NLMS). We used the ELBND method proposed in [31], for the first time, with the normalized learning rate η to classify the measured EEGs. As the result of estimation, a vector of coefficients describing novelty is estimated for every sample in measured data according to the following equation: If we replace η in Eq. (8) by Eq. (7), we will obtain where the regularization term = 1. For our work, we used just the largest coefficient out of c(k) vector to describe every sample: The maximum was used because in some steps some weights do no increment too much. However, in general, the weights increments are strongly correlated. If we reduce the vector of increments just to its maximum we will not lose too much useful information while decreasing the dimension of the output data. The mean value estimation is alternative to maximum function; however, it does not produce better results in this case.
To make the data segments easily comparable, every segment was annotated with a single value. This further data reduction was achieved by calculating the standard deviation of c(k) coefficients for the whole EEG segment. The single value descriptor carries the important information about the novelty of the whole EEG segment. Such a simplification can be beneficial during the classification.

Classification
The classification of positives (AD and MCI) and controls is a 2-class problem. A very simple approach was used for this classification. First, the average values of the novelty descriptor for controls and positives were calculated from the training data. The classification criteria were placed exactly between these two mean values. The subject was considered as negative or positive according to its novelty descriptor value. If the value is higher than the criteria, then the subject is classified as positive.

Cross-validation
For the method validation, we used exhaustive leave-p-out cross-validation (p = 3), as it is common for a given topic. The exhaustive leave-p-out means, that we generated all possible combinations of 3 subjects from the data. For every combination, we used the chosen p subjects for testing and leftover subjects for training. The validation results of all combinations were subsequently used for the estimation of specificity and sensitivity.

Novelty estimation in individual channels
First, we estimated the novelty descriptor in single EEG channels with our proposed method. The values are presented in Table 2. The values of some channels have lower standard deviations and bigger differences in mean values among the groups we wanted to classify, i.e., some channels are more suitable for the classification than others. The best channels are the ones with the biggest difference between the mean values of the normal group and the AD group or the biggest difference between the normal group and the MCI group. A solution for the problem of the best channel selection was not the goal in this study.

Cross-validation of classification
The results of AD classification are presented in Table 3 and the MCI classification results are in Table 4. As shown, the most accurate classification was based on channels T6, P4, and P3 for AD diagnosis and was based on channels T6 and T5 for MCI diagnosis. No channel had the accuracy lower than 72% in AD classification. As far as MCI classification is concerned, all channels were above 63%. The criteria used for the classification of AD patients are shown in Fig. 1 and for MCI patients in Fig. 2.

Comparison of different machines
Because the used data were obtained from two different machines, we also compared novelty in channels to verify that the method validation results were not influenced by the source of the EEG data. The estimated values are presented in Table 5 and are estimated only for AD and MCI patients because all controls were measured with the same machine. The differences in the mean values of novelty and standard deviations of the novelty between the machines used were small or none. According to this finding it is possible to conclude that the used device does not influence the classification results.

Discussion
From the clinical point of view, the most important is timely and inexpensive diagnosis of individuals at risk of developing AD. MCI is associated with a high risk of developing AD. Therefore, the authors retain this group of patients separately.
The reason for the lower age of patients with MCI is the age dependence of AD, where MCI in many cases results in AD. The lower age of MCI patients could reduce the sensitivity of the test in this group. In the case of comparable age average of MCI with other groups, we would expect an increase in the value of the discriminatory test.
The most significant changes were found in the brain areas with the most expressed neurohistological changes (temporal regions). Episodic memory is the function most commonly impaired early in AD as a consequence of mesial temporal lobe atrophy (entorhinal cortex, hippocampus) which disables consolidation. MRI features involve two features: mesial temporal lobe atrophy (particularly the hippocampus, entorhinal cortex and perirhinal cortex) and temporoparietal cortical atrophy [46]. So the first and most affected parts of the brain in the typical Alzheimer's disease are temporoparietal regions. The neurohistological changes (such as amyloid plaques and neurofibrillary tangles, neuropil threads, and dystrophic neurites) cannot be verified by computed tomography data. The authors of this study proceeded from the typical localization of these changes, which is the basis of the diagnosis of Alzheimer's disease [47]. The hypothesis that both groups -the positives (AD+MCI) and the controls -have the same spectral power in Delta and Theta bands was tested with t-test. Welch's method was used for the power spectral analysis of a signal. The resulting p-values were tested with false discovery rate (FDR) to control the false discoveries due to multiple comparisons problem. P-values for all the channels were under the FDR threshold of 0.05. This might indicate the brain atrophy in the AD group. The previously displayed results [35] obtained with different datasets yield a better classification rate (sensitivity and specificity of 95%). This may be caused by multiple reasons: bigger and more balanced dataset in the previous study, origin of all data from one measuring device, absence of MCI patients, classification over all channels. The goal of this study was to analyze the novelty in separate channels and evaluate the influence of measuring device on the classification performance and to estimate the potential of this method.
The truism that EEG is nonspecific and cannot diagnose aetiology or localization well is often cited. However, in general medical practice, non-specificity is often not the question because most of the referrals in general neurology are individuals in whom the cause is clear, or reasonably suspected, on the basis of clinical history and laboratory chemistry. The questions from the clinician are whether the brain is involved and what the extent of the brain damage is. The novelty estimation may bring new information about the changes in the brain dynamics during cognitive decline in patients with AD. It may become a suitable complement to the traditional qEEG methods.

Conclusion
The desynchronization of the EEG is the interruption of its rhythmical activity. It occurs with the activation of ascending cholinergic projections of the basal forebrain and brainstem and projections from the raphe nuclei and locus coeruleus [48][49][50]. The rhythmical activity is interrupted both by direct effects on cortical neurons and by indirect effects on thalamic neurons. The cholinergic hypothesis, which was initially presented 20 years ago, suggests that a dysfunction of acetylcholine containing neurons in the brain might substantially contribute to the cognitive decline observed in patients with Alzheimer's disease. Thus, the decreased cholinergic projections to the cerebral cortex in patients with cognitive decline make the desynchronization of the EEG activity less probable. Unlike the EEG energies, where the maximum changes are present in most of the atrophic parts of the brain, the novelty changes are more diffuse and probably reflect the effects of diffuse cholinergic projections on the cortical oscillatory activity. The distribution of changes was similar to those changes in complex noise characteristics [51].
The proposed method was able to mark measured EEGs with a single value that could be directly used for the AD diagnosis. The current work points to less complexity at smaller scales in AD group in frontal areas, while higher complexity at larger scales was observed across the brain areas and this higher complexity was significantly correlated with cognitive decline [52]. It is well known that EEG signals of AD patients are generally less synchronous than in age-matched control subjects [53]. Lower predictability (higher level of novelty) may reflect a higher complexity of EEG signal in patients with AD.
The presented results show that our proposed method has the accuracy comparable with other methods using different EEG features (Table 1). However, a sample of seven MCI patients is very small to extract any meaningful result. The results, however, give hope that this methodology could be sensitive to MCI. To test this hypothesis it is still necessary to expand the group of patients with MCI. In this paper, we also compare the values produced by the proposed method on the data obtained from two different machines.
The data from one of the machines was also resampled. According to the results of comparison, it appears that results are independent of the measuring device and resampling process.

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.