Structural MRI Texture Analysis for Detecting Alzheimer’s Disease

Alzheimer’s disease (AD) has the highest worldwide prevalence of all neurodegenerative disorders, no cure, and low ratios of diagnosis accuracy at its early stage where treatments have some effect and can give some years of life quality to patients. This work aims to develop an automatic method to detect AD in 3 different stages, namely, control (CN), mild-cognitive impairment (MCI), and AD itself, using structural magnetic resonance imaging (sMRI). A set of co-occurrence matrix and texture statistical measures (contrast, correlation, energy, homogeneity, entropy, variance, and standard deviation) were extracted from a two-level discrete wavelet transform decomposition of sMRI images. The discriminant capacity of the measures was analyzed and the most discriminant ones were selected to be used as features for feeding classical machine learning (cML) algorithms and a convolution neural network (CNN). The cML algorithms achieved the following classification accuracies: 93.3% for AD vs CN, 87.7% for AD vs MCI, 88.2% for CN vs MCI, and 75.3% for All vs All. The CNN achieved the following classification accuracies: 82.2% for AD vs CN, 75.4% for AD vs MCI, 83.8% for CN vs MCI, and 64% for All vs All. In the evaluated cases, cML provided higher discrimination results than CNN. For the All vs All comparison, the proposed method surpasses by 4% the discrimination accuracy of the state-of-the-art methods that use structural MRI.


Introduction
Approximately 70% of all dementia cases worldwide are caused by Alzheimer's disease (AD), a progressive neurodegenerative illness. During its early stages -mild-cognitive impairment (MCI) -the condition is asymptomatic. Even though several studies have been conducted, a cure has not yet been discovered [1]. In general, people aged 65 and older live 4 to 8 years after being diagnosed with AD. Nonetheless, some people can live up to 20 years with AD. This extended duration before death significantly impacts public health as a considerable part of that period is spent in a state of dependence and disability [2]. It is therefore imperative to find more precise and reliable means of diagnosing AD to minimize its impact.
AD has 3 stages: (1) the pre-clinical AD, distinguished by the asymptomatic period that occurs between the initial brain lesions and the appearance of the first symptoms; (2) MCI, the pre-dementia state, in which individuals have cognitive deficits greater than those that naturally emerge with age, but do not fit the criteria imposed for the diagnosis of Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http:// adni. loni. usc. edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/ or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at: http:// adni. loni. usc. edu/ wp-conte nt/ uploa ds/ how_ to_ apply/ ADNI_ Ackno wledg ement_ List. pdf. was the Support Vector Machine (SVM) feed with information from left and right hippocampal volume and MMSE scores. The obtained discrimination accuracies were 99.2% for CN vs AD, 78.5% for CN vs MCI, and 91.3% for MCI vs AD.
Hon and Khan [10] used MRI images and extracted their entropy to characterize AD activity. Two Convolution Neural Network (CNN) architectures were used (VGG and Inception) and the reached discrimination accuracy was 96.5% for the CN vs AD comparison. Amini et al. [11] used functional MRI (fMRI) images and extracted the average and the standard deviation of cortical thickness, cortical parcel volume, white matter, and surface area. These features were used to feed both machine learning and CNN algorithms. It was found that the proposed CNN obtained a discrimination accuracy of 96.7% for the CN vs AD comparison.
Al-Khuzaie et al. [12] used MRI images and fed the proposed CNN with the 2D image slices. Thus, the discrimination accuracy achieved was 99.3% for the CN vs AD comparison. Liu et al. [13] used MRI images to extract hippocampal features. The chosen classifier was a 3D Densely CNN (DenseNet 3D). The discrimination accuracies obtained were 88.9% for CN vs AD and 76.2% for CN vs MCI. Qiu et al. [14] used MRI images and fed a Fully CNN with AD probability maps. A discrimination accuracy of 87.0% was obtained for CN vs AD.
Vaithinathan and Parthiban [15]  In this sense, the main purpose of the present work is to develop an artificial intelligence system that enables to detect AD in MCI and Dementia Stages (AD) stages, using sMRI texture features. The paper is structured as follows: Sect. 2 describes the used MRI database; Sect. 3 focuses on the image processing methodology and the classification process; Sect. 4 discusses the obtained results, lastly, Sect. 5 concludes the work.

Materials
The data used in this work are the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http:// adni. loni. usc. edu). The ADNI was launched in 2003 as a public-private partnership with the aim of testing whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild-cognitive impairment and early Alzheimer's disease.
Regarding the MRI scan, time overall was about 45 min per subject and session. Each exam undergoes quality control so that in case of, for example, subject motion or poor anatomic coverage, the scan is considered unusable. The database, released in February 2021, consists of 89 subjects scanned longitudinally at 3T with a 3-year follow-up, in which 24 are healthy control subjects, 44 are MCI patients, and 21 are AD patients (patients diagnosed with dementia due to AD). The demographic data of the 3 groups are summarized in Table 1.

Methods
The proposed methodology is divided into 3 main steps: (1) preprocessing, (2) wavelet decomposition and feature extraction, and (3) feature selection and classification. Figure 1 summarizes the methodology implementation steps.

Preprocessing
The dataset was loaded on FreeSurfer 7.1.1 software (freely available online at https:// surfer. nmr. mgh. harva rd. edu/) to decompose each 3D subject data into 2D slices comprising 3 different anatomical planes, namely, coronal, sagittal, and axial, and then to execute the skull stripping process on the 2D slice MR images. An example of skull stripping is illustrated in Fig. 2.
The resulting 2D slice images were loaded to Matlab Ⓡ 2019b software. These images were then filtered by the median filter with a 3 × 3 kernel to remove noise [18]. Subsequently, they were filtered by the imadjust filter to adjust the image intensity values to all scales according to [19] where P(m, n) is the input image, P adj (m, n) is the output image, m and n are the image pixel indices, and H and L are the maximum and the minimum pixel level in the original image, respectively, and T = 255 and B = 0 are the maximum and the minimum pixel levels in the desired image.

Wavelet Decomposition
The discrete wavelet transform (DWT) was chosen to describe the input images because it is possible to maintain higher resolution at low-frequency bands [20]. It can be obtained by restraining scale (s) and translation ( ) parameters to a discrete lattice with s = 2 −m and = n ⋅ p2 −m , where m and n are integers. Hence, for a discrete-time  where c i,k and d I,k correspond to the coefficients of the approximation component and coefficients of the detail component, respectively [21,22]. These coefficients are given by The parameters i and k indicate the wavelet scale and translation factors, respectively. Besides that, G i characterizes the coefficients of the low-pass and H I the coefficients of the high-pass filters. Every wavelet type and family is different with regard to these filters [21,23].
Since images are two-dimensional, the DWT is applied to images both vertically and horizontally. The result is four images (subbands) with half the width and the height, one of which is a decimated copy of the image (LL), and the 3 remaining contain information about the detailshorizontal (HL), vertical (LH) and diagonal (HH). At each subsequent step of decomposition, the LL subband is replaced by four smaller subbands, so the total number of subbands increases by 3 (see Fig. 3). (2) In this work, for all participants, in each plane, every image has been decomposed by the DWT until level 2, producing in this way 8 images, as illustrated in Fig. 3.

Features Extraction
For each of the 89 study participants, 243 images were used for feature extraction: the 27 original plane images (9 images of each of the 3 planes) and the 8 images resulted from the DWT decomposition of each plane image. From each image, 9 texture features were extracted: contrast, correlation, energy, homogeneity, entropy, line and column variances, and line and column standard deviations. Therefore, for each possible mother wavelet used in the DWT decomposition, 2187 features (729 per plane) were computed for each study participant.
The features were computed from the gray level co-occurrence matrix (GLCM), which is a statistical method that considers the spatial relationship of pixels and is employed to describe the texture of an image [24]. Each element {i, j} of the GLCM P i,j represents the frequency by which the pixel with gray level i is spatially related to the pixel with gray level j [24]. The formula and description of the features are summarized in Table 2, where and For each of the 3 planes (coronal, sagittal, and axial) of each study participant, each feature was averaged considering the 9 original images and the 72 images resulting from their DWT decompositions. This leads to 9 average features (1 value per feature) for each plane of each study participant. These average features were used for the selection processes of mother wavelets and features to improve classification results. The averaging processes per plane were applied to decrease the data dimensionality and consequently improve the execution time of these selection processes.

Wavelet Selection Process
The extracted features were used for binary classification within the pairs CN vs MCI, AD vs MCI, and CN vs AD, and for multi-class classification All vs All. All binary classifications were performed using exclusively the information of each of the 3 planes (coronal, sagittal, and axial) and also using together the information of the 3 planes.
Since the values of each feature depend on the mother wavelet used in the DWT decomposition, a search to find the five wavelets that result in features with greater discriminant capacity considering all study group pairs (CN vs MCI, AD vs MCI, CN vs AD and All vs All) and all study planes (coronal, sagittal, axial, and 3 planes) was performed. The evaluated wavelet families were Haar, Daubechies (Db), Symlets (sym), Coiflets (Coif), Biorthogonal (Bior), Reverse biorthogonal (rbio), Meyer, and Fejer-Korovkin (fk). The average features were used for this purpose.
The average values of each feature were separated for each combination of study group pair, study plane, wavelet, feature, and subband (or full-band). Each combination that uses only 1 plane leads to 1 value per study participant. Each combination that uses together the 3 planes leads to 3 values per study participant. Within each combination, including all study participants, the average values were normalized using z-score [25] and then applied to the Kruskal-Wallis (KW) test [26]. The KW test was used to determine if the null hypothesis that the data of the study groups come from the same distribution is accepted. In this test, p-values lower than 0.05 indicate that there is a significant difference between the distributions and then the null hypothesis is rejected [26]. It is worth mentioning that, for the multi-class study group All vs All, the p-values were corrected by the Bonferroni method [27]. Figure 4 shows the 15 cases with the highest number of average features that reject the null hypothesis and the corresponding wavelet. It is observed that the five wavelets with the highest number of significant features were Biorthogonal 1.1, Reverse Biorthogonal 1.1, Reverse Biorthogonal 1.3, Reverse Biorthogonal 1.5, and Reverse Biorthogonal 3.1. These wavelets were chosen for feature selection and classification procedure steps.

Features Selection and Classification
As mentioned earlier, classification within each study group pair (CN vs MCI, AD vs MCI, CN vs AD, and All vs All) was carried out for each study plane (coronal, sagittal, axial, and 3 planes). For improving the execution time and the classification results, for each combination of study group pair and study plane, a search was carried out to find the features, computed through the five selected wavelets, that result in the highest classification accuracy. Once again, the average features were used for selection purposes.  2 Measures the local variations between pixels Estimates the combined probability occurrence of the indicated pixel pairs Energy Specifies the sum of squared elements in the GLCM Measures the nearness of GLCM elements distribution to the GLCM diagonal Assesses the randomness of an intensity image Measures the dispersion of the image elements j 1 3 The non-normalized average values of each feature were separated for each combination of the study group pair and study plane. Each combination initially had 369 features (9 features × 8 images resulting from DWT decomposition × 5 wavelets + 9 features × 1 original plane image) for each plane of each study participant included in the study group pair. Within each combination, including all study participants belonging to the corresponding study group pair, the average values were normalized using z-score [25]. Then for each combination, including all study participants belonging to the corresponding study group pair, the normalized average values of all features were applied as inputs to a cascade of one F-score algorithm [28] and one classical machine learning (cML) algorithm to select, according to the maximum classification accuracy, the best set of features. The F-score algorithm individually assesses and rates the features based on their F-score. The features with an F-score value above the average are chosen as the relevant features [28].
The number of features selected by the f-score algorithm ranged from 2 to 9 in unit steps and from 10 to all in steps of 5. The cML algorithms were different configurations of decision trees, discriminant analysis, naive-Bayes, support vector machines (SVM), k-nearest neighborhood (KNN), and ensemble. In addition to the cML algorithms, a convolution neural network (CNN) was also applied. For each combination of study group pair and study plane, the CNN was fed with the sets of selected features that, used as inputs to the cML algorithms, led to the best classification result. The classifiers and their configurations are described in Table 3. In all cases, in order to verify the generalization capacity of the classifiers, a leave-one-out cross-validation procedure was used, a well-known process that allows the use of the whole dataset for testing, without leakage between train and test sets [29].

Results and Discussion
For each combination of study group pair and study plane, the highest classification accuracy achieved using the cML algorithms, and the corresponding number of selected features (ft), are shown in Table 4. The classification accuracy achieved employing the CNN, and the corresponding number of selected features (ft) and study plane, are shown in Table 5.
Scrutiny of Table 4 reveals that, for the study group pair CN vs AD, the highest classification accuracy achieved through the cML algorithms was 93.3% using 35 features from the sagittal plane and also with 115 features selected from the 3 planes, both with bagged trees classifiers. The lowest classification accuracy achieved through the cML algorithms was 77.8% using the axial plane. For this study group pair, as indicated in Table 5, the highest classification accuracy achieved through the CNN algorithm was 82.2% using the 115 features selected from the 3 planes.
For the pair AD vs MCI, it is observed from Table 4 that the highest classification accuracy achieved through the cML algorithms was 87.7% using 80, 95, and 140 features from the coronal plane and the quadratic SVM classifier. The lowest classification accuracy achieved through the cML algorithms was 78.5% using the sagittal plane. For this study group pair, as indicated in Table 5, the highest classification accuracy achieved through the CNN algorithm was 75.4% using the 95 features selected from the coronal plane.
Regarding the pair CN vs MCI, it is observed from    plane and the Fine KNN. The lowest classification accuracy achieved through the cML algorithms was 78.5% using the sagittal plane. For this study group pair, as indicated in Table 5, the highest classification accuracy achieved through the CNN algorithm was 75.4% using the 95 features selected from the coronal plane. Concerning the study group pair All vs All, as indicated in Table 4, the highest classification accuracy achieved through the cML algorithms was 75.3% using 80, 95, 105 and 115 features selected from the coronal plane and the subspace KNN classifier. The lowest classification accuracy achieved through the cML algorithms was 65.2% using the sagittal plane. It is observed from Table 5 that, for this study group pair, the highest classification accuracy achieved through the CNN algorithm was 64% using the 80, 85, and 95 features selected from the coronal plane. The lowest classification results were obtained for this study group pair, which indicates that the multi-class classification is the one in which the extracted features and the ML algorithms have more difficulty in discriminating between the groups.
Analyzing the results, it is observed that the CNN algorithm did not obtain classification accuracies higher than the cML algorithms in any of the four study group pairs. In fact, except for the pair CN vs MCI, the best result achieved using the CNN algorithm is worse than the worst result achieved using the cML algorithms. The overall poor performance of the CNN algorithm may be due to a non-optimal selection of the features to be applied on its inputs since the features were selected by applying the f-score algorithm combined with the cML algorithms and not with the CNN.
The only result above 90% was obtained in the pair CN vs MCI. This high classification accuracy is particularly important because, due to the lack of a cure for Alzheimer's disease, early detection plays a key role in medical intervention to reduce brain damage, preserve daily functioning for longer, and give the patient time to plan the future. Despite not having obtained the highest accuracy, the pair CN vs AD was the only one for which classification results above 80% were achieved in all study planes. This overall high performance was expected because CN and AD are the groups that have the greatest anatomical differences in the brain [30].
Among the study planes, the coronal plane was the one in which the best overall classification accuracies were obtained. This result is sustained by previous studies [31] and [32] and can be justified by the fact that the coronal plane enables a clearer view of 3 of the most important tissues for AD, namely, the cerebral cortex, ventricle, and hippocampus. Consequently, it is possible to indicate that the coronal plane allows the best visualization of the differences in the various anatomical regions of the 3 groups studied.
It is worth noting that the results presented and discussed above were obtained by using all study participants on wavelet and feature selection. Although easily found in literature, this is not the most rigorous way to select features because it may introduce a risk factor of overfitting. The selection was performed in this way due to the small size of the database, but this risk was reduced by the cross-validation employed on the evaluation performance.
A comparison between the classification results obtained in the present work and those found in the literature also using the ADNI image database is depicted in Table 6. It is observed that not all state-of-art methods performed the three binary classifications made in the present work, focusing on the pair CN vs AD. And, more importantly, only three of the state-of-art methods carried out the multi-class classification All vs All.
For the pair CN vs AD, compared with only sMRIbased methods, the 93% achieved in the present work is 14, 10, 7 and 7% higher than that obtained in Lebedev et al. [33], Qiu et al. [14], Zhang et al. [34] and Ruiz et al. [8], respectively, but 6% lower than that obtained in Thappa et al. [9]. Regarding the multi-class classification All vs All, the proposed method stands out for achieving the highest accuracy, outperforming the methods developed in Lebedev et al. [33], Zhang et al. [34] and Lee et al. [36] by 34, 23 and 4%, respectively.
Compared with diagnosing methods through images techniques other than sMRI, the proposed method outperformed the methods developed in Liu et al. [35] and Cheng et al. [37] by 2 and 1%, respectively, but it is surpassed by 4% by the fMRI-based method's developed in Amini et al. [11]. Although the above performance comparisons are evidence of the proposed method's ability to discriminate the different stages of AD, they should be carefully analyzed since different works may use different amounts of subjects, or the same amount but different subjects, even if the database is the same.
In addition to the ADNI database, the sMRI-based method developed in Qiu et al. [14] was also originally evaluated using other image databases and these results are summarized in Table 7. It is observed that, for the pair CN vs AD, the classification accuracy achieved by applying the proposed method to the ADNI database also outscores those obtained by applying the method developed in [14] to the  AIBL, FHS, and NACC databases. Besides the different features computed from the images, a factor that may be contributing to the better general performance of the proposed method is the feature selection, a procedure not performed in [14]. Although enriching, these comparisons need to be carefully analyzed because different image databases were employed in the studies. A comparison between the classification results obtained in the present work and those found in the literature using signal and biomarkers techniques is summarized in Table 8.
It is observed that the proposed sMRI-based method did not present the best performance in any of the analyzed study group pairs. For the pair CN vs MCI, it outperformed the method developed in [38] by 11% but it is surpassed by the method introduced in [39] by 10%.
In the MCI vs AD case, the proposed method outscored the methods developed in [40,41], and [38] by 10, 9, and 5%, respectively, but is outperformed by the method elaborated in [39] by 6%. For CN vs AD, the proposed method outperformed both the methods developed in [40] and [41] by 10%, but is outscored by the method produced in [38] by 2%. In the multi-class All vs All case, the proposed method did not outperform the EEG-based methods developed in [38] and [39], being surpassed by 21% by the latter.

Conclusion
Alzheimer's disease is one of the neurodegenerative diseases with the highest prevalence, affecting millions of people worldwide. This work aimed to detect AD on the stages CN, MCI, and AD itself using sMRI. A set of co-occurrence matrix and texture statistical measures (contrast, correlation, energy, homogeneity, entropy, variance, and standard deviation) were extracted from a two-level DWT decomposition of sMRI images. The discriminant capacity of the measures was analyzed and the most discriminant ones were selected to be used as features for feeding classical machine learning algorithms and a CNN. The classical algorithms achieved the following classification accuracies: 93.3% for AD vs CN, 87.7% for AD vs MCI, 88.2% for CN vs MCI, and 75.3% for All vs All. The CNN achieved the following classification accuracies: 82.2% for AD vs CN, 75.4% for AD vs MCI, 83.8% for CN vs MCI, and 64% for All vs All. For the All vs All comparison, the proposed method outperformed by 4% the highest classification accuracy of the state-of-art sMRI-based methods.
The accuracies achieved for AD vs CN, AD vs MCI, and CN vs MCI indicate that the evaluated measures have a great ability to distinguish within these binary groups. However, despite surpassing the state-of-the-art, additional research should be conducted to improve the accuracy of the challenging multi-class classification All vs All. Despite the promising results, the database size was a limitation for the Author Contributions JS and PMR contributed to conceptualization; JS contributed to methodology; BCB and PMR contributed to validation; JS contributed to writing-original; BCB and PMR contributed to writing-review and editing; and PMR contributed to supervision. All authors read and approved the final manuscript.
Funding Open access funding provided by FCT|FCCN (b-on). This work received no funding support.
Data Availability Statement Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http:// adni. loni. usc. edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at: http:// adni. loni. usc. edu/ wp-conte nt/ uploa ds/ how_ to_ apply/ ADNI_ Ackno wledg ement_ List. pdf.

Conflicts of interest
No potential conflict of interest was reported by the authors.
Ethics approval and consent to participate Not applicable.

Consent for publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.