Early Diagnosis of Alzheimer’s Disease by Joint Feature Selection and Classification on Temporally Structured Support Vector Machine
- 13 Citations
- 3.4k Downloads
Abstract
The diagnosis of Alzheimer’s disease (AD) from neuroimaging data at the pre-clinical stage has been intensively investigated because of the immense social and economic cost. In the past decade, computational approaches on longitudinal image sequences have been actively investigated with special attention to Mild Cognitive Impairment (MCI), which is an intermediate stage between normal control (NC) and AD. However, current state-of-the-art diagnosis methods have limited power in clinical practice, due to the excessive requirements such as equal and immoderate number of scans in longitudinal imaging data. More critically, very few methods are specifically designed for the early alarm of AD uptake. To address these limitations, we propose a flexible spatial-temporal solution for early detection of AD by recognizing abnormal structure changes from longitudinal MR image sequence. Specifically, our method is leveraged by the non-reversible nature of AD progression. We employ temporally structured SVM to accurately alarm AD at early stage by enforcing the monotony on classification result to avoid unrealistic and inconsistent diagnosis result along time. Furthermore, in order to select best features which can well collaborate with the classifier, we present as joint feature selection and classification framework. The evaluation on more than 150 longitudinal subjects from ADNI dataset shows that our method is able to alarm the conversion of AD 12 months prior to the clinical diagnosis with at least 82.5 % accuracy. It is worth noting that our proposed method works on widely used MR images and does not have restriction on the number of scans in the longitudinal sequence, which is very attractive to real clinical practice.
Keywords
Joint Feature Selection Abnormal Structural Changes Longitudinal Imaging Data ADNI Dataset Mild Cognitive Impairment (MCI)1 Introduction
Alzheimer’s disease is an incurable neurodegenerative disease. Typical clinical symptoms include memory loss, disorientation, language and behavioral issues. The progression of AD is not reversible, however, there are treatment available to modify disease effect in the early stage of AD [1]. Thus, early diagnosis or prognosis of AD is of high value in clinical practice since it can save more time for treatment and then improve the life quality for not only patients but also their caregivers.
AD introduces both structural and functional loss that is known to have dynamically evolving morphological patterns [1, 2, 3, 12, 13, 14, 15]. In the past decade, longitudinal studies have been actively investigated for AD diagnosis with special attention to MCI [1, 4], which is an intermediate stage between NC and AD. For example, tensor-based morphometry is used in [4] to reveal brain atrophy patterns from 91 probable AD patients and 189 MCI subjects scanned at baseline, and after 6, 12, 18, and 24 months. Moreover, the trend of longitudinal cortical thickness is used as the morphological patterns in [5] to identify subjects which eventually convert to AD. However, current longitudinal AD diagnosis methods have very strong restriction on the longitudinal image sequence. For example, each subject recruited in [5] should at least 5 time-points in every six months, and should develop AD after at least 12 months after the baseline scan. For convenience, many longitudinal approaches assume the number of scans is equal, albeit implicitly. In real clinical setting, however, not all patients have a large or an equal number of imaging scans.
In order to accurately measure the tiny structural changes along time, current state-of-the-art computer assisted diagnosis methods have to wait until the patient has enough number of longitudinal scans. More critically, the prediction is short term, e.g., only 6 months before real onset of AD in [5]. Although promising results have been achieved in predicting whether the subject has progressed to AD or stays in MCI stage, the limitation of short-term prediction substantially hamper the deployment in clinical practice.
In light of this, we propose a flexible solution for early detection of AD by sequentially and consistently recognizing abnormal patterns of structure change from longitudinal MR image sequence. First, we present a novel temporally structured SVM (TS-SVM) which is trained based on a set of partial image sequences cut from the complete longitudinal data. Compared to conventional SVM, our TS-SVM has two major improvements to achieve early alarm and high accuracy in detecting AD progression: (1) Temporal consistency. We enforce monotonic constraint to avoid inconsistent detection results along time. Since convergent evidence suggests that AD progression is non-reversible [6, 7], we require the risk of AD progression should monotonically increase within each subject as more and more time-points are inspected. (2) Early detection. We employ sequential recognition to achieve best balance of early alarm and detection accuracy. In the training stage, we specifically train the classifiers by making the classification margin adaptive to the length of partial image sequence. Given the longitudinal image sequence of new subject with arbitrary number of scans, we sequentially examine the longitudinal imaging patterns from baseline and alarm the AD conversion as long as the detection of abnormal change is of high confidence. Thus, our proposed AD early detection method does not have requirement on the number of scans. Second, we further present a joint feature selection and classification framework, in order to make the selected best features are eventually optimal to work with the learned support vector machine. We have evaluated the performance of AD early detection on more than 150 longitudinal subjects from ADNI dataset. Our method achieved promising results by alarming AD onset 12 months prior to the clinical diagnosis with at least 82.5 % accuracy.
2 Methods
2.1 Temporally Structured SVM for Early Detection of AD
The goal of our method is to accurately predict AD converting as early as possible by longitudinally tracking the structure changes. Since magnetic resonance (MR) image is non-invasive and widely used in clinic practice, we present a novel temporally structured SVM on longitudinal MR image sequences.
Morphological Features.
Suppose we have \( N \) training subjects, each subject \( S_{n} \) has a MR image sequences \( \varvec{I}^{n} = \{ I_{t}^{n} |t = 1, \ldots ,T_{n} \} \) (\( n = 1, \ldots ,N \)) with \( T_{n} \) longitudinal scans. For each volumetric image \( I_{t}^{n} \), we first register the template image (http://qnl.bu.edu/obart/explore/AAL/) with 90 manually labeled ROIs (regions of interest) using hammer registration tool to the underlying image \( I_{t}^{n} \) and extract seven morphological features in each ROI which include tissue percentiles (volumetric percentiles of the ROI volume) of white matter (WM), gray matter (GM), cerebral-spinal fluid (CSF), and background, and the averaged voxel-wise Jacobian determinant in WM, and GM and CSF regions. Therefore, the image feature \( {\mathbf{f}}_{t}^{n} \) for each volumetric image \( I_{t}^{n} \) is a \( 90 \times 7 = 630 \) dimension feature vector.
Decomposition to Partial Image Sequences.
We can decompose the complete longitudinal image sequence \( \varvec{I}^{n} \) into \( \left( {T_{n} - 1} \right) \) partial image sequences \( \varvec{P}^{n} = \{ \varvec{P}^{n} (b)|b = 2, \ldots ,T_{n} \} \), where each \( \varvec{P}^{n} \left( b \right) = \{ I_{t}^{n} |t = 1, \ldots ,b\} \) is the partial image sequence with \( b \) time points from baseline to \( \left( {b - 1} \right) \)-th follow-up. For each \( \varvec{P}^{n} (b) \), we further extract longitudinal feature representations and form a column vector \( {\mathbf{h}}\left( {b,n} \right) = \left[ {\sum\nolimits_{t = 1}^{b} {{\mathbf{f}}_{t}^{n} /b,\left( {{\mathbf{f}}_{1}^{n} - {\mathbf{f}}_{b}^{n} } \right)} } \right]^{'} \), where the first half elements are the average of morphological features from baseline to last time point and the second half elements measure the longitudinal difference of morphological features from baseline to the last follow-up. It is apparent that each feature representation \( {\mathbf{h}}(b,n) \) describes both the spatial and temporal morphological patterns. As we will explain in Sect. 2.2, feature selection is of necessity to remove data redundancy from such high dimension (\( d = 1,260 \)).
Naive Way to Achieve Early Detection by Classic SVM.
Advantages of our TS-SVM (right) over the naïve SVM solution (left). In our TS-SVM method, we enforce the temporal monotony and consistency constraints on the extracted partial image sequences (shown in the middle))
Temporally Structured SVM on Longitudinal MR Image Sequences.
Compared to the objective function of naïve SVM in Eq. (1), two new constraints (C 1 and C 2 ) are used. (1) we first turn the inter-class margins \( \delta_{x} \) and \( \delta_{y} \) in Eq. (1) from scalar values into the monotonically increasing functions of \( b \) (the length of partial image sequence). The constraint C 1 is mainly used to achieve early detection, i.e., we require the probability of making accurate classification should increase as more time points are available. (2) The second constraint C 2 takes advantage of the non-reversible nature of AD progression. Suppose \( {\mathbf{y}}_{a,q} \) and \( {\mathbf{y}}_{b,q} \) are the morphological features from the same MCI-C subject but \( {\mathbf{y}}_{b,q} \) is extracted at the later time points after \( {\mathbf{y}}_{a,q} \) (i.e., a < b). Then we require the probability of the underlying MCI-C subject being converted to AD should higher at later time point b than at earlier time point a, i.e., \( {\mathbf{w}}_{{\mathbf{y}}}^{T} {\mathbf{y}}_{b,q} > {\mathbf{w}}_{{\mathbf{y}}}^{T} {\mathbf{y}}_{a,q} \) since AD conversion is irreversible. Furthermore, the intra-class margin \( \tau_{{\mathbf{y}}} \) is a monotonically increasing function of l (\( l = b - a \) is the length difference between two partial image sequences). Intuitively, the bigger the gap between two time points is, the larger the increase of AD conversion risk becomes. It is worth noting that the constraint C 2 is not applicable to MCI-NC subjects since the MCI-NC subject might convert to AD as more and more follow-ups will be scanned in future. Thus it is unreasonable to assume the MCI-NC subject can keep staying at MCI stage. As shown in the right of Fig. 1, for particular MCI-C subject, not only the probability score of AD conversion but also the difference between the probability scores of converting to AD and staying MCI monotonically increase as the partial image sequence becomes longer and longer. Thus, our TS-SVM can detect AD onset at early stage with high confidence. It is worth noting that we set \( \delta_{{\mathbf{x}}} \left( b \right) = b \), \( \delta_{{\mathbf{y}}} \left( b \right) = b \), \( \tau_{{\mathbf{y}}} \left( l \right) = l \) in all experiments.
2.2 Joint Feature Selection and Classification on TS-SVM
2.3 Optimization
3 Experiments
In the following experiments, we select 70 MCI-C subjects from ADNI dataset which have AD onset in the middle of longitudinal image sequence and 81 MCI-NC subject which stay in MCI stage until the last scan in the latest ADNI dataset. For all subjects, 95.3 % have 4 follow-ups every 6 months, and the remaining 4.7 % having more than 4 follow-ups. Specifically, for 70 MCI-C subjects, 11.1 % are diagnosed AD at 6 months, 31.8 % at 12 months, 25.3 % are diagnosed AD at 18 months after baseline scan, while the remaining 31.8 % are diagnosed AD more than 24 months after baseline scan. We compare our proposed TS-SVM based early detection method with standard SVM based method. Furthermore, we evaluate the importance of feature selection in both TS-SVM and standard SVM method. Thus, we compare the classification performance for four method in total, denoted by SVM, SVM+FS, TS-SVM, and TS-SVM+FS, respectively. In all experiments, we split the data into 10 non-overlap folders and report the averaged classification accuracy after 10-fold cross validation. The parameters are tuned using grid search strategy only in the training dataset.
Performance of AD Early Detection.
Accuracy of AD early detection at 6 months, and 0 month before AD onset for the MCI-C subjects converting to AD 12 months after baseline scan.
| Method | 18 months | 12 months | 6 months | 0 month | ||||
|---|---|---|---|---|---|---|---|---|
| ACC | AUC | ACC | AUC | ACC | AUC | ACC | AUC | |
| SVM | - | - | - | - | 0.7110 | 0.7612 | 0.7345 | 0.7937 |
| SVM+FS | - | - | - | - | 0.7557 | 0.7862 | 0.7735 | 0.8237 |
| TS-SVM | - | - | - | - | 0.8816 | 0.9327 | 0.8975 | 0.9431 |
| TS-SVM+FS | - | - | - | - | 0.9025 | 0.9649 | 0.9075 | 0.9776 |
Accuracy of AD early detection at 12 months, 6 months, and 0 month before AD onset for the MCI-C subjects converting to AD 18 months after baseline scan.
| Method | 18 months | 12 months | 6 months | 0 month | ||||
|---|---|---|---|---|---|---|---|---|
| ACC | AUC | ACC | AUC | ACC | AUC | ACC | AUC | |
| SVM | - | - | 0.7325 | 0.7822 | 0.7455 | 0.7917 | 0.7535 | 0.8223 |
| SVM+FS | - | - | 0.7537 | 0.7912 | 0.7685 | 0.8123 | 0.7725 | 0.8314 |
| TS-SVM | - | - | 0.8425 | 0.8851 | 0.8593 | 0.9042 | 0.8635 | 0.9128 |
| TS-SVM+FS | - | - | 0.8475 | 0.8932 | 0.8720 | 0.9277 | 0.8812 | 0.9216 |
Accuracy of AD early detection at 18 months, 12 months, 6 months, and 0 month before AD onset for the MCI-C subjects converting to AD 24 months after baseline scan.
| Method | 18 months | 12 months | 6 months | 0 month | ||||
|---|---|---|---|---|---|---|---|---|
| ACC | AUC | ACC | AUC | ACC | AUC | ACC | AUC | |
| SVM | 0.6016 | 0.6542 | 0.6025 | 0.6677 | 0.6325 | 0.6712 | 0.6515 | 0.6931 |
| SVM+FS | 0.6557 | 0.6862 | 0.6735 | 0.6127 | 0.6675 | 0.6983 | 0.6464 | 0.6926 |
| TS-SVM | 0.7345 | 0.7734 | 0.7675 | 0.8116 | 0.7805 | 0.8334 | 0.7875 | 0.8503 |
| TS-SVM+FS | 0.7653 | 0.7983 | 0.8125 | 0.8434 | 0.8345 | 0.8672 | 0.8431 | 0.8894 |
Critical Brain Regions Related with AD Progression.
The top 20 critical brain regions which contributed in AD early detection.
4 Conclusion
In this paper, we present a novel early AD diagnosis method using temporally structural SVM. In order to avoid inconsistent and unrealistic classification results, we propose the monotony on the output of SVM since the AD progression is generally non-reversible. In order to achieve early alarm of AD onset, we propose to adjust the classification margin such that the confidence of detecting AD progression becomes high as more and more follow-up scans are examined. Furthermore, we jointly perform feature selection and training of TS-SVM, in order to make the selected features can work well with the trained classifiers.
References
- 1.Thompson, P.M., Hayashi, K.M., Dutton, R.A., Chiang, M.C., Leow, A.D., Sowell, E.R., De Zubicaray, G., Becker, J.T., Lopez, O.L., Aizenstein, H.J., Toga, A.W.: Tracking Alzheimer’s disease. Ann. NY Acad. Sci. 1097, 198–214 (2007)CrossRefGoogle Scholar
- 2.aël Chetelat, G., Baron, J.-C.: Early diagnosis of Alzheimer’s disease: contribution of structural neuroimaging. NeuroImage 18, 525–541 (2003)CrossRefGoogle Scholar
- 3.Reisberg, B., Ferris, S.H., Kluger, A., Franssen, E., Wegiel, J., de Leon, M.J.: Mild cognitive impairment (MCI): a historical perspective. Int. Psychogeriatr. 20, 18–31 (2008)Google Scholar
- 4.Hua, X., Lee, S., Hibar, D.P., Yanovsky, I., Leow, A.D., Toga Jr., A.W., Jack, C.R., Bernstein, M.A., Reiman, E.M., Harvey, D.J., Kornak, J., Schuff, N., Alexander, G.E., Weiner, M.W., Thompson, P.M.: Mapping Alzheimer’s disease progression in 1309 MRI scans: power estimates for different inter-scan intervals. NeuroImage 51, 63–75 (2010)CrossRefGoogle Scholar
- 5.Li, Y., Wang, Y., Wu, G., Shi, F., Zhou, L., Lin, W., Shen, D.: Discriminant analysis of longitudinal cortical thickness changes in Alzheimer’s disease using dynamic and network features. Neurobiol. Aging 33, 427.e415–430 (2012)Google Scholar
- 6.Hua, X., Gutman, B., Boyle, C.P., Rajagopalan, P., Leow, A.D., Yanovsky, I., Kumar, A.R., Toga Jr., A.W., Jack, C.R., Schuff, N., Alexander, G.E., Chen, K., Reiman, E.M., Weiner, M.W., Thompson, P.M.: Accurate measurement of brain changes in longitudinal MRI scans using tensor-based morphometry. NeuroImage 57, 5–14 (2011)CrossRefGoogle Scholar
- 7.Filley, C.: Alzheimer’s disease: it’s irreversible but not untreatable. Geriatrics 50, 18–23 (1995)Google Scholar
- 8.Boyd, S., et al.: Distributed optimization and statistical learning via the ADMM. Found. Trends Mach. Learn. 3, 1–122 (2011)CrossRefGoogle Scholar
- 9.Nie, F., Huang, Y., Wang, X., Huang, H.: New primal SVM solver with linear computational cost for big data classifications. In: ICML (2014)Google Scholar
- 10.Hoesen, G.W.V., Parvizi, J., Chu, C.-C.: Orbitofrontal cortex pathology in Alzheimer’s disease. Cereb. Cortex 10, 243–251 (2000)CrossRefGoogle Scholar
- 11.Risacher, S., Saykin, A.: Neuroimaging biomarkers of neurodegenerative diseases and dementia. Semin. Neurol. 33, 386–416 (2013)CrossRefGoogle Scholar
- 12.Antila, K., Lötjönen, J., Thurfjell, L., et al.: The PredictAD project: development of novel biomarkers and analysis software for early diagnosis of the Alzheimer’s disease. Interface Focus 3(2012)Google Scholar
- 13.Lorenzi, M., Ziegler, G., Alexander, D.C., Ourselin, S.: Efficient Gaussian process-based modelling and prediction of image time series. In: Ourselin, S., Alexander, D.C., Westin, C.-F., Cardoso, M. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 626–637. Springer, Heidelberg (2015)CrossRefGoogle Scholar
- 14.Young, A.L., et al.: A data-driven model of bio-marker changes in sporadic Alzheimer’s disease. Brain 25, 64–77 (2014)Google Scholar
- 15.Fonteijin, H.M., et al.: An event-based model for disease progression and its application in familial Alzheimer’s disease and Huntington’s disease. NeuroImage 60, 1880–1889 (2012)CrossRefGoogle Scholar
- 16.Zhu, Y., Lucey, S.: Convolutional sparse coding for trajectory reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 529–540 (2015)CrossRefGoogle Scholar

