Abstract
Recently, neuroimaging-based Alzheimer’s disease (AD) or mild cognitive impairment (MCI) diagnosis has attracted researchers in the field, due to the increasing prevalence of the diseases. Unfortunately, the unfavorable high-dimensional nature of neuroimaging data, but a limited small number of samples available, makes it challenging to build a robust computer-aided diagnosis system. Machine learning techniques have been considered as a useful tool in this respect and, among various methods, sparse regression has shown its validity in the literature. However, to our best knowledge, the existing sparse regression methods mostly try to select features based on the optimal regression coefficients in one step. We argue that since the training feature vectors are composed of both informative and uninformative or less informative features, the resulting optimal regression coefficients are inevidently affected by the uninformative or less informative features. To this end, we first propose a novel deep architecture to recursively discard uninformative features by performing sparse multi-task learning in a hierarchical fashion. We further hypothesize that the optimal regression coefficients reflect the relative importance of features in representing the target response variables. In this regard, we use the optimal regression coefficients learned in one hierarchy as feature weighting factors in the following hierarchy, and formulate a weighted sparse multi-task learning method. Lastly, we also take into account the distributional characteristics of samples per class and use clustering-induced subclass label vectors as target response values in our sparse regression model. In our experiments on the ADNI cohort, we performed both binary and multi-class classification tasks in AD/MCI diagnosis and showed the superiority of the proposed method by comparing with the state-of-the-art methods.
Similar content being viewed by others
Notes
In a least squares regression framework, one task corresponds to find optimal regression coefficients to represent the values of a target response variable. So, when we consider multiple target response variables simultaneously, it is regarded as multi-task learning (Argyriou et al. 2008).
In this work, we define the uninformative and less informative features based on their optimal regression coefficients. Specifically, the features whose regression coefficients are zero or close to zero, are regarded, respectively, as uninformative or less informative in representing the target response variables.
Available at ‘http://www.loni.ucla.edu/ADNI’.
Although there exist in total more than 800 subjects in ADNI database, only 202 subjects have the baseline data including all the modalities of MRI, PET, and CSF.
Refer to ‘http://www.adniinfo.org’ for more details.
Available at ‘http://mipav.cit.nih.gov/clickwrap.php’.
Available at ‘http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/’.
In our experiments on the ADNI cohort, we have one sample per subject.
\(\mathbb {F}^{(0)}\) denotes the original full feature set.
Initially, we set the current best accuracy zero.
Available at ‘http://www.public.asu.edu/~jye02/Software/SLEP/index.htm’.
Available at ‘http://www.csie.ntu.edu.tw/~cjlin/libsvm/’.
In this work, we use a negative Euclidian distance for similarity computation.
References
Akobeng AK (2007) Understanding diagnostic tests 1: sensitivity, specificity and predictive values. Acta Pediatr 96(3):338–341
Alikhanian H, Crawford JD, DeSouza JFX, Cheyne D, Blohm G (2013) Adaptive cluster analysis approach for functional localization using magnetoencephalography. Front Neurosci 7(73). doi:10.3389/fnins.2013.00073
Association Alzheimer’s (2012) 2012 Alzheimer’s disease facts and figures. Alzheimer’s Dementia 8(2):131–168
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Bokde ALW, Lopez-Bayo P, Meindl T, Pechler S, Born C, Faltraco F, Teipel SJ, Möller HJ, Hampel H (2006) Functional connectivity of the fusiform gyrus during a face-matching task in subjects with mild cognitive impairment. Brain 129(5):1113–1124
Braak H, Braak E (1991) Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol 82(4):239–259
de Brecht M, Yamagishi N (2012) Combining sparseness and smoothness improves classification accuracy and interpretability. NeuroImage 60(2):1550–1561
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov 2(2):121–167
Burton EJ, Barber R, Mukaetova-Ladinska EB, Robson J, Perry RH, Jaros E, Kalaria RN, O’Brien JT (2009) Medial temporal lobe atrophy on MRI differentiates Alzheimer’s disease from dementia with lewy bodies and vascular cognitive impairment: a prospective study with pathological verification of diagnosis. Brain 132(1):195–203
Busse A, Angermeyer MC, Riedel-Heller SG (2006) Progression of mild cognitive impairment to dementia: a challenge to current thinking. Br J Psychiatry 189:399–404
Cui Y, Liu B, Luo S, Zhen X, Fan M, Liu T, Zhu W, Park M, Jiang T, Jin JS; the Alzheimer’s Disease Neuroimaging Initiative (2011) Identification of conversion from mild cognitive impairment to Alzheimer’s disease using multivariate predictors. PLoS ONE 6(7):e21896
Davatzikos C, Bhatt P, Shaw LM, Batmanghelich KN, Trojanowski JQ (2011) Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification. Neurobiol Aging 32(12):2322.e19–2322.e27
Desikan R, Cabral H, Hess C, Dillon W, Salat D, Buckner R, Fischl B, Initiative ADN (2009) Automated MRI measures identify individuals with mild cognitive impairment and Alzheimer’s disease. Brain 132:2048–2057
Devanand DP, Pradhaban G, Liu X, Khandji A, De Santi S, Segal S, Rusinek H, Pelton GH, Hoing LS, Mayeux R, Stern Y, Tabert MH, de Leon JJ (2007) Hippocampal and entorhinal atrophy in mild cognitive impairment. Neurology 68:828–836
DiFrancesco M, Hollandm S, Szaflarski J (2008) Simultaneous EEG/functional magnetic resonance imaging at 4 tesla: correlates of brain activity to spontaneous alpha rhythm during relaxation. J Clin Neurophysiol 25(5):255–264
Dueck D, Frey B (2007) Non-metric affinity propagation for unsupervised image categorization. In: 2007 IEEE international conference on computer vision (ICCV), pp 1–8
Ewers M, Walsh C, Trojanowski JQ, Shaw LM, Petersen RC, Jr., Feldman HH, Bokde AL, Alexander GE, Scheltens P, Vellas B, Dubois B, Weiner M, Hampel H (2012) Prediction of conversion from mild cognitive impairment to Alzheimer’s disease dementia based upon biomarkers and neuropsychological test performance. Neurobiol Aging 33(7):1203–1214.e2
Fazli S, Danczy M, Schelldorfer J, Mller KR (2011) \(\ell_{1}\)-penalized linear mixed-effects models for high dimensional data with application to BCI. NeuroImage 56(4):2100–2108
Fotenos A, Snyder A, Girton L, Morris J, Buckner R (2005) Normative estimates of cross-sectional and longitudinal brain volume decline in aging and AD. Neurology, pp 1032–1039 (2005)
Francis PT, Ramírez MJ, Lai MK (2010) Neurochemical basis for symptomatic treatment of Alzheimer’s disease. Neuropharmacology 59(4–5):221–229
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Frisoni GB, Ganzola R, Canu E, Rüb U, Pizzini FB, Alessandrini F, Zoccatelli G, Beltramello A, Caltagirone C, Thompson PM (2008) Mapping local hippocampal changes in Alzheimer’s disease and normal ageing with MRI at 3 Tesla. Brain 131(12):3266–3276
Gönen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Henze N, Zirkler B (1990) A class of invariant consistent tests for multivariate normality. Commun Stat Theory Methods 19(10):3595–3617
Hinrichs C, Singh V, Xu G, Johnson SC (2011) Predictive markers for AD in a multi-modality framework: an analysis of MCI progression in the ADNI population. NeuroImage 55(2):574–589
Joie RL, Perrotin A, Barré L, Hommet C, Mézenge F, Ibazizene M, Camus V, Abbas A, Landeau B, Guilloteau D, de La Sayette V, Eustache F, Desgranges B, Chételat G (2012) Region-specific hierarchy between atrophy, hypometabolism, and beta-amyloid (A\(\beta\)) load in Alzheimer’s disease dementia. J Neurosci 32:16265–16273
Kabani N, MacDonald D, Holmes C, Evans A (1998) A 3D atlas of the human brain. NeuroImage 7(4):S717
Karas G, Scheltens P, Rombouts S, van Schijndel R, Klein M, Jones B, van der Flier W, Vrenken H, Barkhof F (2007) Precuneus atrophy in early-onset Alzheimer’s disease: a morphometric structural MRI study. Neuroradiology 49(12):967–976
Kohannim O, Hua X, Hibar DP, Lee S, Chou YY, Toga AW Jr, Jack CR, Weiner MW, Thompson PM (2010) Boosting power for clinical trials using classifiers based on multiple biomarkers. Neurobiol Aging 31(8):1429–1442
Lee ACH, Buckley MJ, Gaffan D, Emery T, Hodges JR, Graham KS (2006) Differentiating the roles of the hippocampus and perirhinal cortex in processes beyond long-term declarative memory: a double dissociation in dementia. J Neurosci 26(19):5198–5203
Li Y, Wang Y, Wu G, Shi F, Zhou L, Lin W, Shen D (2012) Discriminant analysis of longitudinal cortical thickness changes in Alzheimer’s disease using dynamic and network features. Neurobiol Aging 33(2):427.e15–427.e30
Liu F, Wee CY, Chen H, Shen D (2013) Inter-modality relationship constrained multi-task feature selection for AD/MCI classification. In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N (eds) Medical image computing and computer-assisted intervention (MICCAI), vol 8149., Lecture Notes in Computer ScienceSpringer, Berlin, pp 308–315
Liu M, Zhang D, Shen D (2012) Ensemble sparse classification of Alzheimer’s disease. NeuroImage 60(2):1106–1116
Loewenstein DA, Greig MT, Schinka JA, Barker W, Shen Q, Potter E, Raj A, Brooks L, Varon D, Schoenberg M, Banko J, Potter H, Duara R (2012) An investigation of PreMCI: subtypes and longitudinal outcomes. Alzheimer’s Dementia 8(3):172–179
Lu Z, Carreira-Perpinan M (2008) Constrained spectral clustering through affinity propagation. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Mark RE, Sitskoorn MM (2013) Are subjective cognitive complaints relevant in preclinical Alzheimer’s disease? A review and guidelines for healthcare professionals. Rev Clin Gerontol 23:61–74
Milgram J, Cheriet M, Sabourin R (2006) “One against one” or “one against all”: which one is better for handwriting recognition with SVMs? In: Lorette G (ed) Tenth international workshop on frontiers in handwriting recognition, Suvisoft
Mosconi L (2005) Brain glucose metabolism in the early and specific diagnosis of Alzheimer’s disease. Eur J Nucl Med Mol Imaging 32(4):486–510
Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint \(\ell _{2,1}\)-norms minimization. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in neural information processing systems, vol 23, pp 1813–1821
Nobili F, Mazzei D, Dessi B, Morbelli S, Brugnolo A, Barbieri P, Girtler N, Sambuceti G, Rodriguez G, Pagani M (2010) Unawareness of memory deficit in amnestic MCI: FDG-PET findings. J Alzheimer’s Dis 22(3):993–1003
Noppeney U, Penny WD, Price CJ, Flandin G, Friston KJ (2006) Identification of degenerate neuronal systems based on intersubject variability. NeuroImage 30(3):885–890
Perrin RJ, Fagan AM, Holtzman DM (2009) Multimodal techniques for diagnosis and prognosis of Alzheimer’s disease. Nature 461:916–922
Roth V (2004) The generalized LASSO. IEEE Trans Neural Netw 15(1):16–28
Schroeter ML, Stein T, Maslowski N, Neumann J (2009) Neural correlates of Alzheimer’s disease and mild cognitive impairment: a systematic and quantitative meta-analysis involving 1351 patients. NeuroImage 47(4):1196–1206
Shen D, Davatzikos C (2002) HAMMER: hierarchical attribute matching mechanism for elastic registration. IEEE Trans Med Imaging 21(11):1421–1439
Shi F, Wang L, Gilmore J, Lin W, Shen D (2011) Learning-based meta-algorithm for MRI brain extraction. In: Fichtinger G, Martel A, Peters T (eds) Medical image computing and computer-assisted intervention (MICCAI), Lecture Notes in Computer Science, vol 6893, pp 313–321
Singh V, Chertkow H, Lerch JP, Evans AC, Dorr AE, Kabani NJ (2006) Spatial patterns of cortical thinning in mild cognitive impairment and Alzheimer’s disease. Brain 129(11):2885–2893
Sled JG, Zijdenbos AP, Evans AC (1998) A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging 17(1):87–97
Suk HI, Lee SW (2013) A novel Bayesian framework for discriminative feature extraction in brain–computer interfaces. IEEE Trans Pattern Anal Mach Intell 35(2):286–299
Suk HI, Lee SW, Shen D (2014) Subclass-based multi-task learning for Alzheimer’s disease diagnosis. Front Aging Neurosci 6(168)
Suk HI, Lee SW, Shen D (2015) Latent feature representation with stacked auto-encoder for AD/MCI diagnosis. Brain Struct Funct 220(2):841–859
Suk HI, Wee CY, Shen D (2013) Discriminative group sparse representation for mild cognitive impairment classification. Mach Learn Med Imaging Lect Notes Comput Sci 8184:131–138
Thung KH, Wee CY, Yap PT, Shen D (2014) Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion. NeuroImage 91:386–400
Tibshirani R (1994) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 58:267–288
Varoquaux G, Gramfort A, Poline JB, Thirion B (2010) Brain covariance selection: better individual functional connectivity models using population prior. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systesms, vol 23, pp 2334–2342
Visser PJ, Verhey FRJ, Hofman PAM, Scheltens P, Jolles J (2002) Medial temporal lobe atrophy predicts Alzheimer’s disease in patients with minor cognitive impairment. J Neurol Neurosurg Psychiatry 72(4):491–497
Walhovd K, Fjell A, Brewer J, McEvoy L, Fennema-Notestine C Jr, Hagler DJ, Jennings R, Karow D, Dale A; the Alzheimer’s Disease Neuroimaging Initiative (2010) Combining MR imaging, positron-emission tomography, and CSF biomarkers in the diagnosis and prognosis of Alzheimer disease. Am J Neuroradiol 31:347–354
Wan J, Zhang Z, Yan J, Li T, Rao B, Fang S, Kim S, Risacher S, Saykin A, Shen L (2012) Sparse Bayesian multi-task learning for predicting cognitive outcomes from neuroimaging measures in Alzheimer’s disease. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 940–947
Wang H, Nie F, Huang H, Risacher S, Ding C, Saykin A, Shen L (2011) Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In: 2011 IEEE international conference on computer vision (ICCV), pp 557–562
Wang Q, Chen L, Yap PT, Wu G, Shen D (2010) Groupwise registration based on hierarchical image clustering and atlas synthesis. Human Brain Mapp 31(8):1128–1140
Wang Y, Nie J, Yap PT, Li G, Shi F, Geng X, Guo L, Shen D; for the Alzheimer’s Disease Neuroimaging Initiative (2014) Knowledge-guided robust MRI brain extraction for diverse large-scale neuroimaging studies on humans and non-human primates. PLoS ONE 9(1):e77810. doi:10.1371/journal.pone.0077810
Wei Q, Dunbrack Jr, Lehmann RL (2013) The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 8(7):e67863
West M (2003) Bayesian factor regression models in the “large p, small n” paradigm. In: Bayesian statistics, pp 723–732
Westman E, Muehlboeck JS, Simmons A (2012) Combining MRI and CSF measures for classification of Alzheimer’s disease and prediction of mild cognitive impairment conversion. NeuroImage 62(1):229–238
Xiang S, Yuan L, Fan W, Wang Y, Thompson PM, Ye J (2014) Bi-level multi-source learning for heterogeneous block-wise missing data. NeuroImage 102(1):192–206
Yao Z, Hu B, Liang C, Zhao L, Jackson M; the Alzheimer’s Disease Neuroimaging Initiative (2012) A longitudinal study of atrophy in amnestic mild cognitive impairment and normal aging revealed by cortical thickness. PLoS ONE 7(11):e48973
Yuan L, Wang Y, Thompson PM, Narayan VA, Ye J (2012) Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage 61(3):622–632
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67
Zhang D, Shen D (2012) Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. NeuroImage 59(2):895–907
Zhang D, Shen D (2012) Predicting future clinical changes of MCI patients using longitudinal and multimodal biomarkers. PLoS One 7(3):e33182
Zhang D, Wang Y, Zhou L, Yuan H, Shen D (2011) Multimodal classification of Alzheimer’s disease and mild cognitive impairment. NeuroImage 55(3):856–867
Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging 20(1):45–57
Zhou J, Liu J, Narayan VA, Ye J (2013) Modeling disease progression via multi-task learning. NeuroImage 78:233–248
Zhu X, Suk HI, Shen D (2014) Matrix-similarity based loss function and feature selection for Alzheimer’s disease diagnosis. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR)
Acknowledgments
This work was supported in part by NIH grants EB006733, EB008374, EB009634, AG041721, MH100217, and AG042599, and also supported by ICT R&D program of MSIP/IITP. [B0101-15-0307, Basic Software Research in Human-level Lifelong Machine Learning (Machine Learning Center)].
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical standard
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Author information
Authors and Affiliations
Consortia
Corresponding authors
Additional information
Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://www.loni.ucla.edu/ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete list of ADNI investigators is available at http://adni.loni.ucla.edu/wpcontent/uploads/how_to_apply/ADNI_Authorship_List.
Appendix: Affinity propagation
Appendix: Affinity propagation
Here, we briefly review the affinity propagation (Frey and Dueck 2007), by which we find subclasses in each original class. Let \(S_{ij}^{(h)}\) \((i,j=1,2,\dots ,N)\) denote the pairwise similaritiesFootnote 13 between each pair of N samples in \(\tilde{\mathbf {X}}^{(h)}\). The affinity propagation algorithm works on the similarity matrix \({\mathbf {S}}^{(h)}=[S_{ij}^{(h)}]\in \mathbb {R}^{N\times N}\) and attempts to find ‘exemplars’ that maximize the overall sum of similarities between all exemplars and their member samples. Methodologically, the algorithm defines two types of messages, namely, responsibility and availability, exchanged among samples: Responsibility \(R_{ij}^{(h)}\) represents the accumulated evidence for how well-suited sample j is to serve as the exemplar for sample i; Availability \(A_{ij}^{(h)}\) reflects the accumulated evidence for how appropriate it would be for sample i to choose sample j as its exemplar. Using these messages, the exemplar of sample i is determined by the one that maximizes the following objective function:
In Algorithm 1, both \({\mathbf {R}}^{(h)}=[R_{ij}^{(h)}]\) and \({\mathbf {A}}^{(h)}=[A_{ij}^{(h)}]\) are initially set to zero matrices, and then their values are iteratively updated as below until converged:
Rights and permissions
About this article
Cite this article
Suk, HI., Lee, SW., Shen, D. et al. Deep sparse multi-task learning for feature selection in Alzheimer’s disease diagnosis. Brain Struct Funct 221, 2569–2587 (2016). https://doi.org/10.1007/s00429-015-1059-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00429-015-1059-y