Skip to main content

Advertisement

Log in

Cognitive Assessment Prediction in Alzheimer’s Disease by Multi-Layer Multi-Target Regression

  • Original Article
  • Published:
Neuroinformatics Aims and scope Submit manuscript

Abstract

Accurate and automatic prediction of cognitive assessment from multiple neuroimaging biomarkers is crucial for early detection of Alzheimer’s disease. The major challenges arise from the nonlinear relationship between biomarkers and assessment scores and the inter-correlation among them, which have not yet been well addressed. In this paper, we propose multi-layer multi-target regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general compositional framework. Specifically, by kernelized dictionary learning, the MMR can effectively handle highly nonlinear relationship between biomarkers and assessment scores; by robust low-rank linear learning via matrix elastic nets, the MMR can explicitly encode inter-correlations among multiple assessment scores; moreover, the MMR is flexibly and allows to work with non-smooth 2,1-norm loss function, which enables calibration of multiple targets with disparate noise levels for more robust parameter estimation. The MMR can be efficiently solved by an alternating optimization algorithm via gradient descent with guaranteed convergence. The MMR has been evaluated by extensive experiments on the ADNI database with MRI data, and produced high accuracy surpassing previous regression models, which demonstrates its great effectiveness as a new multi-target regression model for clinical multivariate prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Agarwal, A., Gerber, S., Daume, H. (2010). Learning multiple tasks using manifold regularization. In Advances in neural information processing system (pp. 46–54).

  • Aho, T., ženko, B., Džeroski, S., Elomaa, T. (2012). Multi-target regression with rule ensembles. Journal of Machine Learning Research, 13(1), 2367–2407.

    Google Scholar 

  • Alvarez, M., Rosasco, L., Lawrence, N. (2012). Kernels for vector-valued functions: a review. Foundations and Trends in Machine Learning.

  • Argyriou, A., Evgeniou, T., Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.

    Article  Google Scholar 

  • Armijo, L. (1966). Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1), 1–3.

    Article  Google Scholar 

  • Association, A. et al. (2016). 2016 alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 12(4), 459–509.

    Article  Google Scholar 

  • Ciliberto, C., Mroueh, Y., Poggio, T., Rosasco, L. (2015). Convex learning of multiple tasks and their structure. In Internationl conference on machine learning (pp. 1548–1557).

  • Daumé III, H. (2009). Bayesian multitask learning with latent hierarchies. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 135–142).

  • Dinuzzo, F. (2013). Learning output kernels for multi-task problems. Neurocomputing, 118, 119–126.

    Article  Google Scholar 

  • Dinuzzo, F., Ong, C.S., Pillonetto, G., Gehler, P.V. (2011). Learning output kernels with block coordinate descent. In Internationl conference on machine learning (pp. 49–56).

  • Dinuzzo, F., & Schölkopf, B. (2012). The representer theorem for Hilbert spaces: a necessary and sufficient condition. In Advances in neural information processing system (pp. 189–196).

  • Evgeniou, T., Micchelli, C.A., Pontil, M. (2005). Learning multiple tasks with kernel methods. In Journal of machine learning research (pp. 615–637).

  • Falahati, F., Ferreira, D., Muehlboeck, J.S., Eriksdotter, M., Simmons, A., Wahlund, L.O., Westman, E. (2016). Longitudinal investigation of an mri-based alzheimers disease diagnostic index in adni. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 12(7), P732–P733.

    Article  Google Scholar 

  • Feng, Y., Lv, S.G., Hang, H., Suykens, J.A. (2016). Kernelized elastic net regularization: generalization bounds, and sparse recovery. Neural Computation, 28(3), 525–562.

    Article  PubMed  Google Scholar 

  • Ferrarini, L., Palm, W.M., Olofsen, H., van der Landen, R., Blauw, G.J., Westendorp, R.G., Bollen, E.L., Middelkoop, H.A., Reiber, J.H., van Buchem, M.A., et al. (2008). Mmse scores correlate with local ventricular enlargement in the spectrum from cognitively normal to alzheimer disease. NeuroImage, 39(4), 1832–1838.

    Article  PubMed  Google Scholar 

  • Folstein, M.F., Folstein, S.E., McHugh, P.R. (1975). A mini-mental state: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–198.

    Article  PubMed  CAS  Google Scholar 

  • Gillberg, J., Marttinen, P., Pirinen, M., Kangas, A.J., Soininen, P., Ali, M., Havulinna, A.S., Järvelin, M. R., Ala-Korpela, M., Kaski, S. (2016). Multiple output regression with latent noise. The Journal of Machine Learning Research, 17(1), 4170–4204.

    Google Scholar 

  • Gong, P., Zhou, J., Fan, W., Ye, J. (2014). Efficient multi-task feature learning with calibration. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 761–770).

  • Hara, K., & Chellappa, R. (2014). Growing regression forests by classification: applications to object pose estimation. In European conference on computer vision (pp. 552–567).

  • Jack, C.R., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., Borowski, B., Britson, P.J., L Whitwell, J., Ward, C., et al. (2008). The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging, 27(4), 685–691.

    Article  PubMed  Google Scholar 

  • Kabani, N.J. (1998). 3d anatomical atlas of the human brain. Neuroimage, 7, P–0717.

    Article  Google Scholar 

  • Kolar, M., Lafferty, J., Wasserman, L. (2011). Union support recovery in multi-task learning. Journal of Machine Learning Research, 12, 2415–2435.

    Google Scholar 

  • Kumar, A., & Daume, H. (2012). Learning task grouping and overlap in multi-task learning. In Internationl conference on machine learning (pp. 1383–1390).

  • Lee, S.I., Chatalbashev, V., Vickrey, D., Koller, D. (2007). Learning a meta-level prior for feature relevance from multiple related tasks. In Internationl conference on machine learning (pp. 489–496).

  • Li, C., Georgiopoulos, M., Anagnostopoulos, G.C. (2015). Pareto-path multitask multiple kernel learning. IEEE Transactions on Neural Networks and Learning Systems, 26(1), 51–61.

    Article  PubMed  Google Scholar 

  • Li, H., Chen, N., Li, L. (2012). Error analysis for matrix elastic-net regularization algorithms. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 737–748.

    Article  PubMed  Google Scholar 

  • Liu, H., Wang, L., Zhao, T. (2014). Multivariate regression with calibration. In Advances in neural information processing system (pp. 127–135).

  • Lounici, K., Pontil, M., Van De Geer, S., Tsybakov, A.B. (2011). Oracle inequalities and optimal inference under group sparsity. In The annals of statistics (pp. 2164–2204).

  • Molstad, A.J., & Rothman, A.J. (2015). Indirect multivariate response linear regression. arXiv:1507.04610.

  • Moradi, E., Hallikainen, I., Hänninen, T., Tohka, J., Initiative, A.D.N., et al. (2016). Rey’s auditory verbal learning test scores can be predicted from whole brainmri in alzheimer’s disease. NeuroImage: Clinical.

  • Mueller, S.G., Weiner, M.W., Thal, L.J., Petersen, R.C., Jack, C.R., Jagust, W., Trojanowski, J.Q., Toga, A.W., Beckett, L. (2005). Ways toward an early diagnosis in alzheimers disease: the alzheimers disease neuroimaging initiative (adni). Alzheimer’s & Dementia, 1(1), 55–66.

    Article  Google Scholar 

  • Pan, Y., Xia, R., Yin, J., Liu, N. (2015). A divide-and-conquer method for scalable robust mul-titask learning. IEEE Transactions on Neural Networks and Learning Systems, 26(12), 3163–3175.

    Article  PubMed  Google Scholar 

  • Rai, P., Kumar, A., Daume, H. (2012). Simultaneously leveraging output and task structures for multiple-output regression. In Advances in neural information processing system (pp. 3185–3193).

  • Rakitsch, B., Lippert, C., Borgwardt, K., Stegle, O. (2013). It is all in the noise: efficient multi-task gaussian process inference with structured residuals. In NIPS (pp. 1466–1474).

  • Rothman, A.J., Levina, E., Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4), 947–962.

    Article  PubMed  PubMed Central  Google Scholar 

  • Schmidt, M., & et al. (1996). Rey auditory verbal learning test: a handbook. Western Psychological Services Los Angeles.

  • Seshadri, S., DeStefano, A.L., Au, R., Massaro, J.M., Beiser, A.S., Kelly-Hayes, M., Kase, C.S., D’Agostino, R.B., DeCarli, C., Atwood, L.D., et al. (2007). Genetic correlates of brain aging on mri and cognitive test measures: a genome-wide association and linkage analysis in the framingham study. BMC Medical Genetics, 8(1), S15.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Shen, D., & Davatzikos, C. (2002). Hammer: hierarchical attribute matching mechanism for elastic registration. IEEE Transactions on Medical Imaging, 21(11), 1421–1439.

    Article  PubMed  Google Scholar 

  • Sled, J.G., Zijdenbos, A.P., Evans, A.C. (1998). A nonparametric method for automatic correction of intensity nonuniformity in mri data. IEEE Transactions on Medical Imaging, 17(1), 87–97.

    Article  PubMed  CAS  Google Scholar 

  • Sohn, K.A., & Kim, S. (2012). Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. In International conference on artificial intelligence and statistics (pp. 1081–1089).

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288.

  • Tsoumakas, G., Spyromitros-Xioufis, E., Vrekou, A., Vlahavas, I. (2014). Multi-target regression via random linear target combinations. In Machine learning and knowledge discovery in databases (pp. 225–240). Springer.

  • Wang, H, Nie, F, Huang, H, Yan, J, Kim, S, Risacher, S, Saykin, A, Shen, L. (2012). High-order multi-task feature learning to identify longitudinal phenotypic markers for alzheimer’s disease progression prediction. In Advances in neural information processing systems (pp. 1277–1285).

  • Wang, Y., Nie, J., Yap, P.T., Li, G., Shi, F., Geng, X., Guo, L., Shen, D., Initiative, A.D.N., et al. (2014). Knowledge-guided robust mri brain extraction for diverse large-scale neuroimaging studies on humans and non-human primates. PloS One, 9(1), e77810.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Wang, Y., Nie, J., Yap, P.T., Shi, F., Guo, L., Shen, D. (2011). Robust deformable-surface-based skull-stripping for large-scale studies. In Medical image computing and computer-assisted intervention–MICCAI 2011 (pp. 635–642). Springer.

  • Yu, K., Tresp, V., Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. In International conference on machine learning (pp. 1012–1019).

  • Zhang, Y., Brady, M., Smith, S. (2001). Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging, 20(1), 45–57.

    Article  PubMed  CAS  Google Scholar 

  • Zhang, Y., & Yeung, D.Y. (2013). Learning high-order task relationships in multi-task learning. In International joint conference on artificial intelligence (pp. 1917–1923).

  • Zhang, Y., & Yeung, D.Y. (2014). A regularization approach to learning task relationships in multitask learning. ACM Transactions on Knowledge Discovery from Data, 8(3), 12.

    Article  Google Scholar 

  • Zhen, X., Yu, M., He, X., Li, S. (2017). Multi-target regression via robust low-rank learning. In IEEE transactions on pattern analysis and machine Intelligence.

  • Zhou, Q., & Zhao, Q. (2016). Flexible clustered multi-task learning by learning representative tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 266–278.

    Article  PubMed  Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    Article  Google Scholar 

  • Zhu, X., Li, X., Zhang, S., Ju, C., Wu, X. (2017). Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Transactions on Neural Networks and Learning systems, 28(6), 1263–1275.

    Article  PubMed  Google Scholar 

  • Zhu, X., Suk, H.I., Wang, L., Lee, S.W., Shen, D. (2015). Alzheimer’s disease neuroimaging initiative: a novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Medical Image Analysis, 38, 205–214.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the following grants: NSF-DBI 1356628, NSF-IIS 1633753, NIH R01 AG049371.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Huang.

Appendix

Appendix

Proof

By the definition of the nuclear norm, we can re-write it in terms of traces as follows

$$\begin{array}{@{}rcl@{}} ||S||_{*} & =& tr(\sqrt{S^{\top} S}) = tr(\sqrt{(U{\Sigma} V^{\top})^{\top}(U{\Sigma} V^{\top})})\\ & =& tr(\sqrt{V{\Sigma}^{\top} U^{\top} U {\Sigma} V^{\top}} = tr(\sqrt{V{\Sigma}^{\top} {\Sigma} V^{\top}})\\ & =& tr(\sqrt{V{\Sigma}^{\top} {\Sigma} V^{\top}})\\ & =& tr(\sqrt{V{\Sigma} V^{\top} V{\Sigma} V^{\top}})\\ & =& tr(V {\Sigma} V^{\top})\\ & =& tr({\Sigma}) \end{array} $$
(20)

Therefore, the nuclear norm of S can be also defined as the sum of the singular value decomposition of S. From (13), we have

$$ \partial S=\partial U{\Sigma} V^{\top}+U\partial{\Sigma} V^{\top}+U{\Sigma}\partial V^{\top}, $$
(21)

which gives rise to

$$ U\partial{\Sigma} V^{\top}=\partial S-\partial U{\Sigma} V^{\top}-U{\Sigma}\partial V^{\top}. $$
(22)

Multiplying U on both sides of (22), we have

$$ U^{\top} U\partial{\Sigma} V^{\top} V =U^{\top}\partial SV-U^{\top}\partial U{\Sigma} V^{\top} V -U^{\top} U{\Sigma}\partial V^{\top} V $$
(23)

Since U is also an orthogonal matrix, we achieve

$$ \partial{\Sigma} =U^{\top}\partial SV-U^{\top}\partial U{\Sigma} - {\Sigma}\partial V^{\top} V. $$
(24)

Note that we have the fact that

$$ 0 = \partial I = \partial (U^{\top} U) = \partial U^{\top} U + U^{\top} \partial U, $$
(25)

where I is an identity matrix, and therefore UU is an antisymmetric matrix. We have

$$\begin{array}{@{}rcl@{}} tr(U^{\top} \partial U {\Sigma}) & =& tr((U^{\top} \partial U {\Sigma})^{\top}) = tr({\Sigma}^{\top} \partial U^{\top} U)\\ &=& - tr({\Sigma} U^{\top} \partial U) = - tr(U^{\top} \partial U {\Sigma}) \end{array} $$
(26)

which indicates that tr(UUΣ) = 0. Similarly, we also have trV V ) = 0. Therefore, we achieve

$$ tr(\partial{\Sigma}) =tr(U^{\top}\partial SV) $$
(27)

By taking the derivative of ||S|| w.r.t. S, we obtain

$$ \frac{\partial \|S\|_{*}}{\partial S} =\frac{ tr(\partial{\Sigma})}{\partial S}=\frac{ tr(U^{\top}\partial SV)}{\partial S}= U V^{\top} $$
(28)

which closes the proof. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Zhen, X., Li, Q. et al. Cognitive Assessment Prediction in Alzheimer’s Disease by Multi-Layer Multi-Target Regression. Neuroinform 16, 285–294 (2018). https://doi.org/10.1007/s12021-018-9381-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12021-018-9381-1

Keywords

Navigation