Abstract
Accurate and automatic prediction of cognitive assessment from multiple neuroimaging biomarkers is crucial for early detection of Alzheimer’s disease. The major challenges arise from the nonlinear relationship between biomarkers and assessment scores and the inter-correlation among them, which have not yet been well addressed. In this paper, we propose multi-layer multi-target regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general compositional framework. Specifically, by kernelized dictionary learning, the MMR can effectively handle highly nonlinear relationship between biomarkers and assessment scores; by robust low-rank linear learning via matrix elastic nets, the MMR can explicitly encode inter-correlations among multiple assessment scores; moreover, the MMR is flexibly and allows to work with non-smooth ℓ2,1-norm loss function, which enables calibration of multiple targets with disparate noise levels for more robust parameter estimation. The MMR can be efficiently solved by an alternating optimization algorithm via gradient descent with guaranteed convergence. The MMR has been evaluated by extensive experiments on the ADNI database with MRI data, and produced high accuracy surpassing previous regression models, which demonstrates its great effectiveness as a new multi-target regression model for clinical multivariate prediction.
Similar content being viewed by others
References
Agarwal, A., Gerber, S., Daume, H. (2010). Learning multiple tasks using manifold regularization. In Advances in neural information processing system (pp. 46–54).
Aho, T., ženko, B., Džeroski, S., Elomaa, T. (2012). Multi-target regression with rule ensembles. Journal of Machine Learning Research, 13(1), 2367–2407.
Alvarez, M., Rosasco, L., Lawrence, N. (2012). Kernels for vector-valued functions: a review. Foundations and Trends in Machine Learning.
Argyriou, A., Evgeniou, T., Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.
Armijo, L. (1966). Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1), 1–3.
Association, A. et al. (2016). 2016 alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 12(4), 459–509.
Ciliberto, C., Mroueh, Y., Poggio, T., Rosasco, L. (2015). Convex learning of multiple tasks and their structure. In Internationl conference on machine learning (pp. 1548–1557).
Daumé III, H. (2009). Bayesian multitask learning with latent hierarchies. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 135–142).
Dinuzzo, F. (2013). Learning output kernels for multi-task problems. Neurocomputing, 118, 119–126.
Dinuzzo, F., Ong, C.S., Pillonetto, G., Gehler, P.V. (2011). Learning output kernels with block coordinate descent. In Internationl conference on machine learning (pp. 49–56).
Dinuzzo, F., & Schölkopf, B. (2012). The representer theorem for Hilbert spaces: a necessary and sufficient condition. In Advances in neural information processing system (pp. 189–196).
Evgeniou, T., Micchelli, C.A., Pontil, M. (2005). Learning multiple tasks with kernel methods. In Journal of machine learning research (pp. 615–637).
Falahati, F., Ferreira, D., Muehlboeck, J.S., Eriksdotter, M., Simmons, A., Wahlund, L.O., Westman, E. (2016). Longitudinal investigation of an mri-based alzheimers disease diagnostic index in adni. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 12(7), P732–P733.
Feng, Y., Lv, S.G., Hang, H., Suykens, J.A. (2016). Kernelized elastic net regularization: generalization bounds, and sparse recovery. Neural Computation, 28(3), 525–562.
Ferrarini, L., Palm, W.M., Olofsen, H., van der Landen, R., Blauw, G.J., Westendorp, R.G., Bollen, E.L., Middelkoop, H.A., Reiber, J.H., van Buchem, M.A., et al. (2008). Mmse scores correlate with local ventricular enlargement in the spectrum from cognitively normal to alzheimer disease. NeuroImage, 39(4), 1832–1838.
Folstein, M.F., Folstein, S.E., McHugh, P.R. (1975). A mini-mental state: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–198.
Gillberg, J., Marttinen, P., Pirinen, M., Kangas, A.J., Soininen, P., Ali, M., Havulinna, A.S., Järvelin, M. R., Ala-Korpela, M., Kaski, S. (2016). Multiple output regression with latent noise. The Journal of Machine Learning Research, 17(1), 4170–4204.
Gong, P., Zhou, J., Fan, W., Ye, J. (2014). Efficient multi-task feature learning with calibration. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 761–770).
Hara, K., & Chellappa, R. (2014). Growing regression forests by classification: applications to object pose estimation. In European conference on computer vision (pp. 552–567).
Jack, C.R., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., Borowski, B., Britson, P.J., L Whitwell, J., Ward, C., et al. (2008). The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging, 27(4), 685–691.
Kabani, N.J. (1998). 3d anatomical atlas of the human brain. Neuroimage, 7, P–0717.
Kolar, M., Lafferty, J., Wasserman, L. (2011). Union support recovery in multi-task learning. Journal of Machine Learning Research, 12, 2415–2435.
Kumar, A., & Daume, H. (2012). Learning task grouping and overlap in multi-task learning. In Internationl conference on machine learning (pp. 1383–1390).
Lee, S.I., Chatalbashev, V., Vickrey, D., Koller, D. (2007). Learning a meta-level prior for feature relevance from multiple related tasks. In Internationl conference on machine learning (pp. 489–496).
Li, C., Georgiopoulos, M., Anagnostopoulos, G.C. (2015). Pareto-path multitask multiple kernel learning. IEEE Transactions on Neural Networks and Learning Systems, 26(1), 51–61.
Li, H., Chen, N., Li, L. (2012). Error analysis for matrix elastic-net regularization algorithms. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 737–748.
Liu, H., Wang, L., Zhao, T. (2014). Multivariate regression with calibration. In Advances in neural information processing system (pp. 127–135).
Lounici, K., Pontil, M., Van De Geer, S., Tsybakov, A.B. (2011). Oracle inequalities and optimal inference under group sparsity. In The annals of statistics (pp. 2164–2204).
Molstad, A.J., & Rothman, A.J. (2015). Indirect multivariate response linear regression. arXiv:1507.04610.
Moradi, E., Hallikainen, I., Hänninen, T., Tohka, J., Initiative, A.D.N., et al. (2016). Rey’s auditory verbal learning test scores can be predicted from whole brainmri in alzheimer’s disease. NeuroImage: Clinical.
Mueller, S.G., Weiner, M.W., Thal, L.J., Petersen, R.C., Jack, C.R., Jagust, W., Trojanowski, J.Q., Toga, A.W., Beckett, L. (2005). Ways toward an early diagnosis in alzheimers disease: the alzheimers disease neuroimaging initiative (adni). Alzheimer’s & Dementia, 1(1), 55–66.
Pan, Y., Xia, R., Yin, J., Liu, N. (2015). A divide-and-conquer method for scalable robust mul-titask learning. IEEE Transactions on Neural Networks and Learning Systems, 26(12), 3163–3175.
Rai, P., Kumar, A., Daume, H. (2012). Simultaneously leveraging output and task structures for multiple-output regression. In Advances in neural information processing system (pp. 3185–3193).
Rakitsch, B., Lippert, C., Borgwardt, K., Stegle, O. (2013). It is all in the noise: efficient multi-task gaussian process inference with structured residuals. In NIPS (pp. 1466–1474).
Rothman, A.J., Levina, E., Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4), 947–962.
Schmidt, M., & et al. (1996). Rey auditory verbal learning test: a handbook. Western Psychological Services Los Angeles.
Seshadri, S., DeStefano, A.L., Au, R., Massaro, J.M., Beiser, A.S., Kelly-Hayes, M., Kase, C.S., D’Agostino, R.B., DeCarli, C., Atwood, L.D., et al. (2007). Genetic correlates of brain aging on mri and cognitive test measures: a genome-wide association and linkage analysis in the framingham study. BMC Medical Genetics, 8(1), S15.
Shen, D., & Davatzikos, C. (2002). Hammer: hierarchical attribute matching mechanism for elastic registration. IEEE Transactions on Medical Imaging, 21(11), 1421–1439.
Sled, J.G., Zijdenbos, A.P., Evans, A.C. (1998). A nonparametric method for automatic correction of intensity nonuniformity in mri data. IEEE Transactions on Medical Imaging, 17(1), 87–97.
Sohn, K.A., & Kim, S. (2012). Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. In International conference on artificial intelligence and statistics (pp. 1081–1089).
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288.
Tsoumakas, G., Spyromitros-Xioufis, E., Vrekou, A., Vlahavas, I. (2014). Multi-target regression via random linear target combinations. In Machine learning and knowledge discovery in databases (pp. 225–240). Springer.
Wang, H, Nie, F, Huang, H, Yan, J, Kim, S, Risacher, S, Saykin, A, Shen, L. (2012). High-order multi-task feature learning to identify longitudinal phenotypic markers for alzheimer’s disease progression prediction. In Advances in neural information processing systems (pp. 1277–1285).
Wang, Y., Nie, J., Yap, P.T., Li, G., Shi, F., Geng, X., Guo, L., Shen, D., Initiative, A.D.N., et al. (2014). Knowledge-guided robust mri brain extraction for diverse large-scale neuroimaging studies on humans and non-human primates. PloS One, 9(1), e77810.
Wang, Y., Nie, J., Yap, P.T., Shi, F., Guo, L., Shen, D. (2011). Robust deformable-surface-based skull-stripping for large-scale studies. In Medical image computing and computer-assisted intervention–MICCAI 2011 (pp. 635–642). Springer.
Yu, K., Tresp, V., Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. In International conference on machine learning (pp. 1012–1019).
Zhang, Y., Brady, M., Smith, S. (2001). Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging, 20(1), 45–57.
Zhang, Y., & Yeung, D.Y. (2013). Learning high-order task relationships in multi-task learning. In International joint conference on artificial intelligence (pp. 1917–1923).
Zhang, Y., & Yeung, D.Y. (2014). A regularization approach to learning task relationships in multitask learning. ACM Transactions on Knowledge Discovery from Data, 8(3), 12.
Zhen, X., Yu, M., He, X., Li, S. (2017). Multi-target regression via robust low-rank learning. In IEEE transactions on pattern analysis and machine Intelligence.
Zhou, Q., & Zhao, Q. (2016). Flexible clustered multi-task learning by learning representative tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 266–278.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Zhu, X., Li, X., Zhang, S., Ju, C., Wu, X. (2017). Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Transactions on Neural Networks and Learning systems, 28(6), 1263–1275.
Zhu, X., Suk, H.I., Wang, L., Lee, S.W., Shen, D. (2015). Alzheimer’s disease neuroimaging initiative: a novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Medical Image Analysis, 38, 205–214.
Acknowledgements
This work was partially supported by the following grants: NSF-DBI 1356628, NSF-IIS 1633753, NIH R01 AG049371.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof
By the definition of the nuclear norm, we can re-write it in terms of traces as follows
Therefore, the nuclear norm of S can be also defined as the sum of the singular value decomposition of S. From (13), we have
which gives rise to
Multiplying U⊤ on both sides of (22), we have
Since U is also an orthogonal matrix, we achieve
Note that we have the fact that
where I is an identity matrix, and therefore U⊤∂U is an antisymmetric matrix. We have
which indicates that tr(U⊤∂UΣ) = 0. Similarly, we also have tr(Σ∂V ⊤V ) = 0. Therefore, we achieve
By taking the derivative of ||S||∗ w.r.t. S, we obtain
which closes the proof. □
Rights and permissions
About this article
Cite this article
Wang, X., Zhen, X., Li, Q. et al. Cognitive Assessment Prediction in Alzheimer’s Disease by Multi-Layer Multi-Target Regression. Neuroinform 16, 285–294 (2018). https://doi.org/10.1007/s12021-018-9381-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12021-018-9381-1