, Volume 16, Issue 3–4, pp 285–294 | Cite as

Cognitive Assessment Prediction in Alzheimer’s Disease by Multi-Layer Multi-Target Regression

  • Xiaoqian Wang
  • Xiantong Zhen
  • Quanzheng Li
  • Dinggang Shen
  • Heng Huang
Original Article


Accurate and automatic prediction of cognitive assessment from multiple neuroimaging biomarkers is crucial for early detection of Alzheimer’s disease. The major challenges arise from the nonlinear relationship between biomarkers and assessment scores and the inter-correlation among them, which have not yet been well addressed. In this paper, we propose multi-layer multi-target regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general compositional framework. Specifically, by kernelized dictionary learning, the MMR can effectively handle highly nonlinear relationship between biomarkers and assessment scores; by robust low-rank linear learning via matrix elastic nets, the MMR can explicitly encode inter-correlations among multiple assessment scores; moreover, the MMR is flexibly and allows to work with non-smooth 2,1-norm loss function, which enables calibration of multiple targets with disparate noise levels for more robust parameter estimation. The MMR can be efficiently solved by an alternating optimization algorithm via gradient descent with guaranteed convergence. The MMR has been evaluated by extensive experiments on the ADNI database with MRI data, and produced high accuracy surpassing previous regression models, which demonstrates its great effectiveness as a new multi-target regression model for clinical multivariate prediction.


Multi-target regression Robust low-rank learning Calibration Nonlinear regression Alzheimer’s disease 



This work was partially supported by the following grants: NSF-DBI 1356628, NSF-IIS 1633753, NIH R01 AG049371.


  1. Agarwal, A., Gerber, S., Daume, H. (2010). Learning multiple tasks using manifold regularization. In Advances in neural information processing system (pp. 46–54).Google Scholar
  2. Aho, T., ženko, B., Džeroski, S., Elomaa, T. (2012). Multi-target regression with rule ensembles. Journal of Machine Learning Research, 13(1), 2367–2407.Google Scholar
  3. Alvarez, M., Rosasco, L., Lawrence, N. (2012). Kernels for vector-valued functions: a review. Foundations and Trends in Machine Learning.Google Scholar
  4. Argyriou, A., Evgeniou, T., Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.CrossRefGoogle Scholar
  5. Armijo, L. (1966). Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1), 1–3.CrossRefGoogle Scholar
  6. Association, A. et al. (2016). 2016 alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 12(4), 459–509.CrossRefGoogle Scholar
  7. Ciliberto, C., Mroueh, Y., Poggio, T., Rosasco, L. (2015). Convex learning of multiple tasks and their structure. In Internationl conference on machine learning (pp. 1548–1557).Google Scholar
  8. Daumé III, H. (2009). Bayesian multitask learning with latent hierarchies. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 135–142).Google Scholar
  9. Dinuzzo, F. (2013). Learning output kernels for multi-task problems. Neurocomputing, 118, 119–126.CrossRefGoogle Scholar
  10. Dinuzzo, F., Ong, C.S., Pillonetto, G., Gehler, P.V. (2011). Learning output kernels with block coordinate descent. In Internationl conference on machine learning (pp. 49–56).Google Scholar
  11. Dinuzzo, F., & Schölkopf, B. (2012). The representer theorem for Hilbert spaces: a necessary and sufficient condition. In Advances in neural information processing system (pp. 189–196).Google Scholar
  12. Evgeniou, T., Micchelli, C.A., Pontil, M. (2005). Learning multiple tasks with kernel methods. In Journal of machine learning research (pp. 615–637).Google Scholar
  13. Falahati, F., Ferreira, D., Muehlboeck, J.S., Eriksdotter, M., Simmons, A., Wahlund, L.O., Westman, E. (2016). Longitudinal investigation of an mri-based alzheimers disease diagnostic index in adni. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 12(7), P732–P733.CrossRefGoogle Scholar
  14. Feng, Y., Lv, S.G., Hang, H., Suykens, J.A. (2016). Kernelized elastic net regularization: generalization bounds, and sparse recovery. Neural Computation, 28(3), 525–562.CrossRefPubMedGoogle Scholar
  15. Ferrarini, L., Palm, W.M., Olofsen, H., van der Landen, R., Blauw, G.J., Westendorp, R.G., Bollen, E.L., Middelkoop, H.A., Reiber, J.H., van Buchem, M.A., et al. (2008). Mmse scores correlate with local ventricular enlargement in the spectrum from cognitively normal to alzheimer disease. NeuroImage, 39(4), 1832–1838.CrossRefPubMedGoogle Scholar
  16. Folstein, M.F., Folstein, S.E., McHugh, P.R. (1975). A mini-mental state: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–198.CrossRefPubMedGoogle Scholar
  17. Gillberg, J., Marttinen, P., Pirinen, M., Kangas, A.J., Soininen, P., Ali, M., Havulinna, A.S., Järvelin, M. R., Ala-Korpela, M., Kaski, S. (2016). Multiple output regression with latent noise. The Journal of Machine Learning Research, 17(1), 4170–4204.Google Scholar
  18. Gong, P., Zhou, J., Fan, W., Ye, J. (2014). Efficient multi-task feature learning with calibration. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 761–770).Google Scholar
  19. Hara, K., & Chellappa, R. (2014). Growing regression forests by classification: applications to object pose estimation. In European conference on computer vision (pp. 552–567).Google Scholar
  20. Jack, C.R., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., Borowski, B., Britson, P.J., L Whitwell, J., Ward, C., et al. (2008). The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging, 27(4), 685–691.CrossRefPubMedGoogle Scholar
  21. Kabani, N.J. (1998). 3d anatomical atlas of the human brain. Neuroimage, 7, P–0717.Google Scholar
  22. Kolar, M., Lafferty, J., Wasserman, L. (2011). Union support recovery in multi-task learning. Journal of Machine Learning Research, 12, 2415–2435.Google Scholar
  23. Kumar, A., & Daume, H. (2012). Learning task grouping and overlap in multi-task learning. In Internationl conference on machine learning (pp. 1383–1390).Google Scholar
  24. Lee, S.I., Chatalbashev, V., Vickrey, D., Koller, D. (2007). Learning a meta-level prior for feature relevance from multiple related tasks. In Internationl conference on machine learning (pp. 489–496).Google Scholar
  25. Li, C., Georgiopoulos, M., Anagnostopoulos, G.C. (2015). Pareto-path multitask multiple kernel learning. IEEE Transactions on Neural Networks and Learning Systems, 26(1), 51–61.CrossRefPubMedGoogle Scholar
  26. Li, H., Chen, N., Li, L. (2012). Error analysis for matrix elastic-net regularization algorithms. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 737–748.CrossRefPubMedGoogle Scholar
  27. Liu, H., Wang, L., Zhao, T. (2014). Multivariate regression with calibration. In Advances in neural information processing system (pp. 127–135).Google Scholar
  28. Lounici, K., Pontil, M., Van De Geer, S., Tsybakov, A.B. (2011). Oracle inequalities and optimal inference under group sparsity. In The annals of statistics (pp. 2164–2204).Google Scholar
  29. Molstad, A.J., & Rothman, A.J. (2015). Indirect multivariate response linear regression. arXiv:1507.04610.
  30. Moradi, E., Hallikainen, I., Hänninen, T., Tohka, J., Initiative, A.D.N., et al. (2016). Rey’s auditory verbal learning test scores can be predicted from whole brainmri in alzheimer’s disease. NeuroImage: Clinical.Google Scholar
  31. Mueller, S.G., Weiner, M.W., Thal, L.J., Petersen, R.C., Jack, C.R., Jagust, W., Trojanowski, J.Q., Toga, A.W., Beckett, L. (2005). Ways toward an early diagnosis in alzheimers disease: the alzheimers disease neuroimaging initiative (adni). Alzheimer’s & Dementia, 1(1), 55–66.CrossRefGoogle Scholar
  32. Pan, Y., Xia, R., Yin, J., Liu, N. (2015). A divide-and-conquer method for scalable robust mul-titask learning. IEEE Transactions on Neural Networks and Learning Systems, 26(12), 3163–3175.CrossRefPubMedGoogle Scholar
  33. Rai, P., Kumar, A., Daume, H. (2012). Simultaneously leveraging output and task structures for multiple-output regression. In Advances in neural information processing system (pp. 3185–3193).Google Scholar
  34. Rakitsch, B., Lippert, C., Borgwardt, K., Stegle, O. (2013). It is all in the noise: efficient multi-task gaussian process inference with structured residuals. In NIPS (pp. 1466–1474).Google Scholar
  35. Rothman, A.J., Levina, E., Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4), 947–962.CrossRefPubMedPubMedCentralGoogle Scholar
  36. Schmidt, M., & et al. (1996). Rey auditory verbal learning test: a handbook. Western Psychological Services Los Angeles.Google Scholar
  37. Seshadri, S., DeStefano, A.L., Au, R., Massaro, J.M., Beiser, A.S., Kelly-Hayes, M., Kase, C.S., D’Agostino, R.B., DeCarli, C., Atwood, L.D., et al. (2007). Genetic correlates of brain aging on mri and cognitive test measures: a genome-wide association and linkage analysis in the framingham study. BMC Medical Genetics, 8(1), S15.CrossRefPubMedPubMedCentralGoogle Scholar
  38. Shen, D., & Davatzikos, C. (2002). Hammer: hierarchical attribute matching mechanism for elastic registration. IEEE Transactions on Medical Imaging, 21(11), 1421–1439.CrossRefPubMedGoogle Scholar
  39. Sled, J.G., Zijdenbos, A.P., Evans, A.C. (1998). A nonparametric method for automatic correction of intensity nonuniformity in mri data. IEEE Transactions on Medical Imaging, 17(1), 87–97.CrossRefPubMedGoogle Scholar
  40. Sohn, K.A., & Kim, S. (2012). Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. In International conference on artificial intelligence and statistics (pp. 1081–1089).Google Scholar
  41. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288.Google Scholar
  42. Tsoumakas, G., Spyromitros-Xioufis, E., Vrekou, A., Vlahavas, I. (2014). Multi-target regression via random linear target combinations. In Machine learning and knowledge discovery in databases (pp. 225–240). Springer.Google Scholar
  43. Wang, H, Nie, F, Huang, H, Yan, J, Kim, S, Risacher, S, Saykin, A, Shen, L. (2012). High-order multi-task feature learning to identify longitudinal phenotypic markers for alzheimer’s disease progression prediction. In Advances in neural information processing systems (pp. 1277–1285).Google Scholar
  44. Wang, Y., Nie, J., Yap, P.T., Li, G., Shi, F., Geng, X., Guo, L., Shen, D., Initiative, A.D.N., et al. (2014). Knowledge-guided robust mri brain extraction for diverse large-scale neuroimaging studies on humans and non-human primates. PloS One, 9(1), e77810.CrossRefPubMedPubMedCentralGoogle Scholar
  45. Wang, Y., Nie, J., Yap, P.T., Shi, F., Guo, L., Shen, D. (2011). Robust deformable-surface-based skull-stripping for large-scale studies. In Medical image computing and computer-assisted intervention–MICCAI 2011 (pp. 635–642). Springer.Google Scholar
  46. Yu, K., Tresp, V., Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks. In International conference on machine learning (pp. 1012–1019).Google Scholar
  47. Zhang, Y., Brady, M., Smith, S. (2001). Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging, 20(1), 45–57.CrossRefPubMedGoogle Scholar
  48. Zhang, Y., & Yeung, D.Y. (2013). Learning high-order task relationships in multi-task learning. In International joint conference on artificial intelligence (pp. 1917–1923).Google Scholar
  49. Zhang, Y., & Yeung, D.Y. (2014). A regularization approach to learning task relationships in multitask learning. ACM Transactions on Knowledge Discovery from Data, 8(3), 12.CrossRefGoogle Scholar
  50. Zhen, X., Yu, M., He, X., Li, S. (2017). Multi-target regression via robust low-rank learning. In IEEE transactions on pattern analysis and machine Intelligence.Google Scholar
  51. Zhou, Q., & Zhao, Q. (2016). Flexible clustered multi-task learning by learning representative tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 266–278.CrossRefPubMedGoogle Scholar
  52. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.CrossRefGoogle Scholar
  53. Zhu, X., Li, X., Zhang, S., Ju, C., Wu, X. (2017). Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Transactions on Neural Networks and Learning systems, 28(6), 1263–1275.CrossRefPubMedGoogle Scholar
  54. Zhu, X., Suk, H.I., Wang, L., Lee, S.W., Shen, D. (2015). Alzheimer’s disease neuroimaging initiative: a novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Medical Image Analysis, 38, 205–214.CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Xiaoqian Wang
    • 1
  • Xiantong Zhen
    • 1
  • Quanzheng Li
    • 2
  • Dinggang Shen
    • 3
  • Heng Huang
    • 1
  1. 1.Department of Electrical, Computer EngineeringUniversity of PittsburghPennsylvaniaUSA
  2. 2.Department of Radiology, Massachusetts General HospitalHarvard Medical SchoolBostonUSA
  3. 3.Radiology and BRICUNC-CH School of MedicineChapel HillUSA

Personalised recommendations