, Volume 11, Issue 2, pp 227–247 | Cite as

Penalized Likelihood Phenotyping: Unifying Voxelwise Analyses and Multi-Voxel Pattern Analyses in Neuroimaging

Penalized Likelihood Phenotyping
  • Nagesh AdluruEmail author
  • Bret M. Hanlon
  • Antoine Lutz
  • Janet E. Lainhart
  • Andrew L. Alexander
  • Richard J. Davidson
Original Article


Neuroimage phenotyping for psychiatric and neurological disorders is performed using voxelwise analyses also known as voxel based analyses or morphometry (VBM). A typical voxelwise analysis treats measurements at each voxel (e.g. fractional anisotropy, gray matter probability) as outcome measures to study the effects of possible explanatory variables (e.g. age, group) in a linear regression setting. Furthermore, each voxel is treated independently until the stage of correction for multiple comparisons. Recently, multi-voxel pattern analyses (MVPA), such as classification, have arisen as an alternative to VBM. The main advantage of MVPA over VBM is that the former employ multivariate methods which can account for interactions among voxels in identifying significant patterns. They also provide ways for computer-aided diagnosis and prognosis at individual subject level. However, compared to VBM, the results of MVPA are often more difficult to interpret and prone to arbitrary conclusions. In this paper, first we use penalized likelihood modeling to provide a unified framework for understanding both VBM and MVPA. We then utilize statistical learning theory to provide practical methods for interpreting the results of MVPA beyond commonly used performance metrics, such as leave-one-out-cross validation accuracy and area under the receiver operating characteristic (ROC) curve. Additionally, we demonstrate that there are challenges in MVPA when trying to obtain image phenotyping information in the form of statistical parametric maps (SPMs), which are commonly obtained from VBM, and provide a bootstrap strategy as a potential solution for generating SPMs using MVPA. This technique also allows us to maximize the use of available training data. We illustrate the empirical performance of the proposed framework using two different neuroimaging studies that pose different levels of challenge for classification using MVPA.


Classification Regression Voxel based morphometry Multi-Voxel pattern analysis Generalization risk Image phenotyping Penalized likelihood Linear models 



We are thankful to Kristen Zygmunt and P. Thomas Fletcher at the University of Utah, for data organization and eddy correction of the diffusion tensor imaging data of the autism study. We also thank Molly DuBray Prigge and Alyson Froehlich for providing us with the subject demographic and assessment information for the autism study. We are extremely thankful to Brianna Schuyller, Amelia Cayo and David Bachubber at the University of Wisconsin-Madison, for assisting us with the sample characteristics of the meditation study.

This work was supported by the NIMH R01 MH080826 (JEL) and R01 MH084795 (JEL) (University of Utah), the NIH Mental Retardation/ Developmental Disabilities Research Center (MRDDRC Waisman Center), NIMH 62015 (ALA), the AutismSociety of Southwestern Wisconsin, the NCCAM P01 AT004952-04 (RJD and AL) and the Waisman Core grant P30HD003352-45 (RJD). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Mental Health, the National Institutes of Health or the Waisman Center.


  1. Adluru, N., Hinrichs, C., Chung, M., Lee, J., Singh, V., Bigler, E., Lange, N., Lainhart, J., Alexander, A. (2009). Classification in DTI using shapes of white matter tracts. In IEEE engineering in medicine and biology society (pp. 2719–2722).Google Scholar
  2. Adluru, N., Ennis, C., Davidson, R., Alexander, A. (2012). Max margin general linear modeling for neuroimage analysis. In IEEE workshop on mathematical modeling in biomedical image analysis (pp. 105–110).Google Scholar
  3. Alexander, A., Lee, J., Lazar, M., Field A. (2007). Diffusion tensor imaging of the brain. Neurotherapeutics, 4, 316–329.PubMedCrossRefGoogle Scholar
  4. Anderson, M., & Oates T. (2010). A critique of multi-voxel pattern analysis. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd annual meeting of the cognitive science society (pp. 1511–1516).Google Scholar
  5. Arthur, G., Karsten, M.B., Malte, J.R., Bernhard, S., Alexander, J.S. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723–773.Google Scholar
  6. Basser, P., & Pierpaoli C. (1996). Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI. Journal of Magnetic Resonance, 111, 209–219.PubMedCrossRefGoogle Scholar
  7. Basser, P., Mattiello, J., Bihan, D. (1994). Estimation of the effective self -diffusion tensor from NMR spin echo. Journal of Magnetic Resonance, 103, 247–254.PubMedCrossRefGoogle Scholar
  8. Batmanghelich, N., Dong, A., Taskar, B., Davatzikos, C. (2011). Regularized tensor factorization for multi-modality medical image classification. Medical Image Computing and Computer Assisted Intervention, 14, 17–24.PubMedGoogle Scholar
  9. Beckmann, C., & Smith, S. (2005). Tensorial extensions of independent component analysis for multisubject fMRI analysis. Neuro Image, 25, 294–311.PubMedCrossRefGoogle Scholar
  10. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57(1), 125–133.Google Scholar
  11. Bickel, P., Li, B., Tsybakov, A., van de Geer, S., Yu, B., Valde´s, T., Rivero C., Fan, J., van der Vaart, A. (2006). Regularization in statistics. Test, 15(2), 271–344.CrossRefGoogle Scholar
  12. Bunea, F., She, Y., Ombao, H., Gongvatana, A., Devlin, K., Cohen, R. (2011). Penalized least squares regression methods and applications to neuroimaging. Neuroimage, 55(4), 1519–1527.PubMedCrossRefGoogle Scholar
  13. Carp, J., Park, J., Polk, T., Park, D. (2011). Age differences in neural distinctiveness revealed by multi-voxel pattern analysis. Neuroimage, 56(2), 736–743.PubMedCrossRefGoogle Scholar
  14. Carroll, M., Cecchi, G., Rish, I., Garg, R., Rao, A. (2009). Prediction and interpretation of distributed neural activity with sparse models. Neuroimage, 44, 112–122.PubMedCrossRefGoogle Scholar
  15. Chang, C., & Lin, C. (2001). LIBSVM: a library for support vector machines. Software available at
  16. Chernoff, H. (1952). A measure of asymptotic efficiency of tests of a hypothesis based upon the sum of the observations. Annals of Mathematical Statistics, 24, 493–507.CrossRefGoogle Scholar
  17. Cho, Y., Seong, J., Jeong, Y., Shin, S., ADNI (2012). Individual subject classification for alzheimer’s disease based on incremental learning using a spatial frequency representation of cortical thickness data. Neuroimage, 59(3), 2217–2230.PubMedCrossRefGoogle Scholar
  18. Chu, C., Hsu, A., Chou, K., Bandettini, P., Lin, C., ADNI (2012). Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. Neuroimage, 60(1), 59–70.PubMedCrossRefGoogle Scholar
  19. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.CrossRefGoogle Scholar
  20. Constantino, J., Przybeck, T., Friesen, D., Todd, R. (2000). Reciprocal social behavior in children with and without pervasive developmental disorders. Journal of Developmental and Behavioral Pediatrics, 21, 2–11.PubMedCrossRefGoogle Scholar
  21. Diciccio, T., & Romano, J. (1988). A review of bootstrap confidence intervals. Journal of the Royal Statistical Society Series B (Methodological), 50(3), 338–354.Google Scholar
  22. Efron, B. (1987). Better bootstrap confidence intervals. Journal of American Statistics Association, 82, 171–185.CrossRefGoogle Scholar
  23. Fraley, C., & Hesterberg, T. (2009). Least angle regression and lasso for large datasets. Statistical Analysis and Data Mining, 1(4), 251–259.CrossRefGoogle Scholar
  24. Friedman, J., Hastie, T., Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.PubMedGoogle Scholar
  25. Halchenko, Y.O., & Hanke, M. (2010). Advancing neuroimaging research with predictive multivariate pattern analysis. Neuromorphic Engineer 1–3. doi: 10.2417/1200909.1683.
  26. Hanke, M., Halchenko, Y.O., Sederberg, P.B., Olivetti, E., Fründ, I., Rieger, J.W., Herrmann, C.S., Haxby, J.V., Hanson, S.J., Pollmann, S. (2009a). PyMVPA: a unifying approach to the analysis of neuroscientific data. Frontiers in Neuroinformatics, 3(3), 1–13.Google Scholar
  27. Hanke, M., Halchenko, Y., Sederberg, P., Hanson, S., Haxby, J., Pollmann, S. (2009b). pyMVPA: a python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics, 7(1), 37–53.CrossRefGoogle Scholar
  28. Hastie, T., Tibshirani, R., Friedman J. (2009). The elements of statistical learning (2nd ed.). Berlin Heidelberg New York: Springer.CrossRefGoogle Scholar
  29. Hesterberg, T., Choi, N., Meier, L., Fraley, C. (2008). Least angle and \(\ell _{1}\) penalized regression: a review. Statistics Surveys, 2, 61–93.CrossRefGoogle Scholar
  30. Hinrichs C., Singh V., Mukherjee L., Xu G., Chung M., Johnson S., ADNI (2009). Spatially augmented LP-boosting for AD classification with evaluations on the ADNI dataset. Neuroimage, 48(1), 138–149.PubMedCrossRefGoogle Scholar
  31. Hinrichs, C., Singh, V., Xu, G., Johnson, S., ADNI (2011). Predictive markers for A.D. in a multi-modality framework: an analysis of MCI progression in the ADNI population. Neuroimage, 55(2), 574–589.PubMedCrossRefGoogle Scholar
  32. Huber, P. (1981). Robust statistics. New York: Wiley.CrossRefGoogle Scholar
  33. Ingalhalikar, M., Parker, D., Bloy, L., Roberts, T., Verma, R. (2011). Diffusion based abnormality markers of pathology: toward learned diagnostic prediction of ASD. NeuroImage, 57(3), 918–927.PubMedCrossRefGoogle Scholar
  34. Jäkel, F., Schölkopf, B., Wichmann, F. (2009). Does cognitive science need kernels? Trends in Cognitive Sciences, 13, 381–388.PubMedCrossRefGoogle Scholar
  35. Jones, D., Simmons, A., Williams, S., Horsfield, M. (1999). Non-invasive assessment of axonal fiber connectivity in the human brain via diffusion tensor MRI. Magnetic Resonance in Medicine, 42, 37–41.PubMedCrossRefGoogle Scholar
  36. Kääriäinen, M., & Langford, J. (2005). A comparison of tight generalization error bounds. In International conference on machine learning (pp. 409–416).Google Scholar
  37. Kanungo, T., & Haralick R. (1995). Multivariate hypothesis testing for gaussian data: Theory and software. Tech. rep., University of Washington.Google Scholar
  38. Kearns, M., & Ron, D. (1999). Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11, 1427–1453.PubMedCrossRefGoogle Scholar
  39. Kolda, T.G., & Bader, B.W. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500.CrossRefGoogle Scholar
  40. Kotsia, I., Guob, W., Patrasa, I. (2012). Higher rank support tensor machines for visual recognition. Pattern Recognition, 45(12), 4192–4203.CrossRefGoogle Scholar
  41. Lange, N., Dubray, M., Lee, J., Froimowitz, M., Froehlich, A., Adluru, N., Wright, B., Ravichandran, C., Fletcher, P., Bigler, E., Alexander, A., Lainhart, J. (2010). A typical diffusion tensor hemispheric asymmetry in autism. Autism Research, 3(6), 350–358.CrossRefGoogle Scholar
  42. Langford, J., & Shawe-taylor, J. (2002). PAC-Bayes & margins. In Advances in neural information processing systems (pp. 439–446).Google Scholar
  43. Le Bihan, D., Mangin, J., Poupon, C., Clark, C., Pappata, S., Molko, N.H.C. (2001). Diffusion tensor imaging: concepts and applications. Journal of Magnetic Resonance Imaging, 13(4), 534–546.PubMedCrossRefGoogle Scholar
  44. Liu, M., Zhang, D., Shen, D., ADNI (2012). Ensemble sparse classification of Alzheimer’s disease. Neuro Image, 60, 1106–1116.PubMedCrossRefGoogle Scholar
  45. Marquardt, D., & Snee, R. (1975). Ridge regression in practice. The American Statistician, 29(1), 3–20.Google Scholar
  46. McCullagh, P., & Nelder, J. (1989). Generalized linear models. London, UK: Chapman & Hall/CRC.Google Scholar
  47. Mitchell, T. (2011). From journal articles to computational models: a new automated tool. Nature Methods, 8(8), 627–628.PubMedCrossRefGoogle Scholar
  48. Montague, P., Dolan, R., Friston, K., Dayan, P. (2012). Computational psychiatry. Cell Special Issue: Cognition in Neuropsychiatric Disorders, 16(1), 72–80.Google Scholar
  49. Mori, S., Kaufmann, W., Davatzikos, C., Stieltjes, B., Amodei, L., Fredericksen, K., Pearlson, G., Melhem, E., Solaiyappan, M., Raymond, G., Moser, H., van Zijl, P. (2002). Imaging cortical association tracts in the human brain using diffusion-tensor-based axonal tracking. Magnetic Resonance in Medicine, 47, 215–223.PubMedCrossRefGoogle Scholar
  50. Mumford, J., & Nichols, T. (2008). Power calculation for group fmri studies accounting for arbitrary design and temporal autocorrelation. Neuroimage, 39(1), 261–268.PubMedCrossRefGoogle Scholar
  51. Nichols, T., & Holmes, A. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1–25.PubMedCrossRefGoogle Scholar
  52. Nieto-Castanon, A., Ghosh, S., Tourville, J., Guenther, F. (2003). Region of interest based analysis of functional imaging data. Neuroimage, 19(4), 1303–1316.PubMedCrossRefGoogle Scholar
  53. Norman, K., Polyn, S., Detre, G., Haxby, J. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.PubMedCrossRefGoogle Scholar
  54. Pachauri, D., Hinrichs, C., Chung, M., Johnson, S., Singh, V. (2011). Topology-based kernels with application to inference problems in Alzheimer’s disease. IEEE Transactions on Medical Imaging, 30(10), 1760–1770.PubMedCrossRefGoogle Scholar
  55. Park, M., & Hastie, T. (2007). \(\ell _{1}\)-regularization path algorithm for generalized linear models. Royal Statistical Society Series B Statistical Methodology, 69(4), 659.CrossRefGoogle Scholar
  56. Pereira, F., Mitchell, T., Botvinick, M. (2009). Machine learning classifiers and fMRI: a tutorial overview. Neuroimage, 45(1), S199—S209.PubMedCrossRefGoogle Scholar
  57. Ryali, S., Supekar, K., Abrams, D., Menon, V. (2010). Sparse logistic regression for whole-brain classification of fMRI data. Neuroimage, 18, 752–764.CrossRefGoogle Scholar
  58. Sabato, S., Srebro, N., Tishby, N. (2012). Characterizing the sample complexity of large-margin learning with second-order statistics. Computing Research Repository (CoRR), abs/1204.1276, 1–30.Google Scholar
  59. Scholkopf, B., & Smola, A. (2001). Learning with kernels: Support vector machines, regularization. Cambridge: MIT Press.Google Scholar
  60. Shi, Z., Zheng, T., Han, J. (2011). Trace norm regularized tensor classification and its online learning approaches. Computing Research Repository (CoRR), abs/1109.1342, 1–11.Google Scholar
  61. Shiffrin, R. (2010). Perspectives on modeling in cognitive science. Topics in Cognitive Science, 2(4), 736–750.CrossRefGoogle Scholar
  62. Shiffrin, R., Lee, M., Kim, W., Wagenmakers, E. (2008). A survey of model evaluation approaches with a tutorial on hierarchical bayesian methods. Cognitive Science, 32, 1248–1284.PubMedCrossRefGoogle Scholar
  63. Signoretto, M., De Lathauwer, L., Suykens, J. (2011). Nuclear norms for tensors and their use for convex multilinear estimation. Tech. rep., KU Leuven.Google Scholar
  64. Smith, S., & Nichols, T. (2009). Threshold-free cluster enhancement: addressing problems of smoothing, threshold dependence and localisation in cluster inference. Neuroimage, 44(1), 83–98.PubMedCrossRefGoogle Scholar
  65. Smith, S., Jenkinson, M., Woolrich, M., Beckmann, C., Behrens, T., Johansen-Berg, H., Bannister, P., De Luca, M., Drobnjak, I., Flitney, D., Niazy, R., Saunders, J., Vickers, J., Zhang, Y., De Stefano, N., Brady, J., Matthews, P. (2004). Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage, 23, 208–219.CrossRefGoogle Scholar
  66. Taylor, J., & Worsley, K. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Annals of Statistics, 36, 1–27.CrossRefGoogle Scholar
  67. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.Google Scholar
  68. Valiant, L. (1984). A theory of the learnable. Communications ACM, 27(11), 1134–1142.CrossRefGoogle Scholar
  69. Vapnik, V., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2), 264–280.CrossRefGoogle Scholar
  70. Vounou, M., Nichols, T., Montana, G., ADNI (2010). Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. Neuroimage, 53, 114–1159.CrossRefGoogle Scholar
  71. Wolf, L., Jhuang, H., Hazan, T. (2007). Modeling appearances with low-rank SVM. In Computer vision and pattern recognition (pp. 1–6).Google Scholar
  72. Worsley, K., Marrett, S., Neelin, P., Vandal, A., Friston, K., Evans, A. (1996). A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping, 4, 58–73.PubMedCrossRefGoogle Scholar
  73. Worsley, K., Taylor, J., Tomaiuolo, F., Lerch, J. (2004). Unified univariate and multivariate random field theory. Neuroimage, 23, 189–195.CrossRefGoogle Scholar
  74. Worsley, K., Taylor, J., Carbonell, F., Chung, M., Duerden, E., Bernhardt, B., Lyttelton, O., Boucher, M., Evans, A. (2009). SurfStat: a Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theory. Neuroimage, 47, S102–S102.CrossRefGoogle Scholar
  75. Yarkoni, T., Poldrack, R., Nichols, T., Van Essen, D., Wager, T. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature Methods, 8(8), 665–670.PubMedCrossRefGoogle Scholar
  76. Yuan, G., Ho, C., Lin, C. (2011). An improved GLMNET for \(\ell _{1}\)-regularized logistic regression and support vector machines. Tech. rep., National Taiwan University.Google Scholar
  77. Zhang, H., Yushkevich, P., Alexander, D., Gee, J. (2006). Deformable registration of diffusion tensor MR images with explicit orientation optimization. Medical Image Analysis, 10, 764–785.PubMedCrossRefGoogle Scholar
  78. Zhang, D., Shen, D., ADNI (2012). Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. Neuroimage, 59(2), 895–907.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Nagesh Adluru
    • 1
    Email author
  • Bret M. Hanlon
    • 1
  • Antoine Lutz
    • 1
  • Janet E. Lainhart
    • 2
  • Andrew L. Alexander
    • 1
  • Richard J. Davidson
    • 1
  1. 1.University of Wisconsin-MadisonMadisonUSA
  2. 2.University of UtahSalt Lake CityUSA

Personalised recommendations