A Likelihood-Free Approach for Characterizing Heterogeneous Diseases in Large-Scale Studies

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10265)


We propose a non-parametric approach for characterizing heterogeneous diseases in large-scale studies. We target diseases where multiple types of pathology present simultaneously in each subject and a more severe disease manifests as a higher level of tissue destruction. For each subject, we model the collection of local image descriptors as samples generated by an unknown subject-specific probability density. Instead of approximating the probability density via a parametric family, we propose to side step the parametric inference by directly estimating the divergence between subject densities. Our method maps the collection of local image descriptors to a signature vector that is used to predict a clinical measurement. We are able to interpret the prediction of the clinical variable in the population and individual levels by carefully studying the divergences. We illustrate an application this method on simulated data as well as on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD). Our approach outperforms classical methods on both simulated and COPD data and demonstrates the state-of-the-art prediction on an important physiologic measure of airflow (the forced respiratory volume in one second, FEV1).


Chronic Obstructive Pulmonary Disease Locally Linear Embedding Neighbor Graph Histogram Feature Local Image Descriptor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by in part by NLM Training grant T15LM007059, NIH NIBIB NAMIC U54-EB005149, NIH NCRR NAC P41-RR13218 and NIH NIBIB NAC P41-EB015902, NHLBI R01HL089856, R01HL089897, K08HL097029, R01HL113264, 5K25HL104085, 5R01HL116931, and 5R01HL116473. The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, GlaxoSmithKline and Sunovion.


  1. 1.
    Alexander, D.H., Novembre, J., Lange, K.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9), 1655–1664 (2009)CrossRefGoogle Scholar
  2. 2.
    Batmanghelich, N.K., Saeedi, A., Cho, M., Estepar, R.S.J., Golland, P.: Generative method to discover genetically driven image biomarkers. Int. Conf. Inf. Process. Med. Imaging 17(1), 30–42 (2015)Google Scholar
  3. 3.
    Binder, P., Batmanghelich, N.K., Estepar, R.S.J., Golland, P.: Unsupervised discovery of emphysema subtypes in a large clinical cohort. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 180–187. Springer, Cham (2016). doi: 10.1007/978-3-319-47157-0_22 CrossRefGoogle Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Depeursinge, A., Chin, A.S., Leung, A.N., Terrone, D., Bristow, M., Rosen, G., Rubin, D.L.: Automated classification of usual interstitial pneumonia using regional volumetric texture analysis in high-resolution computed tomography. Invest. Radiol. 50(4), 261–267 (2015)CrossRefGoogle Scholar
  6. 6.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., Knight, K., Loubes, J.M., Massart, P., Madigan, D., Ridgeway, G., Rosset, S., Zhu, J.I., Stine, R.A., Turlach, B.A., Weisberg, S., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gao, W., Oh, S., Viswanath, P.: Breaking the bandwidth barrier: geometrical adaptive entropy estimation (2016).
  8. 8.
    Holzer, M., Donner, R.: Over-segmentation of 3D medical image volumes based on monogenic cues. In: CVWW, pp. 35–42 (2014).
  9. 9.
    Lauritzen, S.L., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Chapter 4: Statistical Manifolds, pp. 163–216. Institute of Mathematical Statistics (1987).
  10. 10.
    Liu, K., Skibbe, H., Schmidt, T., Blein, T., Palme, K., Brox, T., Ronneberger, O.: Rotation-invariant HOG descriptors using fourier analysis in polar and spherical coordinates. Int. J. Comput. Vis. 106(3), 342–364 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Loader, C.R.: Local likelihood density estimation. Ann. Stat. 24(4), 1602–1618 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Mendoza, C.S., et al.: Emphysema quantification in a multi-scanner HRCT cohort using local intensity distributions. In: 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI), pp. 474–477. IEEE (2012)Google Scholar
  13. 13.
    Muja, M., Lowe, D.G.: Scalable nearest neighbour algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)CrossRefGoogle Scholar
  14. 14.
    Póczos, B., Schneider, J.G.: On the estimation of alpha-divergences. In: AISTATS, pp. 609–617 (2011)Google Scholar
  15. 15.
    Poczos, B., Xiong, L., Schneider, J.: Nonparametric divergence estimation with applications to machine learning on distributions. Uncertainty in Artificial Intelligence (2011)Google Scholar
  16. 16.
    Regan, E.A., Hokanson, J.E., Murphy, J.R., Make, B., Lynch, D.A., Beaty, T.H., Curran-Everett, D., Silverman, E.K., Crapo, J.D.: Genetic epidemiology of COPD (COPDGene) study design. COPD: J. Chronic Obstructive Pulm. Dis. 7(1), 32–43 (2011)CrossRefGoogle Scholar
  17. 17.
    Satoh, K., Kobayashi, T., Misao, T., Hitani, Y., Yamamoto, Y., Nishiyama, Y., Ohkawa, M.: CT assessment of subtypes of pulmonary emphysema in smokers. CHEST J. 120(3), 725–729 (2001)CrossRefGoogle Scholar
  18. 18.
    Shaker, S.B., Bruijne, M.D., Sorensen, L., Shaker, S.B., De Bruijne, M.: Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans. Med. Imaging 29(2), 559–569 (2010)CrossRefGoogle Scholar
  19. 19.
    Shapiro, S.D.: Evolving concepts in the pathogenesis of chronic obstructive pulmonary disease. Clin. Chest Med. 21(4), 621–632 (2000)CrossRefGoogle Scholar
  20. 20.
    Song, L., Siddiqi, S.M., Gordon, G., Smola, A.: Hilbert space embeddings of hidden Markov models. In: The 27th International Conference on Machine Learning (ICML2010), pp. 991–998 (2010)Google Scholar
  21. 21.
    Sorensen, L., Nielsen, M., Lo, P., Ashraf, H., Pedersen, J.H., De Bruijne, M.: Texture-based analysis of COPD: a data-driven approach. IEEE Trans. Med. Imaging 31(1), 70–78 (2012)CrossRefGoogle Scholar
  22. 22.
    Vogl, W.-D., Prosch, H., Müller-Mang, C., Schmidt-Erfurth, U., Langs, G.: Longitudinal alignment of disease progression in fibrosing interstitial lung disease. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 97–104. Springer, Cham (2014). doi: 10.1007/978-3-319-10470-6_13 Google Scholar
  23. 23.
    Zhang, Q., Goncalves, B.: Why should I trust you? Explaining the predictions of any classifier, p. 4503. ACM (2015)Google Scholar
  24. 24.
    Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, pp. 1593–1600 (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Biomedical InformaticsUniversity of PittsburghPittsburghUSA
  2. 2.Intelligence Systems ProgramUniversity of PittsburghPittsburghUSA
  3. 3.Brigham and Women’s HospitalBostonUSA

Personalised recommendations