Skip to main content

A Likelihood-Free Approach for Characterizing Heterogeneous Diseases in Large-Scale Studies

  • Conference paper
  • First Online:
Information Processing in Medical Imaging (IPMI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10265))

Included in the following conference series:

Abstract

We propose a non-parametric approach for characterizing heterogeneous diseases in large-scale studies. We target diseases where multiple types of pathology present simultaneously in each subject and a more severe disease manifests as a higher level of tissue destruction. For each subject, we model the collection of local image descriptors as samples generated by an unknown subject-specific probability density. Instead of approximating the probability density via a parametric family, we propose to side step the parametric inference by directly estimating the divergence between subject densities. Our method maps the collection of local image descriptors to a signature vector that is used to predict a clinical measurement. We are able to interpret the prediction of the clinical variable in the population and individual levels by carefully studying the divergences. We illustrate an application this method on simulated data as well as on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD). Our approach outperforms classical methods on both simulated and COPD data and demonstrates the state-of-the-art prediction on an important physiologic measure of airflow (the forced respiratory volume in one second, FEV1).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexander, D.H., Novembre, J., Lange, K.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9), 1655–1664 (2009)

    Article  Google Scholar 

  2. Batmanghelich, N.K., Saeedi, A., Cho, M., Estepar, R.S.J., Golland, P.: Generative method to discover genetically driven image biomarkers. Int. Conf. Inf. Process. Med. Imaging 17(1), 30–42 (2015)

    Google Scholar 

  3. Binder, P., Batmanghelich, N.K., Estepar, R.S.J., Golland, P.: Unsupervised discovery of emphysema subtypes in a large clinical cohort. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 180–187. Springer, Cham (2016). doi:10.1007/978-3-319-47157-0_22

    Chapter  Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Depeursinge, A., Chin, A.S., Leung, A.N., Terrone, D., Bristow, M., Rosen, G., Rubin, D.L.: Automated classification of usual interstitial pneumonia using regional volumetric texture analysis in high-resolution computed tomography. Invest. Radiol. 50(4), 261–267 (2015)

    Article  Google Scholar 

  6. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., Knight, K., Loubes, J.M., Massart, P., Madigan, D., Ridgeway, G., Rosset, S., Zhu, J.I., Stine, R.A., Turlach, B.A., Weisberg, S., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  Google Scholar 

  7. Gao, W., Oh, S., Viswanath, P.: Breaking the bandwidth barrier: geometrical adaptive entropy estimation (2016). http://arxiv.org/abs/1609.02208

  8. Holzer, M., Donner, R.: Over-segmentation of 3D medical image volumes based on monogenic cues. In: CVWW, pp. 35–42 (2014). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.707.2473&rep=rep1&type=pdf

  9. Lauritzen, S.L., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Chapter 4: Statistical Manifolds, pp. 163–216. Institute of Mathematical Statistics (1987). http://projecteuclid.org/euclid.lnms/1215467061

  10. Liu, K., Skibbe, H., Schmidt, T., Blein, T., Palme, K., Brox, T., Ronneberger, O.: Rotation-invariant HOG descriptors using fourier analysis in polar and spherical coordinates. Int. J. Comput. Vis. 106(3), 342–364 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Loader, C.R.: Local likelihood density estimation. Ann. Stat. 24(4), 1602–1618 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  12. Mendoza, C.S., et al.: Emphysema quantification in a multi-scanner HRCT cohort using local intensity distributions. In: 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI), pp. 474–477. IEEE (2012)

    Google Scholar 

  13. Muja, M., Lowe, D.G.: Scalable nearest neighbour algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  14. Póczos, B., Schneider, J.G.: On the estimation of alpha-divergences. In: AISTATS, pp. 609–617 (2011)

    Google Scholar 

  15. Poczos, B., Xiong, L., Schneider, J.: Nonparametric divergence estimation with applications to machine learning on distributions. Uncertainty in Artificial Intelligence (2011)

    Google Scholar 

  16. Regan, E.A., Hokanson, J.E., Murphy, J.R., Make, B., Lynch, D.A., Beaty, T.H., Curran-Everett, D., Silverman, E.K., Crapo, J.D.: Genetic epidemiology of COPD (COPDGene) study design. COPD: J. Chronic Obstructive Pulm. Dis. 7(1), 32–43 (2011)

    Article  Google Scholar 

  17. Satoh, K., Kobayashi, T., Misao, T., Hitani, Y., Yamamoto, Y., Nishiyama, Y., Ohkawa, M.: CT assessment of subtypes of pulmonary emphysema in smokers. CHEST J. 120(3), 725–729 (2001)

    Article  Google Scholar 

  18. Shaker, S.B., Bruijne, M.D., Sorensen, L., Shaker, S.B., De Bruijne, M.: Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans. Med. Imaging 29(2), 559–569 (2010)

    Article  Google Scholar 

  19. Shapiro, S.D.: Evolving concepts in the pathogenesis of chronic obstructive pulmonary disease. Clin. Chest Med. 21(4), 621–632 (2000)

    Article  Google Scholar 

  20. Song, L., Siddiqi, S.M., Gordon, G., Smola, A.: Hilbert space embeddings of hidden Markov models. In: The 27th International Conference on Machine Learning (ICML2010), pp. 991–998 (2010)

    Google Scholar 

  21. Sorensen, L., Nielsen, M., Lo, P., Ashraf, H., Pedersen, J.H., De Bruijne, M.: Texture-based analysis of COPD: a data-driven approach. IEEE Trans. Med. Imaging 31(1), 70–78 (2012)

    Article  Google Scholar 

  22. Vogl, W.-D., Prosch, H., Müller-Mang, C., Schmidt-Erfurth, U., Langs, G.: Longitudinal alignment of disease progression in fibrosing interstitial lung disease. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 97–104. Springer, Cham (2014). doi:10.1007/978-3-319-10470-6_13

    Google Scholar 

  23. Zhang, Q., Goncalves, B.: Why should I trust you? Explaining the predictions of any classifier, p. 4503. ACM (2015)

    Google Scholar 

  24. Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, pp. 1593–1600 (2006)

    Google Scholar 

Download references

Acknowledgements

This work was supported by in part by NLM Training grant T15LM007059, NIH NIBIB NAMIC U54-EB005149, NIH NCRR NAC P41-RR13218 and NIH NIBIB NAC P41-EB015902, NHLBI R01HL089856, R01HL089897, K08HL097029, R01HL113264, 5K25HL104085, 5R01HL116931, and 5R01HL116473. The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, GlaxoSmithKline and Sunovion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kayhan N. Batmanghelich .

Editor information

Editors and Affiliations

A Appendix: Non-parametric Inference

A Appendix: Non-parametric Inference

In this section, we first show that the unnormalized density f(x) has a closed-form using locally constant approximation. Then, we show why the second-order approximation is computationally expensive for our problem. Finally, we provide more detail on the approximation of the KL and HE divergences.

Assuming a locally constant function for \(f(x) = \exp (a_0)\), we can compute a closed-form solution for \(a_0\) by differentiating Eq. 4 with respect to \(a_0\):

$$\begin{aligned} \frac{d \mathcal {L}_x (f_i)}{da_0} = \sum _{ v \in S_i } { w \left( \frac{x - \psi (v)}{h} \right) } - | S_i | \int { w \left( \frac{y - x}{h} \right) e^{a_0} dy} = 0 \end{aligned}$$

If we set \(h \equiv \rho _{k, S_i } (x) \) and use the step window function (\(w(x) = \mathbb {I}(\Vert x \Vert \le 1)\)), the first term in the right hand-side becomes exactly k and the second term is the volume of a d-dimensional hyper-sphere with radius h which is \(C_d h^d\), and we arrive at Eq. 5. For the Gaussian window function, the first term becomes a weighted sum k points in the vicinity of x and the second term has the same closed-form as the normalizer of the Gaussian distribution.

If we set h to a constant and use the Gaussian window function and the second-order polynomial, i.e., \(\log f(u) |_x \approx a_0 + (u -x)^T a_1 + (u-x)^T a_2 (u-x)\), the local parameters have closed-forms [7, 11]:

$$\begin{aligned}&a_0 = \log (A_0 ) - \frac{ \Vert A_1 \Vert ^2}{A_0^2} -( d \log \sqrt{2\pi } + (d+1) \log n ), a_1 = \frac{1}{ h A_0 } A_1, \\&a_2 = \frac{1}{ 2h^2 } I_{d \times d} - \frac{A_0}{ 2h^2 } \left( A_2 - A_1 A_1^T \right) ^{-1} \end{aligned}$$

where \(A_0 \equiv \sum _{v \in S_i} { \alpha _v (x) }\) and \(\alpha _v (x) \equiv \text {exp}\left( - \frac{ \Vert \psi (v) - x \Vert ^2 }{ 2 h^2 } \right) \), for \(D(x,v) \equiv \frac{1}{h} ( \psi (v) - x ) \), \(A_1 \equiv \sum _{v \in S_i} { \alpha _v (x) D(x,v) }\), and \(A_2 \equiv \sum _{v \in S_i} { \alpha _v (x) D(x,v) D(x,v)^T }\). It is straightforward to see computing \(a_2\) demands inversion of a \(d\times d\) matrix (\(O(d^3)\)) which needs to be done for every patch hence it is computationally prohibitive.

The KL divergence is a straightforward substitution of Eq. 5. Our estimator for HE is proposed by Poczos et al. [14]. The HE estimator is also based on substitution. The minor adjustment (the term behind the summation in Eq. 6) makes sure that the estimator is unbiased.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Schabdach, J., Wells, W.M., Cho, M., Batmanghelich, K.N. (2017). A Likelihood-Free Approach for Characterizing Heterogeneous Diseases in Large-Scale Studies. In: Niethammer, M., et al. Information Processing in Medical Imaging. IPMI 2017. Lecture Notes in Computer Science(), vol 10265. Springer, Cham. https://doi.org/10.1007/978-3-319-59050-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59050-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59049-3

  • Online ISBN: 978-3-319-59050-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics