Skip to main content

Advertisement

Log in

longSil: an Evaluation Metric to Assess Quality of Clustering Longitudinal Clinical Data

  • Research Article
  • Published:
Journal of Healthcare Informatics Research Aims and scope Submit manuscript

Abstract

Longitudinal disease subtyping is an important problem within the broader scope of computational phenotyping. In this article, we discuss several data-driven unsupervised disease subtyping methods to obtain disease subtypes from longitudinal clinical data. The methods are analyzed in the context of chronic kidney disease, one of the leading health problems, both in the USA and worldwide. To provide a quantitative comparison of the different methods, we propose a novel evaluation metric that measures the cluster tightness and degree of separation between the various clusters produced by each method. Comparative results for two significantly large clinical datasets are provided, along with key insights that are possible due to the proposed evaluation metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Baytas IM, Xiao C, Zhang X, Wang F, Jain AK, Zhou J (2017) Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 65–74

  2. Choi E, Bahadori MT, Searles E, Coffey C, Thompson M, Bost J, Tejedor-Sojo J, Sun J (2016) Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1495–1504

  3. Choi E, Xiao C, Stewart W, Sun J (2018) Mime: multilevel medical embedding of electronic health records for predictive healthcare. In: Advances in neural information processing systems, pp 4547–4557

  4. Hripcsak G, Albers DJ (2012) Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 20(1):117–121

    Article  Google Scholar 

  5. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

    Article  Google Scholar 

  6. Johnson AE, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) MIMIC-III, a freely accessible critical care database. Sci Data, 3

  7. Lasko TA, Denny JC, Levy MA (2013) Correction: computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS one 8:8

    Article  Google Scholar 

  8. Levey AS, Stevens LA, Schmid CH, Zhang YL, Castro AF, Feldman HI, Kusek JW, Eggers P, Van Lente F, Greene T et al (2009) A new equation to estimate glomerular filtration rate. Ann Intern Med 150(9):604–612

    Article  Google Scholar 

  9. Liao TW (2005) Clustering of time series data - a survey. Pattern Recogn 38 (11):1857–1874

    Article  Google Scholar 

  10. Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: 2010 IEEE international conference on data mining. IEEE, pp 911–916

  11. Luong DTA, Chandola V (2017) A k-means approach to clustering disease progressions. In: 2017 IEEE International conference on healthcare informatics (ICHI), pp 268–274

  12. Luong DTA, Chandola V (2019) Learning deep representations from clinical data for chronic kidney disease. In: 2019 IEEE International conference on healthcare informatics (ICHI)

  13. Luong DTA, Tran D, Pace WD, Dickinson M, Vassalotti J, Carroll J, Withiam-Leitch M, Yang M, Satchidanand N, Staton E et al (2017) Extracting deep phenotypes for chronic kidney disease using electronic health records. eGEMs (Generating Evidence & Methods to improve patient outcomes), 5

  14. National Kidney Foundation (2002) K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification, vol 39

  15. Pace WD, Fox C, White T, Graham D, Schilling LM, David R (2014) The DARTNet institute: seeking a sustainable support mechanism for electronic data enabled research networks. eGEMs (Generating Evidence & Methods to improve patient outcomes) 2(2):6

    Article  Google Scholar 

  16. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M et al (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1(1):18

    Article  Google Scholar 

  17. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  18. Schulam P, Arora R (2016) Disease trajectory maps. In: Advances in neural information processing systems, pp 4709–4717

  19. Schulam P, Saria S (2015) A framework for individualizing predictions of disease trajectories by exploiting multi-resolution structure. In: Advances in neural information processing systems, pp 748–756

  20. Schulam P, Wigley F, Saria S (2015) Clustering longitudinal clinical marker trajectories from electronic health data: applications to phenotyping and endotype discovery. In: AAAI, pp 2956–2964

  21. Singh P, Chandola V, Fox C (2017) Automatic extraction of deep phenotypes for precision medicine in chronic kidney disease. In: Proceedings of the 2017 international conference on digital health. ACM, pp 195–199

  22. Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duc Thanh Anh Luong.

Ethics declarations

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luong, D.T.A., Singh, P., Ramezani, M. et al. longSil: an Evaluation Metric to Assess Quality of Clustering Longitudinal Clinical Data. J Healthc Inform Res 3, 441–459 (2019). https://doi.org/10.1007/s41666-019-00058-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41666-019-00058-z

Keywords

Navigation