Cluster Analysis of Obesity Disease Based on Comorbidities Extracted from Clinical Notes

  • Ruth Reátegui
  • Sylvie Ratté
  • Estefanía Bautista-Valarezo
  • Víctor DuqueEmail author
Transactional Processing Systems
Part of the following topical collections:
  1. Health Information Systems & Technologies


Clinical notes provide a comprehensive and overall impression of the patient’s health. However, the automatic extraction of information within these notes is challenging due to their narrative style. In this context, our goal was to identify clusters of patients based on fourteen comorbidities related to obesity, automatically extracted with the cTAKES tool from the i2b2 Obesity Challenge data. Furthermore, results were compared with clusters obtained from experts’ annotated data. The sparse K-means algorithms were used in both experiment at two levels: at the first level, three clusters were found, and at the second, new clusters were found by applying the same algorithm to each of the clusters from the former level. The results show that three types of clusters could be identified based on the number of comorbidities and the percentage of patients suffering from them. Diabetes, hypercholesterolemia, atherosclerotic cardiovascular diseases, congestive heart failure, obstructive sleep apnea, and depression were the diseases with the highest weights contributing to the cluster distribution.


Obesity Clinical notes cTAKES Cluster analysis 


Compliance with Ethical Standards

Conflicts of Interest

The authors declare they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.


  1. 1.
    Bukhanov, N., Balakhontceva, M., Krikunov, A., Sabirov, A., Semakova, A., Zvartau, N., and Konradi, A., Clustering of comorbidities based on conditional probabilities of diseases in hypertensive patients. Proc. Comput. Sci. 108:2478–2487, 2017. Scholar
  2. 2.
    Shivade, C., Raghavan, P., Fosler-Lussier, E., Embi, P. J., Elhadad, N., Johnson, S. B., and Lai, A. M., A review of approaches to identifying patient phenotype cohorts using electronic health records. JAMIA 21(2):221–230, 2014. Scholar
  3. 3.
    National Library of Medicine (US), UMLS® Reference Manual, 2009. Accessed 20 Mar 2018.
  4. 4.
    National Library of Medicine (US), Overview of SNOMED CT, 2016. Accessed 20 Mar 2018.
  5. 5.
    Chen, C.-Z., Wang, L.-Y., Ou, C.-Y., Lee, C.-H., Lin, C.-C., and Hsiue, T.-R., Using cluster analysis to identify phenotypes and validation of mortality in men with COPD. Lung 192(6):889–896, 2014. Scholar
  6. 6.
    Bourdin, A., Molinari, N., Vachier, I., Varrin, M., Marin, G., Gamez, A.-S., Paganin, F., and Chanez, P., Prognostic value of cluster analysis of severe asthma phenotypes. J. Allerg. Clin. Immunol. 134(5):1043–1050, 2014. Scholar
  7. 7.
    Rocha, A., and Rocha, B., Adopting nursing health record standards. Inform. Health Soc. Care 39(1):1–14, 2014. Scholar
  8. 8.
    van der Esch, M., Knoop, J., van der Leeden, M., Roorda, L. D., Lems, W. F., Knol, D. L., and Dekker, J., Clinical phenotypes in patients with knee osteoarthritis: A study in the Amsterdam osteoarthritis cohort. Osteoarthr. Cartil. 23(4):544–549, 2015. Scholar
  9. 9.
    Vavougios, G. D., Natsios, G., Pastaka, C., Zarogiannis, S. G., and Gourgoulianis, K. I., Phenotypes of comorbidity in OSAS patients: Combining categorical principal component analysis with cluster analysis. J. Sleep Res. 25(1):31–38, 2016. Scholar
  10. 10.
    Joosten, S. A., Hamza, K., Sands, S., Turton, A., Berger, P., and Hamilton, G., Phenotypes of patients with mild to moderate obstructive sleep apnoea as confirmed by cluster analysis. Respirology 17(1):99–107, 2012. Scholar
  11. 11.
    Figueroa, R. L., and Flores, C. A., Extracting information from electronic medical records to identify the obesity status of a patient based on comorbidities and bodyweight measures. J. Med. Syst. 40(8):1–9, 2016.CrossRefGoogle Scholar
  12. 12.
    Serrano-Pariente, J., Rodrigo, G., Fiz, J. A., Crespo, A., Plaza, V., and High Risk Asthma Res G, Identification and characterization of near-fatal asthma phenotypes by cluster analysis. Allergy 70(9):1139–1147, 2015. Scholar
  13. 13.
    Ahmad, T., Pencina, M. J., Schulte, P. J., O'Brien, E., Whellan, D. J., Pina, I. L., Kitzman, D. W., Lee, K. L., O'Connor, C. M., and Felker, G. M., Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J. Am. Coll. Cardiol. 64(17):1765–1774, 2014. Scholar
  14. 14.
    Poirier, P., Giles, T. D., Bray, G. A., Hong, Y., Stern, J. S., Pi-Sunyer, F. X., and Eckel, R. H., Obesity and cardiovascular disease: Pathophysiology, evaluation, and effect of weight loss. Arterioscler. Thromb. Vasc. Biol. 26(5):968–976, 2006. Scholar
  15. 15.
    Guh, D. P., Zhang, W., Bansback, N., Amarsi, Z., Birmingham, C. L., and Anis, A. H., The incidence of co-morbidities related to obesity and overweight: A systematic review and meta-analysis. BMC Pub. Health 9:1–20, 2009. Scholar
  16. 16.
    Foster, M. C., Hwang, S. J., Larson, M. G., Lichtman, J. H., Parikh, N. I., Vasan, R. S., Levy, D., and Fox, C. S., Overweight, obesity, and the development of stage 3 CKD: The Framingham heart study. Am. J. Kidney Dis. : Off. J. Natl. Kidney Found 52(1):39–48, 2008. Scholar
  17. 17.
    Sutherland, E. R., Goleva, E., King, T. S., Lehman, E., Stevens, A. D., Jackson, L. P., Stream, A. R., Fahy, J. V., Leung, D. Y. M., and Asthma Clin Res, N., Cluster analysis of obesity and Asthma phenotypes. Plos One 7(5):1–7, 2012. Scholar
  18. 18.
    Laing, S. T., Smulevitz, B., Vatcheva, K. P., Rahbar, M. H., Reininger, B., McPherson, D. D., McCormick, J. B., and Fisher-Hoch, S. P., Subclinical atherosclerosis and obesity phenotypes among Mexican Americans. J. Am. Heart Assoc. 4(3):e001540, 2015. Scholar
  19. 19.
    LaGrotte, C., Fernandez-Mendoza, J., Calhoun, S. L., Liao, D., Bixler, E. O., and Vgontzas, A. N.., The relative association of obstructive sleep apnea, obesity, and excessive daytime sleepiness with incident depression: A longitudinal, population-based study. Int. J. Obes.:1–8, 2016. doi:
  20. 20.
    Uzuner, Ö., Recognizing obesity and comorbidities in sparse data. JAMIA 16(4):561–570, 2009.PubMedGoogle Scholar
  21. 21.
    Reategui, R., and Ratte, S., Comparison of MetaMap and cTAKES for entity extraction in clinical notes. BMC Med. Inform. Dec. Mak. 18(Suppl 3):74, 2018. Scholar
  22. 22.
    Witten, D. M., and Tibshirani, R., A framework for feature selection in clustering. J. Am. Stat. Assoc. 105(490):713–726, 2010. Scholar
  23. 23.
    Tibshirani, R., Walther, G., and Hastie, T., Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. B 63:411–423, 2001. Scholar
  24. 24.
    Bruce, S. G., Riediger, N. D., Zacharias, J. M., and Young, T. K., Obesity and obesity-related comorbidities in a Canadian first nation population. Prevent. Chron. Dis. 8(1):A03, 2011.Google Scholar
  25. 25.
    Willett, W. C., Dietz, W. H., and Colditz, G. A., Guidelines for healthy weight. N. Engl. J. Med. 341(6):427–434, 1999. Scholar
  26. 26.
    Leslie, W. S., Hankey, C. R., and Lean, M. E. J., Weight gain as an adverse effect of some commonly prescribed drugs: A systematic review. Qjm-Int J. Med. 100(7):395–404, 2007. Scholar
  27. 27.
    Peppard, P. E., Young, T., Barnet, J. H., Palta, M., Hagen, E., and Hla, K. M., Increased prevalence of sleep-disordered breathing in adults. Am. J. Epidemiol. 177(9):1006–1014, 2013. Scholar
  28. 28.
    Wolf, J., Lewicka, J., and Narkiewicz, K., Obstructive sleep apnea: An update on mechanisms and cardiovascular consequences. Nutr. Metab. Cardiovas. 17(3):233–240, 2007. Scholar
  29. 29.
    Canto, J. G., Kiefe, C. I., Rogers, W. J., Peterson, E. D., Frederick, P. D., French, W. J., Gibson, C. M., Pollack, C. V., Ornato, J. P., Zalenski, R. J., Penney, J., Tiefenbrunn, A. J., Greenland, P., and Investigators, N., Number of coronary heart disease risk factors and mortality in patients with first myocardial infarction. Jama J. Am. Med. Assoc. 306(19):2120–2127, 2011. Scholar
  30. 30.
    Mamudu, H. M., Paul, T. K., Wang, L., Veeranki, S. P., Panchal, H. B., Alamian, A., Sarnosky, K., and Budoff, M., The effects of multiple coronary artery disease risk factors on subclinical atherosclerosis in a rural population in the United States. Prevent. Med. 88:140–146, 2016. Scholar
  31. 31.
    Kramer, C. K., Zinman, B., and Retnakaran, R., Are metabolically healthy overweight and obesity benign conditions?: A systematic review and meta-analysis. Ann. Intern. Med. 159(11):758–769, 2013. Scholar
  32. 32.
    Dixon, J. B., Dixon, M. E., and O'Brien, P. E., Depression in association with severe obesity - Changes with weight loss. Arch. Intern. Med. 163(17):2058–2065, 2003. Scholar
  33. 33.
    Roberts, R. E., Deleger, S., Strawbridge, W. J., and Kaplan, G. A., Prospective association between obesity and depression: Evidence from the Alameda County study. Int. J. Obes. 27(4):514–521, 2003. Scholar
  34. 34.
    Luppino, F. S., de Wit, L. M., Bouvy, P. F., Stijnen, T., Cuijpers, P., Penninx, B. W., and Zitman, F. G., Overweight, obesity, and depression: A systematic review and meta-analysis of longitudinal studies. Arch. Gen. Psychiat. 67(3):220–229, 2010. Scholar
  35. 35.
    Gao, Y. H., Zhao, H. S., Zhang, F. R., Gao, Y., Shen, P., Chen, R. C., and Zhang, G. J., The relationship between depression and Asthma: A meta-analysis of prospective studies. Plos One 10(7):1–12, 2015. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.École de Technologie SupérieureMontrealCanada
  2. 2.Universidad Técnica Particular de LojaLojaEcuador

Personalised recommendations