Skip to main content

Advertisement

Log in

Extracting Information from Electronic Medical Records to Identify the Obesity Status of a Patient Based on Comorbidities and Bodyweight Measures

  • Patient Facing Systems
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Obesity is a chronic disease with an increasing impact on the world’s population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Asthma, atherosclerotic cardiovascular disease, congestive heart failure, depression, diabetes mellitus, gallstones/cholecystectomy, gastroesophageal reflux disease, gout, hypercholesterolemia, hypertension, hypertriglyceridemia, obstructive sleep apnea, osteoarthritis, peripheral vascular disease, and venous insufficiency.

  2. Words were changed to lower case and non-alphanumeric characters and stop words (e.g., a, the, on, etc.) were removed.

  3. QT-designer is a Qt tool to design and build graphical user using widgets. http://doc.qt.io/qt-5/qtdesigner-manual.html

  4. Weka is an open source software for data mining tasks. http://www.cs.waikato.ac.nz/ml/weka/

References

  1. Atalah, E., Epidemiología de la obesidad en Chile. Revista Médica Clínica las Condes. 23(2):117–123, 2012.

    Article  Google Scholar 

  2. Curtis, M., The obesity epidemic in the Pacific Islands. Journal of Development and Social Transformation. 1:37–42, 2004.

    Google Scholar 

  3. Markowitz, S., Friedman, M. A., and Arent, S. M., Understanding the relation between obesity and depression: causal mechanisms and implications for treatment. Clin. Psychol. Sci. Pract. 15(1):1–20, 2008.

    Article  Google Scholar 

  4. Ergün, U., The classification of obesity disease in logistic regression and neural network methods. J. Med. Syst.. 33(1):67–72, 2009.

    Article  PubMed  Google Scholar 

  5. Guh, D. P., Zhang, W., Bansback, N., Amarsi, Z., Birmingham, C. L., and Anis, A. H., The incidence of co-morbidities related to obesity and overweight: a systematic review and meta-analysis. BMC Publ. Health. 9-88, 2009.

  6. Crawford, A. G., Cote, C., Couto, J., Daskiran, M., Gunnarsson, C., Haas, K., Haas, S., Nigam, S. C., and Schuette, R., Prevalence of obesity, type II diabetes mellitus, hyperlipidemia, and hypertension in the United States: findings from the GE Centricity Electronic Medical Record database. Popul. Health Manag. 13(3):151–161, 2010.

    Article  PubMed  Google Scholar 

  7. Wood, G. C., Chu, X., Manney, C., Strodel, W., Petrick, A., Gabrielsen, J., Seiler, J., Carey, D., Argyropoulos, G., Benotti, P., Still, C. D., and Gerhard, G. S., An electronic health record-enabled obesity database. BMC Med. Inform. Decis. Mak. 12(1):1–8, 2012.

    Article  Google Scholar 

  8. Ayash, C. R., Simon, S. R., Marshall, R., Kasper, J., Chomitz, V., Hacker, K., Kleinman, K. P., and Taveras, E. M., Evaluating the impact of point- of-care decision support tools in improving diagnosis of obese children in primary care. Obesity. 21(3):576–582, 2013.

    Article  PubMed  Google Scholar 

  9. Smith, A. J., Skow, A., Bodurtha, J., and Kinra, S., Health information technology in screening and treatment of child obesity: a systematic review. Pediatric. 131(3):e894–e902, 2013.

    Article  Google Scholar 

  10. Cochran, J., and Baus, A., Developing interventions for overweight and obese children using electronic health records data. On-line Journal Of Nursing Informatics. 19(1):1–9, 2015.

    Google Scholar 

  11. Heydari, S. T., Ayatollahi, S. M., and Zare, N., Comparison of artificial neural networks with logistic regression for detection of obesity. J. Med. Syst. 36(4):2449–2454, 2012.

    Article  PubMed  Google Scholar 

  12. Kuebler, M., Yom-Tov, E., Pelleg, D., Puhl, R., and Muennig, P., When overweight is the normal weight: an examination of obesity using a social media internet database. PLoS ONE. 8(9):1–8, 2013.

    Article  Google Scholar 

  13. Bordowitz, R., Morland, K., and Reich, D., The use of an electronic medical record to improve documentation and treatment of obesity. Fam. Med. 39(4):274–279, 2007.

    PubMed  Google Scholar 

  14. Uzuner, Ö., Recognizing obesity and comorbidities in sparse data. J. Am. Med. Inform. Assoc. 16(4):561–570, 2009.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Yang, H., Spasic, I., Keane, J. A., and Nenadic, G., A text mining approach to the prediction of disease status from clinical discharge summaries. J. Am. Med. Inform. Assoc. 16(4):596–600, 2009.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Solt, I., Tikk, D., Gál, V., and Kardkovács, Z. T., Semantic classification of diseases in discharge summaries using a context-aware rule-based classifier. J. Am. Med. Inform. Assoc. 16(4):580–584, 2009.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Murtaugh, M. A., Gibson, B. S., Redd, D., and Zeng-Treitler, Q., Regular expression-based learning to extract bodyweight values from clinical notes. J. Biomed. Inform. 54:186–190, 2015.

    Article  PubMed  Google Scholar 

  18. NIH, NOEI, NHLBI, NAASO, The practical guide identification, evaluation, and treatment of overweight and obesity in adults, NIH Publication Number 0O-4084, 2000

  19. Date, R. S., Walton, S. J., Ryan, N., Rahman, S. N., and Henley, N. C., Is selection bias toward super obese patients in the rationing of metabolic surgery justified?—A pilot study from the United Kingdom. Surg. Obes. Relat. Dis. 9(6):981–986, 2013.

    Article  PubMed  Google Scholar 

  20. Viera, A. J., and Garrett, J. M., Understanding interobserver agreement: the kappa statistic. Fam. Med. 37(5):360–363, 2005.

    PubMed  Google Scholar 

  21. Amrita, M., Performance analysis of different feature selection methods in intrusion detection. International Journal of Scientific & Technology Research. 2(6):225–231, 2013.

    Google Scholar 

  22. Joachims, T., Learning to classify text using support vector machines. Vol. 1. New York: Engineering and Computer Sciences, 2002.

  23. Gebrekidan, B., Zampieri, M., Wittenburg, P. T. H., Improving native language with TF-IDF weighing, Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, Georgia, 2013.

  24. Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., API design for machine learning software: experiences from the scikit-learn project, Paper presented at the European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases, 2013.

  25. Rennie, J. D. M., and Rifkin, R., Improving multiclass text classification with the support vector machine. Cambridge: Massachusetts Institute Oftechnology, MIT, 2001.

    Google Scholar 

  26. Vanwinckelen, G., Blockeel, H., On estimating model accuracy with repeated cross-validation, 21st Belgian-Dutch Conference on Machine Learning, 2012.

  27. Witten, I. H., Frank, E., Hall M. A., Data mining: practical machine learning tools and techniques, Third Edition, Series in Data Management Systems, Morgan Kaufmann, 2011.

Download references

Acknowledgments

The authors want to thank the informatics division of the Hospital Guillermo Grant Benavente (HGGB), Universidad de Concepcion, and Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT Grant N°11121463) for their support for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher A. Flores.

Additional information

This article is part of the Topical Collection on Patient Facing Systems

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Figueroa, R.L., Flores, C.A. Extracting Information from Electronic Medical Records to Identify the Obesity Status of a Patient Based on Comorbidities and Bodyweight Measures. J Med Syst 40, 191 (2016). https://doi.org/10.1007/s10916-016-0548-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-016-0548-8

Keywords

Navigation