Abstract
Obesity is a chronic disease with an increasing impact on the world’s population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.
Similar content being viewed by others
Notes
Asthma, atherosclerotic cardiovascular disease, congestive heart failure, depression, diabetes mellitus, gallstones/cholecystectomy, gastroesophageal reflux disease, gout, hypercholesterolemia, hypertension, hypertriglyceridemia, obstructive sleep apnea, osteoarthritis, peripheral vascular disease, and venous insufficiency.
Words were changed to lower case and non-alphanumeric characters and stop words (e.g., a, the, on, etc.) were removed.
QT-designer is a Qt tool to design and build graphical user using widgets. http://doc.qt.io/qt-5/qtdesigner-manual.html
Weka is an open source software for data mining tasks. http://www.cs.waikato.ac.nz/ml/weka/
References
Atalah, E., Epidemiología de la obesidad en Chile. Revista Médica Clínica las Condes. 23(2):117–123, 2012.
Curtis, M., The obesity epidemic in the Pacific Islands. Journal of Development and Social Transformation. 1:37–42, 2004.
Markowitz, S., Friedman, M. A., and Arent, S. M., Understanding the relation between obesity and depression: causal mechanisms and implications for treatment. Clin. Psychol. Sci. Pract. 15(1):1–20, 2008.
Ergün, U., The classification of obesity disease in logistic regression and neural network methods. J. Med. Syst.. 33(1):67–72, 2009.
Guh, D. P., Zhang, W., Bansback, N., Amarsi, Z., Birmingham, C. L., and Anis, A. H., The incidence of co-morbidities related to obesity and overweight: a systematic review and meta-analysis. BMC Publ. Health. 9-88, 2009.
Crawford, A. G., Cote, C., Couto, J., Daskiran, M., Gunnarsson, C., Haas, K., Haas, S., Nigam, S. C., and Schuette, R., Prevalence of obesity, type II diabetes mellitus, hyperlipidemia, and hypertension in the United States: findings from the GE Centricity Electronic Medical Record database. Popul. Health Manag. 13(3):151–161, 2010.
Wood, G. C., Chu, X., Manney, C., Strodel, W., Petrick, A., Gabrielsen, J., Seiler, J., Carey, D., Argyropoulos, G., Benotti, P., Still, C. D., and Gerhard, G. S., An electronic health record-enabled obesity database. BMC Med. Inform. Decis. Mak. 12(1):1–8, 2012.
Ayash, C. R., Simon, S. R., Marshall, R., Kasper, J., Chomitz, V., Hacker, K., Kleinman, K. P., and Taveras, E. M., Evaluating the impact of point- of-care decision support tools in improving diagnosis of obese children in primary care. Obesity. 21(3):576–582, 2013.
Smith, A. J., Skow, A., Bodurtha, J., and Kinra, S., Health information technology in screening and treatment of child obesity: a systematic review. Pediatric. 131(3):e894–e902, 2013.
Cochran, J., and Baus, A., Developing interventions for overweight and obese children using electronic health records data. On-line Journal Of Nursing Informatics. 19(1):1–9, 2015.
Heydari, S. T., Ayatollahi, S. M., and Zare, N., Comparison of artificial neural networks with logistic regression for detection of obesity. J. Med. Syst. 36(4):2449–2454, 2012.
Kuebler, M., Yom-Tov, E., Pelleg, D., Puhl, R., and Muennig, P., When overweight is the normal weight: an examination of obesity using a social media internet database. PLoS ONE. 8(9):1–8, 2013.
Bordowitz, R., Morland, K., and Reich, D., The use of an electronic medical record to improve documentation and treatment of obesity. Fam. Med. 39(4):274–279, 2007.
Uzuner, Ö., Recognizing obesity and comorbidities in sparse data. J. Am. Med. Inform. Assoc. 16(4):561–570, 2009.
Yang, H., Spasic, I., Keane, J. A., and Nenadic, G., A text mining approach to the prediction of disease status from clinical discharge summaries. J. Am. Med. Inform. Assoc. 16(4):596–600, 2009.
Solt, I., Tikk, D., Gál, V., and Kardkovács, Z. T., Semantic classification of diseases in discharge summaries using a context-aware rule-based classifier. J. Am. Med. Inform. Assoc. 16(4):580–584, 2009.
Murtaugh, M. A., Gibson, B. S., Redd, D., and Zeng-Treitler, Q., Regular expression-based learning to extract bodyweight values from clinical notes. J. Biomed. Inform. 54:186–190, 2015.
NIH, NOEI, NHLBI, NAASO, The practical guide identification, evaluation, and treatment of overweight and obesity in adults, NIH Publication Number 0O-4084, 2000
Date, R. S., Walton, S. J., Ryan, N., Rahman, S. N., and Henley, N. C., Is selection bias toward super obese patients in the rationing of metabolic surgery justified?—A pilot study from the United Kingdom. Surg. Obes. Relat. Dis. 9(6):981–986, 2013.
Viera, A. J., and Garrett, J. M., Understanding interobserver agreement: the kappa statistic. Fam. Med. 37(5):360–363, 2005.
Amrita, M., Performance analysis of different feature selection methods in intrusion detection. International Journal of Scientific & Technology Research. 2(6):225–231, 2013.
Joachims, T., Learning to classify text using support vector machines. Vol. 1. New York: Engineering and Computer Sciences, 2002.
Gebrekidan, B., Zampieri, M., Wittenburg, P. T. H., Improving native language with TF-IDF weighing, Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, Georgia, 2013.
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., API design for machine learning software: experiences from the scikit-learn project, Paper presented at the European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases, 2013.
Rennie, J. D. M., and Rifkin, R., Improving multiclass text classification with the support vector machine. Cambridge: Massachusetts Institute Oftechnology, MIT, 2001.
Vanwinckelen, G., Blockeel, H., On estimating model accuracy with repeated cross-validation, 21st Belgian-Dutch Conference on Machine Learning, 2012.
Witten, I. H., Frank, E., Hall M. A., Data mining: practical machine learning tools and techniques, Third Edition, Series in Data Management Systems, Morgan Kaufmann, 2011.
Acknowledgments
The authors want to thank the informatics division of the Hospital Guillermo Grant Benavente (HGGB), Universidad de Concepcion, and Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT Grant N°11121463) for their support for this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Patient Facing Systems
Rights and permissions
About this article
Cite this article
Figueroa, R.L., Flores, C.A. Extracting Information from Electronic Medical Records to Identify the Obesity Status of a Patient Based on Comorbidities and Bodyweight Measures. J Med Syst 40, 191 (2016). https://doi.org/10.1007/s10916-016-0548-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-016-0548-8