Automatic Text Classification for Label Imputation of Medical Diagnosis Notes Based on Random Forest

Yang, Bokai; Dai, Guangzhe; Yang, Yujie; Tang, Darong; Li, Qi; Lin, Denan; Zheng, Jing; Cai, Yunpeng

doi:10.1007/978-3-030-01078-2_8

Bokai Yang^19,20,
Guangzhe Dai¹⁹,
Yujie Yang¹⁹,
Darong Tang¹⁹,
Qi Li¹⁹,
Denan Lin²¹,
Jing Zheng²¹ &
…
Yunpeng Cai¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11148))

Included in the following conference series:

International Conference on Health Information Science

908 Accesses
6 Citations

Abstract

Electronic medical records (EMRs) contain many information of patients, which are of great value for data mining for various clinical applications. However, information missing, including label missing, is pervasive in nature EMRs which would bring lots of obstacles for processing of the medical text contents. The aim of this study is to adopt automatic text classification technologies to recover missing medical text labels for EMRs and support downstream analyses. A combination of word-embedding technology and random forest classifiers are applied to identify multiple medical note labels including disease types and examination types, from short texts of medical imaging diagnosis notes. The results show that the average binary classification accuracies are 91%. Our research results indicate that using advanced NLP techniques for EMRs can reach high classification accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

From Texts to Classification Knowledge

A tree based approach for multi-class classification of surgical procedures using structured and unstructured data

Article Open access 23 November 2021

Text Extraction from Electronic Health Records for Predicting the Patient Diabetics Level by Machine Learning

References

Gunter, T.D., Terry, N.P.: The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. J. Med. Internet Res. 7(1), e3 (2005)
Article Google Scholar
Dong, X., Qian, L., Guan, Y., et al.: A multiclass classification method based on deep learning for named entity recognition in electronic medical records. In: Scientific Data Summit, pp. 1–10. IEEE (2016)
Google Scholar
Mujtaba, G., Shuib, L., Raj, R.G., et al.: Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection. PLoS ONE 12(2), e0170242 (2017)
Article Google Scholar
Li, M., Fei, Z., Zeng, M., et al.: Automated ICD-9 coding via a deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinform. PP(99), 1 (2018)
Google Scholar
Rajkomar, A., Oren, E., Chen, K., et al.: Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1(1), 18 (2018)
Google Scholar
Rios, A., Kavuluru, R.: Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. In: ACM BCB 2015, pp. 258–267 (2015)
Google Scholar
Kooi, T., Litjens, G., Ginneken, B.V., et al.: Large scale deep learning for computer aided detection of mammographic lesions. Med. Image Anal. 35, 303 (2016)
Article Google Scholar
Roth, H.R., Lu, L., Seff, A., et al.: A new 2.5D representation for lymph node detection using random sets of deep convolutional neural network observations. Med. Image Comput. Comput. Assist. Interv. 17(1), 520–527 (2014)
Google Scholar
Ypsilantis, P.P., Siddique, M., Sohn, H.M., et al.: Predicting response to neoadjuvant chemotherapy with pet imaging using convolutional neural networks. PLoS ONE 10(9), e0137036 (2015)
Article Google Scholar
. 31(3), 32–40 (2017)
Google Scholar
Yu, D., Deng, L.: Feature representation learning in deep neural networks (2015)
Google Scholar
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. Comput. Sci. (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)
Google Scholar
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. Eprint Arxiv (2014)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Cutler, A., Cutler, D.R., Stevens, J.R.: Random forests. Mach. Learn. 45(1), 157–176 (2004)
Google Scholar
Zimmerman, N., Presto, A.A., Kumar, S.P.N., et al.: A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos. Meas. Tech. 11(1), 291–313 (2018)
Article Google Scholar
Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. In: International Conference on Weblogs and Social Media, ICWSM 2009, San Jose, California, USA. DBLP, May 2009
Google Scholar
Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. In: International Joint Conference on Artificial Intelligence, pp. 1130–1135. Morgan Kaufmann Publishers Inc. (2005)
Google Scholar
Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inform. 53, 196–207 (2015)
Article Google Scholar
Peng, K.H., Liou, L.H., Chang, C.S., et al.: Predicting personality traits of Chinese users based on Facebook wall posts. In: Wireless and Optical Communication Conference, pp. 9–14. IEEE (2015)
Google Scholar
Saif, H., Fernandez, M., He, Y., et al.: On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: The International Conference on Language Resources and Evaluation (2014)
Google Scholar
Liu, Y., Ge, T., Mathews, K., et al.: Exploiting task-oriented resources to learn word embeddings for clinical abbreviation expansion. In: BioNLP 2015, pp. 92–97 (2015)
Google Scholar
Rong, X.: word2vec Parameter Learning Explained. Comput. Sci. (2014)
Google Scholar

Download references

Acknowledgment

The work is supported in part by the NSFC funding 11471313, the Science and Technology Planning Project of Guangdong Province (2015B010129012) and the Shenzhen Basic Research Funding JCYJ20150630114942270.

Author information

Authors and Affiliations

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
Bokai Yang, Guangzhe Dai, Yujie Yang, Darong Tang, Qi Li & Yunpeng Cai
Northeastern University, Shenyang, 110819, China
Bokai Yang
Shenzhen Medical Information Center, Shenzhen, 518000, China
Denan Lin & Jing Zheng

Authors

Bokai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guangzhe Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Darong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Li
View author publications
You can also search for this author in PubMed Google Scholar
Denan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yunpeng Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jing Zheng or Yunpeng Cai .

Editor information

Editors and Affiliations

Victoria University, Footscray, VIC, Australia
Siuly Siuly
James Cook University, Cairns, QLD, Australia
Ickjai Lee
Vrije University of Amsterdam, Amsterdam, The Netherlands
Zhisheng Huang
Swinburne University of Technology, Hawthorn, VIC, Australia
Rui Zhou
Victoria University, Footscray, VIC, Australia
Hua Wang
James Cook University, Cairns, QLD, Australia
Wei Xiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, B. et al. (2018). Automatic Text Classification for Label Imputation of Medical Diagnosis Notes Based on Random Forest. In: Siuly, S., Lee, I., Huang, Z., Zhou, R., Wang, H., Xiang, W. (eds) Health Information Science. HIS 2018. Lecture Notes in Computer Science(), vol 11148. Springer, Cham. https://doi.org/10.1007/978-3-030-01078-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-01078-2_8
Published: 22 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01077-5
Online ISBN: 978-3-030-01078-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Text Classification for Label Imputation of Medical Diagnosis Notes Based on Random Forest

Abstract

Access this chapter