Abstract
Hospital information system stores all clinical information, whose major part is electronic patient records written by doctors, nurses and other medical staff. Since records are described by medical experts, they are rich in knowledge about medical decision making. This paper proposes an approach to extract clinical knowledge from the texts of clinical records. The method consists of the following three steps. First, discharge summaries, which include all clinical processes during the hospitalization, are extracted from hospital information system. Second, morphological and correspondence analysis generates a term matrix from text data. Then, finally, machine learning methods are applied to a term matrix in order to acquire classification knowledge. We compared several machine learning methods by using discharge summaries stored in hospital information system. The experimental results show that random forest is the best classifier, compared with deep learning, SVM and decision tree. Furthermore, random forest gains more than 90% classification accuracy.
This research is supported by Grant-in-Aid for Scientific Research (B) 18H03289 from Japan Society for the Promotion of Science (JSPS).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The method can also generate \(p (p\ge 3)\)-dimensional coordinates. However, higher dimensional coordinates did not give better performance that the experiments below.
- 2.
Darch was removed from R package. Please check the githb: https://github.com/maddin79/darch.
- 3.
The reason why 2-fold is selected is that the estimator of 2-fold cross-validation will give the lowest estimate of parameters, such as accuracy and the estimation of bias will be minimized.
- 4.
DPC codes are three-level hierarchical system and each DPC code is defined as a tree. The first-level denotes the type of a disease, the second-level gives the primary selected therapy and the third-level shows the additional therapy. Thus, in the tables, characteristics of codes are used to represent similarities.
References
Amisha, P.M., Pathania, M., Rathaur, V.K.: Overview of artificial intelligence in medicine. J. Family Med. Primary Care 8(7), 2328–2331 (2019)
Ishida, M.: Rmecab. http://rmecab.jp/wiki/index.php?RMeCabFunctions (2016)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004). http://www.jstatsoft.org/v11/i09/
Kim, J.H.: Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53(11), 3735–3745 (2009). https://doi.org/10.1016/j.csda.2009.04.009
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
Mares, M.A., Wang, S., Guo, Y.: Combining multiple feature selection methods and deep learning for high-dimensional data. Trans. Mach. Learn. Data Mining 9, 27–45 (2016)
Nezhad, M.Z., Zhu, D., Li, X., Yang, K., Levy, P.: SAFS: a deep feature selection approach for precision medicine. CoRR abs/1704.05960 (2017). http://arxiv.org/abs/1704.05960
Persidis, A., Persidis, A.: Medical expert systems: an overview. J. Manage. Med. 5(3), 27–34 (1991). https://doi.org/10.1108/EUM0000000001316
Riaño, D., Wilk, S., ten Teije, A. (eds.): AIME 2019. LNCS (LNAI), vol. 11526. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21642-9
Shortliffe, E.: Medical expert systems-knowledge tools for physicians. W. J. Med. 145(6), 830–839 (1986)
Therneau, T.M., Atkinson, E.J.: An Introduction to Recursive Partitioning Using the RPART Routines (2015). https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf
Tsumoto, S.: Automated induction of medical expert system rules from clinical databases based on rough set theory. Inf. Sci. 112, 67–84 (1998)
Tsumoto, S., Hirano, S.: Incremental induction of medical diagnostic rules based on incremental sampling scheme and subrule layers. Fundam. Informaticae 127(1–4), 209–223 (2013). https://doi.org/10.3233/FI-2013-905
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). http://www.stats.ox.ac.uk/pub/MASS4, ISBN 0-387-95457-0
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Tsumoto, S., Kimura, T., Hirano, S. (2021). From Texts to Classification Knowledge. In: Shi, Z., Chakraborty, M., Kar, S. (eds) Intelligence Science III. ICIS 2021. IFIP Advances in Information and Communication Technology, vol 623. Springer, Cham. https://doi.org/10.1007/978-3-030-74826-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-74826-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74825-8
Online ISBN: 978-3-030-74826-5
eBook Packages: Computer ScienceComputer Science (R0)