From Texts to Classification Knowledge

Tsumoto, Shusaku; Kimura, Tomohiro; Hirano, Shoji

doi:10.1007/978-3-030-74826-5_15

Shusaku Tsumoto¹⁸,
Tomohiro Kimura¹⁹ &
Shoji Hirano¹⁸

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 623))

Included in the following conference series:

International Conference on Intelligence Science

278 Accesses

Abstract

Hospital information system stores all clinical information, whose major part is electronic patient records written by doctors, nurses and other medical staff. Since records are described by medical experts, they are rich in knowledge about medical decision making. This paper proposes an approach to extract clinical knowledge from the texts of clinical records. The method consists of the following three steps. First, discharge summaries, which include all clinical processes during the hospitalization, are extracted from hospital information system. Second, morphological and correspondence analysis generates a term matrix from text data. Then, finally, machine learning methods are applied to a term matrix in order to acquire classification knowledge. We compared several machine learning methods by using discharge summaries stored in hospital information system. The experimental results show that random forest is the best classifier, compared with deep learning, SVM and decision tree. Furthermore, random forest gains more than 90% classification accuracy.

This research is supported by Grant-in-Aid for Scientific Research (B) 18H03289 from Japan Society for the Promotion of Science (JSPS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The method can also generate \(p (p\ge 3)\)-dimensional coordinates. However, higher dimensional coordinates did not give better performance that the experiments below.
2.
Darch was removed from R package. Please check the githb: https://github.com/maddin79/darch.
3.
The reason why 2-fold is selected is that the estimator of 2-fold cross-validation will give the lowest estimate of parameters, such as accuracy and the estimation of bias will be minimized.
4.
DPC codes are three-level hierarchical system and each DPC code is defined as a tree. The first-level denotes the type of a disease, the second-level gives the primary selected therapy and the third-level shows the additional therapy. Thus, in the tables, characteristics of codes are used to represent similarities.

References

Amisha, P.M., Pathania, M., Rathaur, V.K.: Overview of artificial intelligence in medicine. J. Family Med. Primary Care 8(7), 2328–2331 (2019)
Google Scholar
Ishida, M.: Rmecab. http://rmecab.jp/wiki/index.php?RMeCabFunctions (2016)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004). http://www.jstatsoft.org/v11/i09/
Kim, J.H.: Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53(11), 3735–3745 (2009). https://doi.org/10.1016/j.csda.2009.04.009
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
Mares, M.A., Wang, S., Guo, Y.: Combining multiple feature selection methods and deep learning for high-dimensional data. Trans. Mach. Learn. Data Mining 9, 27–45 (2016)
Google Scholar
Nezhad, M.Z., Zhu, D., Li, X., Yang, K., Levy, P.: SAFS: a deep feature selection approach for precision medicine. CoRR abs/1704.05960 (2017). http://arxiv.org/abs/1704.05960
Persidis, A., Persidis, A.: Medical expert systems: an overview. J. Manage. Med. 5(3), 27–34 (1991). https://doi.org/10.1108/EUM0000000001316
Article Google Scholar
Riaño, D., Wilk, S., ten Teije, A. (eds.): AIME 2019. LNCS (LNAI), vol. 11526. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21642-9
Book Google Scholar
Shortliffe, E.: Medical expert systems-knowledge tools for physicians. W. J. Med. 145(6), 830–839 (1986)
Google Scholar
Therneau, T.M., Atkinson, E.J.: An Introduction to Recursive Partitioning Using the RPART Routines (2015). https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf
Tsumoto, S.: Automated induction of medical expert system rules from clinical databases based on rough set theory. Inf. Sci. 112, 67–84 (1998)
Article Google Scholar
Tsumoto, S., Hirano, S.: Incremental induction of medical diagnostic rules based on incremental sampling scheme and subrule layers. Fundam. Informaticae 127(1–4), 209–223 (2013). https://doi.org/10.3233/FI-2013-905
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). http://www.stats.ox.ac.uk/pub/MASS4, ISBN 0-387-95457-0

Download references

Author information

Authors and Affiliations

Department of Medical Informatics, Faculty of Medicine, Shimane University, Matsue, Japan
Shusaku Tsumoto & Shoji Hirano
Medical Services Division, Faculty of Medicine, Shimane University, Matsue, Japan
Tomohiro Kimura

Authors

Shusaku Tsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Tomohiro Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Shoji Hirano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shusaku Tsumoto .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Zhongzhi Shi
Jadavpur University, Kolkata, India
Mihir Chakraborty
Department of Mathematics, National Institute of Technology Durgapur, Durgapur, India
Samarjit Kar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsumoto, S., Kimura, T., Hirano, S. (2021). From Texts to Classification Knowledge. In: Shi, Z., Chakraborty, M., Kar, S. (eds) Intelligence Science III. ICIS 2021. IFIP Advances in Information and Communication Technology, vol 623. Springer, Cham. https://doi.org/10.1007/978-3-030-74826-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-74826-5_15
Published: 15 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74825-8
Online ISBN: 978-3-030-74826-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)