Skip to main content

From Texts to Classification Knowledge

  • Conference paper
  • First Online:
Intelligence Science III (ICIS 2021)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 623))

Included in the following conference series:

  • 278 Accesses

Abstract

Hospital information system stores all clinical information, whose major part is electronic patient records written by doctors, nurses and other medical staff. Since records are described by medical experts, they are rich in knowledge about medical decision making. This paper proposes an approach to extract clinical knowledge from the texts of clinical records. The method consists of the following three steps. First, discharge summaries, which include all clinical processes during the hospitalization, are extracted from hospital information system. Second, morphological and correspondence analysis generates a term matrix from text data. Then, finally, machine learning methods are applied to a term matrix in order to acquire classification knowledge. We compared several machine learning methods by using discharge summaries stored in hospital information system. The experimental results show that random forest is the best classifier, compared with deep learning, SVM and decision tree. Furthermore, random forest gains more than 90% classification accuracy.

This research is supported by Grant-in-Aid for Scientific Research (B) 18H03289 from Japan Society for the Promotion of Science (JSPS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The method can also generate \(p (p\ge 3)\)-dimensional coordinates. However, higher dimensional coordinates did not give better performance that the experiments below.

  2. 2.

    Darch was removed from R package. Please check the githb: https://github.com/maddin79/darch.

  3. 3.

    The reason why 2-fold is selected is that the estimator of 2-fold cross-validation will give the lowest estimate of parameters, such as accuracy and the estimation of bias will be minimized.

  4. 4.

    DPC codes are three-level hierarchical system and each DPC code is defined as a tree. The first-level denotes the type of a disease, the second-level gives the primary selected therapy and the third-level shows the additional therapy. Thus, in the tables, characteristics of codes are used to represent similarities.

References

  1. Amisha, P.M., Pathania, M., Rathaur, V.K.: Overview of artificial intelligence in medicine. J. Family Med. Primary Care 8(7), 2328–2331 (2019)

    Google Scholar 

  2. Ishida, M.: Rmecab. http://rmecab.jp/wiki/index.php?RMeCabFunctions (2016)

  3. Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004). http://www.jstatsoft.org/v11/i09/

  4. Kim, J.H.: Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53(11), 3735–3745 (2009). https://doi.org/10.1016/j.csda.2009.04.009

  5. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/

  6. Mares, M.A., Wang, S., Guo, Y.: Combining multiple feature selection methods and deep learning for high-dimensional data. Trans. Mach. Learn. Data Mining 9, 27–45 (2016)

    Google Scholar 

  7. Nezhad, M.Z., Zhu, D., Li, X., Yang, K., Levy, P.: SAFS: a deep feature selection approach for precision medicine. CoRR abs/1704.05960 (2017). http://arxiv.org/abs/1704.05960

  8. Persidis, A., Persidis, A.: Medical expert systems: an overview. J. Manage. Med. 5(3), 27–34 (1991). https://doi.org/10.1108/EUM0000000001316

    Article  Google Scholar 

  9. Riaño, D., Wilk, S., ten Teije, A. (eds.): AIME 2019. LNCS (LNAI), vol. 11526. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21642-9

    Book  Google Scholar 

  10. Shortliffe, E.: Medical expert systems-knowledge tools for physicians. W. J. Med. 145(6), 830–839 (1986)

    Google Scholar 

  11. Therneau, T.M., Atkinson, E.J.: An Introduction to Recursive Partitioning Using the RPART Routines (2015). https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf

  12. Tsumoto, S.: Automated induction of medical expert system rules from clinical databases based on rough set theory. Inf. Sci. 112, 67–84 (1998)

    Article  Google Scholar 

  13. Tsumoto, S., Hirano, S.: Incremental induction of medical diagnostic rules based on incremental sampling scheme and subrule layers. Fundam. Informaticae 127(1–4), 209–223 (2013). https://doi.org/10.3233/FI-2013-905

  14. Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). http://www.stats.ox.ac.uk/pub/MASS4, ISBN 0-387-95457-0

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shusaku Tsumoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tsumoto, S., Kimura, T., Hirano, S. (2021). From Texts to Classification Knowledge. In: Shi, Z., Chakraborty, M., Kar, S. (eds) Intelligence Science III. ICIS 2021. IFIP Advances in Information and Communication Technology, vol 623. Springer, Cham. https://doi.org/10.1007/978-3-030-74826-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74826-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74825-8

  • Online ISBN: 978-3-030-74826-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics