Chapter

Interactive Knowledge Discovery and Data Mining in Biomedical Informatics

Volume 8401 of the series Lecture Notes in Computer Science pp 271-300

Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges

  • Andreas HolzingerAffiliated withResearch Unit Human-Computer Interaction, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz
  • , Johannes SchantlAffiliated withResearch Unit Human-Computer Interaction, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz
  • , Miriam SchroettnerAffiliated withResearch Unit Human-Computer Interaction, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz
  • , Christin SeifertAffiliated withMedia Informatics, University of Passau
  • , Karin VerspoorAffiliated withDepartment of Computing & Information Systems, University of MelbourneHealth and Biomedical Informatics Centre, University of Melbourne

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making – neither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.

Keywords

Text Mining Natural Language Processing Unstructured Information Big Data Knowledge Discovery Statistical Models Text Classification LSA PLSA LDA hLDA PCA SVM