Skip to main content
Log in

Ontology-Based Information Extraction: Identifying Eligible Patients for Clinical Trials in Neurology

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

In this paper, we present a case study in using ontologies within a system for identifying patients who are eligible for clinical trials. The main purpose of this clinical research data warehouse (CRDW) is to support patient recruitment based on routine data from the clinical information system. In contrast to most other systems for similar purposes, the CRDW also makes use of information extracted from clinical documents like admission reports, radiological findings and discharge letters. The so-called linguistic pipeline of the CDRW recognizes negated and coordinated phrases. It is supported by clinical application ontologies, which enable the identification of main terms and their properties, as well as semantic search with synonyms, hypernyms, and syntactic variants. In the paper, we discuss questions related to designing the ontologies and filling them with content. The CRDW is currently being tested at several departments of the Charité—Universitätsmedizin Berlin and the Vivantes—Netzwerk für Gesundheit GmbH. In the article, we provide a thorough evaluation of the deployed systems based on real data related to clinical trials conducted by our neurology departments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bodenreider O (2004) The unified medical language system (umls): integrating biomedical terminology. Nucl Acids Res 32:267–270 (Database-Issue)

    Article  Google Scholar 

  2. Brinkman RR, Courtot M, Derom D, Fostel JM, He Y, Lord P, Malone J, Parkinson H, Peters B, Rocca-Serra P et al (2010) Modeling biomedical experimental processes with obi. J Biomed Semant 1(Suppl 1):S7

    Article  Google Scholar 

  3. Broekstra J, Kampman A, Van Harmelen F (2002) Sesame: a generic architecture for storing and querying rdf and rdf schema. In: The Semantic Web-ISWC 2002, Springer, Berlin, p 54–68

  4. Browne P (2009) Jboss drools business rules. From technologies to solutions, Packt Publishing Limited, India, http://books.google.de/books?id=aweMMi7PttwC

  5. Chinchuluun A, Pardalos P, Migdalas A, Pitsoulis L (2008) Pareto optimality, game theory and equilibria, vol 17. Springer, New York

  6. Cowie J, Wilks Y (2000) Information extraction. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. CRC Press, Boca Raton, Florida, USA, pp 241–260

  7. Cunningham H, Tablan V, Roberts A, Bontcheva K (2013) Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput Biol 9(2):e1002854 doi:10.1371/journal.pcbi.1002854

  8. Dugas M, Lange M, Berdel W, Müller-Tidow C (2008) Workflow to improve patient recruitment for clinical trials within hospital information systems: a case-study. Trials 9(1):2

    Article  Google Scholar 

  9. Gallaire H, Minker J, Nicolas JM (1984) Logic and databases: a deductive approach. ACM Comput Surv 16(2):153–185

    Article  MATH  MathSciNet  Google Scholar 

  10. Glock J, Herold R, Pommerening K (2006) Personal identifiers in medical research networks: evaluation of the personal identifier generator in the competence network paediatric oncology and haematology. Submitted for publiction

  11. Graubner B (2007) ICD und OPS. Bundesgesundheitsblatt-Gesundheitsforschung-Gesundheitsschutz 50(7):932–943

    Article  Google Scholar 

  12. Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M (2004) SWRL: a semantic web rule language combining OWL and RuleML, vol 21. W3C Member submission. W3C, p 79. http://www.w3.org/Submission/SWRL/.

  13. Hripcsak G, Ludemann P, Pryor TA, Wigertz OB, Clayton PD (1994) Rationale for the Arden syntax. Comput Biomed Res 27(4):291–324

    Article  Google Scholar 

  14. Hussain S, Ouagne D, Sadou E, Dart T, Jaulent MC, Vloed BD, Colaert D, Daniel C (2012) EHR4CR: a semantic web based interoperability approach for reusing electronic healthcare records in protocol feasibility studies. In: Paschke A, Burger A, Romano P, Marshall MS, Splendiani A (eds) SWAT4LS, CEUR-WS.org, CEUR Workshop Proceedings, vol 952

  15. Jurafsky D, Martin JH (2008) Speech and language processing, 2nd edn., Series in artificial intelligence. Prentice Hall, Upper Saddle River, New Jersey, USA

  16. Kifer M (2008) Rule interchange format: the framework. Web reasoning and rule systems. In: Calvanese D, Lausen G (eds) LNCS, vol 5341. Springer, Berlin Heidelberg, Germany, pp 1–11

  17. Kifer M, Lausen G, Wu J (1995) Logical foundations of object-oriented and frame-based languages. J ACM 42(4):741–843

    Article  MATH  MathSciNet  Google Scholar 

  18. Lloyd JW (1987) Foundations of logic programming, 2nd edn. Springer-Verlag New York, Inc. New York, USA

  19. Lo O, Fan L, Buchanan WJ, Thuemmler C (2012) Technical evaluation of an e-health platform. In: Macedo M (ed) IADIS E-Health, IADIS, pp 21–28. ISBN 978-972-8939-70-0

  20. Marwede D, Schulz T, Kahn T (2008) Indexing thoracic CT reports using a preliminary version of a standardized radiological lexicon (radlex). J Digit Imag 21(4):363–370

    Google Scholar 

  21. Miles A, Pérez-Agüera JR (2007) SKOS: simple knowledge organisation for the web. Catal Class Quarterly 43(3–4):69–83

    Google Scholar 

  22. Müller F (2005) A finite-state approach to shallow parsing and grammatical functions annotation of german. PhD thesis, University of Tubingen, Tübingen, Germany

  23. Murphy SN, Mendis ME, Berkowitz DA, Chueh IKH (2006) Integration of clinical and genetic data in the i2b2 architecture. In: AMIA Annu Symp Proc, (2009)

  24. Polleres A (2007) From SPARQL to rules (and back). In: Zurko ME, Patel-Schneider PF, Shenoy PJ, Williamson CL (eds.) ACM, USA, p 787–796

  25. Rector A, Rogers J, Zanstra P, Van Der Haring E (2003) OpenGALEN: open source medical terminology and tools. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol 2003, p 982

  26. Reeve L (2005) Survey of semantic annotation platforms. In: Proceedings of the 2005 ACM Symposium on Applied Computing, ACM Press, USA, p 1634–1638

  27. Rogers FB (1963) Medical subject headings. Bull Med Libr Assoc 51:114–116

    Google Scholar 

  28. Rosse C, Mejino J (2003) A reference ontology for biomedical informatics: the foundational model of anatomy. J Biomed Inform 36:478–500

    Article  Google Scholar 

  29. Ruch P, Gobeill J, Lovis C, Geissbuhler A (2008) Automatic medical encoding with SNOMED categories. BMC Med Inform Dec Making 8:6

    Article  Google Scholar 

  30. Russell SJ, Norvig P (2003) Artificial intelligence: a modern approach, 2nd edn. Prentic Hall, Upper Saddle River, New Jersey, USA

  31. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Schuler KK, Chute CG (2010) Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. JAMIA 17(5):507–513

    Google Scholar 

  32. Scheitz JF, Mochmann HC, Nolte CH, Haeusler KG, Audebert HJ, Heuschmann PU, Laufs U, Witzenbichler B, Schultheiss HP, Endres M (2011) Troponin elevation in acute ischemic stroke (TRELAS)—protocol of a prospective observational trial. M BMC Neurol 11:98

    Article  Google Scholar 

  33. Scheitz JF, Mochmann HC, Fiebach BWB, Audebert HJ, Nolte CH (2012) J Neurol 259(1):188–190

    Google Scholar 

  34. Shvaiko P, Euzenat J (2013) Ontology matching: state of the art and future challenges. Knowledge and Data Engineering, IEEE Transactions on , vol 25, issue 1. IEEE Press, Piscataway, NJ, USA

  35. Staab S, Studer R (2009) Handbook on ontologies, 2nd edn. Springer, Berlin Heidelberg, Germany

  36. Szarvas G, Farkas R, Busa-Fekete R (2007) Research paper: state-of-the-art anonymization of medical records using an iterative machine learning framework. JAMIA 14(5):574–580

    Google Scholar 

  37. Todorov K, Geibel P, Kuhnberger KU (2010) Mining concept similarities for heterogeneous ontologies. In: Perner P (ed) Advances in data mining. Applications and theoretical aspects, vol 6171. Springer, Berlin, pp 86–100

    Chapter  Google Scholar 

  38. Wimalasuriya DC, Dou D (2010) Ontology-based information extraction: an introduction and a survey of current approaches. J Info Sci 36(3):306–323

    Article  Google Scholar 

  39. World Health Organization (2004) ICD-10: International statistical classification of diseases and related health problems. World Health Organization, Geneva, Switzerland

  40. Yu L (2011) A developers guide the semantic web. Springer, Berlin Heidelberg, Germany

Download references

Acknowledgments

The project was partially funded by TSB Technologiestiftung Berlin, Zukunftsfonds Berlin, and co-financed by the European Union—European Fund for Regional Development. We would like to thank all computer scientists, linguists, ontologists, physicians, study nurses, and administrative staff of all three partners, who participated in the organization of the project and development of the software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Geibel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geibel, P., Trautwein, M., Erdur, H. et al. Ontology-Based Information Extraction: Identifying Eligible Patients for Clinical Trials in Neurology. J Data Semant 4, 133–147 (2015). https://doi.org/10.1007/s13740-014-0037-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-014-0037-5

Keywords

Navigation