Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Interoperation of NLP-Based Systems with Clinical Databases

  • Yves A. LussierEmail author
  • Matthew G. Crowson
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_208-2



Natural language processing (NLP) is the automation of processes to interpret and understand meaning in human communications. In the life sciences, NLP assists in wide-scale storage and retrieval of specific “bundles” of clinical data embedded in patient charts which are commonly “free text”. Both expert-system and statistical based NLPs have been in use in biomedicine for over three decades and some have shown an expert-like level of accuracy [1,3,6]. With the advent of electronic medical records, the sheer amount of data necessitates automated means for proper analysis to aid in patient care and research purposes.

Key Points

NLP commonly relies on indexing/tokenization, which is a process of breaking down text strings into data bundles. These bundles then need to be understood, which can be accomplished by mapping to clinical ontology. These clinical ontologies provide a means of disambiguating and organizing the mapped concepts to permit more efficient computation. See Fig. 1 below.
Fig. 1

NLP system

Once the tokenization process occurs, the data can be stored in a variety of methods. Figure 2a demonstrates how a relational database requires an “unbundling process” to fit into its simplified, tabular storage structure. Although this is an efficient process for both storage and retrieval, data loss occurs as simplifying assumptions are made. As a consequence, during retrieval, queries can only be directed at the level of complexity that is stored. In contrast, Fig. 2b, post-tokenization, the data is not forced into a fixed structure, but rather is stored whole, i.e., in XML databases. XML format databases and ontology-anchoring [5] are important components of modern high-performance NLP systems. The retention of data complexity permits more nuanced and complex queries as the tokenized data can be retrieved in its entirety. This more rigorous model is more computationally intensive. However, recent advancements in processing power have made these arguments moot. The main upside of this model over the “lossy relational database model” is that it facilitates better storage and utilization of high-throughput generated biomedical data as researchers cannot always anticipate the clinical question posed a priori.
Fig. 2

Two models of NLP-directed output


Recommended Reading

  1. 1.
    Chapman WW, Dowling JN, Wagner MM. Classification of emergency department chief complaints into 7 syndromes: a retrospective analysis of 527,228 patients. Ann Emerg Med. 2005; Epub 2005 Jul 1446(5):445–55.CrossRefGoogle Scholar
  2. 2.
    Chen ES, Hripcsak G, Friedman C. Disseminating natural language processed clinical narratives. AMIA annual symposium proceedings. 2006. p. 126–30.Google Scholar
  3. 3.
    Collier N, Nazarenko A, Baud R, Ruch P. Recent advances in natural language processing for biomedical applications. Int J Med Inform. 2006;75(6):413–7.CrossRefGoogle Scholar
  4. 4.
    Friedman C, Hripcsak G, Shagina L, Liu H. Representing information in patient reports using natural language processing and the extensible markup language. J Am Med Inform Assoc. 1999;6(1):76–87.CrossRefGoogle Scholar
  5. 5.
    Friedman C, Shagina L, Lussier Y, Hripcsak G. Auomated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004; Epub 2004 Jun 711(5):392–402.CrossRefGoogle Scholar
  6. 6.
    Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB, Clayton PD. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med. 1995;122(9):681–8.CrossRefGoogle Scholar
  7. 7.
    Johnson SB, Campbell DA, Krauthammer M, Tulipano PK, Medonca EA, Friedman C, Hripcsak G. A native XML database design for clinical document research. AMIA annual symposium Proceedings. 2003. p. 883.Google Scholar

Copyright information

© Springer Science+Business Media LLC 2016

Authors and Affiliations

  1. 1.University of ChicagoChicagoUSA

Section editors and affiliations

  • Vipul Kashyap
    • 1
  1. 1.Director, Clinical ProgramsCIGNA HealthcareBloomfieldUSA