Natural Language Processing in Health Care and Biomedicine

Friedman, Carol; Elhadad, Noémie

doi:10.1007/978-1-4471-4474-8_8

Carol Friedman PhD³ &
Noémie Elhadad PhD³

134k Accesses
22 Citations

Abstract

After reading this chapter, you should know the answers to these questions:

Why is natural language processing important?
What are the potential uses for natural language processing (NLP) in the biomedical and health domains?
What forms of knowledge are used in NLP?
What are the principal techniques of NLP?
What are challenges for NLP in the clinical, biological, and health consumer domains?

This chapter is adapted from an earlier version in the third edition authored by Carol Friedman and Stephen B. Johnson.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 99.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Unless stated otherwise, the general domain and the topics of text materials discussed in this chapter refer to biomedicine and health.
2.
http://www.ncbi.nlm.nih.gov/pmc (Accessed 4/26/13).
3.
http://www.dbmi.pitt.edu/nlpfront. (Accessed 4/26/13).
4.
orbit.nlm.nih.gov (Accessed 4/19/13).
5.
www.nltk.org (Accessed 4/18/13).
6.
www.alias-i.com/lingpipe/ (Accessed 4/18/13).
7.
http://incubator.apache.org/opennlp/ (Accessed 4/19/13).
8.
http://uima.apache.org/index.html (Accessed 4/19/13).

References

Aronow, D.B., Cooley, J.R., & Soderland, S. (1995). Automated identification of episodes of asthma exacerbation for quality measurement in a computer-based medical record. Proceedings of the Annual Symposium on Computer Applications in Medical Care, 309–313.
Google Scholar
Aronow, D., Feng, F., & Croft, W. B. (1999). Ad hoc classification of radiology reports. Journal of the American Medical Informatics Association: JAMIA, 6(5), 343–411.
Article Google Scholar
Aronson, A.R. (2001). Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. Proceedings of AMIA Symposium, 17–21.
Google Scholar
Baud, R., Rassinoux, A. M., & Sherrer, J. R. (1992). Natural language processing and semantical representation of medical texts. Methods of Information in Medicine, 31, 117–125.
PubMed CAS Google Scholar
Baud, R., Lovis, C., Rassinoux, A. M., Michel, P. A., & Scherrer, J. R. (1998). Automatic extraction of linguistic knowledge from an international classification. Studies in Health Technology and Informatics, 52(Pt 1), 581–585.
PubMed Google Scholar
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L. (2003) GenBank. Nucleic Acids Research, 31(1):23–27. Available at http://www.ncbi.nlm.nih.gov
Google Scholar
Bishop, C. (2007). Pattern recognition and machine learning. New York: Springer.
Google Scholar
Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT, & the members of the Mouse Genome Database Group. (2003). MGD: The mouse genome database. Nucleic Acids Research, 31, 193–195. Available at http://www.informatics.jax.org/
Google Scholar
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., & Schneider, M. (2003). The SWISS-PROT protein knowledgebase and its supplement TrEMBL. Nucleic Acids Research. 31, 365–370. Available at http://us.expasy.org/sprot/
Cao, Y., Liu, F., Simpson, P., Antieau, L., Bennett, A., Cimino, J. J., Ely, J., & Yu, H. (2011). AskHERMES: An online question answering system for complex clinical questions. Journal of Biomedical Informatics, 44(2), 277–288.
Article PubMed Google Scholar
Caporaso, J. G., Deshpande, N., Fink, J. L., Bourne, P. E., Cohen, K. B., & Hunter, L. (2008). Intrinsic evaluation of text mining tools may not predict performance on realistic tasks. Proceedings of the Pacific Symposium Biocomputing, 13, 640–651.
Google Scholar
Chapman, W. C., Dowling, J. N., & Wagner, M. M. (2004). Fever detection from free-text clinical records for biosurveillance. Journal of Biomedical Informatics, 37(2), 120–127.
Article PubMed Google Scholar
Christensen, L., Haug, P., & Fiszman, P. (2002). MPLUS: a probabilistic medical language understanding system. Proceedings of the ACL BioNLP, 29–36.
Google Scholar
Chuang, J.H., Friedman, C., & Hripcsak, G. (2002). A comparison of the charlson comorbidities derived from medical language processing and administrative data. Proceedings of the AMIA Symposium, 160–164.
Google Scholar
Deleger, L., Merkel, M., & Zweigenbaum, P. (2009). Translating medical terminologies through word alignment in parallel text corpora. Journal of Biomedical Informatics, 42(4), 692–701.
Article PubMed Google Scholar
Demner-Fushman, D., & Lin, J. (2007). Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1), 63–103.
Article Google Scholar
Elhadad, N. (2006). Comprehending technical texts: predicting and defining unfamiliar terms. Proceedings AMIA Symposium, 239–243.
Google Scholar
Elhadad, N., Kan, M. Y., Klavans, J. L., & McKeown, K. R. (2005). Customization in a unified framework for summarizing medical literature. Artificial Intelligence in Medicine, 33(2), 179–198.
Article PubMed CAS Google Scholar
Evans, D. A., Cimino, J. J., Hersh, J. J., Huff, S. M., & Bell, D. S. (1994). Toward a medical-concept representation language. The Canon Group. Journal of the American Medical Informatics Association: JAMIA, 1(3), 207–217.
Article PubMed CAS Google Scholar
Eysenbach, G., & Till, J. E. (2001). Ethical issues in qualitative research on internet communities. BMJ, 323(7321), 1103–1105.
Article PubMed CAS Google Scholar
Friedman, C., Alderson, P. O., Austin, J., Cimino, J. J., & Johnson, S. B. (1994). A general natural language text processor for clinical radiology. Journal of the American Medical Informatics Association: JAMIA, 1(2), 161–174.
Article PubMed CAS Google Scholar
Friedman, C., Huff, S. M., Hersh, W. R., Pattison-Gordon, E., & Cimino, J. J. (1995). The canon group’s effort: Working toward a merged model. Journal of the American Medical Informatics Association: JAMIA, 2(1), 4–18.
Article PubMed CAS Google Scholar
Friedman, C., Kra, P., Krauthammer, M., Yu, H., & Rzhetsky, A. (2001). GENIES: A natural-langauge processing system for the extraction of molecular pathways from journal articles. Bioinformatics, 17(suppl), S74–S82.
Article PubMed Google Scholar
Friedman, C., Shagina, L., Lussier, Y., & Hripcsak, G. (2004). Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association: JAMIA, 11(5), 392–402.
Article PubMed Google Scholar
Fukuda, K., Tamura, A., Tsunoda, T., & Takagi, T. (1998). Toward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing, 707–718.
Google Scholar
Grishman, R., & Kittredge, R. (Eds.). (1986). Analyzing language in restricted domains: Sublanguage description and processing. Hillsdale: Erlbaum Associates.
Google Scholar
Grishman, R., Sager, N., Raze, C., & Bookchin, B. (1973). The linguistic string parser. Proceedings of the National Computer Conference, 42, 427–434.
Google Scholar
Grosz, B., Joshi, A., & Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 2(21), 203–225.
Google Scholar
Hahn, U., Romacker, M., & Schulz, S. (1999). Discourse structures in medical reports – watch out! the generation of referentially coherent and valid text knowledge bases in the MEDSYNDIKATE system. International Journal of Medical Informatics, 53(1), 1–28.
Article PubMed CAS Google Scholar
Hahn, U., Romacker, M., & Schulz, S. (2002). MEDSYNDIKATE: A natural language system for the extraction of medical information from finding reports. International Journal of Medical Informatics, 67(1/3), 63–74.
Article PubMed Google Scholar
Harkema, H., Dowling, J. N., Thornblad, T., & Chapman, W. W. (2009). ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 42(5), 839–851.
Article PubMed Google Scholar
Harris, Z. (1991). A theory of language and information – a mathematical approach. New York: Oxford University Press.
Google Scholar
Harris, Z., Gottfried, M., Ryckman, T., Mattick, P., Daladier, A., Harris, T., & Harris, S. (1989). The form of information in science – analysis of an immunology sublanguage. Dordrecht: Kluwer Academic.
Google Scholar
Haug, P. J., Ranum, D. L., & Frederick, P. R. (1990). Computerized extraction of coded findings from free-text radiology reports. Radiology, 174, 543–548.
PubMed CAS Google Scholar
Haug, P., Koehler, S., Lau, L.M., Wang, P., Rocha, R., & Huff, S. (1994). A natural language understanding system combining syntactic and semantic techniques. Proceedings of the Annual Symposium on Computer Applications in Medical Care, 247–251.
Google Scholar
Hirschman, L., Yeh, A., Blaschke, C., & Valencia, A. (2005). Overview of BioCreAtIvE: Critical assessement of information extraction for biology. BMC Bioinformatics, 6(Suppl 1), S1.
Article PubMed Google Scholar
Hobbs, J. R., Appelt, D. E., Bear, J., Israel, D., Kameyama, M., Stickel, M., et al. (1996). FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In Finite state devices for natural language processing. Cambridge, MA: MIT Press.
Google Scholar
Hripcsak, G., Friedman, C., Alderson, P. O., DuMouchel, W., Johnson, S. B., & Clayton, P. D. (1995). Unlocking data from narrative reports: A study of natural language processing. Annals of Internal Medicine, 122(9), 681–688.
Article PubMed CAS Google Scholar
Humphreys, K., Demetriou, G., & Gaizauskas, R. (2000). Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Proceedings of the Pacific Symposium on Biocomputing, 505–516.
Google Scholar
Issel-Tarver, L., Christie, K.R., Dolinski, K., Andrada, R., Balakrishnan, R., Ball, C.A., Binkley, G., Dong, S., Dwight, S.S., Fisk, D.G., Harris, M., Schroeder, M., Sethuraman, A., Tse, K., Weng, S., Botstein, D., & Cherry, J.M. (2001). Saccharomyces genome database. Methods Enzymol, 350, 329–346. Available at http://www.yeastgenome.org/
Google Scholar
Jenssen, T.K., & Vinterbo, S. (2000). A set-covering approach to specific search for literature about human genes. Proceedings of the AMIA Symposium, 384–388.
Google Scholar
Jordan, D. A., McKeown, K. R., Concepcion, K. J., Feiner, S. K., & Hatzivassiloglou, V. (2001). Generation and evaluation of intraoperative inferences for automated health care briefings on patient status after bypass surgery. Journal of the American Medical Informatics Association: JAMIA, 8(3), 267–280.
Article PubMed CAS Google Scholar
Jurafsky, D., & Martin, J. H. (2009). Speech and language processing. An introduction to natural language processing, computational linguistics and speech recognition. Upper Saddle River: Prentice Hall.
Google Scholar
Keselman, A., Tse, T., Crowell, J., Browne, A., Ngo, L., & Zeng, Q. (2007). Assessing consumer health vocabulary familiarity: An exploratory study. Journal of Medical Internet Research, 9(1), e5.
Article PubMed Google Scholar
Kim, J. D., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIA corpus – semantically annotated corpus for bio-textmining. Bioinformatics, 19(suppl 1), i180–i182.
Article PubMed Google Scholar
Kittredge, R., & Lehrberger, J. (Eds.). (1982). Sublanguage – studies of language in restricted semantic domains. New York: De Gruyter.
Google Scholar
Lindberg, D. A. B., Humphreys, B. L., & McCray, A. T. (1993). The unified medical language system. Methods of Information in Medicine, 32, 281–291.
PubMed CAS Google Scholar
Manning, C., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Google Scholar
Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313–330.
Google Scholar
Maroto, M., Reshef, R., Munsterberg, A. E., Koester, S., Goulding, M., & Lassar, A. B. (1997). Ectopic Pax-3 activates MyoD and Myf-5 expression in embryonic mesoderm and neural tissue. Cell, 89, 139–148.
Article PubMed CAS Google Scholar
Minsky, M. (1975). A framework for representing knowledge. In P. H. Wintson (Ed.), The psychology of computer vision. New York: McGraw-Hill.
Google Scholar
Mutalik, P. G., Deshpande, A., & Nadkarni, P. M. (2001). Use of general-purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Informatics Association: JAMIA, 8(6), 598–609.
Article PubMed CAS Google Scholar
Nadkarni, P., Chen, R., & Brandt, C. (2001). UMLS concept indexing for production databases: A feasibility study. Journal of the American Medical Informatics Association: JAMIA, 8(1), 80–91.
Article PubMed CAS Google Scholar
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–105.
Article Google Scholar
Park, J. C., Kim, H. S., & Kim, J. J. (2001). Bidirectional incremental parsing for automatic pathway identification with combinatory categorial grammar. Proceedings of the Pacific Symposium on Biomcomputing, 6, 396–407.
Google Scholar
Pestian, J.P., & Matykiewicz, P. (2008). Classification of suicide notes using natural language processing. Proceedings of the ACL BioNLP, 96–97.
Google Scholar
Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. Proceedings of the Workshop on BioNLP, 97–104
Google Scholar
Rindflesch, T.C., Tanabe, L., Weinstein, J.N., & Hunter, L. (2000). EDGAR: extraction of drugs, genes and relations from the biomedical literature. Proceedings of the Pacific Symposium Biocomputing, 517–528.
Google Scholar
Saeed, M., Lieu, C., Raber, G., & Mark, R. G. (2002). MIMIC II: A massive temporal ICU patient database to support research in intelligent patient monitoring. Computers in Cardiology, 29, 641–644.
PubMed CAS Google Scholar
Sager, N. (1972). Syntactic formatting of science information. Proceedings of the AFIPS (pp. 791–800). In Kittredge, R., &Lehrberger, J., (Eds.), Reprinted in Sublanguage: Studies of language in restricted semantic domains (pp. 9–26). Berlin (1982): Walter de Gruyter.
Google Scholar
Sager, N. (1978). Natural language information formatting: The automatic conversion of texts to a structured data base. In M. C. Yovits (Ed.), Advances in computers (Vol. 17, pp. 89–162). New York: Academic Press.
Google Scholar
Sager, N. (1981). Natural language information processing: A computer grammer of english and its applications. Reading: Addison-Wesley.
Google Scholar
Sager, N., Friedman, C., & Lyman, M. (1987). Medical language processing – computer management of narrative data. Reading: Addison-Wesley.
Google Scholar
Sekimizu, T., Park, H. S., & Tsujii, J. (1998). Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. Genome Informatics Ser Workshop on Genome Informatics, 9, 62–71.
CAS Google Scholar
Spyns, P. (1996). Natural language processing in medicine: An overview. Methods of Information in Medicine, 35, 285–301.
PubMed CAS Google Scholar
The FlyBase Consortium. (2003). The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Research, 31, 172–175. Available at http://flybase.org/
Google Scholar
The Gene Ontology Consortium. (2003). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 25–29. Available at http://www.geneontology.org/
Google Scholar
Uzuner, O. (2009). Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association: JAMIA, 16(4), 561–570.
Article PubMed Google Scholar
Uzuner, O., Goldstein, I., Luo, Y., & Kohane, I. (2008). Identifying patient smoking status from medical discharge records. Journal of the American Medical Informatics Association: JAMIA, 15(1), 14–24.
Article PubMed Google Scholar
Uzuner, O., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association: JAMIA, 17(5), 514–518.
Article PubMed Google Scholar
Uzuner, O., South, B. R., Shen, S., & Duvall, S. L. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association: JAMIA, 18(5), 552–556.
Article PubMed Google Scholar
Vincze, V., Szarvas, G., Farkas, R., Mora, G., & Csirik, J. (2008). The BioScope corpus: Biomedical texts annotated for uncertainty, negation, and their scopes. BMC Bioinformatics, 9(S11), S9.
Article PubMed Google Scholar
Wang, X., Hripcsak, G., Markatou, M., & Friedman, C. (2009a). Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: A feasibility study. Journal of the American Medical Informatics Association: JAMIA, 16(3), 328–337.
Article PubMed Google Scholar
Wang, Y., Xiao, J., Suzek, T. O., et al. (2009b). PubChem: A public information system for analyzing bioactivities of small molecules. Nucleic Acids Research, 37, W623–W633.
Article PubMed CAS Google Scholar
Weeber, M., Mork, J., & Aronson, A. (2001). Developing a test collection for biomedical word sense disambiguation. Proceedings of the AMIA Symposium, 746–750.
Google Scholar
Weizenbaum, J. (1966). A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.
Article Google Scholar
Winograd, T. (1972). Understanding natural language. Cognitive Psychology, 3(1), 1–191.
Article Google Scholar
Woods, W. (1973). Progress in NLU – an application to lunar geology. Proceeding of AFIPS, 42, 441–450.
Google Scholar
Yakushiji, A., Tateisi, Y., Miyao, Y., & Tsujii, J. (2001). Event extraction from biomedical papers using a full parser. Proceedings of the Pacific Symposium Biocomputing, 6, 408–419.
Google Scholar
Zhang, H., Fiszman, M., Shin, D., Miller, C. M., Rosemblat, G., & Rindflesch, T. C. (2011). Degree centrality for semantic abstraction summarization of therapeutic studies. Journal of Biomedical Informatics, 44(5), 830–838.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biomedical Informatics, Columbia University, 622 West 168th Street, VC Bldg 5, New York, 10032, NY, USA
Carol Friedman PhD & Noémie Elhadad PhD

Authors

Carol Friedman PhD
View author publications
You can also search for this author in PubMed Google Scholar
Noémie Elhadad PhD
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carol Friedman PhD .

Editor information

Editors and Affiliations

Departments of Biomedical Informatics at Columbia University and Arizona State University, New York, New York, USA
Edward H. Shortliffe
Bethesda, Maryland, USA
James J. Cimino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Friedman, C., Elhadad, N. (2014). Natural Language Processing in Health Care and Biomedicine. In: Shortliffe, E., Cimino, J. (eds) Biomedical Informatics. Springer, London. https://doi.org/10.1007/978-1-4471-4474-8_8

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4474-8_8
Published: 07 November 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4473-1
Online ISBN: 978-1-4471-4474-8
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics