Abstract
After reading this chapter, you should know the answers to these questions:
-
Why is natural language processing important?
-
What are the potential uses for natural language processing (NLP) in the biomedical and health domains?
-
What forms of knowledge are used in NLP?
-
What are the principal techniques of NLP?
-
What are challenges for NLP in the clinical, biological, and health consumer domains?
This chapter is adapted from an earlier version in the third edition authored by Carol Friedman and Stephen B. Johnson.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Unless stated otherwise, the general domain and the topics of text materials discussed in this chapter refer to biomedicine and health.
- 2.
http://www.ncbi.nlm.nih.gov/pmc (Accessed 4/26/13).
- 3.
http://www.dbmi.pitt.edu/nlpfront. (Accessed 4/26/13).
- 4.
orbit.nlm.nih.gov (Accessed 4/19/13).
- 5.
www.nltk.org (Accessed 4/18/13).
- 6.
www.alias-i.com/lingpipe/ (Accessed 4/18/13).
- 7.
http://incubator.apache.org/opennlp/ (Accessed 4/19/13).
- 8.
http://uima.apache.org/index.html (Accessed 4/19/13).
References
Aronow, D.B., Cooley, J.R., & Soderland, S. (1995). Automated identification of episodes of asthma exacerbation for quality measurement in a computer-based medical record. Proceedings of the Annual Symposium on Computer Applications in Medical Care, 309ā313.
Aronow, D., Feng, F., & Croft, W. B. (1999). Ad hoc classification of radiology reports. Journal of the American Medical Informatics Association: JAMIA, 6(5), 343ā411.
Aronson, A.R. (2001). Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. Proceedings of AMIA Symposium, 17ā21.
Baud, R., Rassinoux, A. M., & Sherrer, J. R. (1992). Natural language processing and semantical representation of medical texts. Methods of Information in Medicine, 31, 117ā125.
Baud, R., Lovis, C., Rassinoux, A. M., Michel, P. A., & Scherrer, J. R. (1998). Automatic extraction of linguistic knowledge from an international classification. Studies in Health Technology and Informatics, 52(Pt 1), 581ā585.
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L. (2003) GenBank. Nucleic Acids Research, 31(1):23ā27. Available at http://www.ncbi.nlm.nih.gov
Bishop, C. (2007). Pattern recognition and machine learning. New York: Springer.
Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT, & the members of the Mouse Genome Database Group. (2003). MGD: The mouse genome database. Nucleic Acids Research, 31, 193ā195. Available at http://www.informatics.jax.org/
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., OāDonovan, C., Phan, I., Pilbout, S., & Schneider, M. (2003). The SWISS-PROT protein knowledgebase and its supplement TrEMBL. Nucleic Acids Research. 31, 365ā370. Available at http://us.expasy.org/sprot/
Cao, Y., Liu, F., Simpson, P., Antieau, L., Bennett, A., Cimino, J. J., Ely, J., & Yu, H. (2011). AskHERMES: An online question answering system for complex clinical questions. Journal of Biomedical Informatics, 44(2), 277ā288.
Caporaso, J. G., Deshpande, N., Fink, J. L., Bourne, P. E., Cohen, K. B., & Hunter, L. (2008). Intrinsic evaluation of text mining tools may not predict performance on realistic tasks. Proceedings of the Pacific Symposium Biocomputing, 13, 640ā651.
Chapman, W. C., Dowling, J. N., & Wagner, M. M. (2004). Fever detection from free-text clinical records for biosurveillance. Journal of Biomedical Informatics, 37(2), 120ā127.
Christensen, L., Haug, P., & Fiszman, P. (2002). MPLUS: a probabilistic medical language understanding system. Proceedings of the ACL BioNLP, 29ā36.
Chuang, J.H., Friedman, C., & Hripcsak, G. (2002). A comparison of the charlson comorbidities derived from medical language processing and administrative data. Proceedings of the AMIA Symposium, 160ā164.
Deleger, L., Merkel, M., & Zweigenbaum, P. (2009). Translating medical terminologies through word alignment in parallel text corpora. Journal of Biomedical Informatics, 42(4), 692ā701.
Demner-Fushman, D., & Lin, J. (2007). Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1), 63ā103.
Elhadad, N. (2006). Comprehending technical texts: predicting and defining unfamiliar terms. Proceedings AMIA Symposium, 239ā243.
Elhadad, N., Kan, M. Y., Klavans, J. L., & McKeown, K. R. (2005). Customization in a unified framework for summarizing medical literature. Artificial Intelligence in Medicine, 33(2), 179ā198.
Evans, D. A., Cimino, J. J., Hersh, J. J., Huff, S. M., & Bell, D. S. (1994). Toward a medical-concept representation language. The Canon Group. Journal of the American Medical Informatics Association: JAMIA, 1(3), 207ā217.
Eysenbach, G., & Till, J. E. (2001). Ethical issues in qualitative research on internet communities. BMJ, 323(7321), 1103ā1105.
Friedman, C., Alderson, P. O., Austin, J., Cimino, J. J., & Johnson, S. B. (1994). A general natural language text processor for clinical radiology. Journal of the American Medical Informatics Association: JAMIA, 1(2), 161ā174.
Friedman, C., Huff, S. M., Hersh, W. R., Pattison-Gordon, E., & Cimino, J. J. (1995). The canon groupās effort: Working toward a merged model. Journal of the American Medical Informatics Association: JAMIA, 2(1), 4ā18.
Friedman, C., Kra, P., Krauthammer, M., Yu, H., & Rzhetsky, A. (2001). GENIES: A natural-langauge processing system for the extraction of molecular pathways from journal articles. Bioinformatics, 17(suppl), S74āS82.
Friedman, C., Shagina, L., Lussier, Y., & Hripcsak, G. (2004). Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association: JAMIA, 11(5), 392ā402.
Fukuda, K., Tamura, A., Tsunoda, T., & Takagi, T. (1998). Toward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing, 707ā718.
Grishman, R., & Kittredge, R. (Eds.). (1986). Analyzing language in restricted domains: Sublanguage description and processing. Hillsdale: Erlbaum Associates.
Grishman, R., Sager, N., Raze, C., & Bookchin, B. (1973). The linguistic string parser. Proceedings of the National Computer Conference, 42, 427ā434.
Grosz, B., Joshi, A., & Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 2(21), 203ā225.
Hahn, U., Romacker, M., & Schulz, S. (1999). Discourse structures in medical reports ā watch out! the generation of referentially coherent and valid text knowledge bases in the MEDSYNDIKATE system. International Journal of Medical Informatics, 53(1), 1ā28.
Hahn, U., Romacker, M., & Schulz, S. (2002). MEDSYNDIKATE: A natural language system for the extraction of medical information from finding reports. International Journal of Medical Informatics, 67(1/3), 63ā74.
Harkema, H., Dowling, J. N., Thornblad, T., & Chapman, W. W. (2009). ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics, 42(5), 839ā851.
Harris, Z. (1991). A theory of language and information ā a mathematical approach. New York: Oxford University Press.
Harris, Z., Gottfried, M., Ryckman, T., Mattick, P., Daladier, A., Harris, T., & Harris, S. (1989). The form of information in science ā analysis of an immunology sublanguage. Dordrecht: Kluwer Academic.
Haug, P. J., Ranum, D. L., & Frederick, P. R. (1990). Computerized extraction of coded findings from free-text radiology reports. Radiology, 174, 543ā548.
Haug, P., Koehler, S., Lau, L.M., Wang, P., Rocha, R., & Huff, S. (1994). A natural language understanding system combining syntactic and semantic techniques. Proceedings of the Annual Symposium on Computer Applications in Medical Care, 247ā251.
Hirschman, L., Yeh, A., Blaschke, C., & Valencia, A. (2005). Overview of BioCreAtIvE: Critical assessement of information extraction for biology. BMC Bioinformatics, 6(Suppl 1), S1.
Hobbs, J. R., Appelt, D. E., Bear, J., Israel, D., Kameyama, M., Stickel, M., et al. (1996). FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In Finite state devices for natural language processing. Cambridge, MA: MIT Press.
Hripcsak, G., Friedman, C., Alderson, P. O., DuMouchel, W., Johnson, S. B., & Clayton, P. D. (1995). Unlocking data from narrative reports: A study of natural language processing. Annals of Internal Medicine, 122(9), 681ā688.
Humphreys, K., Demetriou, G., & Gaizauskas, R. (2000). Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Proceedings of the Pacific Symposium on Biocomputing, 505ā516.
Issel-Tarver, L., Christie, K.R., Dolinski, K., Andrada, R., Balakrishnan, R., Ball, C.A., Binkley, G., Dong, S., Dwight, S.S., Fisk, D.G., Harris, M., Schroeder, M., Sethuraman, A., Tse, K., Weng, S., Botstein, D., & Cherry, J.M. (2001). Saccharomyces genome database. Methods Enzymol, 350, 329ā346. Available at http://www.yeastgenome.org/
Jenssen, T.K., & Vinterbo, S. (2000). A set-covering approach to specific search for literature about human genes. Proceedings of the AMIA Symposium, 384ā388.
Jordan, D. A., McKeown, K. R., Concepcion, K. J., Feiner, S. K., & Hatzivassiloglou, V. (2001). Generation and evaluation of intraoperative inferences for automated health care briefings on patient status after bypass surgery. Journal of the American Medical Informatics Association: JAMIA, 8(3), 267ā280.
Jurafsky, D., & Martin, J. H. (2009). Speech and language processing. An introduction to natural language processing, computational linguistics and speech recognition. Upper Saddle River: Prentice Hall.
Keselman, A., Tse, T., Crowell, J., Browne, A., Ngo, L., & Zeng, Q. (2007). Assessing consumer health vocabulary familiarity: An exploratory study. Journal of Medical Internet Research, 9(1), e5.
Kim, J. D., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIA corpus ā semantically annotated corpus for bio-textmining. Bioinformatics, 19(suppl 1), i180āi182.
Kittredge, R., & Lehrberger, J. (Eds.). (1982). Sublanguage ā studies of language in restricted semantic domains. New York: De Gruyter.
Lindberg, D. A. B., Humphreys, B. L., & McCray, A. T. (1993). The unified medical language system. Methods of Information in Medicine, 32, 281ā291.
Manning, C., & SchĆ¼tze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313ā330.
Maroto, M., Reshef, R., Munsterberg, A. E., Koester, S., Goulding, M., & Lassar, A. B. (1997). Ectopic Pax-3 activates MyoD and Myf-5 expression in embryonic mesoderm and neural tissue. Cell, 89, 139ā148.
Minsky, M. (1975). A framework for representing knowledge. In P. H. Wintson (Ed.), The psychology of computer vision. New York: McGraw-Hill.
Mutalik, P. G., Deshpande, A., & Nadkarni, P. M. (2001). Use of general-purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Informatics Association: JAMIA, 8(6), 598ā609.
Nadkarni, P., Chen, R., & Brandt, C. (2001). UMLS concept indexing for production databases: A feasibility study. Journal of the American Medical Informatics Association: JAMIA, 8(1), 80ā91.
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71ā105.
Park, J. C., Kim, H. S., & Kim, J. J. (2001). Bidirectional incremental parsing for automatic pathway identification with combinatory categorial grammar. Proceedings of the Pacific Symposium on Biomcomputing, 6, 396ā407.
Pestian, J.P., & Matykiewicz, P. (2008). Classification of suicide notes using natural language processing. Proceedings of the ACL BioNLP, 96ā97.
Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. Proceedings of the Workshop on BioNLP, 97ā104
Rindflesch, T.C., Tanabe, L., Weinstein, J.N., & Hunter, L. (2000). EDGAR: extraction of drugs, genes and relations from the biomedical literature. Proceedings of the Pacific Symposium Biocomputing, 517ā528.
Saeed, M., Lieu, C., Raber, G., & Mark, R. G. (2002). MIMIC II: A massive temporal ICU patient database to support research in intelligent patient monitoring. Computers in Cardiology, 29, 641ā644.
Sager, N. (1972). Syntactic formatting of science information. Proceedings of the AFIPS (pp. 791ā800). In Kittredge, R., &Lehrberger, J., (Eds.), Reprinted in Sublanguage: Studies of language in restricted semantic domains (pp. 9ā26). Berlin (1982): Walter de Gruyter.
Sager, N. (1978). Natural language information formatting: The automatic conversion of texts to a structured data base. In M. C. Yovits (Ed.), Advances in computers (Vol. 17, pp. 89ā162). New York: Academic Press.
Sager, N. (1981). Natural language information processing: A computer grammer of english and its applications. Reading: Addison-Wesley.
Sager, N., Friedman, C., & Lyman, M. (1987). Medical language processing ā computer management of narrative data. Reading: Addison-Wesley.
Sekimizu, T., Park, H. S., & Tsujii, J. (1998). Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. Genome Informatics Ser Workshop on Genome Informatics, 9, 62ā71.
Spyns, P. (1996). Natural language processing in medicine: An overview. Methods of Information in Medicine, 35, 285ā301.
The FlyBase Consortium. (2003). The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Research, 31, 172ā175. Available at http://flybase.org/
The Gene Ontology Consortium. (2003). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 25ā29. Available at http://www.geneontology.org/
Uzuner, O. (2009). Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association: JAMIA, 16(4), 561ā570.
Uzuner, O., Goldstein, I., Luo, Y., & Kohane, I. (2008). Identifying patient smoking status from medical discharge records. Journal of the American Medical Informatics Association: JAMIA, 15(1), 14ā24.
Uzuner, O., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association: JAMIA, 17(5), 514ā518.
Uzuner, O., South, B. R., Shen, S., & Duvall, S. L. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association: JAMIA, 18(5), 552ā556.
Vincze, V., Szarvas, G., Farkas, R., Mora, G., & Csirik, J. (2008). The BioScope corpus: Biomedical texts annotated for uncertainty, negation, and their scopes. BMC Bioinformatics, 9(S11), S9.
Wang, X., Hripcsak, G., Markatou, M., & Friedman, C. (2009a). Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: A feasibility study. Journal of the American Medical Informatics Association: JAMIA, 16(3), 328ā337.
Wang, Y., Xiao, J., Suzek, T. O., et al. (2009b). PubChem: A public information system for analyzing bioactivities of small molecules. Nucleic Acids Research, 37, W623āW633.
Weeber, M., Mork, J., & Aronson, A. (2001). Developing a test collection for biomedical word sense disambiguation. Proceedings of the AMIA Symposium, 746ā750.
Weizenbaum, J. (1966). A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36ā45.
Winograd, T. (1972). Understanding natural language. Cognitive Psychology, 3(1), 1ā191.
Woods, W. (1973). Progress in NLU ā an application to lunar geology. Proceeding of AFIPS, 42, 441ā450.
Yakushiji, A., Tateisi, Y., Miyao, Y., & Tsujii, J. (2001). Event extraction from biomedical papers using a full parser. Proceedings of the Pacific Symposium Biocomputing, 6, 408ā419.
Zhang, H., Fiszman, M., Shin, D., Miller, C. M., Rosemblat, G., & Rindflesch, T. C. (2011). Degree centrality for semantic abstraction summarization of therapeutic studies. Journal of Biomedical Informatics, 44(5), 830ā838.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Friedman, C., Elhadad, N. (2014). Natural Language Processing in Health Care and Biomedicine. In: Shortliffe, E., Cimino, J. (eds) Biomedical Informatics. Springer, London. https://doi.org/10.1007/978-1-4471-4474-8_8
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4474-8_8
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4473-1
Online ISBN: 978-1-4471-4474-8
eBook Packages: MedicineMedicine (R0)