Abstract
Narrative text is an important component of communication in health care, including patient-specific information in health record reports and notes and general biomedical knowledge papers, textbooks and web resources. Retrieval of information from such sources can be accomplished with keyword indexing but this approach fails to distinguish between texts that discuss a topic versus those that merely mention it. Natural language processing (NLP) techniques seek to analyze narrative text to make these distinctions and even to find instances where a concept is discussed but not explicitly mentioned. Previously, grammar-based methods were based on parsing sentences into their structure to infer semantics. Machine-learning methods are now being applied that can infer semantics through techniques such as statistical pattern recognition. Regular expression, rule bases, neural networks and word embeddings are among approaches that are improving the ability of automated systems to carry out successful language understanding, information extraction and question answering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, S., & Yu, H. (2009). Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion. Bioinformatics, 25(23), 3174ā3180.
Aronson, A. R., & Lang, F. M. (2010). An overview of MetaMap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3), 229ā236.
Barbarino, J. M., Whirl-Carrillo, M., Altman, R. B., & Klein, T. E. (2018). PharmGKB: A worldwide resource for pharmacogenomic information. Wiley Interdisciplinary Reviews. Systems Biology and Medicine, 10(4), e1417.
Ben Abacha, A., Shivade, C., & Demner-Fushman, D. (2019). Overview of the MEDIQA 2019 shared task on textual inference, question entailment and question answering. Proceedings of the BioNLP 2019 Workshop.
Bird, S., Klein, E., & Loper, E.. https://www.nltk.org/book/.
Bjƶrne, J., Ginter, F., Pyysalo, S., Tsujii, J. I., & Salakoski, T. (2010). Complex event extraction at PubMed scale. Bioinformatics, 26(12), i382āi390.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77ā84.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993ā1022.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.01759.
Bunt, H. (2017). Computational pragmatics. In Y. Huang (Ed.), The Oxford handbook of pragmatics (pp. 326ā345). Oxford: Oxford University Press.
Cao, Y., Liu, F., Simpson, P., Antieau, L., Bennett, A., Cimino, J. J., et al. (2011). AskHERMES: An online question answering system for complex clinical questions. Journal of Biomedical Informatics, 44(2), 277ā288.
Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 34(5), 301ā310.
Christensen, L., Haug, P., & Fiszman, P. (2002). MPLUS: A probabilistic medical language understanding system. Proceedings of the ACL BioNLP, 29ā36.
Claveau, V., & LāHomme, M.-C. (2005). Structuring terminology using analogy-based machine learning. Proceedings of the 7th International Conference on Terminology and Knowledge Engineering, TKE.
Cohen, P. R. (2015). DARPAās Big Mechanism program. Physical Biology, 12(4), 045008. IOP Publishing Ltd. https://iopscience.iop.org/article/10.1088/1478-3975/12/4/045008/meta.
Conway, M., Keyhani, S., Christensen, L., South, B. R., Vali, M., Walter, L. C., et al. (2019). Moonstone: A novel natural language processing system for inferring social risk from clinical narratives. Journal of Biomedical Semantics, 10(1), 6. https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-019-0198-0.
De Choudhury, M. D., Counts, S., & Horvitz, E.. (2013). Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference (WebSci ā13). Association for Computing Machinery, New York, NY, USA, 47ā56. https://dl.acm.org/doi/abs/10.1145/2464464.2464480.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391ā407.
DelĆ©ger, L., Merkel, M., & Zweigenbaum, P. (2009a). Translating medical terminologies through word alignment in parallel text corpora. Journal of Biomedical Informatics, 42(4), 692ā701.
DelĆ©ger, L., Namer, F., & Zweigenbaum, P. (2009b). Morphosemantic parsing of medical compound words: Transferring a French analyzer to English. International Journal of Medical Informatics, 78, S48āS55.
Demner-Fushman, D., & Lin, J. (2007). Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1), 63ā103.
Demner-Fushman, D., Chapman, W. W., & McDonald, C. J. (2009). What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, 42(5), 760ā772.
Demner-Fushman, D., Rogers, W. J., & Aronson, A. R. (2017). MetaMap Lite: An evaluation of a new Java implementation of MetaMap. Journal of the American Medical Informatics Association, 24(4), 841ā844.
Demner-Fushman, D., Shooshan, S. E., Rodriguez, L., Aronson, A. R., Lang, F., Rogers, W., et al. (2018). A dataset of 200 structured product labels annotated for adverse drug reactions. Scientific Data, 5, 180001.
Demner-Fushman, D., Mrabet, Y., & Ben Abacha, A. (2020). Consumer health information and question answering: Helping consumers find answers to their health-related information needs. Journal of the American Medical Informatics Association, 27(2), 194ā201.
Denny, J. C., Miller, R. A., Johnson, K. B., & Spickard, A. III. (2008). Development and evaluation of a clinical note section header terminology. In AMIA annual symposium proceedings 2008 (Vol. 2008, pp. 156ā160). Bethesda: American Medical Informatics Association.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018, Oct 11). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dublin, S., Baldwin, E., Walker, R. L., Christensen, L. M., Haug, P. J., Jackson, M. L., et al. (2013). Natural language processing to identify pneumonia from radiology reports. Pharmacoepidemiology and Drug Safety, 8(22), 834ā841.
Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., et al. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26(2), 159ā169. https://doi.org/10.1177/0956797614557867.
Elhadad, N. (2006). Comprehending technical texts: Predicting and defining unfamiliar terms. Proceedings AMIA Symposium, 239ā243.
Elhadad, N., Kan, M. Y., Klavans, J. L., & McKeown, K. R. (2005). Customization in a unified framework for summarizing medical literature. Artificial Intelligence in Medicine, 33(2), 179ā198.
Evans, D. A., Cimino, J. J., Hersh, J. J., Huff, S. M., & Bell, D. S. (1994). Toward a medical-concept representation language. The Canon Group. Journal of the American Medical Informatics Association: JAMIA, 1(3), 207ā217.
Eysenbach, G., & Till, J. E. (2001). Ethical issues in qualitative research on internet communities. BMJ, 323(7321), 1103ā1105.
Firth, J. R. (1957). A synopsis of linguistic theory. In Studies in linguistic analysis. Oxford: Blackwell.
Friedman, C. (2000). A broad-coverage natural language processing system. American Medical Informatics Association Annual Symposium Proceedings, 2000, 270ā274.
Friedman, C., Alderson, P. O., Austin, J., Cimino, J. J., & Johnson, S. B. (1994). A general natural language text processor for clinical radiology. Journal of the American Medical Informatics Association: JAMIA, 1(2), 161ā174.
Friedman, C., Shagina, L., Lussier, Y., & Hripcsak, G. (2004). Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association, 11(5), 392ā402.
Ganiz, M. C., Pottenger, W. M., & Janneck, C. D. (2005). Recent advances in literature based discovery. Journal of the American Society for Information Science and Technology: JASIST (Submitted).
Ghassemi, M., Naumann, T., Doshi-Velez, F., Brimmer, N., Joshi, R., Rumshisky, A., & Szolovits, P. (2014). Unfolding physiological state: Mortality modelling in intensive care units. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 75ā84). New York: ACM.
Greaves, F., Ramirez-Cano, D., Millett, C., Darzi, A., & Donaldson, L. (2013). Use of sentiment analysis for capturing patient experience from free-text comments posted online. Journal of Medical Internet Research, 15(11), e239.
Grishman, R., Sager, N., Raze, C., & Bookchin, B. (1973). The linguistic string parser. Proceedings of the National Computer Conference, 42, 427ā434.
Grosz, B., Joshi, A., & Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 2(21), 203ā225.
Habibi, M., Weber, L., Neves, M., Wiegandt, D. L., & Leser, U. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33(14), i37āi48.
Hahn, U., Romacker, M., & Schulz, S. (1999). Discourse structures in medical reports ā watch out! The generation of referentially coherent and valid text knowledge bases in the MEDSYNDIKATE system. International Journal of Medical Informatics, 53(1), 1ā28.
Hakenberg, J., Voronov, D., NguyĆŖn, V. H., Liang, S., Anwar, S., Lumpkin, B., et al. (2012). A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. Journal of Biomedical Informatics, 45(5), 842ā850.
Harris, Z. (1991). A theory of language and information ā a mathematical approach. New York: Oxford University Press.
Harris, Z., Gottfried, M., Ryckman, T., Mattick, P., Daladier, A., Harris, T., & Harris, S. (1989). The form of information in science ā analysis of an immunology sublanguage. Dordrecht: Kluwer Academic.
Haug, P. J., Ranum, D. L., & Frederick, P. R. (1990). Computerized extraction of coded findings from free-text radiology reports. Radiology, 174, 543ā548.
Haug, P., Koehler, S., Lau, L. M., Wang, P., Rocha, R., & Huff, S. (1994). A natural language understanding system combining syntactic and semantic techniques. Proceedings of the Annual Symposium on Computer Applications in Medical Care, 247ā251.
Hofmann, T. (1999). Probabilistic latent semantic indexing. Proceedings of the Twenty-Second Annual International SIGIR Conference.
Hripcsak, G., Friedman, C., Alderson, P. O., DuMouchel, W., Johnson, S. B., & Clayton, P. D. (1995). Unlocking data from narrative reports: A study of natural language processing. Annals of Internal Medicine, 122(9), 681ā688.
Hripcsak, G., Soulakis, N. D., Li, L., Morrison, F. P., Lai, A. M., Friedman, C., et al. (2009). Syndromic surveillance using ambulatory electronic health records. Journal of the American Medical Informatics Association, 16(3), 354ā361.
HĆ¼ske-Kraus, D. (2003). Text generation in clinical medicine ā a review. Methods of Information in Medicine, 42(1), 51ā60.
Institute of Medicine (US) Committee for Evaluating Medical Technologies in Clinical Use. (1985). Assessing medical technologies. Washington, DC: National Academies Press.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (Springer texts in statistics). New York: Springer Science+Business Media.
Keselman, A., Tse, T., Crowell, J., Browne, A., Ngo, L., & Zeng, Q. (2007). Assessing consumer health vocabulary familiarity: An exploratory study. Journal of Medical Internet Research, 9(1), e5.
Kilicoglu, H., & Demner-Fushman, D. (2016). Bio-SCoRes: A smorgasbord architecture for coreference resolution in biomedical text. PLoS One, 11(3), e0148538.
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., & Rindflesch, T. C. (2012). SemMedDB: A PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23), 3158ā3160.
Krallinger, M., Morgan, A., Smith, L., Leitner, F., Tanabe, L., Wilbur, J., et al. (2008). Evaluation of text-mining systems for biology: Overview of the Second BioCreative community challenge. Genome Biology, 9(2), S1.
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019, Jan 25). BioBERT: Pre-trained biomedical language representation model for biomedical text mining. arXiv preprint arXiv:1901.08746.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., et al. (2019, Oct 29). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
Lindberg, D. A. B., Humphreys, B. L., & McCray, A. T. (1993a). The unified medical language system. Methods of Information in Medicine, 32, 281ā291.
Lindberg, D. A., Siegel, E. R., Rapp, B. A., Wallingford, K. T., & Wilson, S. R. (1993b). Use of MEDLINE by physicians for clinical problem solving. Journal of the American Medical Association, 269(24), 3124ā3129.
Lynch, J. A., Kelley, M. J., Lee, K. M., Hung, A., Li, Y., Hintze, B. J., et al. (2019). An NLP tool to identify molecular diagnostic testing in veterans with stage IV NSCLC. Journal of Clinical Oncology, 37(27_suppl), 318. https://ascopubs.org/doi/abs/10.1200/JCO.2019.37.27_suppl.318.
Mane, V. L., Panicker, S. S., & Patil, V. B. (2015, Jan 8). Summarization and sentiment analysis from user health posts. In 2015 International Conference on Pervasive Computing (ICPC) (pp. 1ā4). IEEE.
Maroto, M., Reshef, R., Munsterberg, A. E., Koester, S., Goulding, M., & Lassar, A. B. (1997). Ectopic Pax-3 activates MyoD and Myf-5 expression in embryonic mesoderm and neural tissue. Cell, 89, 139ā148.
Meystre, S. M., Friedlin, F. J., South, B. R., Shen, S., & Samore, M. H. (2010). Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Research Methodology, 10(1), 70.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013, Jan 16). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mork, J., Aronson, A., & Demner-Fushman, D. (2017). 12 years on ā is the NLM medical text indexer still useful and relevant? Journal of Biomedical Semantics, 8(1), 8.
OHDSIPNAS, Lancet ā Noemie?
openNLP. http://opennlp.apache.org/index.html.
Peng, Y., Rios, A., Kavuluru, R., & Lu, Z. (2018). Extracting chemicalāprotein relations with ensembles of SVM and deep learning models. Database, 2018, bay073.
Pennington, J., Socher, R., & Manning, C. D. (2014, Oct). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532ā1543.
Peters, M. P., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv:1802.05365.
Pivovarov, R., & Elhadad, N. (2015, Sep). Automated methods for the summarization of electronic health records. Journal of the American Medical Informatics Association, 22(5), 938ā947. https://doi.org/10.1093/jamia/ocv032.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
Resnik, P., Niv, M., Nossal, M., Schnitzer, G., Stoner, J., Kapit, A., & Toren, R. (2006). Using intrinsic and extrinsic metrics to evaluate accuracy and facilitation in computer-assisted coding. Perspectives in Health Information Management Computer Assisted Coding Conference Proceedings, Fall, 6 Sept 2006.
Roberts, K., & Patra, B. G. (2018). A semantic parsing method for mapping clinical questions to logical forms. American Medical Informatics Association Annual Symposium Proceedings, 2017, 1478ā1487.
Roberts, K., Demner-Fushman, D., & Tonning, J. M. (2017). Overview of the TAC 2017 adverse reaction extraction from drug labels track. Proceedings of the 2017 Text Analysis Conference, 13 Nov 2017.
Ruch, P., Boyer, C., Chichester, C., Tbahriti, I., GeissbĆ¼hler, A., Fabry, P., et al. (2007). Using argumentation to extract key sentences from biomedical abstracts. International Journal of Medical Informatics, 76(2ā3), 195ā200.
Ruder, S. (2019). Neural transfer learning for natural language processing (Diss). NUI Galway.
Sager, N. (1972). Syntactic formatting of science information. Proceedings of the AFIPS (pp. 791ā800). In R. Kittredge & J. Lehrberger (Eds.). (1982). Reprinted in Sublanguage: Studies of language in restricted semantic domains (pp. 9ā26). Berlin: Walter de Gruyter.
Sager, N. (1978). Natural language information formatting: The automatic conversion of texts to a structured data base. In M. C. Yovits (Ed.), Advances in computers (Vol. 17, pp. 89ā162). New York: Academic Press.
Sager, N. (1981). Natural language information processing: A computer grammer of English and its applications. Reading: Addison-Wesley.
Sager, N., Friedman, C., & Lyman, M. (1987). Medical language processing ā computer management of narrative data. Reading: Addison-Wesley.
Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. (2010). Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507ā513.
Soysal, E., Wang, J., Jiang, M., Wu, Y., Pakhomov, S., Liu, H., & Xu, H. (2017). CLAMP ā a toolkit for efficiently building customized clinical natural language processing pipelines. Journal of the American Medical Informatics Association, 25(3), 331ā336.
SQUAD. Stanford Question Answering Dataset Leaderboard https://rajpurkar.github.io/SQuAD-explorer/.
Stenetorp, P., Pyysalo, S., TopiÄ, G., Ohta, T., Ananiadou, S., & Tsujii, J. I. (2012). BRAT: A web-based tool for NLP-assisted text annotation. In Proceedings of the demonstrations at the 13th conference of the European chapter of the Association for Computational Linguistics 2012 Apr 23 (pp. 102ā107). Stroudsburg: Association for Computational Linguistics.
Swanson, D. R. (1986). Fish oil, Raynaudās syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30, 7ā18.
Taylor, A., Marcus, M., & Santorini, B. (2003). The Penn treebank: An overview. In Treebanks (pp. 5ā22). Dordrecht: Springer.
Turian, J., Ratinov, L., & Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.
UIMA. https://uima.apache.org/.
Uzuner, O., Goldstein, I., Luo, Y., & Kohane, I. (2008). Identifying patient smoking status from medical discharge records. Journal of the American Medical Informatics Association: JAMIA, 15(1), 14ā24.
Uzuner, Ć., South, B. R., Shen, S., & DuVall, S. L. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552ā556.
Wu, S., Roberts, K., Datta, S., Du, J., Ji, Z., Si, Y., et al. (2019). Deep learning in clinical natural language processing: A methodical review. Journal of the American Medical Informatics Association, 27, 457ā470. https://doi.org/10.1093/jamia/ocz200.
Ye, Y., Tsui, F. R., Wagner, M., Espino, J. U., & Li, Q. (2014). Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. Journal of the American Medical Informatics Association, 5(21), 815ā823.
Zhang, H., Fiszman, M., Shin, D., Miller, C. M., Rosemblat, G., & Rindflesch, T. C. (2011). Degree centrality for semantic abstraction summarization of therapeutic studies. Journal of Biomedical Informatics, 44(5), 830ā838.
Zunic, A., Corcoran, P., & Spasic, I. (2020). Sentiment analysis in health and well-being: Systematic review. JMIR Medical Informatics, 8(1), e16023. https://doi.org/10.2196/16023.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Demner-Fushman, D., Elhadad, N., Friedman, C. (2021). Natural Language Processing for Health-Related Texts. In: Shortliffe, E.H., Cimino, J.J. (eds) Biomedical Informatics. Springer, Cham. https://doi.org/10.1007/978-3-030-58721-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-58721-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58720-8
Online ISBN: 978-3-030-58721-5
eBook Packages: MedicineMedicine (R0)