Natural Language Processing – Overview and History

Connolly, Brian; Miller, Timothy; Ni, Yizhao; Cohen, Kevin B.; Savova, Guergana; Dexheimer, Judith W.; Pestian, John

doi:10.1007/978-981-10-1104-7_11

Brian Connolly Ph.D.³,
Timothy Miller Ph.D.⁴,
Yizhao Ni Ph.D.⁵,
Kevin B. Cohen Ph.D.⁶,
Guergana Savova Ph.D.⁷,
Judith W. Dexheimer Ph.D.⁸ &
…
John Pestian Ph.D., M.B.A.⁹

Part of the book series: Translational Bioinformatics ((TRBIO,volume 10))

992 Accesses
1 Citations
1 Altmetric

Abstract

In this chapter, we introduce the topic of Natural Language Processing (NLP) in the clinical domain. NLP has shown increasing promise in tasks ranging from the assembly of patient cohorts to the identification of mental disorders. The chapter begins with a discussion of the necessity of NLP for analyzing EHRs. Subsequent sections then place clinical NLP research in a wider historical context by reviewing various approaches to NLP over time. The focus then turns to available NLP-related data resources and the methods of generating such resources. The actual development of NLP systems and their evaluation are then examined. The chapter concludes by describing current and future challenges in clinical NLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Mathematically, the average inner-unit disagreement is given by
m _u,the number of annotators for u, d(x,y) is an arbitrarily defined function that quantifies dis-similarity between two values x and y, and a _jk is the value assigned to the kth unit by the jth annotator. D _o is then defined as

where there are n pairable values over m _u annotators and N analysis units. Note each D _u is weighted by the fraction of total annotations contained in analysis unit u.
D _e is calculated by averaging over all annotated pairs:
where (i,u) ≠ (i’,u’) and m are the number of annotators. Random chance disagreement is thereby defined by the average disagreement over all pairs regardless of their analysis unit or annotator.

References

AAMT. American Association for Medical Transcription position paper. Quality assurance guidelines. J Am Assoc Med Transcr. 1994;13(6):33–7.
Google Scholar
AAMT. American Association for Medical Transcriptionists. Best practices for measuring quality in medical transcription March 2015. 2005. Retrieved from http://www.startranscriptions.com/QualityMeasurementMT.pdf.
ACL and SIGNLL. Association of Computational Linguistics (ACL) and Special Interest Group on Natural Language Learning (SIGNLL). CoNLL: the conference of SIGNLL; 2010. Retrieved from http://ifarm.nl/signll/conll/.
Albright D, Lanfranchi A, Fredriksen A, Styler WF, Warner C, Hwang JD, et al. Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc. 2013 Sep 1;20(5):922–30.
Google Scholar
ALT Server – QCRI. SemEval – 2014 Task 7. 2014. Retrieved from http://alt.qcri.org/semeval2014/task7/.
ALT Server – QCRI. SemEval-2015 Task 14: analysis of clinical text. 2015. Retrieved from http://alt.qcri.org/semeval2015/task14/.
Altman D. Inter-rater agreement. Practical statistics for medical research. 1991;5:403–409.
Google Scholar
Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Paper presented at the proceedings of the AMIA symposium. 2001.
Google Scholar
Aston G, Burnard L. The BNC handbook: exploring the British National Corpus with SARA: Capstone. 1998.
Google Scholar
Baker CF, Fillmore CJ, Lowe JB. The Berkeley FrameNet project. Poster presented at the proceedings of the 17th international conference on computational linguistics – Volume 1, Montreal; 1998. http://acl.ldc.upenn.edu/C/C98/C98-1013.pdf.
Banerjee M, Capozzoli M, McSweeney L, Sinha D. Beyond kappa: a review of interrater agreement measures. Can J Stat. 1999;27(1):3–23.
Article Google Scholar
Bethard S, Derczynski L, Savova G, Savova G, Pustejovsky J, Verhagen M. Semeval-2015 task 6: Clinical tempeval. Proc SemEval. 2015.
Google Scholar
Birdwhistell RL. Kinesics and context: essays on body motion communication. Philadelphia: University of Pennsylvania press; 2010.
Google Scholar
Bodenreider O, McCray AT. Exploring semantic groups through visual approaches. J Biomed Inform. 2003;36(6):414–32.
Article PubMed PubMed Central Google Scholar
Braga-Neto UM, Dougherty ER. Is cross-validation valid for small-sample microarray classification? Bioinformatics. 2004;20(3):374–80.
Article CAS PubMed Google Scholar
brat. brat rapid annotation tool. Retrieved from http://brat.nlplab.org/.
Bright W. International encyclopedia of linguistics. New York: Oxford University Press; 1992.
Google Scholar
Brownstein JS, Freifeld CC, Madoff LC. Influenza A (H1N1) virus, 2009 – online monitoring. N Engl J Med. 2009;360(21):2156.
Article PubMed Google Scholar
Brownstein JS, Sordo M, Kohane IS, Mandl KD. The tell-tale heart: population-based surveillance reveals an association of rofecoxib and celecoxib with myocardial infarction. PLoS One. 2007;2(9):e840.
Article PubMed PubMed Central Google Scholar
cancer.healthnlp.org. Health NLP. Retrieved from https://healthnlp.hms.harvard.edu/cancer/wiki/index.php/Main_Page.
CDC. Suicide trends among youths and young adults aged 10–24 years – United States, 1990–2004. MMWR Morb Mortal Wkly Rep. 2007;56(35):905–8.
Google Scholar
Chapman WW, Dowling JN. Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports. J Biomed Inform. 2006;39(2):196–208.
Article PubMed Google Scholar
Chapman WW, Dowling JN, Hripcsak G. Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports. Int J Med Inform. 2008;77(2):107–13.
Article PubMed Google Scholar
Chen W-T, Styler W. Anafora: a web-based general purpose annotation tool. Paper presented at the Proceedings of the North American Association for Computational Linguistics conference, Atlanta; 2013.
Google Scholar
Chen Y, Mani S, Xu H. Applying active learning to assertion classification of concepts in clinical text. J Biomed Inform. 2012;45(2):265–72.
Article PubMed Google Scholar
Chomsky N. Aspects of the theory of syntax. Cambridge: M.I.T. Press; 1965.
Google Scholar
Cieri C, Miller D, Walker K. The fisher corpus: a resource for the next generations of speech-to-text. Paper presented at the LREC, vol. 4. 2004; p. 69–71.
Google Scholar
Cinchor N. The statistical significance of MUC4 results. Paper presented at the MUC4 ‘92 proceedings of the 4th conference on message understanding. 1992.
Google Scholar
CLAN. Child language data exchange system. Retrieved February 12, 2016 http://childes.psy.cmu.edu/Clan/.
CLEF/ShAREe. Sharing annotated resources. 2013. Retrieved from https://sites.google.com/site/shareclefehealth/
CLEF/ShAREe. CLEF ehealth 2014: lab overview. 2014. Retrieved from http://clefehealth2014.dcu.ie/.
CMC. Computational medical center. 2007 international challenge: classifying clinical free text using natural language processing. 2007. Retrieved from http://computationalmedicine.org/challenge/previous.
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.
Article Google Scholar
Deleger L, Lingren T, Ni Y, Kaiser M, Stoutenborough L, Marsolo K, Kouril M, Molnar K, Solti I. Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research. J Biomed Inform. 2014;50:173–83.
Article PubMed PubMed Central Google Scholar
DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.
Article CAS PubMed Google Scholar
Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform. 2009;42(5):760–72.
Article PubMed PubMed Central Google Scholar
Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26(9):1205–10.
Article CAS PubMed PubMed Central Google Scholar
Desmet, B. Finding the online cry for help: automatic text classification for suicide prevention. PhD Dissertation. Ghent: Ghent University; 2014.
Google Scholar
Dy JG, Brodley CE. Feature selection for unsupervised learning. J Mach Learn Res. 2004;5:845–89.
Google Scholar
Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc. 2005;12(2):217–24.
Article PubMed PubMed Central Google Scholar
eMERGE Network. A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. Retrieved from http://gwas.net/.
Fellbaum C, Grabowski J, Landes S. Performance and Confidence in a Semantic Annotation Task. In: Fellbaum C, editor. WordNet: an electronic lexical database. Cambridge, MA: MIT Press; 1998.
Google Scholar
Friedman C. A broad-coverage natural language processing system. Paper presented at the proceedings of the AMIA symposium. 2000.
Google Scholar
Gale W, Church KW, Yarowsky D. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. Poster presented at the proceedings of the 30th annual meeting on association for computational linguistics, Newark, Delaware. 1992.
Google Scholar
Garla VN, Brandt C. Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification. J Am Med Inform Assoc. 2013;20(5):882–6.
Article PubMed Google Scholar
Geisser S. Predictive sample reuse method with applications. J Am Stat Assoc. 1975;70(350):320–8.
Article Google Scholar
Giacomini KM, Brett CM, Altman RB, Benowitz NL, Dolan ME, Flockhart DA, Johnson JA, Hayes DF, Klein T, Krauss RM, Kroetz DL, McLeod HL, Nguyen AT, Ratain MJ, Relling MV, Reus V, Roden DM, Schaefer CA, Shuldiner AR, Skaar T, Tantisira K, Tyndale RF, Wang L, Weinshilboum RM, Weiss ST, Zineh I. The pharmacogenetics research network: from SNP discovery to clinical drug response. Clin Pharmacol Ther. 2007;81(3):328–45.
Article CAS PubMed PubMed Central Google Scholar
Gomez JM. Language technologies for suicide prevention in social media. Paper presented at the 5th information systems research working days (JISIC 2014). 2014.
Google Scholar
Grouin C, Rosset S, Zweigenbaum P, Fort K, Galibert O, Quintard L. Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview. Poster presented at the proceedings of the 5th linguistic annotation workshop (LAW V ’11), Portland, Oregon. 2011.
Google Scholar
Gwet KL. On Krippendorff’s Alpha coefficient. Advanced analytics LLC inter-rater reliability Publicaitons. 2011. Retrieved from http://www.agreestat.com/research_papers/onkrippendorffalpha_rev10052015.pdf.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.
Article CAS PubMed Google Scholar
Hastie T, Tibshirani R, Friedman J, Franklin J. The elements of statistical learning: data mining, inference and prediction. Math Intell. 2005;27(2):83–5.
Google Scholar
Health Map. 2006. Retrieved from http://www.healthmap.org/en/.
Hicks J. The potential of claims data to support the measurement of health care quality Santa Monica. Santa Monica: RAND Corporation; 2003. Retrieved from http://www.rand.org/pubs/rgs_dissertations/RGSD171.
Hripcsak G, Rothschild AS. Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc. 2005;12(3):296–8.
Article PubMed PubMed Central Google Scholar
Huang Y-P, Goh T, Liew CL. Hunting suicide notes in web 2.0-preliminary findings. Paper presented at the multimedia workshops, 2007 ISMW’07 Ninth IEEE international symposium on. 2007.
Google Scholar
i2b2. Informatics for integrating biology & the bedside. Datasets. Retrieved from https://www.i2b2.org/NLP/DataSets/Main.php.
i2b2. Informatics for integrating biology & the bedside. 2012 NLP shared task: shared-tasks and workshop on challenges in natural language processing for clinical data. 2012. Retrieved from https://www.i2b2.org/NLP/HeartDisease/.
Jashinsky J, Burton SH, Hanson CL, West J, Giraud-Carrier C, Barnes MD, Argyle T. Tracking suicide risk factors through Twitter in the US. Crisis. 2014;35(1):51–9.
Article PubMed Google Scholar
Jha AK. The promise of electronic records: around the corner or down the road? JAMA. 2011;306(8):880–1.
Article CAS PubMed Google Scholar
Jones K. Natural language processing: a historical review [Paper]. Current issues in computational linguistics: in honour of Don Walker. 2001. Retrieved from http://www.cl.cam.ac.uk/archive/ksj21/histdw4.pdf.
Jurafsky D, Martin JH. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River: Prentice Hall; 2000.
Google Scholar
Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N, Crane PK, Pathak J, Chute CG, Bielinski SJ, Kullo IJ, Li R, Manolio TA, Chisholm RL, Denny JC. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med. 2011;3(79):79re71.
Article Google Scholar
Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet. 2011;12(6):417–28.
Article CAS PubMed Google Scholar
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Poster presented at the proceedings of the fourteenth international joint conference on artificial intelligence. 1995.
Google Scholar
Krippendorff K. Computing Krippendorff’s alpha reliability. Departmental papers (ASC). 2007;43.
Google Scholar
Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L. The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 2005;38(5):404–15.
Article PubMed Google Scholar
Li Q, Deleger L, Lingren T, Zhai H, Kaiser M, Stoutenborough L, Jegga AG, Cohen KB, Solti I. Mining FDA drug labels for medical conditions. BMC Med Inform Decis Mak. 2013a;13(1):1.
Article CAS Google Scholar
Li T, Ng B, Chau M, Wong P, Yip P. Collective intelligence for suicide surveillance in web forums intelligence and security informatics. Berlin: Springer; 2013b. p. 29–37.
Google Scholar
Luo Z, Johnson SB, Lai AM, Weng C. Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. AMIA Annu Symp Proc. 2011;2011:843–52.
PubMed PubMed Central Google Scholar
MacWhinney B. The CHILDES project: tools for analyzing talk, The database, vol. 2. 3rd ed. Mahwah: Lawrence Erlbaum Associates; 2000.
Google Scholar
Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH. Big data: the next frontier for innovation, competition, and productivity. Washington, DC: McKinsey Global Institute; 2011.
Google Scholar
Marcus MP, Marcinkiewicz MA, Santorini B. Building a large annotated corpus of English: the Penn Treebank. Comput Linguist. 1993;19(2):313–30.
Google Scholar
Matykiewicz P, Duch W, Pestian J. Clustering semantic spaces of suicide notes and newsgroups articles. Poster presented at the proceedings of the workshop on current trends in biomedical natural language processing, Boulder; 2009.
Google Scholar
McCarty C, Chisholm R, Chute C, Kullo I, Jarvik G, Larson E, Li R, Masys D, Ritchie M, Roden D, Struewing J, Wolf W, Team, t. e. The eMERGE network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4(1):13.
Article PubMed PubMed Central Google Scholar
Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12(4):448–57.
Article PubMed PubMed Central Google Scholar
Meyers A, Reeves R, Macleod C, Szekely R, Zielinska V, Young B, Grishman R. The NomBank project: an interim report. Paper presented at the HLT-NAACL 2004 workshop: Frontiers in Corpus Annotation. 2004.
Google Scholar
Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008:128–144.
Google Scholar
Miller T, Dligach D, Savova GK. Active learning for coreference resolution. Poster presented at the BioNLP workshop at the conference of the North American Association of Computational Linguistics (NACCL), Montreal. 2012.
Google Scholar
Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, Dittus RS, Rosen AK, Elkin PL, Brown SH, Speroff T. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–55.
Article CAS PubMed Google Scholar
National Institutes of Health and National Institute of General Medical Sciences. The NIH Pharmacogenomics Research Network (PGRN). Retrieved from http://www.nigms.nih.gov/Research/FeaturedPrograms/PGRN/.
National Research Council. “Recommendations.” Language and machines: computers in translation and linguistics. 1966. Retrieved from http://www.nap.edu/openbook.php?isbn=ARC000005.
NIST. National Institute for Standards in Technology. Text REtrieval Conference (TREC). Retrieved from http://trec.nist.gov/.
NLM(U.S) and NIH(U.S.). SemRep: semantic knowledge representation. Retrieved from http://semrep.nlm.nih.gov/.
Ogren, P. Knowtator: a protégé plug-in for annotated corpus construction. Paper presented at the proceedings of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 2006.
Google Scholar
Ogren PV, Savova GK, Chute C. Constructing evaluation corpora for automated clinical named entity recognition. Paper presented at the proceedings of the sixth international conference on language resources and evaluation (LREC ‘08), Marrakech, Morocco. 2008.
Google Scholar
Osborne JW. Best practices in quantitative methods. Thousand Oaks: Sage Publications; 2008.
Book Google Scholar
Owen JE, Giese-Davis J, Cordova M, Kronenwetter C, Golant M, Spiegel D. Self-report and linguistic indicators of emotional expression in narratives as predictors of adjustment to cancer. J Behav Med. 2006;29(4):335–45.
Article PubMed Google Scholar
Pakhomov SV, Coden A, Chute CG. Developing a corpus of clinical notes manually annotated for part-of-speech. Int J Med Inform. 2006;75(6):418–29.
Article PubMed Google Scholar
Palmer M, Dang HT, Fellbaum C. Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat Lang Eng. 2007;13(02):137–63.
Google Scholar
Palmer M, Gildea D, Kingsbury P. The proposition bank: an annotated corpus of semantic roles. Comput Linguist. 2005;31(1):71–106.
Article Google Scholar
Palmer M, Xue N. Linguistic annotation. In: Clark A, Fox C, Lappin S, editors. The handbook of computational linguistics and natural language processing. Chichester/Malden: Wiley-Blackwell; 2010. p. 13–21.
Google Scholar
Penn Treebank Project. Retrieved from http://www.cis.upenn.edu/~treebank/.
Pennebaker JW, Francis ME, Booth RJ. Linguistic inquiry and word count: LIWC 2001. Mahwah: Erlbaum Publishers; 2001 (www.erlbaum.com).
Google Scholar
Pennebaker JW, Mayne TJ, Francis ME. Linguistic predictors of adaptive bereavement. J Pers Soc Psychol. 1997;72(4):863.
Article CAS PubMed Google Scholar
Pestian JP, Brew C, Matykiewicz P, Hovermale DJ, Johnson N, Cohen KB, Duch W. A shared task involving multi-label classification of clinical free text. Poster presented at the proceedings of the workshop on BioNLP 2007: biological, translational, and clinical language processing, Prague, Czech Republic. 2007.
Google Scholar
Pestian JP, Matykiewicz P, Grupp-Phelan J, Lavanier SA, Combs J, Kowatch R. Using natural language processing to classify suicide notes. AMIA Annu Symp Proc. 2008:1091.
Google Scholar
Pestian JP, Matykiewicz P, Linn-Gust M, South B, Uzuner O, Wiebe J, Cohen KB, Hurdle J, Brew C. Sentiment analysis of suicide notes: a shared task. Biomed Inform Insights. 2012;5 Suppl 1:3–16.
Article PubMed PubMed Central Google Scholar
Pestian J P, Sorter M, Cohen KB, McCullumsmith C, Gee JT, Morency LP, Scherer S, Rohlfs LftSRG. A machine learning approach to identifying the thought markers of suicidal subjects: a prospective multicenter trial [in press]. Suicide Life Threat Behav. 2016.
Google Scholar
Poesio M. Discourse annotation and semantic annotation in the GNOME corpus. Poster presented at the proceedings of the 2004 ACL workshop on discourse annotation, Barcelona, Spain. 2004. http://delivery.acm.org/10.1145/1610000/1608948/p72-poesio.pdf?ip=205.142.197.101&acc=OPEN&CFID=102282657&CFTOKEN=85566326&__acm__=1336683135_1b135c40faae93e87b9f6cbfbfe6d031.
Poesio M, Vieira R. A corpus-based investigation of definite description use. Comput Linguist. 1998;24(2):183–216.
Google Scholar
Pradhan S, Elhadad N, Chapman W, Manandhar S, Savova G. Semeval-2014 task 7: analysis of clinical text. SemEval. 2014;199(99):54.
Google Scholar
Prasad R, Dinesh N, Lee A, Miltsakaki E, Robaldo L, Joshi AK, Webber BL. The Penn Discourse TreeBank 2.0. Paper presented at the proceedings of LREC (Language Resources and Evaluation Conference). 2008.
Google Scholar
PropBank Project. Retrieved from http://verbs.colorado.edu/~mpalmer/projects/ace.html.
Pustejovsky J, Hanks P, Sauri R, See A, Gaizauskas R, Setzer A, Radev D, Sundheim B, Day D, Ferro L, Lazo M. The TIMEBANK corpus. Paper presented at the proceedings of the corpus linguistics. 2003.
Google Scholar
Pustejovsky J, Stubbs A. Natural language annotation for machine learning. Sebastopol: O’Reilly Media; 2012.
Google Scholar
Ratner NB, Rooney B, MacWhinney B. Analysis of stuttering using CHILDES and CLAN. Clinical Linguistics & Phonetics. 1996;10(3):169–87.
Article Google Scholar
Resnik P, Lin J. Evaluation of NLP systems. In: Clark A, Fox C, Lappin S, editors. The handbook of computational linguistics and natural language processing. Chichester/Malden: Wiley-Blackwell; 2010. p. 271–96.
Chapter Google Scholar
Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, Setzer A. Building a semantically annotated corpus of clinical texts. J Biomed Inform. 2009;42(5):950–66.
Article PubMed Google Scholar
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.
Article PubMed PubMed Central Google Scholar
SENSEVAL. Evaluation Exercises for the Semantic Analysis of Text. Retrieved from http://www.senseval.org/.
Settles B. Active learning literature survey. Madison: University of Wisconsin; 2010;52(55–66), 11.
Google Scholar
Shannon CE, Weaver W. The mathematical theory of communication. Urbana: University of Illinois Press; 1949.
Google Scholar
ShaRe. Shared Annotated Resource for the Clinical Domain. Clinical NLP Annotation Retrieved from https://www.clinicalnlpannotation.org/index.php/Main_Page.
SHARPn.org. Strategic Health IT Advanced Research Projects (SHARP): research focus area 4. Retrieved from http://informatics.mayo.edu/sharp/index.php/Main_Page.
Silverman KE, Beckman ME, Pitrelli JF, Ostendorf M, Wightman CW, Price P, Pierrehumbert JB, Hirschberg J. TOBI: a standard for labeling English prosody. Paper presented at the proceedings of ICSLP. 1992.
Google Scholar
Solti I, Cooke CR, Xia F, Wurfel MM. Automated classification of radiology reports for acute lung injury: comparison of keyword and machine learning based natural language processing approaches. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009, p. 314–319.
Google Scholar
Source Forge. OBO Annotator. Retrieved from https://sourceforge.net/projects/obo-annotator/.
South BR, Shen S, Jones M, Garvin J, Samore MH, Chapman WW, Gundlapalli AV. Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease. BMC Bioinform. 2009;10(Suppl 9):S12.
Google Scholar
Styler IV WF, Bethard S, Finan S, Palmer M, Pradhan S, de Groen PC, Erickson B, Miller T, Lin C, Savova G. Temporal annotation in the clinical domain. Transactions of the Association for Computational Linguistics. 2014;2:143–54.
Google Scholar
Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J Am Med Inform Assoc. 2013;20(5).
Google Scholar
Thompson P, Poulin C, Bryan CJ. Predicting military and veteran suicide risk: cultural aspects. ACL. 2014;2014:1.
Google Scholar
THYME. Temporal History of Your Medical Events. Retrieved from http://clear.colorado.edu/compsem/index.php?page=endendsystems&sub=temporal.
TimeML. TimeML specifications. Retrieved from http://www.timeml.org/publications/specs.html.
TLA. The Language Archive. ELAN: a professional tool for the creation of complex annotations on video and audio resources. Retrieved February 12, 2016 https://tla.mpi.nl/tools/tla-tools/elan/.
Tomanek K, Hahn U. Timed annotations – enhancing MUC7 metadata by the time it takes to annotate named entities. Paper presented at the proceedings of the linguistic annotation workshop. 2009.
Google Scholar
UMLS [Internet]. United Medical Language System (UMLS). Retrieved from https://www.nlm.nih.gov/research/umls/.
UPMC. University of Pittsburg Medical Center. TIES: a clinical text search engine. Retrieved from http://ties.upmc.com/.
Uzuner O. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc. 2009;16(4):561–70.
Article PubMed PubMed Central Google Scholar
Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR. Evaluating the state of the art in coreference resolution for electronic medical records. J Am Med Inform Assoc. 2012;19(5):786–91.
Article PubMed PubMed Central Google Scholar
Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550–63.
Article PubMed PubMed Central Google Scholar
Uzuner O, Sibanda TC, Luo Y, Szolovits P. A de-identifier for medical discharge summaries. Artif Intell Med. 2008;42(1):13–35.
Article PubMed Google Scholar
Uzuner O, Solti I, Xia F, Cadag E. Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am Med Inform Assoc. 2010;17(5):519–23.
Article PubMed PubMed Central Google Scholar
Uzuner O, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011;18(5):552–6.
Article PubMed PubMed Central Google Scholar
Wiebe J, Wilson T, Cardie C. Annotating expressions of opinions and emotions in language. Lang Resour Eval. 2005;39(2):165–210.
Article Google Scholar
Wittenburg P, Brugman H, Russel A, Klassmann A, Sloetjes H. Elan: a professional framework for multimodality research. Paper presented at the proceedings of LREC. 2006.
Google Scholar
WordNet. A lexical database for English. Retrieved June 1, 2012, from Princeton University http://wordnet.princeton.edu/.
Yetisgen-Yildiz M, Solti I, Xia F. Using amazon’s mechanical turk for annotating medical named entities. AMIA Annu Symp Proc. 2010;2010:1316.
PubMed PubMed Central Google Scholar
Young EC. The effects of treatment on consonant cluster and weak syllable reduction processes in misarticulating children. Language, Speech, and Hearing Services in Schools. 1987;18(1):23–33.
Article Google Scholar
Zampolli A, Calzolari N, Palmer MS, Walker DE. Current issues in computational linguistics : in honour of Don Walker. Pisa/Norwell: Giardini ; Distributed in the U.S.A. and Canada by Kluwer Academic Publishers; 1994.
Google Scholar
Zhan C, Miller MR. Administrative data based patient safety research: a critical review. Qual Saf Health Care. 2003:12(Suppl 2):ii58–63.
Google Scholar
Zhang L, Huang X, Liu T, Li A, Chen Z, Zhu T. Using linguistic features to estimate suicide probability of Chinese microblog users human centered computing. New York: Springer; 2014. p. 549–59.
Google Scholar
Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: current progress. Brief Bioinform. 2007;8(5):358–75.
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Department of Pediatrics, Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati College of Medicine, 3333 Burnet Avenue, Cincinnati, OH, 45229-3039, USA
Brian Connolly Ph.D.
Department of Pediatrics, Harvard Medical School, Boston Children’s Hospital, 300 Longwood Avenue Enders 138, Boston, MA, 02115, USA
Timothy Miller Ph.D.
Department of Pediatrics and Biomedical Informatics, Division of Biomedical Informatics, Children’s Hospital Medical Center, University of Cincinnati College of Medicine, 3333 Burnet Avenue, ML-7024, Cincinnati, OH, 45229-3039, USA
Yizhao Ni Ph.D.
University of Colorado School of Medicine, 13001 E 17th Pl, RC-1 S. Room L18-6102, Aurora, CO, 80045, USA
Kevin B. Cohen Ph.D.
Children’s Hospital Boston and Harvard Medical School, 300 Longwood Avenue, Enders 138, Boston, MA, 02115, USA
Guergana Savova Ph.D.
Departments of Pediatrics and Biomedical Informatics, Divisions of Emergency Medicine and Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati College of Medicine, 3333 Burnet Ave, ML-2008, Cincinnati, OH, 45229, USA
Judith W. Dexheimer Ph.D.
Department of Pediatrics and Biomedical Informatics, Division of Emergency Medicine, Cincinnati Children’s Hospital Medical Center, University of Cincinnati College of Medicine, 3333 Burnet Ave, ML-2008, Cincinnati, OH, 45229-3039, USA
John Pestian Ph.D., M.B.A.

Authors

Brian Connolly Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Miller Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Yizhao Ni Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Kevin B. Cohen Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Guergana Savova Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Judith W. Dexheimer Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
John Pestian Ph.D., M.B.A.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brian Connolly Ph.D. .

Editor information

Editors and Affiliations

Children's Hospital Research Foundation, Cincinnati, Ohio, USA
John J. Hutton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Connolly, B. et al. (2016). Natural Language Processing – Overview and History. In: Hutton, J. (eds) Pediatric Biomedical Informatics. Translational Bioinformatics, vol 10. Springer, Singapore. https://doi.org/10.1007/978-981-10-1104-7_11

Download citation

DOI: https://doi.org/10.1007/978-981-10-1104-7_11
Published: 09 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1102-3
Online ISBN: 978-981-10-1104-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics