Abstract
We discuss specific biomedical Natural Language Processing-based applications that cover a wide spectrum of use cases within the field of translational and health services research. In our use cases we focus on four categories of applications: (1) Information Extraction (IE), (2) Document Classification, (3) Patient Classification, and (4) Sentiment Analysis. We show how the extracted information could be used for (a) Phenotype identification, (b) Comparative effectiveness studies, (c) Cohort identification, (d) Meaningful Use, and (e) Linking patients’ phenotype and genotype. In addition, we discuss the use of Natural Language Processing components for de-identification of large collections of patient notes. We review the literature for examples of pediatric natural language processing applications and show the transferability of select adult clinical natural language processing applications to the pediatric population.
Keywords
- Natural Language Processing
- Electronic Health Record
- Sentiment Analysis
- Unify Medical Language System
- Protected Health Information
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aberdeen J, et al. The MITRE identification scrubber toolkit: design, training, and assessment. Int J Med Inform. 2010;79(12):849–59.
Ackoff RL. From data to wisdom. J Appl Syst Anal. 1989;16(1):3–9.
AMA. Treatment of convulsive status epilepticus. Recommendations of the Epilepsy Foundation of America’s Working Group on Status Epilepticus. JAMA. 1993;270(7):854–9.
Arakami E. Automatic deidentification by using sentence features and label consistency. In: Workshop on challenges in natural language I2b2 processing for clinical data, Washington, DC; 2006.
Aronson AR, et al. The NLM indexing initiative. Proc AMIA Symp. 2000;2000:17–21.
Athenikos SJ, Han H. Biomedical question answering: a survey. Comput Methods Programs Biomed. 2010;99(1):1–24.
Athenikos SJ, Han H, Brooks AD. A framework of a logic-based question-answering system for the medical domain (LOQAS-Med). In: Proceedings of the 2009 ACM symposium on applied computing. Honolulu: ACM; 2009. p. 847–51.
Beckwith BA, et al. Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006;6:12.
Benton A, et al. A system for de-identifying medical message board text. BMC Bioinformatics. 2011;12 Suppl 3:S2.
Berman JJ. Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med. 2003;127(6):680–6.
Brownstein JS, Kleinman KP, Mandl KD. Identifying pediatric age groups for influenza vaccination using a real-time regional surveillance system. Am J Epidemiol. 2005;162(7):686–93.
Cairns BL, et al. The MiPACQ clinical question answering system. AMIA Annu Symp Proc. 2011;2011:171–80.
cancer Text Information Extraction System (caTIES). [cited 2012 Mar 19]. Available from: https://cabig.nci.nih.gov/community/tools/caties
Centers for Medicare and Medicaid Services (CMS). Clinical Quality Measures (CQMs). [cited 2012 Mar 19]. Available from: http://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/ClinicalQualityMeasures.html
Chapman WW, et al. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp. 2001;2001:105–9.
Choi JD, Palmer M. Getting the most out of transition-based dependency parsing. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies. Portland: Association for Computational Linguistics; 2011a. p. 687–92.
Choi JD, Palmer M. Transition-based semantic role labeling using predicate argument clustering. In: Proceedings of the ACL 2011 workshop on relational models of semantics. Portland: Association for Computational Linguistics; 2011b. p. 37–45.
Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on natural language processing in the biomedical domain, vol. 3. Phildadelphia: Association for Computational Linguistics; 2002. p. 29–36.
Coursera.org [Standford University]. Natural language processing. [cited 2012 June 1]; Available from: https://class.coursera.org/nlp/auth/welcome
Crowley RS, et al. caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J Am Med Inform Assoc. 2010;17(3):253–64.
cTakes (Clinical Text Analysis and Knowledge Extraction System). [cited 2012 June 4]. Available from: http://ohnlp.svn.sourceforge.net/viewvc/ohnlp/trunk/cTAKES/
Deleger L, et al. Building gold standard corpora for medical natural language processing tasks. In: American medical informatics annual symposium proceedings. Chicago, 1–6 November 2012.
Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.
Denny JC, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26(9):1205–10.
Dunlop AL, et al. The impact of HIPAA authorization on willingness to participate in clinical research. Ann Epidemiol. 2007;17(11):899–905.
eMERGE Network: electronic medical records & genomics. A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. [cited 2012 Mar 19]. Available from: http://gwas.net/
Fielstein FJ, Brown SH, Speroff T. Algorithmic De-identification of VA medical exam text for HIPAA privacy compliance: preliminary findings. In: Fiesch M, Coiera E, Li YCJ, editors. MEDINFO 2004: proceedings of the 11th World Congress on Medical Informatics. Fairfax: Ios Press; 2004. p. 1590.
Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc. 2008;15(5):601–10.
Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp. 1997;1997:595–9.
Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp. 2000;2000:270–4.
Gardner J, Xiong L. HIDE: an integrated system for health information DE-identification. In: Proceedings of the 21st IEEE international symposium on computer-based medical systems. Los Alamitos: IEEE Computer Society; 2008. p. 254–9.
Guo Y, et al. Identifying personal health information using support vector machines. In: I2b2 workshop on challenges in natural language processing for clinical data, Washington, DC; 2006.
Gupta D, Saul M, Gilbertson J. Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004;121(2):176–86.
Hansen ML, Gunn PW, Kaelber DC. Underdiagnosis of hypertension in children and adolescents. JAMA. 2007;298(8):874–9.
Hara K. Applying a SVM based Chunker and a text classifier to the deid challenge. In: I2b2 workshop on challenges in natural language processing for clinical data, Washington, DC; 2006.
Haug PJ, et al. Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care. 1995;19:284–8.
Health Information Technologies Research Laboratory (HITRL). [cited 2012 Mar 19]. Available from: http://hitrl.it.usyd.edu.au/
Health information Text Extraction (HITEx). HITEx manual v2.0. [cited 2012 Mar 19]. Available from: https://www.i2b2.org/software/projects/hitex/hitex_manual.html
Health Insurance Portability and Accountability Act of 1996 (HIPAA). P.L. 104–191. In: 42 U.S.C. 1996.
Hripcsak G, Kuperman GJ, Friedman C. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods Inf Med. 1998;37(1):1–7.
Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle: ACM; 2004. p. 168–77.
IBM – Watson. (n.d.) [cited 2012 April 5]. Available from: http://www-03.ibm.com/innovation/us/watson/index.html
Institute of Medicine (IOM). Initial national priorities for comparative effectiveness research [Consensus Report]. 2009 [cited 2012 Mar 19]. Available from: http://www.iom.edu/Reports/2009/ComparativeEffectivenessResearchPriorities.aspx
Institute of Medicine (IOM). The learning healthcare system in 2010 and beyond: understanding, engaging, and communicating the possibilities. [Workshop]. 2010 Apr [cited 2012 June 1]; Available from: http://www.iom.edu/Activities/Quality/VSRT/2010-APR-01.aspx
Jha AK. The promise of electronic records: around the corner or down the road? JAMA. 2011;306(8):880–1.
JULIE Lab. Jena University Language & Information Engineering Lab. [cited 2012 Mar 19]. Available from: http://www.julielab.de/
Jurafsky D, Martin JH. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall series in artificial intelligence. Upper Saddle River: Prentice Hall; 2000. xxvi, 934 p.
Kho AN, et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med. 2011;3(79):79re1.
Kimia AA, et al. Utility of lumbar puncture for first simple febrile seizure among children 6 to 18 months of age. Pediatrics. 2009;123(1):6–12.
Kimia A, et al. Yield of lumbar puncture among children who present with their first complex febrile seizure. Pediatrics. 2010;126(1):62–9.
Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet. 2011;12(6):417–28.
Kullo IJ, et al. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc. 2010;17(5):568–74.
Lexical Systems Group. Specialist NLP Tools. [cited 2012 June 1]. Available from: http://lexsrv3.nlm.nih.gov/Specialist/Home/index.html
Liao KP, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). 2010;62(8):1120–7.
Lin C, et al. Feature engineering and selection for rheumatoid arthritis disease activity classification using electronic medical records. In: Proceedings of the 29th international ICML conference Workshop on Machine Learning for Clinical Data; 2012; Edinburgh, Scotland.
Lindberg DA, Humphreys BL, McCray AT. The unified medical language system. Methods Inf Med. 1993;32(4):281–91.
Liu B. Sentiment analysis and opinion mining. In: Paper presented at the twenty-fifth conference on artificial intelligence (AAAI-11 Tutorial); 2011; San Franciso. p. 1–99.
Lucene. Apache Lucene Core. [cited 2012 Mar 13]. Available from: http://lucene.apache.org/core/
Mack R, et al. Text analytics for life science using the unstructured information management architecture. IBM Syst J. 2004;43(3):490–515.
Manning CD, Schütze H. Foundations of statistical natural language processing. 2nd printing, with corrections. ed. Cambridge, MA: MIT Press; 2000. xxxvii, 680 p.
McCarty C, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4(1):13.
Meystre S, Haug PJ. Evaluation of medical problem extraction from electronic clinical documents using MetaMap transfer (MMTx). In: Proceedings of MIE2005 – the XIXth international congress of the European Federation for Medical Informatics. Amsterdam: Ios Press; 2005. p. 823–8.
Meystre SM, et al. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008;2008:128–44.
Meystre SM, et al. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol. 2010;10:70.
MIST: The MITRE identification scrubber toolkit. [cited 2012 June 4]. Available from: http://mist-deid.sourceforge.net/
Murphy SN, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124–30.
National Centre for Text Mining (NaCTeM). [cited 2012 Mar 19]. Available from: http://www.nactem.ac.uk/index.php
Neamatullah I, et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8:32.
Nielsen RD, et al. An architecture for complex clinical question answering. In: Proceedings of the 1st ACM international health informatics symposium. Arlington: ACM; 2010. p. 395–9.
Online Colleges.net. Standford introducing five free online classes by Anna Schumann. 2012 [cited 2012 June 1]. Available from: http://www.onlinecolleges.net/2012/03/07/stanford-introducing-five-free-online-classes/
OpenNLP Tools 1.5.0 API: sentence boundary detector. [cited 2012 June 4]. Available from: http://opennlp.sourceforge.net/api/index.html
Palmer M, Gildea D, Kingsbury P. The proposition Bank: an annotated corpus of semantic roles. Comput Linguist. 2005;31(1):71–106.
Pestian JP, et al. Using natural language processing to classify suicide notes. AMIA Annu Symp Proc. 2008;2008:1091.
Pestian JP, et al. Sentiment analysis of suicide notes: a shared task. Biomed Inform Insights. 2012;5 Suppl 1:3–16.
Riviello Jr JJ, et al. Practice parameter: diagnostic assessment of the child with status epilepticus (an evidence-based review): report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society. Neurology. 2006;67(9):1542–50.
Ruch P, et al. Medical document anonymization with a semantic lexicon. Proc AMIA Symp. 2000;2000:729–33.
Savova GK, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010a;17(5):507–13.
Savova GK, et al. Discovering peripheral arterial disease cases from radiology notes using natural language processing. AMIA Annu Symp Proc. 2010b;2010:722–6.
Savova GK, et al. Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record. J Am Med Inform Assoc. 2012;19(e1)83–9.
Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv (CSUR). 2002;34(1):1–47.
Singh RK, et al. Prospective study of new-onset seizures presenting as status epilepticus in childhood. Neurology. 2010;74(8):636–42.
Sohn S, et al. Classification of medication status change in clinical narratives. AMIA Annu Symp Proc. 2010;2010:762–6.
Solti I. Increasing clinical trial enrollment: a semi-automated patient centered approach. NIH Project No: 5R00LM010227-04. [cited 2012 June 1]. Available from: http://projectreporter.nih.gov/project_info_description.cfm?aid=8215715&icde=12435657
Solti I, et al. Automated classification of radiology reports for acute Lung injury: comparison of keyword and machine learning based natural language processing approaches. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009;2009:314–19.
Stein SC, Hurst RW, Sonnad SS. Meta-analysis of cranial CT scans in children. A mathematical model to predict radiation-induced tumors. Pediatr Neurosurg. 2008;44(6):448–57.
Szarvas G, Farkas R, Busa-Fekete R. State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007;14(5):574–80.
Taira RK, Bui AA, Kangarloo H. Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp. 2002;2002:757–61.
U.S. Department of Health and Human Services (HHS). Secretary Sebelius announces final rules to support ‘Meaningful Use’ of electronic health records [News Release]. 2010 [cited 2012 Mar 19]. Available from: http://www.hhs.gov/news/press/2010pres/07/20100713a.html
U-Compare. [cited 2012 Mar 19]. Available from: http://u-compare.org/index.en.html
UIMA (Unstructured Information Management Applications). Apache UIMA. [cited 2012 June 4]. Available from: http://uima.apache.org/
Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550–63.
Uzuner O, et al. A de-identifier for medical discharge summaries. Artif Intell Med. 2008;42(1):13–35.
Warren Z, et al. Therapies for children with autism spectrum disorders. comparative effectiveness review, AHRQ, Number 26. 2011 [cited 2012 June 1]. Available from: http://www.effectivehealthcare.ahrq.gov/ehc/products/106/656/CER26_Autism_Report_04-14-2011.pdf
Weber GM, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624–30.
Weiming W, et al. Automatic clinical question answering based on UMLS relations. In: Proceedings of the third international conference on semantics, knowledge and grid. Washington, DC: IEEE Computer Society; 2007. p. 495–8.
Wellner B. Sequence models and ranking methods for discourse parsing. Waltham: Brandeis University; 2009.
Wilke RA, et al. The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther. 2011;89(3):379–86.
Wolf MS, Bennett CL. Local perspective of the impact of the HIPAA privacy rule on research. Cancer. 2006;106(2):474–9.
Yu H, Cao YG. Automatically extracting information needs from Ad Hoc clinical questions. AMIA Annu Symp Proc. 2008;2008:96–100.
Zeng QT, et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006;6:30.
Acknowledgements
Dr. Savova’s work was supported in part by NIH grants U54LM008748 and 1U01HG006828. Drs. Deleger’s and Solti’s work was supported in part by NIH grant 5R00LM010227.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Savova, G.K., Deleger, L., Solti, I., Pestian, J., Dexheimer, J.W. (2012). Natural Language Processing: Applications in Pediatric Research. In: Hutton, J. (eds) Pediatric Biomedical Informatics. Translational Bioinformatics, vol 2. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5149-1_10
Download citation
DOI: https://doi.org/10.1007/978-94-007-5149-1_10
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5148-4
Online ISBN: 978-94-007-5149-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)