Natural Language Processing for Understanding Contraceptive Use at the VA

  • Matthew ScotchEmail author
  • Cynthia Brandt
  • Sylvia Leung
  • Julie Womack
Part of the Annals of Information Systems book series (AOIS, volume 19)


Objective: To evaluate the potential of Natural Language Processing (NLP) for understanding contraceptive use among female Veterans seeking care at Veterans Administration (VA) healthcare facilities.

Design: Retrospective chart review of a subset of female Veterans enrolled in the Women Veterans Cohort Study (WVCS) who sought care at the VA Connecticut Healthcare facility (in West Haven, CT) in 2009 and completed a survey that included self-reported contraceptive use. In addition, only notes that were annotated for contraceptive use from a prior study that included 227 patients WVCS participants were selected.

Methods: A biomedical ontology of contraceptive terms and concepts was created that included both permanent methods (e.g. hysterectomy) as well as non-permanent methods (e.g. oral contraceptives). The new ontology, along with a section of the VA’s National Drug File was used as the knowledge base for information extraction from the free-text medical records. Included were 208 annotated notes across 39 patients. The General Architecture for Text Engineering (GATE), an open-source application for development of NLP pipelines was used. The ontology was added to GATE along with a processing resource that was developed in order to create an ontology-aware information extraction plugin for the pipeline. In addition, prior resources developed for negation of concepts (e.g. The patient denies using a emergency contraceptive) were utilized.

The NLP pipeline extracted contraceptives currently used by the patient, ones not currently used (prior use or recommended use by the clinician), or whose use was negated. A Boolean matrix of concepts by each patient was produced for input into a decision tree classifier. Tenfold cross validation created iterations of training and testing sets to estimate active versus inactive contraceptive. Responses to self-reported contraceptive use on the prior survey were used as the gold standard.

Results: The use of manual annotation, development of a biomedical ontology, and creation of a natural language processing pipeline achieved high precision (0.83) and recall (0.84). The weighted F-measure was 0.83.

Conclusion: Our combined approach utilized annotation of concepts, a biomedical ontology of contraceptives, and a natural language processing pipeline for information extraction. Our results highlight the potential for biomedical informatics to support research of contraceptive use among female Veterans at the VA. Additional research needs to be done that evaluates the accuracy of contraceptive information in the VA’s Electronic Health Record (EHR) with the consideration of both free text and semi-structured data such as pharmacy records.


Natural language processing Contraceptive agents Veterans Medical informatics 



This work was supported in part by a Veterans Affairs Health Services Research & Development (HSR&D) grant HIR 09-007 and is a translational use case project within the VA-funded Consortium for Healthcare Informatics Research (CHIR). In addition, this work is supported in part by VA grant DHI 07-065-1 to CB. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs. The authors would like to thank colleagues from the Tampa VA, specifically Dr. James McCart, Mr. Jay Jarman, and Dr. Stephen Luther for providing GATE plugins. The authors would also like to thank Ms. Harini Bathulapalli for her database work and Mr. Brett South for providing code to export Knowtator annotations. Finally, the authors would like to thank Dr. Jyotishman Pathak for his feedback on the project.


  1. (2012). Contraception terms. [cited 2011 Dec 1].
  2. BioPortal. 2010 [cited 2010 Dec 5].
  3. Brown SH et al (2004) VA National Drug File Reference Terminology: a cross-institutional content coverage study. Stud Health Technol Inform 107(Pt 1):477–481Google Scholar
  4. Chapman WW et al (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310CrossRefGoogle Scholar
  5. Chapman WW et al (2011) Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 18(5):540–543CrossRefGoogle Scholar
  6. Chiang JH, Lin JW, Yang CW (2010) Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE). J Am Med Inform Assoc 17(3):245–252CrossRefGoogle Scholar
  7. Coonan KM (2004) Medical informatics standards applicable to emergency department information systems: making sense of the jumble. Acad Emerg Med 11(11):1198–1205CrossRefGoogle Scholar
  8. Cote RA, Robboy S (1980) Progress in medical information management. Systematized nomenclature of medicine (SNOMED). JAMA 243(8):756–762CrossRefGoogle Scholar
  9. Denny JC et al (2012) Natural language processing improves identification of colorectal cancer testing in the electronic medical record. Med Decis Making 32(1):188–197CrossRefGoogle Scholar
  10. (2012) Contraception drugs. [cited 2011 Dec 2].
  11. Friedman C (1997) Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp 1997:595–599Google Scholar
  12. Garvin JH et al (2012) Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. J Am Med Inform Assoc 19(5):859–866CrossRefGoogle Scholar
  13. GATE (2011) [cited 2012 Apr 16].
  14. Haskell SG et al (2011) The burden of illness in the first year home: do male and female VA users differ in health conditions and healthcare utilization. Womens Health Issues 21(1):92–97CrossRefGoogle Scholar
  15. Hyun S, Johnson SB, Bakken S (2009) Exploring the ability of natural language processing to extract data from nursing narratives. Comput Inform Nurs 27(4):215–223; quiz 224–5CrossRefGoogle Scholar
  16. IHTSDO (2012) SNOMED. [cited 2012 May 10].
  17. Lee JH, Gonzalez GH (2011) Towards integrative gene prioritization in Alzheimer’s disease. Pac Symp Biocomput 4:13Google Scholar
  18. Meystre SM, Haug PJ (2005) Comparing natural language processing tools to extract medical problems from narrative text. AMIA Annu Symp Proc 2005:525–529Google Scholar
  19. Morrison FP, Sengupta S, Hripcsak G (2009) Using a pipeline to improve de-identification performance. AMIA Annu Symp Proc 2009:447–451Google Scholar
  20. NLM (2012) RxNorm. [cited 2012 May 10].
  21. Noy NF et al (2003) Protege-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA Annu Symp Proc 2003:953Google Scholar
  22. Ogren PV (2006) Knowtator: a protégé plug-in for annotated corpus construction. Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2006Google Scholar
  23. Ogren P (2007) Knowtator . [cited 2012 May 12].
  24. Rubin D et al (2010) Natural language processing for lines and devices in portable chest X-rays. AMIA Annu Symp Proc 2010:692–696Google Scholar
  25. Savova GK et al (2010) Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17(5):507–513CrossRefGoogle Scholar
  26. Veterans Health Administration (2007) Program Announcement for Request for Concept Paper for Service Directed Research: Consortium for Healthcare Informatics Research (CHIR). [cited 2012 May 12].
  27. Veterans Health Administration (2012) VINCI. [cited 2012 May 12].
  28. Wang X et al (2008) Automated knowledge acquisition from clinical narrative reports. AMIA Annu Symp Proc 2008:783–787Google Scholar
  29. Warrer P et al (2012) Using text-mining techniques in electronic patient records to identify ADRs from medicine use. Br J Clin Pharmacol 73(5):674–684CrossRefGoogle Scholar
  30. Whetzel PL et al (2011) BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res 39:541–545CrossRefGoogle Scholar
  31. Womack J et al (2012) Analysis of contraceptive use among female veterans at the VA. AMIA Summit on Clinical Research Informatics, San FranciscoGoogle Scholar
  32. Yano EM et al (2006) Toward a VA Women’s Health Research Agenda: setting evidence-based priorities to improve the health and health care of women veterans. J Gen Intern Med 21(Suppl 3):S93–S101CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Matthew Scotch
    • 1
    • 2
    Email author
  • Cynthia Brandt
    • 1
    • 3
  • Sylvia Leung
    • 4
  • Julie Womack
    • 1
  1. 1.VA Connecticut Healthcare SystemWest HavenUSA
  2. 2.Department of Biomedical InformaticsArizona State UniversityScottsdaleUSA
  3. 3.Yale Center for Medical InformaticsNew HavenUSA
  4. 4.VA Palo Alto Health Care SystemPalo AltoUSA

Personalised recommendations