Syntactic-Semantic Frames for Clinical Cohort Identification Queries

  • Dina Demner-Fushman
  • Swapna Abhyankar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7348)


Large sets of electronic health record data are increasingly used in retrospective clinical studies and comparative effectiveness research. The desired patient cohort characteristics for such studies are best expressed as free text descriptions. We present a syntactic-semantic approach to structuring these descriptions. We developed the approach on 60 training topics (descriptions) and evaluated it on 35 test topics provided within the 2011 TREC Medical Record evaluation. We evaluated the accuracy of the frames as well as the modifications needed to achieve near perfect precision in identifying the top 10 eligible patients. Our automatic approach accurately captured 34 test descriptions; 25 automatic frames needed no modifications for finding eligible patients. Further evaluations of the overall average retrieval effectiveness showed that frames are not needed for simple descriptions containing one or two key terms. However, our training results suggest that the frames are needed for more complex real-life cohort selection tasks.


Test Question Clinical Cohort Electronic Health Record Data Information Retrieval Method Query Frame 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)Google Scholar
  2. 2.
    Boxwala, A., Kim, H., Choi, J., Ohno-Machado, L.: Understanding data and query requirements for cohort identification in clinical research studies. In: AMIA Annu. Symp. Proc., p. 95 (2011)Google Scholar
  3. 3.
    Cimino, J.J., Ayres, E.J.: The clinical research data repository of the US National Institutes of Health. Stud. Health Technol. Inform. 160(Pt 2), 1299–1303 (2010)Google Scholar
  4. 4.
    Demner-Fushman, D., Lin, J.: Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics 33(1), 63–103 (2007)CrossRefGoogle Scholar
  5. 5.
    Deshmukh, V.G., Meystre, S.M., Mitchell, J.A.: Evaluating the informatics for integrating biology and the bedside system for clinical research. BMC Med. Res. Methodol. 9, 70 (2009)CrossRefGoogle Scholar
  6. 6.
    Friedman, C., Shagina, L., Lussier, Y., Hripcsak, G.: Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc. 11(5), 392–402 (2004)CrossRefGoogle Scholar
  7. 7.
    Huang, X., Lin, J., Demner-Fushman, D.: Evaluation of PICO as a knowledge representation for clinical questions. In: AMIA Annu. Symp. Proc., pp. 359–363 (2006)Google Scholar
  8. 8.
    Ide, N.C., Loane, R.F., Demner-Fushman, D.: Essie: a concept-based search engine for structured biomedical text. J. Am. Med. Inform. Assoc. 14(3), 253–263 (2007)CrossRefGoogle Scholar
  9. 9.
    Institute of Medicine of the National Academies (IOM): 100 initial priority topics for comparative effectiveness research,
  10. 10.
    Jacquemart, P., Zweigenbaum, P.: Towards a medical question-answering system: a feasibility study. Stud. Health Technol. Inform. 95, 463–468 (2003)Google Scholar
  11. 11.
    JournalWATCH® General Medicine: (updated August 16, 2011, accessed August 16, 2011)
  12. 12.
    Lindberg, D.A.B., Humphreys, B.L., McCray, A.T.: The Unified Medical Language System. Meth. Inform. Med. 32, 281–291 (1993)Google Scholar
  13. 13.
    Lowe, H.J., Ferris, T.A., Hernandez, P.M., Weber, S.C.: STRIDE–An integrated standards-based translational research informatics platform. In: AMIA Annu. Symp. Proc., pp. 391–395 (2009)Google Scholar
  14. 14.
    de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: LREC 2006 (2006), (accessed August 16, 2011)
  15. 15.
    Murphy, S.N., Barnett, G.O., Chueh, H.C.: Visual query tool for finding patient cohorts from a clinical data warehouse of the Partners HealthCare system. In: Proc. AMIA Symp., p. 1174 (2000)Google Scholar
  16. 16.
    Narayanan, S., Harabagiu, S.: Question Answering based on Semantic Structures. In: International Conference on Computational Linguistics COLING 2004, Geneva, Switzerland (2004)Google Scholar
  17. 17.
    Richardson, W.S., Wilson, M.C., Nishikawa, J., Hayward, R.S.: The well-built clinical question: a key to evidence-based decisions. ACP J. Club 123, A12–3 (1995)Google Scholar
  18. 18.
    Ruiz, E.E., Chilov, M., Johnson, S.B., Mendonça E.A.: Developing multilevel search filters for clinical questions represented as conceptual graphs. In: AMIA Annu. Symp. Proc., p. 1118 (2008)Google Scholar
  19. 19.
    Tu, S., Peleg, M., Carini, S., Bobak, M., Ross, J., Rubin, D., Sim, I.: A practical method for transforming free-text eligibility criteria into computable criteria. J. Biomed. Inform. 44(2), 239–250 (2011)CrossRefGoogle Scholar
  20. 20.
    Voorhees, E., Tong, R.: Overview of the TREC 2011 Medical Records Track. In: The Twentieth Text REtrieval Conference Proceedings TREC 2011, Gaithersburg, MD. National Institute for Standards and Technology (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Dina Demner-Fushman
    • 1
  • Swapna Abhyankar
    • 1
  1. 1.National Library of MedicineBethesdaUSA

Personalised recommendations