Information Retrieval

, Volume 10, Issue 2, pp 173–202 | Cite as

Knowledge-based query expansion to support scenario-specific retrieval of medical free text



In retrieving medical free text, users are often interested in answers pertinent to certain scenarios that correspond to common tasks performed in medical practice, e.g., treatment or diagnosis of a disease. A major challenge in handling such queries is that scenario terms in the query (e.g., treatment) are often too general to match specialized terms in relevant documents (e.g., chemotherapy). In this paper, we propose a knowledge-based query expansion method that exploits the UMLS knowledge source to append the original query with additional terms that are specifically relevant to the query's scenario(s). We compared the proposed method with traditional statistical expansion that expands terms which are statistically correlated but not necessarily scenario specific. Our study on two standard testbeds shows that the knowledge-based method, by providing scenario-specific expansion, yields notable improvements over the statistical method in terms of average precision-recall. On the OHSUMED testbed, for example, the improvement is more than 5% averaging over all scenario-specific queries studied and about 10% for queries that mention certain scenarios, such as treatment of a disease and differential diagnosis of a symptom/disease.


Medical information retrieval Scenario-specific information retrieval Knowledge-based information retrieval Query expansion Knowledge-based systems 



This research is supported in part by NIC/NIH Grant #4442511-33780. We are grateful to Dr. Andrew Chen and Wei Liu from the UCLA School of Medicine for providing their domain knowledge during the knowledge acquisition task. We thank Dr. Blaine Kristo and his colleagues for their efforts in evaluating the relevancy of expansion terms. We also thank our colleagues in the UCLA Department of Radiology Sciences, Dr. Hooshang Kangarloo, Dr. Denise Aberle, and Dr. Suzie El-Saden, for stimulating discussions and insightful comments. Finally, we thank Nancy Wilczynski and Dr. Brian Haynes from the Health Information Research Unit (HIRU) at McMaster University for sharing their valuable dataset for our experimental verification of the effectiveness of the knowledge-based expansion methodology.


  1. Aho, A. V., & Corasick, M. J. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18(6), 330–340.CrossRefMathSciNetGoogle Scholar
  2. Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. In Proceedings of the AMIA Annual Symposium 2001 (pp. 17–21).Google Scholar
  3. Aronson, A. R., & Rindflesch, T. C. (1997). Query expansion using the UMLS. In Proceedings of AMIA Annual Symposium 1997 (pp. 485–489).Google Scholar
  4. Buckley, C., Salton, G., Allan, J., & Singhal, A. (1994). Automatic query expansion using SMART: TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC-3) (pp. 69–80).Google Scholar
  5. Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1995). New retrieval approaches using SMART: TREC-4. In Proceedings of the Fourth Text REtrieval Conference (TREC-4) (pp. 25–48).Google Scholar
  6. Callan, J. P., Croft, W. B., & Harding, S. M. (1992). The INQUERY retrieval system. In Proceedings of the DEXA’92 (pp. 78–83).Google Scholar
  7. Efthimiadis, E. N. (1996). Query expansion. Annual Review of Information Science and Technology, 31, 121–187.Google Scholar
  8. Efthimiadis, E. N., & Biron, P. (1993). UCLA-okapi at TREC-2: Query expansion experiments. In Proceedings of the Second Text REtrieval Conference (TREC-2) (pp. 279–290).Google Scholar
  9. Ely, J. W., Osheroff, J. A., Ebell, M. H., Bergus, G. R., Levy, B. T., & Chambliss, M. L. (1999). Analysis of questions asked by family doctors regarding patient care. BMJ, 319(7), 211–220.Google Scholar
  10. Ely, J. W., Osheroff, J. A., Gorman, P. N., Ebell, M. H., Chambliss, M. L., & Pifer, E. A. (2000). A taxonomy of generic clinical questions: Classification study. BMJ, 321(12), 429–432.CrossRefGoogle Scholar
  11. Guo, Y., Harkema, H., & Gaizauskas, R. (2004). Sheffield University and the tree 2004 genomics track: Query expansion using synonymous terms. In Proceedings of the Thirteenth Text REtrieval Conference (TREC-13) (pp. 753–757).Google Scholar
  12. Haynes, R., McKibbon, K., Walker, C., Ryan, N., Fitzgerald, D., & Ramsden, M. (1990). Online access to medline in clinical settings. Annals of Internal Medicine, 112, 78–84.Google Scholar
  13. Hersh, W., Buckley, C., Leone, T. J., & Hickam, D. (1994). OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the ACM SIGIR’94 (pp. 192–201).Google Scholar
  14. Hersh, W. H., Price, S., & Donohoe, L. (2000). Assessing thesaurus-based query expansion using the UMLS metahesaurus. In Proceedings of the AMIA Annual Symposium 2000 (pp. 344–348).Google Scholar
  15. Hersh, W. R., Pentecost, J., & Hickam, D. H. (1996). A task-oriented approach to information retrieval evaluation. JASIS, 47(1), 50–56.CrossRefGoogle Scholar
  16. Jing, Y., & Croft, W. B. (1994). An association thesaurus for information retrieval. In Proceedings of the RIAO’94 (pp. 146–160).Google Scholar
  17. Liu, Z., & Chu, W. W. (2006). Knowledge-based query expansion to support scenario-specific retrieval of medical free text. Technical Report #060019. Los Angeles, CA: Computer Science Department, UCLA, Retrieved from Scholar
  18. Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1), 11–31.Google Scholar
  19. Mitra, M., Singhal, A., & Buckley, C. (1998). Improving automatic query expansion. In Proceedings of the ACM SIGIR’98 (pp. 206–214).Google Scholar
  20. Montori, V. M., Wilczynski, N. L., Morgan, D., & Haynes, R. B. (2003). Systematic reviews: A cross-sectional study of location and citation counts. BMC Medicine, 1(2). Published online 2003 November 24. doi: 10.1186/1741-7015-1-2.Google Scholar
  21. National Library of Medicine. (2001). UMLS knowledge sources (12th ed.).Google Scholar
  22. Plovnick, R. M., & Zeng, Q. T. (2004). Reformulation of consumer health queries with professional terminology: A pilot study. Journal of Medical Internet Research, 6(3). Published online 2004 September 3. doi: 10.2196/jmir.6.3.e27.Google Scholar
  23. Qiu, Y., & Frei, H. P. (1993). Concept-based query expansion. In Proceedings of the ACM SIGIR’93 (pp. 160–169).Google Scholar
  24. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1994). Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC-3) (pp. 109–126).Google Scholar
  25. Rocchio, J. J. (1971). The SMART Retrieval System—Experiments in automatic document processing, chapter Relevance feedback in information retrieval. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  26. Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.CrossRefGoogle Scholar
  27. Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41, 288–297.CrossRefGoogle Scholar
  28. Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.Google Scholar
  29. Srinivasan, P. (1996). Query expansion and MEDLINE. Information Processing and Management, 32(4), 431–443.CrossRefGoogle Scholar
  30. Tse, T., & Soergel, D. (2003). Exploring medical expressions used by consumers and the media: An emerging view of consumer health vocabularies. In Proceedings of AMIA Annual Symposium 2003 (pp. 674–678).Google Scholar
  31. Voorhees, E. M. (1993). On expanding query vectors with lexically related words. In Proceedings of the TREC-2 (pp. 223–232).Google Scholar
  32. Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In Proceedings of the ACM SIGIR’94 (pp. 61–69).Google Scholar
  33. Wilczynski, N. L., & Haynes, R. B. (2003). Developing optimal search strategies for detecting sound clinically sound causation studies in MEDLINE. In Proceedings of AMIA Annual Symposium 2003 (pp. 719–723).Google Scholar
  34. Wilczynski, N. L., McKibbon, K. A., & Haynes, R. B. (2001). Enhancing retrieval of best evidence for health care from bibliographic databases: calibration of the hand search of the literature. International Journal of Medical Informatics, 10(1), 390–393.Google Scholar
  35. Wong, N. L., Wilczyski, N. L., Haynes, R. B., & Ramkissoonsingh, R. (2003). Developing optimal search strategies for detecting sound clinical prediction studies in MEDLINE. In Proceedings of AMIA Annual Symp 2003 (pp. 728–732).Google Scholar
  36. Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. In Proceedings of the ACM SIGIR’96 (pp. 4–11).Google Scholar
  37. Zeng, Q., Kogan, S., Ash, N., Greenes, R. A., & Boxwala, A. A. (2002). Characteristics of consumer terminology for health information retrieval. Methods in Information in Medicine, 41(4), 289–298.Google Scholar
  38. Zou, Q., Chu, W. W., Morioka, C., Leazer, G. H., & Kangarloo, H. (2003). IndexFinder: A method of extracting key concepts from clinincal texts for indexing. In Proceedings of the AMIA Annual Symposium 2003 (pp. 763–767).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceUCLALos AngelesUSA

Personalised recommendations