Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3761))

  • 755 Accesses

Abstract

Searching specialized collections, such as biomedical literature, typically requires intimate knowledge of a specialized terminology. Hence, it can be a disappointing experience: not knowing the right terms to use and being unaware of synonyms or variations in terminology might result in low recall scores. We study the role of a thesaurus in the biomedical information retrieval process. We start by giving a description of vocabulary mismatch problems between natural language queries and relevant documents in biomedical literature search; we provide a detailed case study and observe the impact of vocabulary mismatch problems on retrieval effectiveness. Additionally, we analyze the associated MeSH thesaurus terms used to index the documents in the collection. Based on our observations, we propose a method for exploiting the MeSH thesaurus to improve retrieval effectiveness and, more specifically, to increase recall. We carry out a series of thesaurus-based retrieval experiments that show substantial performance improvements. We conclude with a detailed analysis of the retrieval results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ceusters, W., Smith, B., Goldberg, L.: A terminological and ontological analysis of the NCI thesaurus. Methods of Information in Medicine (2005) (in press)

    Google Scholar 

  2. Ceusters, W., Smith, B., Kuman, A., Dhaen, C.: Mistakes in medical ontologies: Where do they come from and how can they be detected? In: Ontologies in Medicine: Proceedings of the Workshop on Medical Ontologies. IOS Press, Amsterdam (2003)

    Google Scholar 

  3. Cleverdon, C.W.: Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. Technical report, College of Aeronautics, Cranfield UK (1962)

    Google Scholar 

  4. Cleverdon, C.W.: The Cranfield tests on index language devices. Aslib 19, 173–192 (1967)

    Article  Google Scholar 

  5. Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Harman, D.K. (ed.) The Second Text REtrieval Conference (TREC-2). National Institute for Standards and Technology. NIST Special Publication 500-215, pp. 243–252 (1994)

    Google Scholar 

  6. French, J.C., Powell, A.L., Gey, F., Perelman, N.: Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness. In: CIKM 2001: Proceedings of the tenth international conference on Information and knowledge management, pp. 199–206. ACM Press, New York (2001)

    Chapter  Google Scholar 

  7. Grabar, N., Zweigenbaum, P., Soualmia, L., Darmoni, S.: Matching controlled vocabulary words. In: Surjan, G., Engelbrecht, R., McNair, P. (eds.) Proceedings of MIE 2003, Eighteenth International Congress of the European Federation for Medical Informatics. IOS Press Publisher, Amsterdam (2003)

    Google Scholar 

  8. Hersh, W., Bhuptiraju, R.T., Ross, L., Johnson, P., Cohen, A., Kraemer, D.: Trec 2004 genomics track overview. In: The Thirteenth Text Retrieval Conference: TREC 2004, Gaithersburg, MD, National Institute of Standards and Technology (2004)

    Google Scholar 

  9. Hersh, W., Price, S., Donohoe, L.: Assessing thesaurus-based query expansion using the UMLS metathesaurus. In: Proc. of the 2000 American Medical Informatics Association (AMIA) Symposium, pp. 344–348 (2000)

    Google Scholar 

  10. Iivonen, M.: Consistency in the selection of search concepts and search terms. Information Processing and Management 31, 173–190 (1995)

    Article  Google Scholar 

  11. Kamps, J.: Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 283–295. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Kraaij, W., Weeber, M., Raaijmakers, S., Jelier, R.: MeSH based feedback, concept recognition and stacked classification for curation tasks. In: Proceedings of TREC 2004, NIST (2005)

    Google Scholar 

  13. Lancaster, F.W.: Vocabulary Control for Information Retrieval, 2nd edn. Information Resources Press, Arlington (1986)

    Google Scholar 

  14. National Library of Medicine. Medical Literature Analysis and Retrieval System Online (MEDLINE) (May 2005), http://www.nlm.nih.gov/pubs/factsheets/medline.html

  15. National Library of Medicine. Medical Subject Headings (MeSH) (May 2005), http://www.nlm.nih.gov/mesh/

  16. National Library of Medicine. Unified Medical Language System (UMLS) (May 2005), http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html

  17. Paralic, J., Kostial, I.: Ontology-based information retrieval. In: Proceedings of the 14th Int. Conference on Information and Intelligent Systems - iis 2003, pp. 23–28 (2003)

    Google Scholar 

  18. Saracevic, T., Kantor, P.B.: A study of information seeking and retrieving. III. searchers, searches, overlap. Journal of the American Society for Information Science and Technology 39, 197–216 (1988)

    Article  Google Scholar 

  19. Savoy, J.: Bibliographic database access using free-text and controlled vocabulary: an evaluation. Information Processing and Management 41, 873–890 (2005)

    Article  Google Scholar 

  20. Srinivasan, P.: Query expansion and MEDLINE. Information Processing and Management 32(4), 431–443 (1996)

    Article  Google Scholar 

  21. Svenonius, E.: Unanswered questions in the design of controlled vocabularies. Journals of the American Society for Information Science 37, 331–340 (1986)

    Google Scholar 

  22. TREC Genomics Track. TREC Genomics Track (May 2005), http://ir.ohsu.edu/genomics/

  23. Voorhees, E.M.: Using WordNet to disambiguate word senses for text retrieval. In: SIGIR 1993: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 171–180. ACM Press, New York (1993)

    Chapter  Google Scholar 

  24. Wilbur, J.: Non-parametric significance tests of retrieval performance comparisons. Journal of Information Science 20, 270–284 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

IJzereef, L., Kamps, J., de Rijke, M. (2005). Biomedical Retrieval: How Can a Thesaurus Help?. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. OTM 2005. Lecture Notes in Computer Science, vol 3761. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575801_31

Download citation

  • DOI: https://doi.org/10.1007/11575801_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29738-3

  • Online ISBN: 978-3-540-32120-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics