Skip to main content

Modeling Text Retrieval in Biomedicine

  • Chapter
Medical Informatics

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 8))

Chapter Overview

Given the amount of literature relevant to many of the areas of biomedicine, researchers are forced to use methods other than simply reading all the literature on a topic. Necessarily one must fall back on some kind of search engine. While the Google PageRank algorithm works well for finding popular web sites, it seems clear one must take a different approach in searching for information needed at the cutting edge of research. Information which is key to solving a particular problem may never have been looked at by many people in the past, yet it may be crucial to present progress. What has worked well to meet this need is to rank documents by their probable relevance to a piece of text describing the information need (a query). Here we will describe a general model for how this is done and how this model has been realized in both the vector and language modeling approaches to document retrieval. This approach is quite broad and applicable to much more than biomedicine. We will also present three example document retrieval systems that are designed to take advantage of specific information resources in biomedicine in an attempt to improve on the general model. Current challenges and future prospects are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Belew, R. K. and Chang, M. (2004). “Purposeful Retrieval: Applying Domain Insight for Topically-focused Groups of Biologists,” Paper presented at the Search and Discovery in Bioinformatics: SIGIR 2004 Workshop.

    Google Scholar 

  • Blair, D. C. and Maron, M. E. (1985). “An Evaluation of Retrieval Effectiveness for a Full-text Document-retrieval System,” Communications of the ACM, 28(3), 289–299.

    Article  Google Scholar 

  • Croft, W. B. and Harper, D. J. (1979). “Using Probabilistic Models of Document Retrieval Without Relevance Information,” Journal of Documentation, 35(4), 285–295.

    Google Scholar 

  • Fuller, S., Revere, D., Bugni, P., and Martin, G. M. (2004). “Telemakus: A Schema-based Information System to Promote Scientific Discovery,” Journal of the American Society for Information Science and Technology, In press.

    Google Scholar 

  • Funk, M. E., Reid, C. A., and McGoogan, L. S. (1983). “Indexing Consistency in MEDLINE,” Bulletin of the Medical Librarians Association, 71(2), 176–183.

    CAS  Google Scholar 

  • Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. (1987). “The Vocabulary Problem in Human-System Communication,” Communications of the ACM, 30(11), 964–971.

    Article  Google Scholar 

  • Harter, S. P. (1975). “A Probabilistic Approach to Automatic Keyword Indexing: Part I. On the Distribution of Specialty Words in a Technical Literature,” Journal of the American Society for Information Science, 26, 197–206.

    Google Scholar 

  • Humphreys, B. L., Lindberg, D. A., Schoolman, H. M., and Barnett, G. O. (1998). “The Unified Medical Language System: An Informatics Research Collaboration,” Journal of the American Medical Informatics Association, 5(1), 1–11.

    PubMed  CAS  Google Scholar 

  • Kurland, O. and Lee, L. (2004). “Corpus Structure, Language Models, and Ad Hoc Information Retrieval,” Paper presented at the ACM SIGIR 2004.

    Google Scholar 

  • Maron, M. E. and Kuhns, J. L. (1960). “On Relevance, Probabilistic Indexing and Information Retrieval,” Journal of the ACM, 7(3), 216–243.

    Article  Google Scholar 

  • Perez-Iratxeta, C, Keer, H. S., Bork, P., and Andrade, M. A. (2002). “Computing Fuzzy Associations for the Analysis of Biological Literature,” BioTechniques, 32, 1380–1385.

    PubMed  CAS  Google Scholar 

  • Perez-Iratxeta, C, Perez, A. J., Bork, P., and Andrade, M. A. (2001). “XplorMed: A Tool for Exploring MEDLINE Abstracts,” TRENDS in Biochemical Sciences, 26(9), 573–575.

    Article  PubMed  CAS  Google Scholar 

  • Perez-Iratxeta, C, Perez, A. J., Bork, P., and Andrade, M. A. (2003). “Update on XplorMed: A Web Server for Exploring Scientific Literature,” Nucleic Acids Research, 31(13), 3866–3868.

    Article  PubMed  CAS  Google Scholar 

  • Ponte, J. M. and Croft, W. B. (1998). “A Language Modeling Approach to Information Retrieval,” Paper presented at the SIGIR98, Melbourne, Australia.

    Google Scholar 

  • Robertson, S. and Hiemstra, D. (2001). “Language Models and Probability of Relevance,” Paper presented at the First Workshop on Language Modeling and Information Retrieval, Pittsburgh, PA.

    Google Scholar 

  • Robertson, S. E. and Sparck Jones, K. (1976). “Relevance Weighting of Search Terms,” Journal of the American Society for Information Science, May–June, 129–146.

    Google Scholar 

  • Robertson, S. E. and Walker, S. (1994). “Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval,” Paper presented at the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

    Google Scholar 

  • Salton, G. (1975). A Theory of Indexing (Vol. 18). Bristol, England: J. W. Arrowsmith, Ltd.

    Google Scholar 

  • Salton, G. (1991). “Developments in Automatic Text Retrieval,” Science, 253, 974–980.

    Google Scholar 

  • Salton, G., Wong, A., and Yang, C. S. (1975). “A Vector Space Model for Automatic Indexing,” Communications of the ACM, 18, 613–620.

    Article  Google Scholar 

  • Salton, G. (Ed.). (1971). The SMART Retrieval System: Experiments in Automatic Document Processing, Englewood Cliffs, NJ: Prentice-Hall, Inc.

    Google Scholar 

  • Saracevic, T. (1991). “Individual Differences in Organizing, Searching, and Retrieving Information,” Paper presented at the Proceedings of the 54th Annual ASIS Meeting, Washington, D.C.

    Google Scholar 

  • Shieber, S. M. (1994). “Lessons from a Restricted Turing Test,” Communications of the ACM, 37(6), 70–78.

    Article  Google Scholar 

  • Silverstein, C. and Henzinger, M. (1999). “Analysis of a Very Large Web Search Engine Query Log,” SIGIR Forum, 33(1), 6–12.

    Google Scholar 

  • Sparck Jones, K. (1972). “A Statistical Interpretation of Term Specificity and its Application in Retrieval,” The Journal of Documentation, 28(1), 11–21.

    Google Scholar 

  • Sparck Jones, K., Walker, S., and Robertson, S. E. (2000a). “A Probabilistic Model of Information Retrieval: Development and Comparative Experiments (Part 1),” Information Processing and Management, 36, 779–808.

    Article  Google Scholar 

  • Sparck Jones, K., Walker, S., and Robertson, S. E. (2000b). “A Probabilistic Model of Information Retrieval: Development and Comparative Experiments (Part 2),” Information Processing and Management, 36, 809–840.

    Article  Google Scholar 

  • Sparck-Jones, K. (2001). “LM vs PM: Where’s the Relevance?” Paper presented at the First Workshop on Language Modeling and Information Retrieval, Pittsburgh, PA.

    Google Scholar 

  • Swanson, D. R. (1988). “Historical Note: Information Retrieval and the Future of an Illusion,” Journal of the American Society for Information Science, 39(2), 92–98.

    Article  Google Scholar 

  • Wilbur, W. J. (1998). “The Knowledge in Multiple Human Relevance Judgments,” ACM Transactions on Information Systems, 16(2), 101–126.

    Article  Google Scholar 

  • Wilbur, W. J. and Coffee, L. (1994). “The Effectiveness of Document Neighboring in Search Enhancement,” Information Processing and Management, 30(2), 253–266.

    Article  Google Scholar 

  • Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes (Second ed.), San Francisco: Morgan-Kaufmann Publishers, Inc.

    Google Scholar 

  • Zaragoza, H., Hiemstra, D., and Tipping, M. (2003). “Bayesian Extension to the Language Model for Ad Hoc Information Retrieval,” Paper presented at the SIGIR’03, Toronto, Canada.

    Google Scholar 

  • Zhai, C. and Lafferty, J. (2004). “A Study of Smoothing Methods for Language Models Applied to Information Retrieval,” ACM Transactions on Information Systems, 22(2), 179–214.

    Article  Google Scholar 

  • Zobel, J. and Moffat, A. (1998). “Exploring the Similarity Space,” ACM SIGIR Forum, 32(1), 18–34.

    Google Scholar 

Suggested Readings

  • van Rijsbergen, C. J. (1979). Information Retrieval, Second Edition, London: Butterworths.

    Google Scholar 

  • Salton, G. (1989). Automatic Text Processing, New York: Addison-Wesley.

    Google Scholar 

  • Sparck Jones, K. and Willet, P. (Eds.). (1997s). Readings in Information Retrieval, San Francisco: Morgan Kaufman.

    Google Scholar 

  • Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes, Second Edition, San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval, New York: Addison-Wesley.

    Google Scholar 

  • Belew, Richard K. (2000). Finding Out About, Cambridge: Cambridge University Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Wilbur, W.J. (2005). Modeling Text Retrieval in Biomedicine. In: Chen, H., Fuller, S.S., Friedman, C., Hersh, W. (eds) Medical Informatics. Integrated Series in Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/0-387-25739-X_10

Download citation

  • DOI: https://doi.org/10.1007/0-387-25739-X_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-24381-8

  • Online ISBN: 978-0-387-25739-6

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics