Chapter Overview
Given the amount of literature relevant to many of the areas of biomedicine, researchers are forced to use methods other than simply reading all the literature on a topic. Necessarily one must fall back on some kind of search engine. While the Google PageRank algorithm works well for finding popular web sites, it seems clear one must take a different approach in searching for information needed at the cutting edge of research. Information which is key to solving a particular problem may never have been looked at by many people in the past, yet it may be crucial to present progress. What has worked well to meet this need is to rank documents by their probable relevance to a piece of text describing the information need (a query). Here we will describe a general model for how this is done and how this model has been realized in both the vector and language modeling approaches to document retrieval. This approach is quite broad and applicable to much more than biomedicine. We will also present three example document retrieval systems that are designed to take advantage of specific information resources in biomedicine in an attempt to improve on the general model. Current challenges and future prospects are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Belew, R. K. and Chang, M. (2004). “Purposeful Retrieval: Applying Domain Insight for Topically-focused Groups of Biologists,” Paper presented at the Search and Discovery in Bioinformatics: SIGIR 2004 Workshop.
Blair, D. C. and Maron, M. E. (1985). “An Evaluation of Retrieval Effectiveness for a Full-text Document-retrieval System,” Communications of the ACM, 28(3), 289–299.
Croft, W. B. and Harper, D. J. (1979). “Using Probabilistic Models of Document Retrieval Without Relevance Information,” Journal of Documentation, 35(4), 285–295.
Fuller, S., Revere, D., Bugni, P., and Martin, G. M. (2004). “Telemakus: A Schema-based Information System to Promote Scientific Discovery,” Journal of the American Society for Information Science and Technology, In press.
Funk, M. E., Reid, C. A., and McGoogan, L. S. (1983). “Indexing Consistency in MEDLINE,” Bulletin of the Medical Librarians Association, 71(2), 176–183.
Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. (1987). “The Vocabulary Problem in Human-System Communication,” Communications of the ACM, 30(11), 964–971.
Harter, S. P. (1975). “A Probabilistic Approach to Automatic Keyword Indexing: Part I. On the Distribution of Specialty Words in a Technical Literature,” Journal of the American Society for Information Science, 26, 197–206.
Humphreys, B. L., Lindberg, D. A., Schoolman, H. M., and Barnett, G. O. (1998). “The Unified Medical Language System: An Informatics Research Collaboration,” Journal of the American Medical Informatics Association, 5(1), 1–11.
Kurland, O. and Lee, L. (2004). “Corpus Structure, Language Models, and Ad Hoc Information Retrieval,” Paper presented at the ACM SIGIR 2004.
Maron, M. E. and Kuhns, J. L. (1960). “On Relevance, Probabilistic Indexing and Information Retrieval,” Journal of the ACM, 7(3), 216–243.
Perez-Iratxeta, C, Keer, H. S., Bork, P., and Andrade, M. A. (2002). “Computing Fuzzy Associations for the Analysis of Biological Literature,” BioTechniques, 32, 1380–1385.
Perez-Iratxeta, C, Perez, A. J., Bork, P., and Andrade, M. A. (2001). “XplorMed: A Tool for Exploring MEDLINE Abstracts,” TRENDS in Biochemical Sciences, 26(9), 573–575.
Perez-Iratxeta, C, Perez, A. J., Bork, P., and Andrade, M. A. (2003). “Update on XplorMed: A Web Server for Exploring Scientific Literature,” Nucleic Acids Research, 31(13), 3866–3868.
Ponte, J. M. and Croft, W. B. (1998). “A Language Modeling Approach to Information Retrieval,” Paper presented at the SIGIR98, Melbourne, Australia.
Robertson, S. and Hiemstra, D. (2001). “Language Models and Probability of Relevance,” Paper presented at the First Workshop on Language Modeling and Information Retrieval, Pittsburgh, PA.
Robertson, S. E. and Sparck Jones, K. (1976). “Relevance Weighting of Search Terms,” Journal of the American Society for Information Science, May–June, 129–146.
Robertson, S. E. and Walker, S. (1994). “Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval,” Paper presented at the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
Salton, G. (1975). A Theory of Indexing (Vol. 18). Bristol, England: J. W. Arrowsmith, Ltd.
Salton, G. (1991). “Developments in Automatic Text Retrieval,” Science, 253, 974–980.
Salton, G., Wong, A., and Yang, C. S. (1975). “A Vector Space Model for Automatic Indexing,” Communications of the ACM, 18, 613–620.
Salton, G. (Ed.). (1971). The SMART Retrieval System: Experiments in Automatic Document Processing, Englewood Cliffs, NJ: Prentice-Hall, Inc.
Saracevic, T. (1991). “Individual Differences in Organizing, Searching, and Retrieving Information,” Paper presented at the Proceedings of the 54th Annual ASIS Meeting, Washington, D.C.
Shieber, S. M. (1994). “Lessons from a Restricted Turing Test,” Communications of the ACM, 37(6), 70–78.
Silverstein, C. and Henzinger, M. (1999). “Analysis of a Very Large Web Search Engine Query Log,” SIGIR Forum, 33(1), 6–12.
Sparck Jones, K. (1972). “A Statistical Interpretation of Term Specificity and its Application in Retrieval,” The Journal of Documentation, 28(1), 11–21.
Sparck Jones, K., Walker, S., and Robertson, S. E. (2000a). “A Probabilistic Model of Information Retrieval: Development and Comparative Experiments (Part 1),” Information Processing and Management, 36, 779–808.
Sparck Jones, K., Walker, S., and Robertson, S. E. (2000b). “A Probabilistic Model of Information Retrieval: Development and Comparative Experiments (Part 2),” Information Processing and Management, 36, 809–840.
Sparck-Jones, K. (2001). “LM vs PM: Where’s the Relevance?” Paper presented at the First Workshop on Language Modeling and Information Retrieval, Pittsburgh, PA.
Swanson, D. R. (1988). “Historical Note: Information Retrieval and the Future of an Illusion,” Journal of the American Society for Information Science, 39(2), 92–98.
Wilbur, W. J. (1998). “The Knowledge in Multiple Human Relevance Judgments,” ACM Transactions on Information Systems, 16(2), 101–126.
Wilbur, W. J. and Coffee, L. (1994). “The Effectiveness of Document Neighboring in Search Enhancement,” Information Processing and Management, 30(2), 253–266.
Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes (Second ed.), San Francisco: Morgan-Kaufmann Publishers, Inc.
Zaragoza, H., Hiemstra, D., and Tipping, M. (2003). “Bayesian Extension to the Language Model for Ad Hoc Information Retrieval,” Paper presented at the SIGIR’03, Toronto, Canada.
Zhai, C. and Lafferty, J. (2004). “A Study of Smoothing Methods for Language Models Applied to Information Retrieval,” ACM Transactions on Information Systems, 22(2), 179–214.
Zobel, J. and Moffat, A. (1998). “Exploring the Similarity Space,” ACM SIGIR Forum, 32(1), 18–34.
Suggested Readings
van Rijsbergen, C. J. (1979). Information Retrieval, Second Edition, London: Butterworths.
Salton, G. (1989). Automatic Text Processing, New York: Addison-Wesley.
Sparck Jones, K. and Willet, P. (Eds.). (1997s). Readings in Information Retrieval, San Francisco: Morgan Kaufman.
Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes, Second Edition, San Francisco: Morgan Kaufmann.
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval, New York: Addison-Wesley.
Belew, Richard K. (2000). Finding Out About, Cambridge: Cambridge University Press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Wilbur, W.J. (2005). Modeling Text Retrieval in Biomedicine. In: Chen, H., Fuller, S.S., Friedman, C., Hersh, W. (eds) Medical Informatics. Integrated Series in Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/0-387-25739-X_10
Download citation
DOI: https://doi.org/10.1007/0-387-25739-X_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24381-8
Online ISBN: 978-0-387-25739-6
eBook Packages: MedicineMedicine (R0)