BioMedical Information Retrieval: The BioTracer Approach
With the large amount of biomedical information available today, providing a good search tool is vital. Such a tool should not only be able to retrieve the sought information, but also to filter out irrelevant documents, while giving the relevant ones the highest ranking. Focusing on biomedical information, the main goal of this work has been to investigate how to improve the ability for a system to find and rank relevant documents. To achieve this, we apply a series of information retrieval techniques to search in biomedical information and combine them in an optimal manner. These techniques include extending and using well-established information retrieval (IR) similarity models like the Vector Space Model (VSM) and BM25 and their underlying scoring schemes, and allowing users to affect the ranking according to their view of relevance. The techniques have been implemented and tested in a proof-of-concept prototype called BioTracer, extending a Java-based open source search engine library. The results from our experiments using the TREC 2004 Genomic Track collection seem promising. Our investigation have also revealed that involving the user in the search will indeed have positive effects on the ranking of search results, and that the approaches used in BioTracer can be used to meet the user’s information needs.
KeywordsBiomedical information retrieval evaluation
Unable to display preview. Download preview PDF.
- 3.Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)Google Scholar
- 5.Croft, B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, 1st edn. Addison-Wesley, Reading (February 2009)Google Scholar
- 7.Eaton, A.D.: Hubmed: a web-based biomedical literature search interface. Nucleic Acids Research 34(Web Server issue), W745–W747 (2006)Google Scholar
- 8.Hatcher, E., Gospodnetic, O.: Lucene in Action. Manning Publications Co., Greenwich (2005)Google Scholar
- 9.Hersh, W.R., Bhupatiraju, R.T., Ross, L., Roberts, P., Cohen, A.M., Kraemer, D.F.: Enhancing access to the bibliome: the trec 2004 genomics track. Journal of Biomedical Discovery and Collaboration 2006 1(3), 10 (2006)Google Scholar
- 13.Kelly, D., Harper, D.J., Landau, B.: Questionnaire mode effects in interactive information retrieval experiments. Information Processing and Management 44(1), 122–141 (2008); Evaluation of Interactive Information Retrieval SystemsGoogle Scholar
- 16.Muller, H.-M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11), e309 (2004)Google Scholar
- 18.Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 42–49. ACM, Washington (2004)Google Scholar
- 19.Robertson, S.E., Jones, K.S.: Simple proven approaches to text retrieval. Technical Report 356, University of Cambridge (1994)Google Scholar
- 21.Trieschnigg, D., Kraaij, W., de Jong, F.: The influence of basic tokenization on biomedical document retrieval. In: Proceedings of the 30th international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), p. 803 (2007)Google Scholar
- 23.Wilkinson, R.: Effective retrieval of structured documents. In: Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 311–317. Springer, New York (1994)Google Scholar
- 25.Zhai, C.: Notes on the lemur TFIDF model. note with lemur 1.9 documentation. Technical report, School of CS, CMU (2001)Google Scholar