Applying Lemur Query Expansion Techniques in Biomedical Information Retrieval

  • A. R. Rivas
  • L. Borrajo
  • E. L. Iglesias
  • R. Romero
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 151)


The increase in the amount of available biomedical information has resulted in a higher demand on biomedical information retrieval systems. However, traditional information retrieval systems do not achieve the desired performance in this area. Query expansion techniques have improved the effectiveness of ranked retrieval by automatically adding additional terms to a query. In this work we test several automatic query expansion techniques using the Lemur Language Modelling Toolkit. The objective is to evaluate a set of query expansion techniques when they are applied to biomedical information retrieval. In the first step of the information retrieval searching, indexing, we compare the use of several techniques of stemming and stopwords. In the second step, matching, we compare the well-known weighting algorithms Okapi and TF-IDF BM25. The best results are obtained with the combination of Krovetz stemmer, SMART stopword list and TF-IDF. Moreover, we analyze the document retrieval based on Abstract, Title and Mesh fields. We conclude that seems more effective than looking at each of these fields individually. Also, we show that the use of feedback in document retrieval results a improvement in retrieving. The corpus used in the experiments was extracted from the biomedical text Cystic Fibrosis Corpus (CF).


Query expansion Biomedical information retrieval Lemur MEDLINE 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman (1999)Google Scholar
  2. 2.
    Chen, J., Yu, P., Ge, H.: Unt 2005 trec qa participation: Using lemur as ir search engine. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Fourteenth Text REtrieval Conference, TREC 2005, volume Special Publication 500-266, National Institute of Standards and Technology, NIST (2005)Google Scholar
  3. 3.
    Eckard, E., Chappelier, J.C.: Free Software for research in Information Retrieval and Textual Clustering. Technical report, Ecole Polytechnique Federale de Lausanne (2007)Google Scholar
  4. 4.
    Fan, Y., Huang, X., An, A.: York university at trec 2006: Enterprise email discussion search. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Fifteenth Text REtrieval Conference, TREC 2006, volume Special Publication 500–272, National Institute of Standards and Technology, NIST (2006)Google Scholar
  5. 5.
    Gauch, S., Wang, J., Rachakonda, S.M.: A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Transactions on Information Systems 17, 250–269 (1999)CrossRefGoogle Scholar
  6. 6.
    Leveling, J., Jones, G.F.: Sub-word indexing and blind relevance feedback for english, bengali, hindi, and marathi ir. ACM Transactions on Asian Language Information Processing (TALIP) 9, 12:1–12:30 (2010)CrossRefGoogle Scholar
  7. 7.
    Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214. ACM Press (1998)Google Scholar
  8. 8.
    Pustejovsky, J., Castanho, J., Saur, R., Rumshinsky, A., Zhang, J., Luo, W.: Medstract: creating large-scale information servers for biomedical libraries. In: Proceedings of the ACL 2002 Workshop on Natural Language Processing in the Biomedical Domain, pp. 85–92. Association for Computational Linguistics, Morristown (2002)CrossRefGoogle Scholar
  9. 9.
    Ramampiaro, H., Li, C.: Supporting biomedical information retrieval: The biotracer approach. T. Large-Scale Data- and Knowledge-Centered Systems 4, 73–94 (2011)CrossRefGoogle Scholar
  10. 10.
    Shi, Z., Gu, B., Popowich, F., Sarkar, A.: Synonym-based query expansion and boosting-based re-ranking: A two-phase approach for genomic information retrieval. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Fourteenth Text REtrieval Conference, TREC 2005, volume Special Publication 500-266, National Institute of Standards and Technology, NIST (2005)Google Scholar
  11. 11.
    Shin, K., Han, S.-Y.: Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 388–394. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Si, L., Lu, J., Callan, J.: Combining multiple resources, evidence and criteria for genomic information retrieval (2006)Google Scholar
  13. 13.
    Stokes, N., Li, Y., Cavedon, L., Zobel, J.: Exploring criteria for successful query expansion in the genomic domain. Inf. Retr. 12, 17–50 (2009)CrossRefGoogle Scholar
  14. 14.
    Trotman, A.: An artificial intelligence approach to information retrieval (2004)Google Scholar
  15. 15.
    Trotman, A.: Learning to rank. Information Retrieval 8, 381 (2005)CrossRefGoogle Scholar
  16. 16.
    Yu, H., Kim, W., Hatzivassiloglou, V., Wilbur, W.J.: Using medline as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. Journal of Biomedical Informatics 40(2), 150–159 (2007)CrossRefGoogle Scholar
  17. 17.
    Zazo, A.F., Figuerola, C.G., Berrocal, J.L.A., Rodríguez, E.: Term expansion using stemming and thesauri in Spanish. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 177–183. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading (1949)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • A. R. Rivas
    • 1
  • L. Borrajo
    • 1
  • E. L. Iglesias
    • 1
  • R. Romero
    • 1
  1. 1.Computer Science Dept.Univ. of VigoVigoSpain

Personalised recommendations