Advertisement

Information Retrieval

, Volume 12, Issue 1, pp 51–68 | Cite as

An empirical study of gene synonym query expansion in biomedical information retrieval

Article

Abstract

Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly. So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical for improving performance.

Keywords

Biomedical information retrieval Synonym query expansion Language modeling 

Notes

Acknowledgments

This material is based in part upon work supported by the National Science Foundation under award number 0425852 and work supported by NIH/NLM grant 1 R01 LM009153-01.

References

  1. Abdou, S., Savoy, J., & Ruck, P. (2005). Evaluation of stemming, Query expansion and manual indexing approaches for the genomic task. In Proceedings of TREC.Google Scholar
  2. Bruce croft, W., & Lafferty, J. (2003). Language modeling and information retrieval. Kluwer Academic Publishers.Google Scholar
  3. Buttcher, S., Clarke, C. L. A., & Cormack, G. V. (2004). Domain-specific synonym expansion and validation for biomedical information retrieval (MultiText Experiments for TREC 2004). In Proceedings of TREC.Google Scholar
  4. Cohen, A. M., Yang, J., Fisher, S., Roark, B., & Hersh, W. R. (2007). The OHSU biomedical question answering system framework. In Proceedings of TREC.Google Scholar
  5. Demner-Fushman, D., Humphrey, S. M., Ide, N. C., Loane, R. F., Smith, L. H., Tanabe, L. K., Wilbur, W. J., Ruch, P., & Ruiz, M. E. (2006). Finding relevant passages in scientific articles: Fusion of automatic approaches vs. an interactive team effort. In Proceedings of TREC.Google Scholar
  6. Divoli, A., Hearst, M. A., Nakov, P. I., & Schwartz, A. (2006). Biotext team report for the TREC 2006 Genomics Track. In Proceedings of TREC.Google Scholar
  7. Dorff, K. C., Wood, M. J., & Campagne, F. (2006). Twease at TREC 2006: Breaking and fixing BM25 scoring with query expansion, a biologically inspired double mutant recovery experiment. In Proceedings of TREC.Google Scholar
  8. Fang, H. (2008). A re-examination of query expansion using lexcial resources. In ACL’08: Proceedings of the 46th Meetings of the Association for Computational Linguistics.Google Scholar
  9. Fang, H., & Zhai, C. (2006). Semantic term matching in axiomatic approaches to information retrieval. In SIGIR ‘06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–122.Google Scholar
  10. Fautsch, C., & Savoy, J. (2007). IR-specific searches at TREC 2007: genomics & blog experiments. In Proceedings of TREC.Google Scholar
  11. Fujita, S. (2004). Revisiting again document length hypotheses TREC 2004 genomics track experiments at patolis. In Proceedings of TREC.Google Scholar
  12. Goldberg, A. B., Andrzejewski, D., Van Gael, J., Settles, B., Zhu, X., & Craven, M. (2006). Ranking biomedical passages for relevance and diversity: University of Wisconsin, Madison at TREC Genomics 2006. In Proceedings of TREC.Google Scholar
  13. Guo, Y., Harkema, H., & Gaizauskas, R. (2004). Sheffield University and the TREC 2004 genomics track: Query expansion using synonymous terms. In Proceedings of TREC.Google Scholar
  14. Hersh, W. R., et al. (2003). TREC genomics track overview. In Proceedings of TREC.Google Scholar
  15. Hersh, W. R., et al. (2004). TREC 2004 genomics track overview. In Proceedings of TREC.Google Scholar
  16. Hersh, W. R., et al. (2005). TREC 2005 genomics track overview. In Proceedings of TREC.Google Scholar
  17. Hersh, W. R., et al. (2006). TREC 2006 genomics track overview. In Proceedings of TREC.Google Scholar
  18. Hersh, W. R., et al. (2007). TREC 2007 genomics track overview. In Proceedings of TREC.Google Scholar
  19. Hersh, W. R., Price, S., & Donohoe, L. (2000). Assessing thesaurus-based query expansion using the UMLS Metathesaurus. In Proceedings of the 2000 Annual AMIA Fall Symposium.Google Scholar
  20. Huang, X., Hu, B., & Rohian, H. (2006). York University at TREC (2006): Genomics track. In Proceedings of TREC.Google Scholar
  21. Huang, X., Sotoudeh-Hosseinii, D., Rohian, H., & An, X. (2007). York University at TREC 2007: Genomics track. In Proceedings of TREC.Google Scholar
  22. Jimeno, A., & Pezik, P. (2007). Information retrieval and information extraction in TREC genomics 2007. In Proceedings of TREC.Google Scholar
  23. Lin, K. H.-Y., Hou, W.-J., & Chen, H.-H. (2006). NTU at TREC 2006 genomics track. In Proceedings of TREC.Google Scholar
  24. Ponte, J. M., & Bruce Croft, W. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275–281.Google Scholar
  25. Ruiz, M. E. (2006). UB at TREC Genomics 2006: Using passage retrieval and pre-retrieval query expansion for genomics IR. In Proceedings of TREC.Google Scholar
  26. Stairmand, M. A. (1997). Textual conext analysis for information retrieval. In Proceedings of the 1997 ACM SIGIR Conference on Research and Development in I nformation Retrieval.Google Scholar
  27. Stokes, N., Li, Y., Cavedon, L., Huang, E., Rong, J., & Zobel, J. (2007). Entity-based relevance feedback for genomic list answer retrieval. In Proceedings of TREC.Google Scholar
  28. Tsai, T.-H., Wu, C.-W., Hung, H.-C., Wang, Y.-C., He, D., Lin, Y.-F. , Lee, C.-W., Sung, T.-Y., & Hsu, W.-L. (2005). Enhance genomic IR with term variation and expansion: Experience of the IASL group at genomic track 2005. In Proceedings of TREC.Google Scholar
  29. Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In Proceedings of the 1994 ACM SIGIR Conference on Research and Development in Information Retrieval.Google Scholar
  30. Wan, R., Takigawa, I., Mamitsuka, H., & Ngoc Anh, V. (2006). Combining vector-space and word-based aspect models for passage retrieval. In Proceedings of TREC.Google Scholar
  31. Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01), pp. 403–410.Google Scholar
  32. Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 2(2), 179–214.CrossRefGoogle Scholar
  33. Zhai, C., & Lafferty, J. (2006). A risk minimization framework for information retrieval. Information Processing and Management, 42(1), 31–55.MATHCrossRefGoogle Scholar
  34. Zhou, W., Yu, C., Smalheiser, N., & Torvik, V., & Hong, J. (2007). Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In SIGIR ‘07: Proceedings of the 30th Annual International ACM SIGIR Conference On Research And Development In Information Retrieval, pp. 655–662.Google Scholar
  35. Zhou, W., Yu, C. T., Torvik, V. I., & Smalheiser, N. R. (2006). A concept-based framework for passage retrieval at genomics. In Proceedings of TREC.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.Electrical and Computer EngineeringUniversity of DelawareNewarkUSA

Personalised recommendations