Advertisement

Journal of Intelligent Information Systems

, Volume 34, Issue 3, pp 249–274 | Cite as

Passage extraction and result combination for genomics information retrieval

Article

Abstract

In this paper, we first propose algorithms for passage extraction to build indices for the purpose of generating more accurate passages as query answers. Second, we propose a basic result combination method and an improved result combination method to combine the retrieved results from different indices for the purpose of selecting and merging relevant passages as outputs. For passage extraction, three new algorithms are proposed, namely paragraphParsed, sentenceParsed and wordSentenceParsed. For result combination, a novel method is proposed, in which we use factor analysis to generate a better baseline result for combination by finding some hidden common factors that can be used to estimate the importance of keywords and keyword associations. Finally, we report the experimental results that confirm the effectiveness and superiority of the factor analysis based method for result combination. Our proposed approaches achieve excellent results on the TREC 2006 and 2007 Genomics data sets, which provide a promising avenue for constructing high performance information retrieval systems in biomedicine.

Keywords

Information retrieval Passage extraction Result combination Linear regression Factor analysis Genomics 

Notes

Acknowledgements

This research is supported in part by the research grant from the Natural Sciences & Engineering Research Council (NSERC) of Canada and the Early Researcher Award/Premier’s Research Excellence Award. We would like to thank Ming Zhong and Luo Si for their contributions at the early stage of this project. The authors are also grateful to the anonymous reviewers for their constructive comments, which have helped improve the quality of the paper.

References

  1. Beaulieu, M., Gatford, M., Huang, X., Robertson, S., Walker, S., & Williams, P. (1997). Okapi at TREC-5. In Proceedings of the 5th text REtrieval conference (pp. 143–166). NIST Special Publication.Google Scholar
  2. Fuhr, N., & Pfeifer, U. (1994). Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions. ACM Transactions on Information Systems (TOIS), 12(1), 92–115.CrossRefGoogle Scholar
  3. Hersh, W., Cohen, A. M., & Roberts, P. (2007). TREC 2007 genomics track overview. In Proceedings of the 16th text REtrieval conference. NIST Special Publication.Google Scholar
  4. Hersh, W., Cohen, A. M., Roberts, P., & Rekapalli1, H. K. (2006). TREC 2006 genomics track overview. In Proceedings of the 15th text REtrieval conference. NIST Special Publication.Google Scholar
  5. Hersh, W., Cohen, A. M., & Yang, J. (2005). TREC 2005 genomics track overview. In Proceedings of 14th text REtrieval conference. NIST Special Publication.Google Scholar
  6. Huang, X., Huang, Y., & Wen, M. (2005a). A dual index model for contextual IR. In Proceedings of the 28th international ACM SIGIR conference on research and development in information retrieval (pp. 613–614).Google Scholar
  7. Huang, X., Peng, F., Schuurmans, D., Cercone, N., & Robertson, S. (2003). Applying machine learning to text segmentation for information retrieval. Information Retrieval Journal, 6(4), 333–362.CrossRefGoogle Scholar
  8. Huang, X., Zhong, M., & Si, L. (2005b). York University at TREC 2005: Genomics track. In Proceedings of the 14th text retrieval conference.Google Scholar
  9. Jiang, J., & Zhai, C. (2007). An empirical study of tokenization strategies for biomedical information retrieval. Information Retrieval, 10(4–5), 341–363.CrossRefGoogle Scholar
  10. Machado, A., & Marinho, C. (2003). An image retrieval method based on factor analysis. In Proceedings of the XVI Brazilian symposium on computer graphics and image processing (pp. 191–198).Google Scholar
  11. Mandl, T. (1999). Efficient preprocessing for information retrieval with neural networks. Datenbank Rundbrief, 24, 54–60.Google Scholar
  12. Montegomery Douglas, C., Peck Elizabeth, A., & Geoffrey, V. G. (2001). Introduction to linear regression analysis (3rd ed.). New York: Wiley.Google Scholar
  13. Reyment, R., & Joreskog, G. (1996). Applied factor analysis in the natural sciences (2nd ed.). Cambridge: Cambridge University Press.MATHGoogle Scholar
  14. Richard, G. L. (1983). Factor analysis (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.Google Scholar
  15. Robertson, E. S., & Walker, S. (1994). Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th international ACM SIGIR conference on research and development in information retrieval (pp. 232–241).Google Scholar
  16. Subbarao, C., Subbarao, N., & Chandu, S. (1995). Characterisation of groundwater contamination using factor analysis. Environmental Geology, 28, 175–180.CrossRefGoogle Scholar
  17. Tsai, M. F., Wang, Y. T., & Chen, H. H. (2008). A study of learning a merge model for multilingual information retrieval. In Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval (pp. 195–202).Google Scholar
  18. Wang, M., & Si, L. (2008). Discriminative probabilistic models for passage based retrieval. In Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval (pp. 419–426).Google Scholar
  19. Zhong, M., & Huang, X. (2006). Concept-based biomedical text retrieval. In Proceedings of the 29th international ACM SIGIR conference on research and development in information retrieval (pp. 723–724).Google Scholar
  20. Zhou, W., Yu, C., Smalheiser, N., Torvik, V., & Hong, J. (2007). Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval (pp. 655–662).Google Scholar
  21. Zhou, X., Hu, X., Zhang, X., Lin, X., & Song, I. (2006). Context-sensitive semantic smoothing for the language modeling approach to genomic IR. In Proceedings of the 29th international ACM SIGIR conference on research and development in information retrieval (pp. 170–177).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringYork UniversityTorontoCanada
  2. 2.School of Information TechnologyYork UniversityTorontoCanada

Personalised recommendations