Utilizing Passage-Based Language Models for Document Retrieval

  • Michael Bendersky
  • Oren Kurland
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we propose yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; the latter outperform a document-based relevance model. We also show that the homogeneity measures are effective means for integrating document-query and passage-query similarity information for document retrieval.

Keywords

passage-based document retrieval document homogeneity passage language model passage-based relevance model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of SIGIR, pp. 49–58 (1993)Google Scholar
  2. 2.
    Callan, J.P.: Passage-level evidence in document retrieval. In: Proceedings of SIGIR, pp. 302–310 (1994)Google Scholar
  3. 3.
    Mittendorf, E., Schäuble, P.: Document and passage retrieval based on hidden Markov models. In: Proceedings of SIGIR, pp. 318–327 (1994)Google Scholar
  4. 4.
    Wilkinson, R.: Effective retrieval of structured documents. In: Proceedings of SIGIR, pp. 311–317 (1994)Google Scholar
  5. 5.
    Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proceedings of SIGIR, pp. 178–185 (1997)Google Scholar
  6. 6.
    Denoyer, L., Zaragoza, H., Gallinari, P.: HMM-based passage models for document classification and ranking. In: Proceedings of ECIR, pp. 126–135 (2001)Google Scholar
  7. 7.
    Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. Journal of the American Society for Information Science 52(4), 344–364 (2001)CrossRefGoogle Scholar
  8. 8.
    Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proceedings of the 11th International Conference on Information and Knowledge Managment (CIKM), pp. 375–382 (2002)Google Scholar
  9. 9.
    Croft, W.B., Lafferty, J. (eds.): Language Modeling for Information Retrieval. Information Retrieval Book Series, vol. 13. Kluwer, Dordrecht (2003)MATHGoogle Scholar
  10. 10.
    Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)Google Scholar
  11. 11.
    Lavrenko, V.: A Generative Theory of Relevance. PhD thesis, University of Massachusetts Amherst (2004)Google Scholar
  12. 12.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of SIGIR, pp. 275–281 (1998)Google Scholar
  13. 13.
    Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: Proceedings of SIGIR, pp. 194–201 (2004)Google Scholar
  14. 14.
    Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using SMART: TREC3. In: Proceedings of of the Third Text Retrieval Conference (TREC-3), pp. 69–80 (1994)Google Scholar
  15. 15.
    Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Block-based web search. In: Proceedings of SIGIR, pp. 456–463 (2004)Google Scholar
  16. 16.
    Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)Google Scholar
  17. 17.
    Abdul-Jaleel, N., Allan, J., Croft, W.B., Diaz, F., Larkey, L., Li, X., Smucker, M.D., Wade, C.: UMASS at TREC 2004 — novelty and hard. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC-13) (2004)Google Scholar
  18. 18.
    Hussain, M.: Language modeling based passage retrieval for question answering systems. Master’s thesis, Saarland University (2004)Google Scholar
  19. 19.
    Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Proceedings of INEX (2004)Google Scholar
  20. 20.
    Murdock, V., Croft, W.B.: A translation model for sentence retrieval. In: Proceedings of HLT/EMNLP, pp. 684–695 (2005)Google Scholar
  21. 21.
    Sigurbjörnsson, B., Kamps, J.: The effect of structured queries and selective indexing on XML retrieval. In: Proceedings of INEX, pp. 104–118 (2005)Google Scholar
  22. 22.
    Wade, C., Allan, J.: Passage retrieval and evaluation. Technical Report IR-396, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts (2005)Google Scholar
  23. 23.
    Kurland, O., Lee, L.: PageRank without hyperlinks: Structural re-ranking using links induced by language models. In: Proceedings of SIGIR, pp. 306–313 (2005)Google Scholar
  24. 24.
    Corrada-Emmanuel, A., Croft, W.B., Murdock, V.: Answer passage retrieval for question answering. Technical Report IR-283, Center for Intelligent Information Retrieval, University of Massachusetts (2003)Google Scholar
  25. 25.
    Zhang, D., Lee, W.S.: A language modeling approach to passage question answering. In: Proceedings of the Twelfth Text Retrieval Conference (TREC-12), pp. 489–495 (2004)Google Scholar
  26. 26.
    Jiang, J., Zhai, C.: UIUC in HARD 2004 — passage retrieval using HMMs. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC-13) (2004)Google Scholar
  27. 27.
    Kurland, O., Lee, L., Domshlak, C.: Better than the real thing? Iterative pseudo-query processing using cluster-based language models. In: Proceedings of SIGIR, pp. 19–26 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Michael Bendersky
    • 1
  • Oren Kurland
    • 2
  1. 1.Center for Intelligent Information Retrieval, Department of Computer ScienceUniversity of MassachusettsAmherst 
  2. 2.Faculty of Industrial Eng. & Mgmt.TechnionIsrael

Personalised recommendations