Advertisement

Information Retrieval

, Volume 14, Issue 4, pp 355–389 | Cite as

Extending the language modeling framework for sentence retrieval to include local context

  • Ronald T. FernándezEmail author
  • David E. Losada
  • Leif A. Azzopardi
Article

Abstract

Employing effective methods of sentence retrieval is essential for many tasks in Information Retrieval, such as summarization, novelty detection and question answering. The best performing sentence retrieval techniques attempt to perform matching directly between the sentences and the query. However, in this paper, we posit that the local context of a sentence can provide crucial additional evidence to further improve sentence retrieval. Using a Language Modeling Framework, we propose a novel reformulation of the sentence retrieval problem that extends previous approaches so that the local context is seamlessly incorporated within the retrieval models. In a series of comprehensive experiments, we show that localized smoothing and the prior importance of a sentence can improve retrieval effectiveness. The proposed models significantly and substantially outperform the state of the art and other competitive sentence retrieval baselines on recall-oriented measures, while remaining competitive on precision-oriented measures. This research demonstrates that local context plays an important role in estimating the relevance of a sentence, and that existing sentence retrieval language models can be extended to utilize this evidence effectively.

Keywords

Information retrieval Sentence retrieval Language models Context 

Notes

Acknowledgments

We thank the anonymous reviewers for their useful comments and suggestions that have been incorporated into this article. This research was funded by Ministerio de Ciencia e Innovación under project TIN2010-18552-C03-03, and Xunta de Galicia under projects 07SIN005206PR, and 2008/068 (co-funded by FEDER).

References

  1. Allan, J., Wade, C., & Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In Proceedings of the 26th ACM international conference on research and development in information retrieval (SIGIR 2003) (pp. 314–321). Toronto, Canada: ACM.Google Scholar
  2. Balog, K., Azzopardi, L., & de Rijke, M. (2009). A language modeling framework for expert finding. Information Processing and Management, 45(1), 1–19.CrossRefGoogle Scholar
  3. Collins-Thompson, K., Ogilvie, P., Zhang, Y., & Callan, J. (2002). Information filtering, novelty detection and name-page finding. In Proceedings of the 11th text retrieval conference (TREC 2002).Google Scholar
  4. Fernández, R. T., & Losada, D. E. (2009). Using opinion-based features to boost sentence retrieval. In Proceedings of the ACM 18th conference on information and knowledge management (CIKM 2009) (pp. 1617–1620). Hong Kong, China: ACM.Google Scholar
  5. Harman, D. (2002). Overview of the TREC 2002 novelty track. In Proceedings of the 11th text retrieval conference (TREC 2002) (pp. 46–55). Gaithersburg, USA.Google Scholar
  6. Hiemstra, D. (2001, January). Using language models for information retrieval. PhD thesis, University of Twente.Google Scholar
  7. Jaleel, N. A., Allan, J., Croft, W. B., Diaz, F., Larkey, L. S., Li, X., et al. (2004). UMass at TREC 2004: Novelty and hard. In Proceedings of the 13th text retrieval conference (TREC 2004), volume Special Publication 500-261. National Institute of Standards and Technology (NIST).Google Scholar
  8. Kallurkar, S., Shi, Y., Cost, R. S., Nicholas, C. K., Java, A., James, C., et al. (2003). UMBC at TREC 12. In Proceedings of the 12th text retrieval conference (TREC 2003) (pp. 699–706).Google Scholar
  9. Kaszkiel, M., & Zobel, J. (1997). Passage retrieval revisted. In Proceedings of the 20th ACM international conference on research and development in information retrieval (SIGIR 1997) (pp. 178–185). Philadelphia, USA: ACM.Google Scholar
  10. Kaszkiel, M., & Zobel, J. (2001). Effective ranking with arbitrary passages. Journal of The American Society for Informacion Science & Technology, 52(4), 344–364.CrossRefGoogle Scholar
  11. Li, X., & Croft, W. B. (2005). Novelty detection based on sentence level patterns. In Proceedings of the 14th international conference on information and knowledge management (CIKM 2005) (pp. 744–751), Bremen, Germany: ACM.Google Scholar
  12. Liu, X., & Croft, W. B. (2002). Passage retrieval based on language models. In Proceedings of the 11th international conference on information knowledge and management (CIKM 2002) (pp. 375–382), Virginia, USA: ACM.Google Scholar
  13. Losada, D. E. (2008). A study of statistical query expansion strategies for sentence retrieval. In Proceedings SIGIR 2008 workshop on focused retrieval (question answering, passage retrieval, element retrieval), Singapore: ACM.Google Scholar
  14. Losada, D. E., & Azzopardi, L. (2008a). An analysis on document length retrieval trends in language modeling smoothing. Journal of Information Retrieval, 11(2), 109–138.CrossRefGoogle Scholar
  15. Losada, D. E., & Azzopardi, L. (2008b). Assessing multi-variate Bernoulli models for information retrieval. ACM Transactions on Information Systems (TOIS), 26(3), 17:1–17:46.CrossRefGoogle Scholar
  16. Losada, D. E., & Fernández, R. T. (2007). Highly frequent terms and sentence retrieval. In Proceedings of the 14th String processing and information retrieval symposium (SPIRE 2007), Lecture Notes in Computer Science (pp. 217–228). Santiago de Chile, Chile: Springer.Google Scholar
  17. Miller, D. R., Leek, T., & Schwartz, R. M. (1999). A hidden Markov model information retrieval system. In Proceedings of the 22th ACM international conference on research and development in information retrieval (SIGIR 1999) (pp. 214–221), Berkeley, USA: ACM.Google Scholar
  18. Murdock, V. G. (2006, September). Aspects of sentence retrieval. PhD thesis, University of Massachusetts Amherst.Google Scholar
  19. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st ACM international conference on research and development in information retrieval (SIGIR 1998) (pp. 275–281). Melbourne, Australia: ACM.Google Scholar
  20. Robertson, S. (2005). In Book TREC: Experiment and evaluation in information retrieval, chapter How okapi came to TREC (pp. 287–299). Digital Libraries and Electronic Publishing. MIT Press.Google Scholar
  21. Robertson, S. E., Walker, S., & Beaulieu, M. (1999). Okapi at TREC-7: Automatic ad hoc, filtering, VCL and interactive track. In Proceedings of the 7th text retrieval conference (TREC 1999) (pp. 253–264), Gaithersburg, USA.Google Scholar
  22. Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. In Proceedings of the 13th international conference on information and knowledge management (CIKM 2004) (pp. 42–49). Washington, USA: ACM.Google Scholar
  23. Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: effort, sensitivity, and reliability. In Proceedings of the 28th ACM international conference on research and development in information retrieval (SIGIR 2005) (pp. 162–169). Salvador, Brazil: ACM.Google Scholar
  24. Si, L., Jin, R., Callan, J., & Ogilvie, P. (2002). A language modeling framework for resource selection and results merging. In Proceedings of the 11th international conference on information and knowledge management (CIKM 2002) (pp. 391–397). New York, NY, USA: ACM.Google Scholar
  25. Singhal, A., Buckley, C., Mitra, M., & Mitra, A. R. (1996). Pivoted document length normalization. In Proceedings of the 19th ACM international conference on research and development in information retrieval (SIGIR 1996) (pp. 21–29). ACM Press.Google Scholar
  26. Smucker, M. D., & Allan, J. (2005). An investigation of dirichlet prior smoothing’s performance advantage. Technical report, University of Massachusetts, Amherst, CIIR.Google Scholar
  27. Soboroff, I. (2004). Overview of the TREC 2004 Novelty Track. In Proceedings of the 13th text retrieval conference (TREC 2004), Gaithersburg, USA.Google Scholar
  28. Soboroff, I., & Harman, D. (2003). Overview of the TREC 2003 Novelty Track. In Proceedings of the 12th text retrieval conference (TREC 2003), Gaithersburg, USA.Google Scholar
  29. White, R. W., Jose, J. M., & Ruthven, I. (2005). Using top-ranking sentences to facilitate effective information access. American Society for Information Science and Technology, 56(10), 1113–1125.CrossRefGoogle Scholar
  30. Xue, X., Jeon, J., & Croft, W. B. (2008). Retrieval models for question and answer archives. In Proceedings of the 31st ACM international conference on research and development in information retrieval (SIGIR 2008) (pp. 475–482). Singapore. ACM.Google Scholar
  31. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th ACM international conference on research and development in information retrieval (SIGIR 2001) (pp. 334–342). New Orleans, USA: ACM.Google Scholar
  32. Zhai, C., & Lafferty, J. (2002). Two-stage language models for information retrieval. In Proceedings of the 25th ACM international conference on research and development in information retrieval (SIGIR 2002) (pp. 49–56). Kluwer.Google Scholar
  33. Zhang, M., Lin, C., Liu, Y., Zhao, L., & Ma, S. (2003). THUIR at TREC 2003: Novelty, robust and web. In Proceedings of the 12th text retrieval conference (TREC 2003) (pp. 556–567), Gaithersburg, USA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Ronald T. Fernández
    • 1
    Email author
  • David E. Losada
    • 1
  • Leif A. Azzopardi
    • 2
  1. 1.Grupo de Sistemas InteligentesUniversidad de Santiago de CompostelaSantiago de CompostelaSpain
  2. 2.Department of Computing ScienceUniversity of GlasgowGlasgowUK

Personalised recommendations