Advertisement

Window-Based Method for Information Retrieval

  • Qianli Jin
  • Jun Zhao
  • Bo Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)

Abstract

In this paper, a series of window-based methods is proposed for information retrieval. Compared with traditional tf-idf model, our approaches are based on two new key notions. The first one is that the closer the query words in a document, the larger the similarity value between the query and the document. And the second one is that some query words, like named entities and baseNP called “Core Words” are much more important than other words, and should have special weights. We implement the above notions by three models. They are Simple Window-based Model, Dynamic Window-based Model and Core Window-based Model. Our models can compute similarities between queries and documents based on the importance and distribution of query words in the documents. TREC data are used to test the algorithms. The experiments indicate that our window-based methods outperform most of the traditional methods, such as tf-idf and Okapi BM25. And the Core Window-based Model is the best and most robust model for various queries.

Keywords

Information Retrieval Window-based Method Named Entity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)CrossRefGoogle Scholar
  2. 2.
    Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. Text Retrieval Conference. NIST Specail Publication 500-246 (1999)Google Scholar
  3. 3.
    Greiff, W.R.: A theory of term weighting based on exploratory data analysis. In: Proceedings of SIGIR 1998 (1998)Google Scholar
  4. 4.
    Hiemstra, D.: A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3(2), 131–139 (2000)CrossRefGoogle Scholar
  5. 5.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  6. 6.
    Church, K.W., Gale, W.A.: Inverse Document Frequency(IDF): A Measure of Deviations from Poisson. AT&T Bell Laboratories (1995)Google Scholar
  7. 7.
    Fujita, S.: Notes on Phrasal Indexing JSCB Evaluation Experiments at NTCIR AD HOC. In: Proceedings of NTCIR-1 workshop (1999)Google Scholar
  8. 8.
    Takenobu, T., Hironori, O., Hozumi, T.: Effectiveness of complex index term in information retrieval. In: The 6th RIAO Conference, pp.1322–1331 (2000)Google Scholar
  9. 9.
    Kaszkiel, et al.: Passage Retrieval Revisited. In: SIGIR 1997 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Qianli Jin
    • 1
  • Jun Zhao
    • 1
  • Bo Xu
    • 1
  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of ScienceBeijingChina

Personalised recommendations