Window-Based Method for Information Retrieval
In this paper, a series of window-based methods is proposed for information retrieval. Compared with traditional tf-idf model, our approaches are based on two new key notions. The first one is that the closer the query words in a document, the larger the similarity value between the query and the document. And the second one is that some query words, like named entities and baseNP called “Core Words” are much more important than other words, and should have special weights. We implement the above notions by three models. They are Simple Window-based Model, Dynamic Window-based Model and Core Window-based Model. Our models can compute similarities between queries and documents based on the importance and distribution of query words in the documents. TREC data are used to test the algorithms. The experiments indicate that our window-based methods outperform most of the traditional methods, such as tf-idf and Okapi BM25. And the Core Window-based Model is the best and most robust model for various queries.
KeywordsInformation Retrieval Window-based Method Named Entity
Unable to display preview. Download preview PDF.
- 2.Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. Text Retrieval Conference. NIST Specail Publication 500-246 (1999)Google Scholar
- 3.Greiff, W.R.: A theory of term weighting based on exploratory data analysis. In: Proceedings of SIGIR 1998 (1998)Google Scholar
- 6.Church, K.W., Gale, W.A.: Inverse Document Frequency(IDF): A Measure of Deviations from Poisson. AT&T Bell Laboratories (1995)Google Scholar
- 7.Fujita, S.: Notes on Phrasal Indexing JSCB Evaluation Experiments at NTCIR AD HOC. In: Proceedings of NTCIR-1 workshop (1999)Google Scholar
- 8.Takenobu, T., Hironori, O., Hozumi, T.: Effectiveness of complex index term in information retrieval. In: The 6th RIAO Conference, pp.1322–1331 (2000)Google Scholar
- 9.Kaszkiel, et al.: Passage Retrieval Revisited. In: SIGIR 1997 (1997)Google Scholar