Abstract
We have applied the well-known Robertson-Sparck Jones weighting to sets of indexing features that are different from word-based features. Our features describe the co-occurrences of words in a window range of predefined size. The experiments have been designed to analyse the value of features that are beyond word-based features but all used retrieval methods can be motivated strictly in the probabilistic framework. Among the several implications of our experiments for weighted retrieval is the surprising result that features that describe the co-occurrences of words in sentence-size or paragraph-size windows are significantly better descriptors than purely word-based indexing features.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Ballerini J-P, Büchel M, Domenig R, Knaus D, Mateev B, Mittendorf E, Schäuble P, Sheridan P and Wechsler M (1997) SPIDER retrieval system at TREC-5. In: TREC-5 Proceedings.
Cooper W (1995) Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM-Transactions on Information Systems, pp. 100-111.
Croft WB, Turtle HR and Lewis DD (1991) The use of phrases and structured queries in information retrieval. In: ACM SIGIR Conference on R&D in Information Retrieval, pp. 32-45.
Fagan JL (1987) Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In: ACM SIGIR Conference on R&D in Information Retrieval, pp. 91-101.
Fuhr N (1992) Probabilistic models in information retrieval. The Computer Journal, 35(3):243-255.
Haas SW and Losee RM jr. (1994) Looking in text windows: Their size and composition. Information Processing & Management, 30(5):619-629.
Harman D (1996) Overview of the fifth text retrieval conference (TREC-5). In: TREC-5 Proceedings.
Huang X and Robertson SR (1997) Application of probabilistic methods to Chinese text retrieval. Journal of Documentation, 53(1):74-49.
Hug J (1996) Analyse und synthese der textur von organoberflächen. Master's thesis, Institute for Communication Systems.
Knaus D, Mittendorf E and Schäuble P (1994) Improving a basic retrieval method by links and passage level evidence. In: TREC-3 Proceedings, pp. 241-246.
Mateev B(1996) Stochastic dependence of indexing features and the routing problem. Diploma Thesis, Department of Computer Science
Moffat A, Sacks-Davis R, Wilkinson R and Zobel J (1993) Retrieval of partial documents. In: TREC-2 Proceedings.
Robertson SE (1977) The probability ranking principle in IR. Journal of Documentation, 33(4):294-304.
Robertson SE and Walker S (1994) Some simple effective approximations of the 2-Poisson model for probabilistic weighted retrieval. In: ACM SIGIR Conference on R&D in Information Retrieval. pp. 232-241.
Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM and Gatford M(1995) OKAPI at TREC-3. In: TREC-3 Proceedings, pp. 109-126.
Roth M (1994) Analyse von Indexierungsmerkmalen in grossen Dokumentenkollektionen. Master's Thesis, ETH Zurich.
Salton G, Allan J, Buckley C and Singhal A (1994) Automatic analysis, theme generation, and summarization of machine-readable texts. Science, 264(3):1421-1426.
Singhal A, Buckley C and Mitra M (1996) Pivoted document length normalization. In: ACM SIGIR Conference on R&D in Information Retrieval, pp. 21-29.
van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation, 33:106-119.
Venables WN and Ripley BD (1994) Modern applied statistics with S-plus. In: Statistics and Computing, Springer-Verlag, New York.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mittendorf, E., Mateev, B. & Schäuble, P. Using the Co-occurrence of Words for Retrieval Weighting. Information Retrieval 3, 243–251 (2000). https://doi.org/10.1023/A:1026520926673
Issue Date:
DOI: https://doi.org/10.1023/A:1026520926673