Advertisement

Information Retrieval

, Volume 3, Issue 3, pp 243–251 | Cite as

Using the Co-occurrence of Words for Retrieval Weighting

  • Elke Mittendorf
  • Bojidar Mateev
  • Peter Schäuble
Article

Abstract

We have applied the well-known Robertson-Sparck Jones weighting to sets of indexing features that are different from word-based features. Our features describe the co-occurrences of words in a window range of predefined size. The experiments have been designed to analyse the value of features that are beyond word-based features but all used retrieval methods can be motivated strictly in the probabilistic framework. Among the several implications of our experiments for weighted retrieval is the surprising result that features that describe the co-occurrences of words in sentence-size or paragraph-size windows are significantly better descriptors than purely word-based indexing features.

probabilistic term weighting word concurrences term phase weighting retrieval routing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ballerini J-P, Büchel M, Domenig R, Knaus D, Mateev B, Mittendorf E, Schäuble P, Sheridan P and Wechsler M (1997) SPIDER retrieval system at TREC-5. In: TREC-5 Proceedings.Google Scholar
  2. Cooper W (1995) Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM-Transactions on Information Systems, pp. 100-111.Google Scholar
  3. Croft WB, Turtle HR and Lewis DD (1991) The use of phrases and structured queries in information retrieval. In: ACM SIGIR Conference on R&D in Information Retrieval, pp. 32-45.Google Scholar
  4. Fagan JL (1987) Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In: ACM SIGIR Conference on R&D in Information Retrieval, pp. 91-101.Google Scholar
  5. Fuhr N (1992) Probabilistic models in information retrieval. The Computer Journal, 35(3):243-255.Google Scholar
  6. Haas SW and Losee RM jr. (1994) Looking in text windows: Their size and composition. Information Processing & Management, 30(5):619-629.Google Scholar
  7. Harman D (1996) Overview of the fifth text retrieval conference (TREC-5). In: TREC-5 Proceedings.Google Scholar
  8. Huang X and Robertson SR (1997) Application of probabilistic methods to Chinese text retrieval. Journal of Documentation, 53(1):74-49.Google Scholar
  9. Hug J (1996) Analyse und synthese der textur von organoberflächen. Master's thesis, Institute for Communication Systems.Google Scholar
  10. Knaus D, Mittendorf E and Schäuble P (1994) Improving a basic retrieval method by links and passage level evidence. In: TREC-3 Proceedings, pp. 241-246.Google Scholar
  11. Mateev B(1996) Stochastic dependence of indexing features and the routing problem. Diploma Thesis, Department of Computer ScienceGoogle Scholar
  12. Moffat A, Sacks-Davis R, Wilkinson R and Zobel J (1993) Retrieval of partial documents. In: TREC-2 Proceedings.Google Scholar
  13. Robertson SE (1977) The probability ranking principle in IR. Journal of Documentation, 33(4):294-304.Google Scholar
  14. Robertson SE and Walker S (1994) Some simple effective approximations of the 2-Poisson model for probabilistic weighted retrieval. In: ACM SIGIR Conference on R&D in Information Retrieval. pp. 232-241.Google Scholar
  15. Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM and Gatford M(1995) OKAPI at TREC-3. In: TREC-3 Proceedings, pp. 109-126.Google Scholar
  16. Roth M (1994) Analyse von Indexierungsmerkmalen in grossen Dokumentenkollektionen. Master's Thesis, ETH Zurich.Google Scholar
  17. Salton G, Allan J, Buckley C and Singhal A (1994) Automatic analysis, theme generation, and summarization of machine-readable texts. Science, 264(3):1421-1426.Google Scholar
  18. Singhal A, Buckley C and Mitra M (1996) Pivoted document length normalization. In: ACM SIGIR Conference on R&D in Information Retrieval, pp. 21-29.Google Scholar
  19. van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation, 33:106-119.Google Scholar
  20. Venables WN and Ripley BD (1994) Modern applied statistics with S-plus. In: Statistics and Computing, Springer-Verlag, New York.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Elke Mittendorf
    • 1
  • Bojidar Mateev
    • 2
  • Peter Schäuble
    • 3
  1. 1.ZürichSwitzerland
  2. 2.Eurospider Information Technology AGZürichSwitzerland
  3. 3.Eurospider Information Technology AGZürichSwitzerland

Personalised recommendations