Skip to main content

A Document Retrieval Model Based on Term Frequency Ranks

  • Conference paper
Book cover SIGIR ’94

Abstract

This paper introduces a new full-text document retrieval model that is based on comparing occurrence frequency rank numbers of terms in queries and documents.

More precisely, to compute the similarity between a query and a document, this new model first ranks the terms in the query and in the document on decreasing occurrence frequency. Next, for each term, it computes a local similarity between the query and the document, by calculating a weighted difference between the term’s rank number in the query and its rank number in the document. Finally, it collects all those local similarities and unifies them into one global similarity between the query and the document.

In this paper we also demonstrate that the effectiveness of this new full-text document retrieval model is comparable with that of the standard vector-space retrieval model.

On temporary leave from Philips Research Laboratories, Eindhoven, The Netherlands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IJ.J. Aalbersberg, Posting Compression in Dynamic Retrieval Environments, Proc. 14th International Conference on Research and Development in Information Retrieval SIGIR 91, Chicago, IL (October 1991), 72–81.

    Book  Google Scholar 

  2. D.C. Blair, Language and Representation in Information Retrieval, Elsevier, Amsterdam, The Netherlands (1990).

    Google Scholar 

  3. A.D. Booth, A Law of Occurrence for Words of Low Frequency, Information and Control 10 (1967), 386–393.

    Article  MATH  Google Scholar 

  4. B.C. Brookes, Ranking Techniques and the Empirical Log Law, Information Processing and Management 20 (1984), 37–46.

    Article  Google Scholar 

  5. L. Egghe, On the Classification of the Classical Bibliometric Laws, Journal of Documentation 44 (1988), 53–62.

    Article  Google Scholar 

  6. C. Fox, A Stop List for General Text, SIGIR Forum 24, No. 1–2 (1989/1990), 19–35.

    Google Scholar 

  7. C. Fox, Lexical Analysis and Stoplists, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 102–130.

    Google Scholar 

  8. W.B. Frakes, Stemming Algorithms, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 131–160.

    Google Scholar 

  9. N. Fuhr, Probabilistic Models in Information Retrieval, The Computer Journal 35 (1992), 243–255.

    Article  MATH  Google Scholar 

  10. D. Harman, Ranking Algorithms, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 363–392.

    Google Scholar 

  11. H. Kucera and W.N. Francis, Computational Analysis of Present-day American English, Brown University Press, Providence, RI (1967).

    Google Scholar 

  12. J.B. Lovins, Development of a Stemming Algorithm, Mechanical Translation and Computational Linguistics 11 (1968), 22–31.

    Google Scholar 

  13. G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, Reading, MA (1989).

    Google Scholar 

  14. G. Salton and C. Buckley, Term-Weighting Approaches in Automatic Text Retrieval, Information Processing and Management 24 (1988), 513–523.

    Article  Google Scholar 

  15. K. Sparck Jones, A Statistical Interpretation of Term Specificity and its Application in Retrieval, Journal of Documentation 28 (1972), 11–21.

    Article  Google Scholar 

  16. Virginia Disc One, CD-ROM from Virginia Polytechnic Institute and State University, Blacksburg, VA (1990).

    Google Scholar 

  17. G.K. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA (1949).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag London Limited

About this paper

Cite this paper

Aalbersberg, I.J. (1994). A Document Retrieval Model Based on Term Frequency Ranks. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2099-5_17

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19889-5

  • Online ISBN: 978-1-4471-2099-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics