A Document Retrieval Model Based on Term Frequency Ranks

Aalbersberg, IJsbrand Jan

doi:10.1007/978-1-4471-2099-5_17

IJsbrand Jan Aalbersberg³

430 Accesses
7 Citations

Abstract

This paper introduces a new full-text document retrieval model that is based on comparing occurrence frequency rank numbers of terms in queries and documents.

More precisely, to compute the similarity between a query and a document, this new model first ranks the terms in the query and in the document on decreasing occurrence frequency. Next, for each term, it computes a local similarity between the query and the document, by calculating a weighted difference between the term’s rank number in the query and its rank number in the document. Finally, it collects all those local similarities and unifies them into one global similarity between the query and the document.

In this paper we also demonstrate that the effectiveness of this new full-text document retrieval model is comparable with that of the standard vector-space retrieval model.

On temporary leave from Philips Research Laboratories, Eindhoven, The Netherlands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

IJ.J. Aalbersberg, Posting Compression in Dynamic Retrieval Environments, Proc. 14th International Conference on Research and Development in Information Retrieval SIGIR 91, Chicago, IL (October 1991), 72–81.
Book Google Scholar
D.C. Blair, Language and Representation in Information Retrieval, Elsevier, Amsterdam, The Netherlands (1990).
Google Scholar
A.D. Booth, A Law of Occurrence for Words of Low Frequency, Information and Control 10 (1967), 386–393.
Article MATH Google Scholar
B.C. Brookes, Ranking Techniques and the Empirical Log Law, Information Processing and Management 20 (1984), 37–46.
Article Google Scholar
L. Egghe, On the Classification of the Classical Bibliometric Laws, Journal of Documentation 44 (1988), 53–62.
Article Google Scholar
C. Fox, A Stop List for General Text, SIGIR Forum 24, No. 1–2 (1989/1990), 19–35.
Google Scholar
C. Fox, Lexical Analysis and Stoplists, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 102–130.
Google Scholar
W.B. Frakes, Stemming Algorithms, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 131–160.
Google Scholar
N. Fuhr, Probabilistic Models in Information Retrieval, The Computer Journal 35 (1992), 243–255.
Article MATH Google Scholar
D. Harman, Ranking Algorithms, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 363–392.
Google Scholar
H. Kucera and W.N. Francis, Computational Analysis of Present-day American English, Brown University Press, Providence, RI (1967).
Google Scholar
J.B. Lovins, Development of a Stemming Algorithm, Mechanical Translation and Computational Linguistics 11 (1968), 22–31.
Google Scholar
G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, Reading, MA (1989).
Google Scholar
G. Salton and C. Buckley, Term-Weighting Approaches in Automatic Text Retrieval, Information Processing and Management 24 (1988), 513–523.
Article Google Scholar
K. Sparck Jones, A Statistical Interpretation of Term Specificity and its Application in Retrieval, Journal of Documentation 28 (1972), 11–21.
Article Google Scholar
Virginia Disc One, CD-ROM from Virginia Polytechnic Institute and State University, Blacksburg, VA (1990).
Google Scholar
G.K. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA (1949).
Google Scholar

Download references

Author information

Authors and Affiliations

Philips Laboratories, Briarcliff Manor, NY, USA
IJsbrand Jan Aalbersberg

Authors

IJsbrand Jan Aalbersberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Massachusetts, 01003, Amherst, MA, USA
Bruce W. Croft
Department of Computer Science, University of Glasgow, G12 8RZ, 8–17 Lilybank Gardens, Glasgow, Scotland
C. J. van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aalbersberg, I.J. (1994). A Document Retrieval Model Based on Term Frequency Ranks. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_17

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2099-5_17
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics