Static Pruning of Terms in Inverted Files

  • Roi Blanco
  • Álvaro Barreiro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4425)

Abstract

This paper addresses the problem of identifying collection dependent stop-words in order to reduce the size of inverted files. We present four methods to automatically recognise stop-words, analyse the tradeoff between efficiency and effectiveness, and compare them with a previous pruning approach. The experiments allow us to conclude that in some situations stop-words pruning is competitive with respect to other inverted file reduction techniques.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bahle, D., Williams, H., Zobel, J.: Efficient phrase querying with an auxiliary index. In: Proc. of ACM SIGIR, pp. 215–221. ACM Press, New York (2002)Google Scholar
  2. 2.
    Carmel, D., et al.: Static index pruning for information retrieval systems. In: Proc. of ACM SIGIR, pp. 43–50. ACM Press, New York (2001)Google Scholar
  3. 3.
    Church, K., Gale, W.: Poisson mixtures. Natural Language Engineering 2(1), 163–190 (1995)Google Scholar
  4. 4.
    de Moura, E.S., et al.: Improving web search efficiency via a locality based static pruning method. In: Proc. of WWW, pp. 235–244 (2005)Google Scholar
  5. 5.
    Fox, C.: A stop list for general text. SIGIR Forum 24(1-2), 19–21 (1990)CrossRefGoogle Scholar
  6. 6.
    Lo, R.T.W., He, B., Ounis, I.: Automatically building a stopword list for an information retrieval system. In: Proc. of DIR’05, Utrecht, Netherlands (2005)Google Scholar
  7. 7.
    Moffat, A., Turpin, A.: Compression and Coding Algorithms. Kluwer Academic Publishers, Norwell (2002)Google Scholar
  8. 8.
    Rennie, J.D.M., Jaakkola, T.: Using term informativeness for named entity detection. In: Proc. of ACM SIGIR, pp. 353–360. ACM Press, New York (2005)Google Scholar
  9. 9.
    Robertson, S., Sparck Jones, K.: Relevance weighting of search terms. JASIS 27, 129–146 (1976)CrossRefGoogle Scholar
  10. 10.
    Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: Text REtrieval Conference, pp. 151–162 (2000)Google Scholar
  11. 11.
    Robertson, S.E., et al.: Okapi at TREC-4. In: Text REtrieval Conference, pp. 21–30 (1996)Google Scholar
  12. 12.
    Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. JASIS 26(1), 33–44 (1975)CrossRefGoogle Scholar
  13. 13.
    Turtle, H., Flood, J.: Query evaluation: Strategies and optimizations. IP&M 31(6), 831–850 (1995)Google Scholar
  14. 14.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Roi Blanco
    • 1
  • Álvaro Barreiro
    • 1
  1. 1.IRLab, Computer Science Department, University of CoruñaSpain

Personalised recommendations