Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Stoplists

  • Edie Rasmussen
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_955

Synonyms

Negative dictionary; Stopwords

Definition

Stoplists are lists of words, commonly called stopwords, which are not indexed in an information retrieval system, and/or are not available for use as query terms. A stoplist can be created by sorting the terms in a document collection by frequency of occurrence, and designating some number of high frequency terms as stopwords, or alternately, by using one of the published lists of stopwords available. Stoplists may be generic or domain specific, and are of course language specific. When a stoplist is used for indexing, as a document is added to the system, each word in it is checked against the stoplist (for example through dictionary lookup or hashing), and those which match are eliminated from further processing. In some systems, stopwords are indexed, but the stoplist is used to eliminate the words from processing when they are used as query terms.

Key Points

Hans Peter Luhn, in pioneering work on automatic abstracting, put forward...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Dialog online courses: glossary of search terms. Available at: http://training.dialog.com/onlinecourses/glossary/glossary_life.html.
  2. 2.
    Flood BJ. Historical note: the start of a stop list at Biological Abstracts. J Am Soc Inf Sci. 1999;50(12):1066.CrossRefGoogle Scholar
  3. 3.
    Fox C. Lexical analysis and stoplists. In: Frakes WB, Baeza-Yates R, editors. Information retrieval: data structures and algorithms. Englewood Cliffs: Prentice-Hall; 1992. p. 102–30.Google Scholar
  4. 4.
    Google Web Search Help Center. Search basics: use of common words. Available at: http://www.google.com/support/bin/answer.py?answer=981.
  5. 5.
    Korfhage RR. Information storage and retrieval. Wiley: Wiley Computer Pub; 1997.Google Scholar
  6. 6.
    Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):157–65.MathSciNetGoogle Scholar
  7. 7.
    Luhn HP. Keyword-in-context index for technical literature. Am Doc. 1960;11(4):288–95.CrossRefGoogle Scholar
  8. 8.
    Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.zbMATHCrossRefGoogle Scholar
  9. 9.
    Parkins PV. Approaches to vocabulary management in permuted-title indexing of Biological Abstracts. In: Proceedings of the 26th Annual Meeting on American Documentation Institute; 1963. p. 27–9.Google Scholar
  10. 10.
    Witten IH, Moffat A, Bell TC. Managing gigabytes: compressing and indexing documents and images. 2nd ed. San Francisco: Morgan Kaufmann; 1999.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Library, Archival and Information StudiesThe University of British ColumbiaVancouverCanada

Section editors and affiliations

  • Edie Rasmussen
    • 1
  1. 1.Library, Archival & Inf. StudiesThe Univ. of British ColumbiaVancouverCanada