Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Signature Files

  • Mario A. Nascimento
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1138

Definition

A signature file allows fast search for text data. It is typically a very compact data structure that aims at minimizing disk access at query time. Query processing is performed in two stages: filtering, where false negatives are guaranteed to not occur but false positives may occur, and, query refinement, where false positives are removed.

Historical Background

Efficient and effective text indexing is a well-known and long-standing problem in information retrieval. While inverted files are a de facto standard for text indexing, in the early days, its storage overhead was not acceptable for larger datasets. In addition, accessing an inverted file on disk may require a relatively large number of (expensive) disk seeks. The main motivation for signature files is to allow fast filtering of text using a linear scan of the signature file for finding text segments that may contain the queried term(s). Given that the found segments may be false positives, a refinement step is...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Baeza-Yates RA, Ribeiro-Neto BA. Modern information retrieval. New York: ACM Press/Addison-Wesley; 1999.Google Scholar
  2. 2.
    Faloutsos C. Access methods for text. ACM Comput Surv. 1985;17(1):49–74.CrossRefGoogle Scholar
  3. 3.
    Zobel J, Moffat A, Kotagiri R. Inverted files versus signature files for text indexing. ACM Trans Database Syst. 1998;23(4):453–90.CrossRefGoogle Scholar
  4. 4.
    Deppish U. S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1986. p. 77–87.Google Scholar
  5. 5.
    Frakes WB, Baeza-Yates RA. Information retrieval data structures & algorithms. Upper Saddle River: Prentice-Hall; 1992.Google Scholar
  6. 6.
    Witten IH, Moffat A, Bell TC. Managing gigabytes: compressing and indexing documents and images. 2nd ed. San Francisco: Morgan Kaufman; 1999.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada

Section editors and affiliations

  • Mario A. Nascimento
    • 1
  1. 1.Dept. of Computing ScienceUniv. of AlbertaEdmontonCanada