Computational Mathematics and Modeling

, Volume 13, Issue 3, pp 314–326

Using Signature Hashing for Approximate String Matching

  • L. M. Boitsov
Article
  • 86 Downloads

Abstract

The objective is to demonstrate the expediency of using compressed inverted files with a signature-hashing dictionary for approximate string matching. A comparison of different types of dictionaries is performed and a method based on keyword signature hashing is described. In conclusion, we report the comparative characteristics (search speed, index size, indexing speed) for our search system using compressed inverted files with keyword signature hashing and the Glimpse search freeware.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. 1.
    R. Graham, D. Knuth, and O. Patashnik, Concrete Mathematics [Russian translation], Mir, Moscow (1998).Google Scholar
  2. 2.
    E. Ozkarahan, Database Machines and Database Management [Russian translation], Mir, Moscow (1989).Google Scholar
  3. 3.
    W. Feller, An Introduction to Probability Theory and Its Applications [Russian translation], Vol. 1, Mir, Moscow (1967).Google Scholar
  4. 4.
    A. Ehrenfeucht and D. Haussler, “A new distance metric on strings computable in linear time,” Discrete Applied Mathematics, No. 20, 191–203 (1988).Google Scholar
  5. 5.
    Ch. Faloutsos and D. Oard, Survey of Information Retrieval and Filtering Methods, Univ. Maryland.Google Scholar
  6. 6.
    J. Zobel, A. Moffat, and K. Ramamohanarao, Inverted Files Versus Signature Files for Text Indexing, Technical Report No. TR-95–5, Collaborative Information Technology Research Institute, Department of Computer Science, RMIT and the University of Melbourne, Australia (Feb. 1995). 326Google Scholar
  7. 7.
    H. E. Williams and J. Zobel, “Compressing integers for fast file access,” Computer Journal, 42, No. 3, 193–201.Google Scholar
  8. 8.
    U. Masek and M. S. Peterson, “A faster algorithm for computing string-edit distances,” J. Comput. Sys. Sci., 20, No. 1, 785–807 (1980).Google Scholar
  9. 9.
    C. J. Van Rijsbergen, Information Retrieval, London, Butterworths (1979) (http://www.dcs.glasgow.ac.uk/Keith/Preface.html).Google Scholar
  10. 10.
    P. H. Sellers, “The theory of computation of evolutionary distances: pattern recognition,” J. Algorithms, 1, 359–373 (1980).Google Scholar
  11. 11.
    E. Ukkonen, “Algorithms for approximate string matching,” Information and Control, 64,100–118 (1985).Google Scholar
  12. 12.
    E. Ukkonen, “Finding approximate patterns in strings, O (k×n) time,” J. Algorithms, 6,132–137 (1985).Google Scholar
  13. 13.
    E. Ukkonen, “Approximate string matching with q-grams and maximal matches,” Theoretical Computer Science, 92, No. 1, 191–211 (1992).Google Scholar
  14. 14.
    E. Ukkonen, “Approximate string matching over suffix-trees,” in: Proc. 4 th Annual Symp. Combinatorial Pattern Matching, Padova, Italy (June 1993), pp. 229–242.Google Scholar
  15. 15.
    S. Wu and U. Manber, “Agrep - a fast approximate pattern matching tool,”in: USENIX Winter 1992 Technical Conference, San Francisco (Jan. 1992), pp. 152–162 (ftp://ftp.cs.arizona.edu/).Google Scholar
  16. 16.
    U. Manber, “A text compression scheme that allows fast searching directly in the compressed file,” in: Proc. 5th Annual Symp. Combinatorial Pattern Matching, CPM 94, Asilomar, CA, (June 5 - 8, 1994), pp. 113–124.Google Scholar
  17. 17.
    S. Wu and U. Manber, “Glimpse: a tool for search through entire file systems,” USENIX Winter 1994 Technical Conference (ftp://ftp.cs.arizona.edu/).Google Scholar
  18. 18.
    R. A. Wagner and M. J. Fisher, “The string to string correction problem,” J. ACM, 21, No. 3, 168–173 (1974).Google Scholar

Copyright information

© Plenum Publishing Corporation 2002

Authors and Affiliations

  • L. M. Boitsov

There are no affiliations available

Personalised recommendations