Skip to main content

Efficient similarity search in sequence databases

  • Conference paper
  • First Online:
Foundations of Data Organization and Algorithms (FODO 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 730))

Abstract

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R *-trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

On sabbatical from the Dept. of Computer Science, University of Maryland, College Park. This research was partially funded by the Systems Research Center (SRC) at the University of Maryland, and by the National Science Foundation under Grant IRI-8958546 (PYI), with matching funds from EMPRESS Software Inc. and Thinking Machines Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective”, IEEE Transactions on Knowledge and Data Engineering, Special issue on Learning and Discovery in Knowledge-Based Databases, (to appear).

    Google Scholar 

  2. R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications”, VLDB 92, Vancouver, August 1992.

    Google Scholar 

  3. R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases”, ACM SIGMOD, Washington D.C., May 1993.

    Google Scholar 

  4. S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman, “A Basic Local Alignment Search Tool” Journal of Molecular Biology, 1990.

    Google Scholar 

  5. F. Aurenhammer, “Voronoi Diagrams — A Survey of a Fundamental Geometric Data Structure” ACM Computing Surveys 23(3):345–405, Sept. 1991.

    Google Scholar 

  6. Ricardo Baeza-Yates and Gaston H. Gonnet, “A New Approach to Text Searching”, Comm. of ACM, 35 10, Oct. 1992, 74–82.

    Google Scholar 

  7. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: an efficient and robust access method for points and rectangles”, ACM SIGMOD, pages 322–331, May 1990.

    Google Scholar 

  8. L. G. Brown, “A Survey of Image Registration Techniques”, ACM Computing Surveys, 24(4), pages 325–376, December 1992.

    Google Scholar 

  9. C. Chatfield, The Analysis of Time Series: an Introduction, Chapman and Hall, London & New York, 1984, Third Edition.

    Google Scholar 

  10. R. D. Edwards and J. Magee, Technical Analysis of Stock Trends, John Magee, Springfield, Massachusetts, 1966, 5th Edition, second printing.

    Google Scholar 

  11. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990, 2nd Edition.

    Google Scholar 

  12. A. Gelb, Applied Optimal Estimation, MIT Press, 1986.

    Google Scholar 

  13. A. Guttman, “R-trees: a dynamic index structure for spatial searching”, Proc. ACM SIGMOD, pages 47–57, June 1984.

    Google Scholar 

  14. Richard Wesley Hamming, Digital Filters, Prentice-Hall Signal Processing Series, Englewood Cliffs, N.J., 1977.

    Google Scholar 

  15. G. M. Hunter and K. Steiglitz, “Operations on images using quad trees”, IEEE Trans. on PAMI, PAMI-1(2):145–153, April 1979.

    Google Scholar 

  16. H. V. Jagadish, “Spatial search with polyhedra”, Proc. Sixth IEEE Int'l Conf. on Data Engineering, February 1990.

    Google Scholar 

  17. H. V. Jagadish, “A retrieval technique for similar shapes”, Proc. ACM SIGMOD Conf. pages 208–217, May 1991.

    Google Scholar 

  18. D. Lomet and B. Salzberg, “The Hb-Tree: a Multiattribute Indexing Method with Good Guaranteed Performance”, ACM TODS, 15(4), pages 625–658, December 1990.

    Google Scholar 

  19. B. Mandelbrot. Fractal Geometry of Nature, W.H. Freeman, New York, 1977.

    Google Scholar 

  20. A. Motro, “VAGUE: A User Interface to Relational Databases that Permits Vague Queries,” ACM Trans. on Information Systems (TOIS), 6(3), pages 187–214, July 1988.

    Google Scholar 

  21. J. Nievergelt, H. Hinterberger, and K. C. Sevcik, “The grid file: an adaptable, symmetric multikey file structure”, ACM TODS, 9(1):38–71, March 1984.

    Google Scholar 

  22. A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, Prentice-Hall, Englewood Cliffs, N.J., 1975.

    Google Scholar 

  23. M. Otterman, “Approximate Matching with High Dimensionality R-trees”, M.Sc. scholarly paper, Dept. of Computer Science, Univ. of Maryland, College Park, MD, 1992.

    Google Scholar 

  24. G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.

    Google Scholar 

  25. H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, 1989.

    Google Scholar 

  26. M. Schroeder, Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise, W.H. Freeman and Company, New York, 1991.

    Google Scholar 

  27. D. Shasha and T-L. Wang, “New techniques for best-match retrieval”, ACM TOIS, 8(2):140–158, April 1990.

    Google Scholar 

  28. R. Stam and R. Snodgrass, “A Bibliography on Temporal Databases”, IEEE Bulletin on Data Engineering, 11(4), Dec. 1988.

    Google Scholar 

  29. G. K. Wallace “The JPEG Still Picture Compression Standard”, CACM, 34(4):31–44, April 1991.

    Google Scholar 

  30. Sun Wu and Udi Manber, “Text searching allowing errors”, Comm. of ACM (CACM), 35(10):83–91, October 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

David B. Lomet

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agrawal, R., Faloutsos, C., Swami, A. (1993). Efficient similarity search in sequence databases. In: Lomet, D.B. (eds) Foundations of Data Organization and Algorithms. FODO 1993. Lecture Notes in Computer Science, vol 730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57301-1_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-57301-1_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57301-2

  • Online ISBN: 978-3-540-48047-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics