Advertisement

Efficient similarity search in sequence databases

  • Rakesh Agrawal
  • Christos Faloutsos
  • Arun Swami
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 730)

Abstract

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

Keywords

Discrete Fourier Transform Fourier Coefficient Range Query Pink Noise False Dismissal 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective”, IEEE Transactions on Knowledge and Data Engineering, Special issue on Learning and Discovery in Knowledge-Based Databases, (to appear).Google Scholar
  2. 2.
    R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications”, VLDB 92, Vancouver, August 1992.Google Scholar
  3. 3.
    R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases”, ACM SIGMOD, Washington D.C., May 1993.Google Scholar
  4. 4.
    S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman, “A Basic Local Alignment Search Tool” Journal of Molecular Biology, 1990.Google Scholar
  5. 5.
    F. Aurenhammer, “Voronoi Diagrams — A Survey of a Fundamental Geometric Data Structure” ACM Computing Surveys 23(3):345–405, Sept. 1991.Google Scholar
  6. 6.
    Ricardo Baeza-Yates and Gaston H. Gonnet, “A New Approach to Text Searching”, Comm. of ACM, 35 10, Oct. 1992, 74–82.Google Scholar
  7. 7.
    N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: an efficient and robust access method for points and rectangles”, ACM SIGMOD, pages 322–331, May 1990.Google Scholar
  8. 8.
    L. G. Brown, “A Survey of Image Registration Techniques”, ACM Computing Surveys, 24(4), pages 325–376, December 1992.Google Scholar
  9. 9.
    C. Chatfield, The Analysis of Time Series: an Introduction, Chapman and Hall, London & New York, 1984, Third Edition.Google Scholar
  10. 10.
    R. D. Edwards and J. Magee, Technical Analysis of Stock Trends, John Magee, Springfield, Massachusetts, 1966, 5th Edition, second printing.Google Scholar
  11. 11.
    K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990, 2nd Edition.Google Scholar
  12. 12.
    A. Gelb, Applied Optimal Estimation, MIT Press, 1986.Google Scholar
  13. 13.
    A. Guttman, “R-trees: a dynamic index structure for spatial searching”, Proc. ACM SIGMOD, pages 47–57, June 1984.Google Scholar
  14. 14.
    Richard Wesley Hamming, Digital Filters, Prentice-Hall Signal Processing Series, Englewood Cliffs, N.J., 1977.Google Scholar
  15. 15.
    G. M. Hunter and K. Steiglitz, “Operations on images using quad trees”, IEEE Trans. on PAMI, PAMI-1(2):145–153, April 1979.Google Scholar
  16. 16.
    H. V. Jagadish, “Spatial search with polyhedra”, Proc. Sixth IEEE Int'l Conf. on Data Engineering, February 1990.Google Scholar
  17. 17.
    H. V. Jagadish, “A retrieval technique for similar shapes”, Proc. ACM SIGMOD Conf. pages 208–217, May 1991.Google Scholar
  18. 18.
    D. Lomet and B. Salzberg, “The Hb-Tree: a Multiattribute Indexing Method with Good Guaranteed Performance”, ACM TODS, 15(4), pages 625–658, December 1990.Google Scholar
  19. 19.
    B. Mandelbrot. Fractal Geometry of Nature, W.H. Freeman, New York, 1977.Google Scholar
  20. 20.
    A. Motro, “VAGUE: A User Interface to Relational Databases that Permits Vague Queries,” ACM Trans. on Information Systems (TOIS), 6(3), pages 187–214, July 1988.Google Scholar
  21. 21.
    J. Nievergelt, H. Hinterberger, and K. C. Sevcik, “The grid file: an adaptable, symmetric multikey file structure”, ACM TODS, 9(1):38–71, March 1984.Google Scholar
  22. 22.
    A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, Prentice-Hall, Englewood Cliffs, N.J., 1975.Google Scholar
  23. 23.
    M. Otterman, “Approximate Matching with High Dimensionality R-trees”, M.Sc. scholarly paper, Dept. of Computer Science, Univ. of Maryland, College Park, MD, 1992.Google Scholar
  24. 24.
    G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.Google Scholar
  25. 25.
    H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, 1989.Google Scholar
  26. 26.
    M. Schroeder, Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise, W.H. Freeman and Company, New York, 1991.Google Scholar
  27. 27.
    D. Shasha and T-L. Wang, “New techniques for best-match retrieval”, ACM TOIS, 8(2):140–158, April 1990.Google Scholar
  28. 28.
    R. Stam and R. Snodgrass, “A Bibliography on Temporal Databases”, IEEE Bulletin on Data Engineering, 11(4), Dec. 1988.Google Scholar
  29. 29.
    G. K. Wallace “The JPEG Still Picture Compression Standard”, CACM, 34(4):31–44, April 1991.Google Scholar
  30. 30.
    Sun Wu and Udi Manber, “Text searching allowing errors”, Comm. of ACM (CACM), 35(10):83–91, October 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1993

Authors and Affiliations

  • Rakesh Agrawal
    • 1
  • Christos Faloutsos
    • 1
  • Arun Swami
    • 1
  1. 1.IBM Almaden Research CenterSan Jose

Personalised recommendations