Abstract
We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R *-trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.
On sabbatical from the Dept. of Computer Science, University of Maryland, College Park. This research was partially funded by the Systems Research Center (SRC) at the University of Maryland, and by the National Science Foundation under Grant IRI-8958546 (PYI), with matching funds from EMPRESS Software Inc. and Thinking Machines Inc.
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective”, IEEE Transactions on Knowledge and Data Engineering, Special issue on Learning and Discovery in Knowledge-Based Databases, (to appear).
R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications”, VLDB 92, Vancouver, August 1992.
R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases”, ACM SIGMOD, Washington D.C., May 1993.
S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman, “A Basic Local Alignment Search Tool” Journal of Molecular Biology, 1990.
F. Aurenhammer, “Voronoi Diagrams — A Survey of a Fundamental Geometric Data Structure” ACM Computing Surveys 23(3):345–405, Sept. 1991.
Ricardo Baeza-Yates and Gaston H. Gonnet, “A New Approach to Text Searching”, Comm. of ACM, 35 10, Oct. 1992, 74–82.
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: an efficient and robust access method for points and rectangles”, ACM SIGMOD, pages 322–331, May 1990.
L. G. Brown, “A Survey of Image Registration Techniques”, ACM Computing Surveys, 24(4), pages 325–376, December 1992.
C. Chatfield, The Analysis of Time Series: an Introduction, Chapman and Hall, London & New York, 1984, Third Edition.
R. D. Edwards and J. Magee, Technical Analysis of Stock Trends, John Magee, Springfield, Massachusetts, 1966, 5th Edition, second printing.
K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990, 2nd Edition.
A. Gelb, Applied Optimal Estimation, MIT Press, 1986.
A. Guttman, “R-trees: a dynamic index structure for spatial searching”, Proc. ACM SIGMOD, pages 47–57, June 1984.
Richard Wesley Hamming, Digital Filters, Prentice-Hall Signal Processing Series, Englewood Cliffs, N.J., 1977.
G. M. Hunter and K. Steiglitz, “Operations on images using quad trees”, IEEE Trans. on PAMI, PAMI-1(2):145–153, April 1979.
H. V. Jagadish, “Spatial search with polyhedra”, Proc. Sixth IEEE Int'l Conf. on Data Engineering, February 1990.
H. V. Jagadish, “A retrieval technique for similar shapes”, Proc. ACM SIGMOD Conf. pages 208–217, May 1991.
D. Lomet and B. Salzberg, “The Hb-Tree: a Multiattribute Indexing Method with Good Guaranteed Performance”, ACM TODS, 15(4), pages 625–658, December 1990.
B. Mandelbrot. Fractal Geometry of Nature, W.H. Freeman, New York, 1977.
A. Motro, “VAGUE: A User Interface to Relational Databases that Permits Vague Queries,” ACM Trans. on Information Systems (TOIS), 6(3), pages 187–214, July 1988.
J. Nievergelt, H. Hinterberger, and K. C. Sevcik, “The grid file: an adaptable, symmetric multikey file structure”, ACM TODS, 9(1):38–71, March 1984.
A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, Prentice-Hall, Englewood Cliffs, N.J., 1975.
M. Otterman, “Approximate Matching with High Dimensionality R-trees”, M.Sc. scholarly paper, Dept. of Computer Science, Univ. of Maryland, College Park, MD, 1992.
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, 1989.
M. Schroeder, Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise, W.H. Freeman and Company, New York, 1991.
D. Shasha and T-L. Wang, “New techniques for best-match retrieval”, ACM TOIS, 8(2):140–158, April 1990.
R. Stam and R. Snodgrass, “A Bibliography on Temporal Databases”, IEEE Bulletin on Data Engineering, 11(4), Dec. 1988.
G. K. Wallace “The JPEG Still Picture Compression Standard”, CACM, 34(4):31–44, April 1991.
Sun Wu and Udi Manber, “Text searching allowing errors”, Comm. of ACM (CACM), 35(10):83–91, October 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agrawal, R., Faloutsos, C., Swami, A. (1993). Efficient similarity search in sequence databases. In: Lomet, D.B. (eds) Foundations of Data Organization and Algorithms. FODO 1993. Lecture Notes in Computer Science, vol 730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57301-1_5
Download citation
DOI: https://doi.org/10.1007/3-540-57301-1_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57301-2
Online ISBN: 978-3-540-48047-1
eBook Packages: Springer Book Archive