Advertisement

The VLDB Journal

, Volume 15, Issue 1, pp 84–98 | Cite as

Online summarization of dynamic time series data

  • Umit Y. Ogras
  • Hakan Ferhatosmanoglu
Regular Paper

Abstract

Managing large-scale time series databases has attracted significant attention in the database community recently. Related fundamental problems such as dimensionality reduction, transformation, pattern mining, and similarity search have been studied extensively. Although the time series data are dynamic by nature, as in data streams, current solutions to these fundamental problems have been mostly for the static time series databases. In this paper, we first propose a framework to online summary generation for large-scale and dynamic time series data, such as data streams. Then, we propose online transform-based summarization techniques over data streams that can be updated in constant time and space. We present both the exact and approximate versions of the proposed techniques and provide error bounds for the approximate case. One of our main contributions in this paper is the extensive performance analysis. Our experiments carefully evaluate the quality of the online summaries for point, range, and knn queries using real-life dynamic data sets of substantial size.

Keywords

Dimensionality reduction Transformation-based summarization Data streams Time-series data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (1993)Google Scholar
  2. 2.
    Albrecht, S., Cumming, I., Dudas, J.: The momentary fourier transformation derived from recursive matrix transformations. In: Proceedings of the 13th International Conference on Digital Signal Processing (1997)Google Scholar
  3. 3.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM STOC (1996)Google Scholar
  4. 4.
    Ayad, A.M., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2004)Google Scholar
  5. 5.
    Babu, S., Widom, J.: Continuous queries over data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)Google Scholar
  6. 6.
    Berchtold, S., Bohm, C., Kriegel, H.-P.: The Pyramid-Technique: Towards breaking the curse of dimensionality. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)Google Scholar
  7. 7.
    Bulut, A., Singh, A.: Swat: Hierarchical stream summarization in large networks. In: Proceedings of the International Conference on Data Engineering (2003)Google Scholar
  8. 8.
    Castleman, K.R.: Digital Image Processing. Englewood Cliffs: Prentice-Hall (1996)Google Scholar
  9. 9.
    Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: Proceedings of the International Conference on Very Large Data Bases (2002)Google Scholar
  10. 10.
    COUGAR. The cougar sensor database project: the network is the database. http://www.cs.cornell.edu/database/cougar/index.htm/
  11. 11.
    Dobra, A., Garofalakis, M., Gehrke, J.E., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM SIGMOD (2002)Google Scholar
  12. 12.
    Douglas, S.C., Soh, J.K.: A numerically-stable slidingwindow estimator and its application to adaptive filters. In: Proceedings of the 31st Asilomar Conference on Signals, Systems, and Computers (1997)Google Scholar
  13. 13.
    Egecioglu, O., Ferhatosmanoglu, H., Ogras, U.: Dimensionality reduction and similarity computation using inner product approximations. IEEE Trans. Knowl. Data Eng. 16(6), 714–726 (2004)CrossRefGoogle Scholar
  14. 14.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems(2002)Google Scholar
  15. 15.
    Babcock, B., Babu, S., Datar, M., Motwani, R.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM SIGMOD Interantional Conference on Management of Data (2003)Google Scholar
  16. 16.
    Babcock, B., Datar, M., Motwani, R., O'Callaghan, L.: Sliding window computations over data streams. In: Proceedings of the Symposium on Principles of Databases Systems (2003)Google Scholar
  17. 17.
    Abadi, D.J., Carney, D., Četintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. In: Proceedings of International Conference on Very Large Data Bases (2003)Google Scholar
  18. 18.
    Carney, D., Četintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams – a new class of DBMS applications. In: International Conference on Very Large Data Bases (2002)Google Scholar
  19. 19.
    Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of the International Conference on Very Large Data Bases (2000)Google Scholar
  20. 20.
    Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (2002)Google Scholar
  21. 21.
    Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the CIDR Conference (2003)Google Scholar
  22. 22.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1994)Google Scholar
  23. 23.
    Gao, L., Wang, X.: Continually evaluating similaritybased pattern queries on a streaming time series. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)Google Scholar
  24. 24.
    Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)Google Scholar
  25. 25.
    Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)Google Scholar
  26. 26.
    Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. In: Proceedings of the Internatinal Conference on Very Large Data Bases (1997)Google Scholar
  27. 27.
    Gilbert, A., Kotidis, Y., Muthukrishnan, S., Straus, M.: Surfing wavelets on streams: one pass summaries for approximate aggregate queries. In: International Conference on Very Large Data Bases (2001)Google Scholar
  28. 28.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases (1999)Google Scholar
  29. 29.
    Kailath, T.: Modern Signal Processing. Berlin, Heidelberg,New York: Springer (1985)Google Scholar
  30. 30.
    Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proceedings of the International Conference on Data Engineering (2003)Google Scholar
  31. 31.
    Kanth, K.V.R., Agrawal, D., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)Google Scholar
  32. 32.
    Karhunen, H.: Uber lineare methoden in der wahrscheinlich-keitsrechnung. Ann. Acad. Sci. Fennicae, Ser. A1 Math.-Phys. 37, 3–79 (1947)Google Scholar
  33. 33.
    Keogh, E.J., Chakrabarti, K., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)Google Scholar
  34. 34.
    Lee, J., Kim, D., Chung, C.: Multi-dimensional selectivity estimation using compressed histogram information. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1999)Google Scholar
  35. 35.
    Loeve, M.: Fonctions aleatoires de seconde ordre. Processus Stochastiques et Mouvement Brownien. Paris: Hermann (1948)Google Scholar
  36. 36.
    Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the International Conference on Data Engineering (2002)Google Scholar
  37. 37.
    Matias, Y., Vitter, J.S., Wang, M.: Wavelet based histograms for selectivity estimation. In: Proceedings of the ACM Sigmod International Conference on Management of Data (1998)Google Scholar
  38. 38.
    Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: International Conference on Very Large Data Bases (2000)Google Scholar
  39. 39.
    Mendel, J.: Lessons in Estimation Theory for Signal Processing, Communications, and Control. Englewood Cliffs: Prentice-Hall (1995)Google Scholar
  40. 40.
    Populis, A.: Signal Analysis. New York: McGraw-Hill (1977)Google Scholar
  41. 41.
    Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1997)Google Scholar
  42. 42.
    Rafiei, D., Mendelzon, A.: Efficient retrieval of similar time sequences using dft. In: Proceedings of the International Conference on Foundations of Data Organization and Algorithms (FODO) (1998)Google Scholar
  43. 43.
    Rao, K.R., Yip, P.C.: The Transform and Data Compression Handbook. Boca Raton: CRC (2001)Google Scholar
  44. 44.
    Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Chicago: ACM (1998)Google Scholar
  45. 45.
    Shumway, R.H., Stoffer, D.S.: Time Series Analysis and Its Applications. Berlin, Heidelberg, New York: Springer (2000)Google Scholar
  46. 46.
    Viglas, S., Naughton, J.F.: Rate-based query optimization for streaming information sources. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI (2002)Google Scholar
  47. 47.
    Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1999)Google Scholar
  48. 48.
    Wu, D., Agrawal, D., El Abbadi, A., Smith, T.R.: Efficient retrieval for browsing large image databases. In: Proceedings of the Conference on Information and Knowledge Management, pp. 11–18 (1996)Google Scholar
  49. 49.
    Yao, Y., Gehrke, J.: Query processing for sensor networks. In: Proceedings of CIDR (2002)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of Computer Science and EngineeringThe Ohio State UniversityColumbusUSA

Personalised recommendations