Advertisement

The VLDB Journal

, Volume 13, Issue 3, pp 222–239 | Cite as

Adaptive, unsupervised stream mining

  • Spiros PapadimitriouEmail author
  • Anthony Brockwell
  • Christos Faloutsos
Article

Abstract.

Sensor devices and embedded processors are becoming widespread, especially in measurement/monitoring applications. Their limited resources (CPU, memory and/or communication bandwidth, and power) pose some interesting challenges. We need concise, expressive models to represent the important features of the data and that lend themselves to efficient estimation. In particular, under these severe constraints, we want models and estimation methods that (a) require little memory and a single pass over the data, (b) can adapt and handle arbitrary periodic components, and (c) can deal with various types of noise. We propose \(\) (Arbitrary Window Stream mOdeling Method), which allows sensors in remote or hostile environments to efficiently and effectively discover interesting patterns and trends. This can be done automatically, i.e., with no prior inspection of the data or any user intervention and expert tuning before or during data gathering. Our algorithms require limited resources and can thus be incorporated into sensors - possibly alongside a distributed query processing engine [10,6,27]. Updates are performed in constant time with respect to stream size using logarithmic space. Existing forecasting methods (SARIMA, GARCH, etc.) and “traditional” Fourier and wavelet analysis fall short on one or more of these requirements. To the best of our knowledge, \(\) is the first framework that combines all of the above characteristics. Experiments on real and synthetic datasets demonstrate that \(\) discovers meaningful patterns over long time periods. Thus, the patterns can also be used to make long-range forecasts, which are notoriously difficult to perform. In fact, \(\) outperforms manually set up autoregressive models, both in terms of long-term pattern detection and modeling and by at least 10 x in resource consumption.

Keywords

Query Processing Synthetic Dataset Forecast Method Periodic Component Processing Engine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Akay M (1997) Time frequency and wavelets in biomedical signal processing. Wiley, New YorkGoogle Scholar
  2. 2.
    Arasu A, Babcock B, Babu S, McAlister J, Widom J (2002) Characterizing memory requirements for queries over continuous data streams. In: Proc PODSGoogle Scholar
  3. 3.
    Babcock B, Olston C (2003) Distributed top-k monitoring. In: Proc SIGMODGoogle Scholar
  4. 4.
    Beran J (1994) Statistics for long-memory processes. Chapman & Hall, LondonGoogle Scholar
  5. 5.
    Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econometr 31:307-327CrossRefMathSciNetzbMATHGoogle Scholar
  6. 6.
    Bonnet P, Gehrke JE, Seshadri P (2001) Towards sensor database systems. In: Proc MDMGoogle Scholar
  7. 7.
    Brockwell PJ, Davis RA (1991) Time series: theory and methods. Springer series in statistics, 2nd edn. Springer, Berlin Heidelberg New YorkGoogle Scholar
  8. 8.
    Bulut A, Singh AK (2003) SWAT: Hierarchical stream summarization in large networks. In: Proc 19th ICDEGoogle Scholar
  9. 9.
    Carley LR, Ganger GR, Nagle D (2000) Mems-based integrated-circuit mass-storage systems. Commun ACM 43(11):72-80CrossRefGoogle Scholar
  10. 10.
    Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik SB (2002) Monitoring streams - a new class of data management applications. In: Proc VLDBGoogle Scholar
  11. 11.
    Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: Proc VLDBGoogle Scholar
  12. 12.
    Considine J, Li F, Kollios G, Byers JW (2004) Approximate aggregation techniques for sensor databases. In: Proc ICDEGoogle Scholar
  13. 13.
    Das G, Lin L-I, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proc KDDGoogle Scholar
  14. 14.
    Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: Proc SODAGoogle Scholar
  15. 15.
    DeGroot MH, Schervish MJ (2002) Probability and statistics, 3rd edn. Addison-Wesley, Reading, MAGoogle Scholar
  16. 16.
    Dobra A, Garofalakis MN, Gehrke J, Rastogi R (2002) Processing complex aggregate queries over data streams. In: Proc SIGMODGoogle Scholar
  17. 17.
    Ergün F, Muthukrishnan S, Sahinalp SC (2004) Sublinear methods for detecting periodic trends in datastreams. In: Proc LATINGoogle Scholar
  18. 18.
    C. Faloutsos (1996) Searching multimedia databases by content. Kluwer, DordrechtGoogle Scholar
  19. 19.
    Garofalakis MN, Gibbons PB (2002) Wavelet synopses with error guarantees. In: Proc SIGMODGoogle Scholar
  20. 20.
    Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data streams. In: Proc SIGMODGoogle Scholar
  21. 21.
    Gencay R, Selcuk F, Whitcher B (2001) An introduction to wavelets and other filtering methods in finance and economics. Academic, New YorkGoogle Scholar
  22. 22.
    Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proc VLDBGoogle Scholar
  23. 23.
    Guha S, Koudas N (2002) Approximating a data stream for querying and estimation: algorithms and performance evaluation. In: Proc ICDEGoogle Scholar
  24. 24.
    Hill J, Szewczyk R, Woo A, Hollar S, Culler D, Pister K (2000) System architecture directions for networked sensors. In: Proc ASPLOS-IXGoogle Scholar
  25. 25.
    Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proc VLDBGoogle Scholar
  26. 26.
    Leland W, Taqqu M, Willinger W, Wilson D (1994) On the self-similar nature of ethernet traffic. IEEE Trans Network 2(1):1-15CrossRefzbMATHGoogle Scholar
  27. 27.
    Madden SR, Shah MA, Hellerstein JM, Raman V (2002) Continuously adaptive continuous queries over streams. In: Proc SIGMODGoogle Scholar
  28. 28.
    Olston C, Jiang J, Widom J (2003) Adaptive filters for continuous queries over distributed data streams. In: Proc SIGMODGoogle Scholar
  29. 29.
    Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: Proc ICDEGoogle Scholar
  30. 30.
    Percival DB, Walden AT (2000) Wavelet methods for time series analysis. Cambridge University Press, Cambridge, UKGoogle Scholar
  31. 31.
    Riedel E, Faloutsos C, Ganger GR, Nagle D (2000) Data mining on an OLTP system (nearly) for free. In Proc SIGMODGoogle Scholar
  32. 32.
    Tao Y, Faloutsos C, Papadias D, Liu B (2004) Prediction and indexing of moving objects with unknown motion patterns. In: Proc SIGMODGoogle Scholar
  33. 33.
    Weigend AS, Gerschenfeld NA (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading, MAGoogle Scholar
  34. 34.
    Yi B-K, Sidiropoulos N, Johnson T, Jagadish H, Faloutsos C, Biliris A (2000) Online data mining for co-evolving time sequences. Proc ICDEGoogle Scholar
  35. 35.
    Young P (1984) Recursive estimation and time-series analysis: an introduction. Springer, Berlin Heidelberg New YorkGoogle Scholar
  36. 36.
    Zhang D, Gunopulos D, Tsotras VJ, Seeger B (2002) Temporal aggregation over data streams using multiple granularities. In: Proc EDBTGoogle Scholar
  37. 37.
    Zhu Y, Shasha D (2002) Statstream: statistical monitoring of thousands of data streams in real time. In: Proc VLDBGoogle Scholar
  38. 38.
    Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proc KDDGoogle Scholar
  39. 39.
    Zuidwijk R, de Zeeuw P (1998) Fast algorithm for directional time-scale analysis using wavelets. In: Proc SPIE, Wavelet Applications in Signal and Image Processing VI, vol 3458Google Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2004

Authors and Affiliations

  • Spiros Papadimitriou
    • 1
    Email author
  • Anthony Brockwell
    • 2
  • Christos Faloutsos
    • 1
  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of StatisticsCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations