Advertisement

World Wide Web

, Volume 18, Issue 2, pp 359–401 | Cite as

Indexable online time series segmentation with error bound guarantee

  • Jianzhong QiEmail author
  • Rui Zhang
  • Kotagiri Ramamohanarao
  • Hongzhi Wang
  • Zeyi Wen
  • Dan Wu
Article

Abstract

The volume of time series stream data grows rapidly in various applications. To reduce the storage, transmission and processing costs of time series data, segmentation and approximation is a common approach. In this paper, we propose a novel online segmentation algorithm that approximates time series by a set of different types of candidate functions (polynomials of different orders, exponential functions, etc.) and adaptively chooses the most compact one as the pattern of the time series changes. We call this algorithm the Adaptive Approximation (AA) algorithm. The AA algorithm incrementally narrows the feasible coefficient spaces (FCS) of candidate functions in coefficient coordinate systems to make each segment as long as possible given an error bound on each data point. We propose an algorithm called the FCS algorithm for the incremental computation of the feasible coefficient spaces. We further propose a mapping based index for similarity searches on the approximated time series. Experimental results show that our AA algorithm generates more compact approximations of the time series with lower average errors than the state-of-the-art algorithm, and our indexing method processes similarity searches on the approximated time series efficiently.

Keywords

Time series Approximation Indexing Similarity search 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: FODO, pp. 69–84 (1993)Google Scholar
  2. 2.
    Appela, U., Brandta, A.V.: Adaptive sequential segmentation of piecewise stationary time series. In: Information Science, pp. 27–56 (1983)Google Scholar
  3. 3.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)Google Scholar
  4. 4.
    Bellman, R.: On the approximation of curves by line segments using dynamic programming. In: Commun. ACM, p. 284 (1961)Google Scholar
  5. 5.
    Berndt, D.J., Clifford, J.: Finding patterns in time series: a dynamic programming approach. In: Advances in Knowledge Discovery and Data Mining, pp. 229–248. American Association for Artificial Intelligence (1996)Google Scholar
  6. 6.
    Cai, Y., Ng, R.T.: Indexing spatio-temporal trajectories with chebyshev polynomials. In: SIGMOD, pp. 599–610 (2004)Google Scholar
  7. 7.
    Chen, L., Ng, R.T.: On the marriage of lp-norms and edit distance. In: VLDB, pp. 792–803 (2004)Google Scholar
  8. 8.
    Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable pla for efficient similarity search. In: VLDB, pp. 435–446 (2007)Google Scholar
  9. 9.
    Cortes, C., Fisher, K., Pregibon, D., Rogers, A., Smith, F.: Hancock: a language for extracting signatures from data streams. In: SIGKDD, pp. 9–17 (2000)Google Scholar
  10. 10.
    Dacorogna, M., Gencay, R., Muller, U.A., Pictet, O.V., Olsen, R.: An Introduction to High-Frequency Finance. Academic, New York (2001)Google Scholar
  11. 11.
    Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endow. 1(2), 1542–1552 (2008)CrossRefGoogle Scholar
  12. 12.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, pp. 419–429 (1994)Google Scholar
  13. 13.
    Fisher, K., Gruber, R.: Pads: Processing arbitrary data streams. In: Workshop of AT&T Labs (2003)Google Scholar
  14. 14.
    Fu, A.C., Chung, F.L., Ng, V., Luk, R.: Evolutionary segmentation of financial time series into subsequences. In: Evolutionary Computation, pp. 426–430 (2001)Google Scholar
  15. 15.
    Fuchs, E., Gruber, T., Nitschke, J., Sick, B.: Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2232–2245 (2010)CrossRefGoogle Scholar
  16. 16.
    Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: PODS, pp. 166–176 (2004)Google Scholar
  17. 17.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Quicksand: Quick summary and analysis of network data. Tech. Rep. 2001-43, DIMACS (2001)Google Scholar
  18. 18.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)Google Scholar
  19. 19.
    Hjaltason, G.R., Samet, H.: Ranking in spatial databases. In: SSD, pp. 83–95 (1995)Google Scholar
  20. 20.
    Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: idistance: an adaptive b + -tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)CrossRefGoogle Scholar
  21. 21.
    Keogh, E., Folias, T.: The UCR time series data mining archive. In: http://www.cs.ucr.edu/~eamonn/TSDMA (2002)
  22. 22.
    Keogh, E.J., Chakrabarti, K., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD, pp. 151–162 (2001)Google Scholar
  23. 23.
    Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefzbMATHGoogle Scholar
  24. 24.
    Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: ICDM, pp. 289–296 (2001)Google Scholar
  25. 25.
    Lee, J.G., Han, J., Whang, K.Y.: Trajectory clustering: a partition-and-group framework. In: SIGMOD, pp. 593–604 (2007)Google Scholar
  26. 26.
    Lemire, D.: A better alternative to piecewise linear time series segmentation. In: SIAM Data Mining, pp. 545–550 (2007)Google Scholar
  27. 27.
    Liu, X., Lin, Z., Wang, H.: Novel online methods for time series segmentation. IEEE Trans. Knowl. Data Eng. 20(12), 1616–1626 (2008)CrossRefGoogle Scholar
  28. 28.
    Liu, X., Wu, X., Wang, H., Zhang, R., Bailey, J., Ramamohanarao, K.: Mining distribution change in stock order streams. In: ICDE, pp. 105–108 (2010)Google Scholar
  29. 29.
    Lomet, D.B., Hong, M., Nehme, R.V., Zhang, R.: Transaction time indexing with version compression. Proc. VLDB Endow. 1(1), 870–881 (2008)CrossRefGoogle Scholar
  30. 30.
    Morse, M.D., Patel, J.M.: An efficient and accurate method for evaluating time series similarity. In: SIGMOD, pp. 569–580 (2007)Google Scholar
  31. 31.
    Nutanong, S., Tanin, E., Zhang, R.: Incremental evaluation of visible nearest neighbor queries. IEEE Trans. Knowl. Data Eng. 22(5), 665–681 (2010)CrossRefGoogle Scholar
  32. 32.
    Nutanong, S., Zhang, R., Tanin, E., Kulik, L.: Analysis and evaluation of v*-knn: an efficient algorithm for moving knn queries. VLDB J. 19(3), 307–332 (2010)CrossRefGoogle Scholar
  33. 33.
    O’Neil, W.: How to Make Money in Stocks, 4th edn. McGraw-Hill, New York (2009)Google Scholar
  34. 34.
    O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24, 574–578 (1981)CrossRefzbMATHGoogle Scholar
  35. 35.
    Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D.: Streaming time series summarization using user-defined amnesic functions. IEEE Trans. Knowl. Data Eng. 20(7), 992–1006 (2008)CrossRefGoogle Scholar
  36. 36.
    Papazoglou, M.P.: Web services and business transactions. World Wide Web 6(1), 49–91 (2003)CrossRefGoogle Scholar
  37. 37.
    Rafiei, D., Mendelzon, A.O.: Efficient retrieval of similar time sequences using dft. In: FODO, pp. 249–257 (1998)Google Scholar
  38. 38.
    Shatkay, H.: Approximate queries and representations for large data sequences. In: ICDE, pp. 536–545 (1996)Google Scholar
  39. 39.
    Sullivan, M., Heybey, A.: Tribeca: A system for managing large databases of network traffic. In: USENIX Technical Conference, pp. 13–24 (1998)Google Scholar
  40. 40.
    Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE, pp. 673–684 (2002)Google Scholar
  41. 41.
    Xu, Z., Zhang, R., Ramamohanarao, K., Parampalli, U.: An adaptive algorithm for online time series segmentation with error bound guarantee. In: EDBT, pp. 192–203 (2012)Google Scholar
  42. 42.
    Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB, pp. 385–394 (2000)Google Scholar
  43. 43.
    Yu, C., Zhang, R., Huang, Y., Xiong, H.: High-dimensional knn joins with incremental updates. GeoInformatica 14(1), 55–82 (2010)CrossRefGoogle Scholar
  44. 44.
    Zhang, R., Jagadish, H.V., Dai, B.T., Ramamohanarao, K.: Optimized algorithms for predictive range and knn queries on moving objects. Inf. Syst. 35(8), 911–932 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Jianzhong Qi
    • 1
    Email author
  • Rui Zhang
    • 1
  • Kotagiri Ramamohanarao
    • 1
  • Hongzhi Wang
    • 2
  • Zeyi Wen
    • 1
  • Dan Wu
    • 2
  1. 1.Department of Computing and Information SystemsUniversity of MelbourneMelbourneAustralia
  2. 2.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations