Advertisement

The VLDB Journal

, Volume 23, Issue 6, pp 915–937 | Cite as

Maximum error-bounded Piecewise Linear Representation for online stream approximation

  • Qing Xie
  • Chaoyi Pang
  • Xiaofang Zhou
  • Xiangliang Zhang
  • Ke Deng
Regular Paper

Abstract

Given a time series data stream, the generation of error-bounded Piecewise Linear Representation (error-bounded PLR) is to construct a number of consecutive line segments to approximate the stream, such that the approximation error does not exceed a prescribed error bound. In this work, we consider the error bound in \(L_\infty \) norm as approximation criterion, which constrains the approximation error on each corresponding data point, and aim on designing algorithms to generate the minimal number of segments. In the literature, the optimal approximation algorithms are effectively designed based on transformed space other than time-value space, while desirable optimal solutions based on original time domain (i.e., time-value space) are still lacked. In this article, we proposed two linear-time algorithms to construct error-bounded PLR for data stream based on time domain, which are named OptimalPLR and GreedyPLR, respectively. The OptimalPLR is an optimal algorithm that generates minimal number of line segments for the stream approximation, and the GreedyPLR is an alternative solution for the requirements of high efficiency and resource-constrained environment. In order to evaluate the superiority of OptimalPLR, we theoretically analyzed and compared OptimalPLR with the state-of-art optimal solution in transformed space, which also achieves linear complexity. We successfully proved the theoretical equivalence between time-value space and such transformed space, and also discovered the superiority of OptimalPLR on processing efficiency in practice. The extensive results of empirical evaluation support and demonstrate the effectiveness and efficiency of our proposed algorithms.

Keywords

Stream approximation Error bound Piecewise Linear Representation 

Notes

Acknowledgments

This research is partially supported by Natural Science Foundation of China (Grant No.61232006) and the Australian Research Council (Grant No. DP140103171 and DP130103051).

References

  1. 1.
    Berg, M.D., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry Algorithms and Applications. Springer, Berlin (2008)zbMATHGoogle Scholar
  2. 2.
    Buragohain, C., Shrivastava, N., Suri, S.: Space efficient streaming algorithms for the maximum error histogram. In: Proceedings of the 23rd International Conference on Data, Engineering, pp. 1026–1035 (2007)Google Scholar
  3. 3.
    Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable pla for efficient similarity search. In: Proceedings of the 33rd International Conference on Very large Data Bases, pp. 435–446 (2007)Google Scholar
  4. 4.
    Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2, 145–156 (2009)CrossRefGoogle Scholar
  5. 5.
    Gandhi, S., Foschini, L., Suri, S.: Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. In: Proceedings of IEEE 26th International Conference on Data, Engineering, pp. 924–935 (2010)Google Scholar
  6. 6.
    Gandhi, S., Nath, S., Suri, S., Liu, J.: Gamps: compressing multi sensor data by grouping and amplitude scaling. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp. 771–784 (2009)Google Scholar
  7. 7.
    Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166–176 (2004)Google Scholar
  8. 8.
    Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54, 811–830 (2008)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Guha, S., Shim, K.: A note on linear time algorithms for maximum error histograms. IEEE Trans. Knowl. Data. Eng. 19, 993–997 (2007)CrossRefGoogle Scholar
  10. 10.
    Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 380–389 (2007)Google Scholar
  11. 11.
    Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 151–162 (2001)Google Scholar
  12. 12.
    Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 289–296 (2001)Google Scholar
  13. 13.
    Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The ucr time series classification/clustering homepage. www.cs.ucr.edu/eamonn/time_series_data/ (2011)
  14. 14.
    Lazaridis, I., Mehrota, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data, Engineering, pp. 429–440 (2003)Google Scholar
  15. 15.
    Li, G., Li, J., Gao, H.: \(\varepsilon \)-approximation to data streams in sensor networks. In: Proceedings of IEEE INFOCOM, pp. 1663–1671 (2013)Google Scholar
  16. 16.
    Matias, Y., Urieli, D.: Optimal workload-based weighted wavelet synopses. In: Database Theory—ICDT 2005, pp. 368–382. Springer, Berlin (2005)Google Scholar
  17. 17.
    Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2003)Google Scholar
  18. 18.
    O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574–578 (1981)CrossRefzbMATHGoogle Scholar
  19. 19.
    Paix, A.D., Williamson, J.A., Runciman, W.B.: Crisis management during anaesthesia: difficult intubation. Qual. Saf. Health Care (2005) Google Scholar
  20. 20.
    Palpanas, T., Vlachos, M., Keogh, E.: Online amnesic approximation of streaming time series. In: Proceedings of the 20th International Conference on Data, Engineering, pp. 339–349 (2004)Google Scholar
  21. 21.
    Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 732–743 (2009)Google Scholar
  22. 22.
    Pang, C., Zhang, Q., Zhou, X., Hansen, D., Wang, S., Maeder, A.: Computing unrestricted synopses under maximum error bound. Algorithmica 65, 1–42 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Sathe, S., Papaioannou, T.G., Jeung, H., Aberer, K.: A survey of model-based sensor data acquisition and management. In: Managing and Mining Sensor Data, pp. 9–50 Springer (2013)Google Scholar
  24. 24.
    Shatkay, H., Zdonik, S.B.: Approximate queries and representations for large data sequences. In: Proceedings of the 12th International Conference on Data, Engineering, pp. 536–545 (1996)Google Scholar
  25. 25.
    Soroush, E., Wu, K., Pei, J.: Fast and quality-guaranteed data streaming in resource-constrained sensor networks. In: Proceedings of the 9th ACM International Symposium on Mobile ad Hoc Networking and Computing, pp. 391–400 (2008)Google Scholar
  26. 26.
    Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 23–34 (2004)Google Scholar
  27. 27.
    Xu, Z., Zhang, R., Kotagiri, R., Parampalli, U.: An adaptive algorithm for online time series segmentation with error bound guarantee. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 192–203 (2012)Google Scholar
  28. 28.
    Yu, L., Li, J., Gao, H., Fang, X.: Enabling \(\epsilon \)-approximate querying in sensor networks. Proc. VLDB Endow. 2(1), 169–180 (2009)CrossRefGoogle Scholar
  29. 29.
    Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: Proceedings of 14th International Conference on Database Systems for Advanced Applications, pp. 646–661 (2009)Google Scholar
  30. 30.
    Zhou, M., Wong, M.H.: A segment-wise time warping method for time scaling searching. Inf. Sci. 173, 227–254 (2005)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Qing Xie
    • 1
  • Chaoyi Pang
    • 2
    • 3
    • 4
  • Xiaofang Zhou
    • 5
    • 6
  • Xiangliang Zhang
    • 1
  • Ke Deng
    • 7
  1. 1.Division of CEMSEKAUSTThuwalSaudi Arabia
  2. 2.AEHRC, CSIROBrisbaneAustralia
  3. 3.Zhejiang University (NIT)NingboChina
  4. 4.Hebei Academy of SciencesHebeiChina
  5. 5.School of Information Technology and Electrical EngineeringThe University of QueenslandBrisbaneAustralia
  6. 6.School of Computer Science and TechnologySoochow UniversitySuzhouChina
  7. 7.Huawei Noah’s Ark Research LabHong KongChina

Personalised recommendations