A non-parametric symbolic approximate representation for long time series

He, Xiaoxu; Shao, Chenxi; Xiong, Yan

doi:10.1007/s10044-014-0395-5

A non-parametric symbolic approximate representation for long time series

Theoretical Advances
Published: 08 August 2014

Volume 19, pages 111–127, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Xiaoxu He^1,2,
Chenxi Shao¹ &
Yan Xiong¹

714 Accesses
20 Citations
Explore all metrics

Abstract

For long time series, it is crucial to design low-dimensional representations that preserve the fundamental characteristics of a series. However, most of the approximate representations require the setting of many input parameters. The main defect of working with parameter-laden algorithms is that incorrect settings may cause an algorithm to fail in achieving the best performance, which is the ability of reducing the dimensionality and retaining the shape information. This is especially likely when the selection of the suitable parameter is not trivial or easy for the user. In this paper, we introduce a new approximate representation of time series, the non-parametric symbolic approximate representation (NSAR), which is based on multi-scale, the approximate coefficients of discrete wavelet transform (DWT) and key points. The novelty of the proposed representation is firstly that it uses a hierarchical mechanism to retain shape information of the original time series. Next, the proposed representation is symbolic in employing key points and encoding in approximate coefficients, so it can greatly reduce the dimension of the original time series and potentially allows the application of text-based retrieval techniques. The proposed representation is fast, automatic, and with no parameter tuning by user. To show the efficacy of the new representation, we performed experiments with real and synthetic data. Experimental results show that NSAR can preserve more fundamental characteristics of a series than symbolic approximate representation (SAX) in the same compression ratio, automatically determine the optimal decomposition level for DWT, and has better performance than SAX in the best matching queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

1d-SAX: A Novel Symbolic Representation for Time Series

A Multi-resolution Approximation for Time Series

Article 29 September 2018

Towards Symbolic Time Series Representation Improved by Kernel Density Estimators

References

Aboy M, Hornero R, Abasolo D, Alvarez D (2006) Interpretation of the Lempel-Ziv complexity measure in the context of biomedical signal analysis. IEEE Trans Biomed Eng 53(11):2282–2288
Article Google Scholar
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases foundations of data organization and algorithms. In: Lomet D (ed) Lecture notes in computer science, vol 730. Springer, Berlin/Heidelberg, pp 69–84
Google Scholar
Bagnall A, Janacek G (2005) Clustering time series with clipped data. Mach Learn 58(2):151–178
Article MATH Google Scholar
Bandt C, Pompe B (2002) Permutation entropy: a natural complexity measure for time series. Phys Rev Lett 88(17):174102–174105
Article Google Scholar
Bao D, Yang Z (2008) Intelligent stock trading system by turning point confirming and probabilistic reasoning. Expert Syst Appl 34(1):620–627
Article MathSciNet Google Scholar
Batista GEAPA, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. SDM, (SIAM/Omnipress2011), pp 699–710
Berndt DJ, Clifford J (1996) Finding patterns in time series: a dynamic programming approach. In: Usama MF, Gregory P-S, Padhraic S, Ramasamy U (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, pp 229–248
Carbone A, Castelli G, Stanley HE (2004) Time-dependent Hurst exponent in financial time series. Phys Stat Mech Appl 344(1–2):267–271
Article MathSciNet Google Scholar
Kin-Pong C, Ada Wai-Chee F (1999) Efficient time series matching by wavelets. Data Engineering, 15th international conference on, pp 126–133
Chaovalit P, Gangopadhyay A, Karabatis G, Chen Z (2011) Discrete wavelet transform-based time series analysis and mining. ACM Comput Surv 43(2):1–37
Article MATH Google Scholar
Chen L, Tamer M, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, (ACM, Baltimore, Maryland), pp 491–502
Chen L, Tamer M, Oria V (2005) Using multi-scale histograms to answer pattern existence and shape match queries. In: Proceedings of the 17th international conference on Scientific and statistical database management, Lawrence Berkeley Laboratory, Santa Barbara, CA, pp 217–226
Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007) Indexable PLA for efficient similarity search. In: Proceedings of the 33rd international conference on very large data bases, VLDB Endowment, Vienna, Austria, pp 435–446
Fu-lai C, Tak-chung F, Luk R, Ng V (2002) Evolutionary time series segmentation for stock data mining, IEEE international conference on data mining, 2002, pp 83–90
Daw CS, Finney CEA, Kennel MB (2000) Symbolic approach for measuring temporal “irreversibility”. Phys Rev E 62(2):1912–1921
Article Google Scholar
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Article Google Scholar
Keogh XXE, Wei L, Ratanamahatana C (2011) The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/~eamonn/time_series_data
Fink E, Gandhi HS (2011) Compression of time series by extracting major extrema. J Exp Theor Artif Intell 23(2):255–270
Article Google Scholar
Fink E, Pratt KB, Gandhi HS (2003) Indexing of time series by major minima and maxima, systems, man and cybernetics, 2003 IEEE International Conference on, vol. 2333, pp 2332–2335
Tak-chung F (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Article Google Scholar
Graps A (1995) An introduction to wavelets. IEEE Comput Sci Eng 2(2):50–61
Article Google Scholar
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386
Article Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Article MATH Google Scholar
Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. IEEE international conference on, data mining, 2001. pp 289–296
Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence, Fifth IEEE international conference on, data mining. pp. 226–233
Keogh E, Wei L, Xi X, Lee S-H, Vlachos M (2006) LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, Seoul, Korea, pp 882–893
Korn F, Jagadish HV, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. SIGMOD Rec 26(2):289–300
Article Google Scholar
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Article MathSciNet Google Scholar
Megalooikonomou V, Wang Q, Li G, Faloutsos C (2005) A multiresolution symbolic representation of time series. 21st International conference on data engineering, pp 668–679
Perng CS, Wang H, Zhang SR, Parker DS (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. 16th international conference on data engineering, pp 33–42
Pincus SM, Goldberger AL (1994) Physiological time-series analysis: what does regularity quantify? Am J Physiol Heart Circ Physiol 266(4):H1643–H1656
Google Scholar
Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graph 2(1):89
Article Google Scholar
Ratanamahatana C, Keogh E, Bagnall A, Lonardi S (2005) A novel bit level time series representation with implication of similarity search and clustering. In: Ho T, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining, vol 3518., Lecture Notes in Computer ScienceSpringer, Berlin Heidelberg, pp 771–777
Chapter Google Scholar
Richman JS, Moorman JR (2000) Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol Heart Circ Physiol 278(6):H2039–H2049
Google Scholar
Hyndman RJ (2011) Time series data library. http://robjhyndman.com/TSDL
Shahabi C, Tian X, Zhao W (2000) TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. Scientific and statistical database management, 12th International conference, pp 55–68
Shieh J, Keogh E (2009) iSAX: disk-aware mining and indexing of massive time series datasets. Data Min Knowl Disc 19(1):24–57
Article MathSciNet Google Scholar
Zarnowitz V, Ozyildirim A (2006) Time series decomposition and measurement of business cycles, trends and growth cycles. J Monet Econ 53(7):1717–1739
Article Google Scholar
Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans Inf Theory 24(5):530–536
Article MathSciNet MATH Google Scholar
Warren Liao T (2005) Clustering of time series data—a survey. Pattern Recogn 38(11):1857–1874
Article MATH Google Scholar
Widiputra H, Pears R, Kasabov N (2012) Dynamic learning of multiple time series in a nonstationary environment. In: Sayed-Mouchaweh M, Lughofer E (eds) Learning in non-stationary environments. Springer, Berlin, pp 303–347
Box GE, Jenkins GM, Reinsel GC (2013) Time series analysis: forecasting and control. Wiley, New York
MATH Google Scholar
Bloomfield P (2004) Fourier analysis of time series: an introduction. Wiley, New York
MATH Google Scholar
Catillo-Ortega RM, Marín N, Sánchez D (2011) A fuzzy approach to the linguistic summarization of time series. J Multiple Valued Logic Soft Comput 17:157–182
Google Scholar
Lendasse A, François D, Wertz V, Verleysen M (2005) Vector quantization: a weighted version for time-series forecasting. Future Gener Comput Syst 21(7):1056–1067
Article Google Scholar
Lughofer E (2008) Extensions of vector quantization for incremental clustering. Pattern Recogn 41(3):995–1011
Article MATH Google Scholar
Yoon PJ (2014) The wake–sleep cycle eeg of albino rat. http://www.cacs.louisiana.edu/~jyoon

Download references

Acknowledgments

The authors are grateful to the anonymous referees and to Prof. Eamonn Keogh for providing datasets. This work is supported by the Natural Science Foundation of China (NSFC) under Grant No. 61174144, No.61232018 and Grant No. 60874065.

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, Anhui, Peoples Republic of China
Xiaoxu He, Chenxi Shao & Yan Xiong
School of Computer Science and Technology, Tianjin Polytechnic University, Tianjin, 300387, Peoples Republic of China
Xiaoxu He

Authors

Xiaoxu He
View author publications
You can also search for this author in PubMed Google Scholar
Chenxi Shao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoxu He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, X., Shao, C. & Xiong, Y. A non-parametric symbolic approximate representation for long time series. Pattern Anal Applic 19, 111–127 (2016). https://doi.org/10.1007/s10044-014-0395-5

Download citation

Received: 17 October 2013
Accepted: 22 July 2014
Published: 08 August 2014
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10044-014-0395-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A non-parametric symbolic approximate representation for long time series

Abstract

Access this article

Similar content being viewed by others

1d-SAX: A Novel Symbolic Representation for Time Series

A Multi-resolution Approximation for Time Series

Towards Symbolic Time Series Representation Improved by Kernel Density Estimators

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A non-parametric symbolic approximate representation for long time series

Abstract

Access this article

Similar content being viewed by others

1d-SAX: A Novel Symbolic Representation for Time Series

A Multi-resolution Approximation for Time Series

Towards Symbolic Time Series Representation Improved by Kernel Density Estimators

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation