Abstract
The Symbolic Aggregate approXimation (SAX) is a very popular symbolic dimensionality reduction technique of time series data, as it has several advantages over other dimensionality reduction techniques. One of its major advantages is its efficiency, as it uses precomputed distances. The other main advantage is that in SAX the distance measure defined on the reduced space lower bounds the distance measure defined on the original space. This enables SAX to return exact results in query-by-content tasks. Yet SAX has an inherent drawback, which is its inability to capture segment trend information. Several researchers have attempted to enhance SAX by proposing modifications to include trend information. However, this comes at the expense of giving up on one or more of the advantages of SAX. In this paper we investigate three modifications of SAX to add trend capturing ability to it. These modifications retain the same features of SAX in terms of simplicity, efficiency, as well as the exact results it returns. They are simple procedures based on a different segmentation of the time series than that used in classic-SAX. We test the performance of these three modifications on 45 time series datasets of different sizes, dimensions, and nature, on a classification task and we compare it to that of classic-SAX. The results we obtained show that one of these modifications manages to outperform classic-SAX and that another one slightly gives better results than classic-SAX.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-57301-1_5
Agrawal, R., Lin, K.I., Sawhney, H.S., Shim, K.: Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the 21st International Conference on Very Large Databases. Zurich, Switzerland, pp. 490–501 (1995)
Bramer, M.: Principles of Data Mining. Springer, Heidelberg (2007)
Cai, Y., Ng, R.: Indexing spatio-temporal trajectories with Chebyshev polynomials. In: SIGMOD (2004)
Chan, K.P., Fu, A.W.-C.: Efficient time series matching by wavelets. In: Proceedings of 15th International Conference on Data Engineering (1999)
Chen,Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive (2015). www.cs.ucr.edu/~eamonn/time_series_data
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. (CSUR) 45(1), 12 (2012)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings of ACM SIGMOD Conference, Minneapolis (1994)
Kane,A.: Trend and value based time series representation for similarity search. In: 2017 IEEE Third International Conference Multimedia Big Data (BigMM), p. 252 (2017)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. J. Knowl. Inform. Syst. 3, 263–286 (2000)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally adaptive dimensionality reduction for similarity search in large time series databases. In: SIGMOD, pp. 151–162 (2001)
Korn, F., Jagadish, H., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. In: Proceedings of SIGMOD 1997, Tucson, AZ, pp. 289–300 (1997)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.Y.: A symbolic representation of time series, with implications for streaming algorithms. In: DMKD 2003, pp. 2–11 (2003)
Lin, J.E., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007)
Ma, T., Xiao, C., Wang, F.: Health-ATM: a deep architecture for multifaceted patient health record representation and risk prediction. In: SIAM International Conference on Data Mining (2018)
Malinowski, S., Guyet, T., Quiniou, R., Tavenard, R.: 1d-SAX: a novel symbolic representation for time series. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 273–284. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41398-8_24
Maimon, O., Rokach, L.: Data Mining and Knowledge Discovery Handbook. Springer, New York (2005)
Morinaka, Y., Yoshikawa, M., Amagasa, T., Uemura, S.: The L-index: an indexing structure for efficient subsequence matching in time sequence databases. In:Proceedings of 5th Pacific Asia Conference on Knowledge Discovery and Data Mining, pp. 51–60 (2001)
Muhammad Fuad, M.M., Marteau P.F.: Multi-resolution approach to time series retrieval. In: Fourteenth International Database Engineering & Applications Symposium– IDEAS 2010, Montreal, QC, Canada (2010)
Nawrocka, A., Lamorska, J.: Determination of food quality by using spectroscopic methods. In: Advances in Agrophysical Research (2013)
Ratanamahatana, C., Keogh, E., Bagnall, Anthony J., Lonardi, S.: A novel bit level time series representation with implication of similarity search and clustering. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 771–777. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_90
Tan, C.W., Webb, G.I., Petitjean, F.: Indexing and classifying gigabytes of time series under time warping. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 282–290. SIAM (2017)
Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt (2000)
Zhang, T., Yue, D., Gu, Y., Wang, Y., Yu, G.: Adaptive correlation analysis in stream time series with sliding windows. Comput. Math Appl. 57(6), 937–948 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Muhammad Fuad, M.M. (2020). Modifying the Symbolic Aggregate Approximation Method to Capture Segment Trend Information. In: Torra, V., Narukawa, Y., Nin, J., Agell, N. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2020. Lecture Notes in Computer Science(), vol 12256. Springer, Cham. https://doi.org/10.1007/978-3-030-57524-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-57524-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57523-6
Online ISBN: 978-3-030-57524-3
eBook Packages: Computer ScienceComputer Science (R0)