Abstract
Cognitive Internet of Things (CIoT) is a new subfield of the Internet of Things (IoT) that aims to integrate cognition into the IoT's architecture and design. Various CIoT applications require techniques to inevitably extract machine-understandable concepts from unprocessed sensory data to provide value-added insights about CIoT devices and their users. The time series classification, which is used for the concept's extraction poses challenges to many applications across various domains, i.e., dimensionality reduction strategies have been suggested as an effective method to decrease the dimensionality of time series. The most common approach for time-series classification is the symbolic aggregate approximation (SAX). However, its main drawback is that it does not select the most significant point from the segment during the piecewise aggregate approximation (PAA) stage. The situation is cumbersome when data is heterogeneous and massive. Therefore, this research presents a novel technique for the selection of the most significant point from a segment during the PAA stage in SAX. The proposed technique chooses the maximum informative point as the most significant point using the probabilistic interpretation of sensory data with an appropriate copula design. The appropriate copula is selected using the minimum akaike information criteria (AIC) value. Subsequently, the modified SAX considers the maximum informative points instead of a selection of mean/max/extreme data points on a given segment during the PAA stage. The experimental evaluation of the environmental dataset reveals that the proposed method is more accurate and computationally efficient than classic SAX. Also, for cross-validation it computes the entropy of the information point (i-value) from each dataset to verify the successful transformation of normal data points to information points.
Similar content being viewed by others
Data availability
The original dataset will be made available as per request.
References
Atzori L, Iera A, Morabito G (2010) The internet of things: A survey. Comput Netw 54(15):2787–2805
Perera C, Zaslavsky A, Christen P, Georgakopoulos D (2014) Context Aware Computing for The Internet of Things: A Survey. IEEE Commun Surv Tutorials 16:414–454. https://doi.org/10.1109/SURV.2013.042313.00197
Palattella MR, Accettura N, Vilajosana X et al (2013) Standardized Protocol Stack for the Internet of (Important) Things. IEEE Commun Surv Tutorials 15:1389–1406. https://doi.org/10.1109/SURV.2012.111412.00158
Baydogan MG, Runger G, Tuv E (2013) A Bag-of-Features Framework to Classify Time Series. IEEE Trans Pattern Anal Mach Intell 35:2796–2802. https://doi.org/10.1109/TPAMI.2013.72
Ismail Fawaz H, Forestier G, Weber J, et al (2019) Adversarial Attacks on Deep Neural Networks for Time Series Classification. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–8
Hatami N, Gavet Y, Debayle J (2019) Bag of recurrence patterns representation for time-series classification. Pattern Anal Appl 22:877–887. https://doi.org/10.1007/s10044-018-0703-6
Karim F, Majumdar S, Darabi H (2019) Insights Into LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 7:67718–67725. https://doi.org/10.1109/ACCESS.2019.2916828
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 1578–1585
Cunningham P, Delany SJ (2022) k-Nearest Neighbour Classifiers - A Tutorial. ACM Comput Surv 54:1–25. https://doi.org/10.1145/3459665
Berrar D (2018) Cross-validation. Encycl Bioinforma Comput Biol ABC Bioinforma 1–3:542–545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Bramer M (2020) Avoiding Overfitting of Decision Trees. 121–136. https://doi.org/10.1007/978-1-4471-7493-6_9
Muhammad Fuad MM (2020) Modifying the symbolic aggregate approximation method to capture segment trend information. In: Modeling Decisions for Artificial Intelligence: 17th International Conference, MDAI 2020, Sant Cugat, Spain, September 2–4, 2020, Proceedings 17. Springer, pp 230–239 https://doi.org/10.1007/978-3-030-57524-3_19
Li AG, Qin Z (2005) Dimensionality reduction and similarity search in large time series databases. Jisuanji Xuebao/Chinese J Comput 28:1467–1475
Blázquez-García A, Conde A, Mori U, Lozano JA (2022) A Review on Outlier/Anomaly Detection in Time Series Data. ACM Comput Surv 54:1–33. https://doi.org/10.1145/3444690
Kulahcioglu B, Ozdemir S, Kumova B (2008) Application of symbolic piecewise aggregate approximation (PAA) analysis to ECG signals. In: 17th IASTED international conference on applied simulation and modelling. Citeseer.
D’Ambrosio C, Lodi A, Martello S (2010) Piecewise linear approximation of functions of two variables in MILP models. Oper Res Lett 38:39–46. https://doi.org/10.1016/j.orl.2009.09.005
Mason JC, Handscomb DC (2002) Chebyshev polynomials. CRC Press
Bagnall A, Lines J, Bostrom A et al (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31:606–660. https://doi.org/10.1007/s10618-016-0483-9
Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: proceedings of the 2004 SIAM international conference on data mining. Society for industrial and applied mathematics, Philadelphia, PA, pp 11–22. https://doi.org/10.1137/1.9781611972740.2
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: A novel symbolic representation of time series. Data Min Knowl Discov 15:107–144. https://doi.org/10.1007/s10618-007-0064-z
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM, New York, NY, USA, pp 2–11. https://doi.org/10.1145/882082.882086
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM SIGMOD Rec 23:419–429. https://doi.org/10.1145/191843.191925
Muhammad Fuad MM, Marteau P-F (2010) Multi-resolution approach to time series retrieval. In: proceedings of the fourteenth international database engineering & applications symposium on - IDEAS ’10. ACM Press, New York, USA, pp 136–142. https://doi.org/10.1145/1866480.1866501
Pasteur L, Koch R (1941) 1. Introduction 1. Introduction 74:535–546
Tucker A (ed) (2013) Advances in intelligent data analysis XII: 12th international symposium, IDA 2013, London, UK. Proceedings. Springer. https://doi.org/10.1007/978-3-642-41398-8
Zhang T, Yue D, Gu Y et al (2009) Adaptive correlation analysis in stream time series with sliding windows. Comput Math with Appl 57:937–948. https://doi.org/10.1016/j.camwa.2008.10.083
Kane A (2017) Trend and value based time series representation for similarity search. In: 2017 IEEE third international conference on multimedia big data (BigMM). IEEE, pp 252–259. https://doi.org/10.1109/BigMM.2017.76
Ratanamahatana C, Keogh E, Bagnall AJ, Lonardi S (2005) A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering. pp 771–777 https://doi.org/10.1007/11430919_90
Bao Y, Chen W (2018) Automated concept extraction in internet-of-things. In: 2018 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData). IEEE, pp 1770–1776. https://doi.org/10.1109/Cybermatics_2018.2018.00295
Pappa L, Karvelis P, Georgoulas G, Stylios C (2021) Slopewise aggregate approximation SAX: keeping the trend of a time series. In: 2021 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 01–08. https://doi.org/10.1109/SSCI50451.2021.9660130
Avogadro P, Dominoni MA (2022) A fast algorithm for complex discord searches in time series: HOT SAX Time. Appl Intell 52:10060–10081. https://doi.org/10.1007/s10489-021-02897-z
Taktak M, Triki S (2023) A novel shape-based time series classification with SAX-Ensemble. Int J Comput Appl Technol 71:64. https://doi.org/10.1504/IJCAT.2023.131065
Liu J, Huang W, Li H et al (2023) SLAFusion: Attention fusion based on SAX and LSTM for dangerous driving behavior detection. Inf Sci (Ny) 640:119063. https://doi.org/10.1016/j.ins.2023.119063
Earnest J (2023) Sum of Gaussian Feature-Based Symbolic Representations of Eddy Current Defect Signatures. Res Nondestruct Eval 1–18. https://doi.org/10.1080/09349847.2023.2217094
Zhao D, Chen Y, Liu S et al (2023) Parallel symbolic aggregate approximation and its application in intelligent fault diagnosis. J Intell Fuzzy Syst 44:6359–6374. https://doi.org/10.3233/JIFS-223575
Tabassum N, Menon S, Jastrzębska A (2022) Time-series classification with SAFE: Simple and fast segmented word embedding-based neural time series classifier. Inf Process Manag 59:103044. https://doi.org/10.1016/j.ipm.2022.103044
El Khansa H, Gervet C, Brouillet A (2022) Prominent Discord Discovery with Matrix Profile : Application to Climate Data Insights. 65–79. https://doi.org/10.5121/csit.2022.120806
Tang D, Zheng Z, Wang X, et al (2022) PeakSAX: Real-time Monitoring and Mitigation System for LDoS Attack in SDN. IEEE Trans Netw Serv Manag 1–1. https://doi.org/10.1109/TNSM.2022.3222846
Zhang H, Sun L, Lin Y (2022) Broadband Long-Term Spectrum Prediction Based on Trend Based SAX BT - Mobile Multimedia Communications. In: Honggang W, Yun L (eds) Chenggang Y. Springer Nature Switzerland, Cham, pp 179–189
Meng F, Gao Y, Wang H et al (2022) TSLOD: a coupled generalized subsequence local outlier detection model for multivariate time series. Int J Mach Learn Cybern 13:1493–1504. https://doi.org/10.1007/s13042-021-01462-x
Yang J, Jing S, Huang G (2022) Accurate and fast time series classification based on compressed random Shapelet Forest. Appl Intell. https://doi.org/10.1007/s10489-022-03852-2
Glenis A, Vouros GA (2022) SCALE-BOSS: a framework for scalable time-series classification using symbolic representations. In: proceedings of the 12th hellenic conference on artificial intelligence. ACM, New York, NY, USA, pp 1–9. https://doi.org/10.1145/3549737.3549761
Park H, Jung J-Y (2020) SAX-ARM: Deviant event pattern discovery from multivariate time series using symbolic aggregate approximation and association rule mining. Expert Syst Appl 141:112950. https://doi.org/10.1016/j.eswa.2019.112950
Genest C, Rémillard B, Beaudoin D (2009) Goodness-of-fit tests for copulas: A review and a power study. Insur Math Econ 44:199–213. https://doi.org/10.1016/j.insmatheco.2007.10.005
Huard D, Évin G, Favre A-C (2006) Bayesian copula selection. Comput Stat Data Anal 51:809–822. https://doi.org/10.1016/j.csda.2005.08.010
Pitt M, Chan D, Kohn R (2006) Efficient Bayesian inference for Gaussian copula regression models. Biometrika 93:537–554. https://doi.org/10.1093/biomet/93.3.537
Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publ inst Stat univ Paris 8:229–231
Joe H (1997) Multivariate models and multivariate dependence concepts. CRC Press
Nelsen RB (1999) An introduction to copulas. Springer, New York. https://doi.org/10.1007/0-387-28678-0
Jordanger LA, Tjøstheim D (2014) Model selection of copulas: AIC versus a cross validation copula information criterion. Stat Probab Lett 92:249–255. https://doi.org/10.1016/j.spl.2014.06.006
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B Statistical Methodol 64:583–639. https://doi.org/10.1111/1467-9868.00353
Edwin T. Jaynes (1982) On The Rationale of Maximum-Entropy Methods. Proc. IEEE 839- https://doi.org/10.1109/PROC.1982.12425
Acknowledgements
A lot of thanks to the EiC, Editor and all those reviewers who had given their precious time for valuable suggestions, comments and active participation in our research work. Again, we are very grateful to all those who have directly or indirectly enhanced our research work.
Funding
Not applicable.
The authors have no relevant financial interests to disclose.
Non-financial interests.
The authors have no relevant non-financial interests to disclose.
Author information
Authors and Affiliations
Contributions
Vidyapati Jha and Priyanka Tripathi equally contributed to this work
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jha, V., Tripathi, P. Probabilistic SAX: A Cognitively-Inspired Method for Time Series Classification in Cognitive IoT Sensor Network. Mobile Netw Appl (2024). https://doi.org/10.1007/s11036-024-02322-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s11036-024-02322-y