Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

Muhammad Fuad, Muhammad Marwan

doi:10.1007/s12530-017-9207-7

Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

Original Paper
Published: 02 November 2017

Volume 10, pages 13–28, (2019)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Muhammad Marwan Muhammad Fuad¹

285 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality; the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating the discovery of unsupervised-shapelets

Article 07 May 2015

catch22: CAnonical Time-series CHaracteristics

Article Open access 09 August 2019

Fast classification of univariate and multivariate time series through shapelet discovery

Article 12 December 2015

References

Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Proceedings of the 4th conference on foundations of data organization and algorithms
Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proceedings of the 21st int’l conference on very large databases. Zurich, Switzerland, pp. 490–501
Bramer M (2007) Principles of data mining. Springer, London. https://doi.org/10.1007/978-1-4471-4884-5
MATH Google Scholar
Cai Y, Ng R (2004) Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. https://doi.org/10.1145/1007568.1007636
Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In Proc. 15th. int. conf. on data engineering
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. http://www.cs.ucr.edu/~eamonn/time_series_data. Accessed 29 Oct 2017
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolutionary Computation. https://doi.org/10.1109/4235.996017
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series. Proc of the 34th VLDB
El-Ghazali T (2009) Metaheuristics: from design to implementation. Wiley, Hoboken, NJ. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470278587.html
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. Proc. ACM SIGMOD Conf., Minneapolis
Feoktistov V (2006) Differential evolution: in search of solutions (Springer optimization and its applications). Springer- Verlag New York, Inc., Secaucus
MATH Google Scholar
Gorunescu F (2006) Data mining: concepts, models and techniques. Blue Publishing House, Cluj-Napoca
MATH Google Scholar
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Waltham, MA. https://www.elsevier.com/books/data-mining-concepts-and-techniques/han/978-0-12-381479-1
Haupt RL, Haupt SE (2004) Practical genetic algorithms with CD-ROM. Wiley, Hoboken, NJ. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471455652.html
Hetland ML (2003) A Survey of recent methods for efficient retrieval of similar time sequences. In: Last M, Kandel A, Bunke H (eds) Data mining in time series databases. World Scientific Printers (S) Pte Ltd, Singapore. http://www.worldscientific.com/worldscibooks/10.1142/5210
Kanungo T, Netanyahu NS, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Article Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2000) Dimensionality reduction for fast similarity search in large time series databases. J Know Info Syst. https://doi.org/10.1007/PL00011669
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2002) Locally adaptive dimensionality reduction for similarity search in large time series databases. ACM Trans Database Syst (TODS) TODS Homepage Arch 27(2):188–228. https://doi.org/10.1145/568518.568520
Korn F, Jagadish H, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD ‘97, Tucson, AZ, pp 289–300
Larose DT (2005) Discovering knowledge in data: an introduction to data mining. Wiley, New York
MATH Google Scholar
Lin J, Keogh E, Lonardi S, Chiu BY (2003) A symbolic representation of time series, with implications for streaming algorithms. DMKD 2003:2–11
Article Google Scholar
Ma Q, Xu D, Iv P, Shi Y (2007) Application of NSGA-II in parameter optimization of extended state observer. Challenges of Power Engineering and Environment. https://doi.org/10.1007/978-3-540-76694-0_109
Maulik U, Bandyopadhyay S, Mukhopadhyay A (2011) Multiobjective genetic algorithms for clustering. Springer, Heidelberg
Book MATH Google Scholar
Mörchen F (2006) Time series knowledge mining, PhD thesis, Philipps-University Marburg, Germany, Görich & Weiershäuser, Marburg, Germany. Accessed 29 Oct 2017
Morinaka Y, Yoshikawa M, Amagasa T, Uemura S (2001) The L-index: An indexing structure for efficient subsequence matching in time sequence databases. Proc. 5th Pacific Asia conf. on knowledge discovery and data mining, pp 51–60
Muhammad Fuad MM (2015) Applying non-dominated sorting genetic algorithm II to multi-objective optimization of a weighted multi-metric distance for performing data mining tasks. The 18th European conference on the applications of evolutionary computation—EvoApplications 2015, April 8–10, 2015, Copenhagen, Denmark. Published in lecture notes in computer science, Volume 9028
Muhammad Fuad MM (2016) A differential evolution optimization algorithm for reducing time series dimensionality. The 2016 IEEE Congress on Evolutionary Computation—IEEE CEC 2016. July 24–29, 2016, Vancouver, Canada
Perng C, Wang H, Zhang S, Parker S (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. Proceedings of 16th international conference on data engineering, pp. 33–45
Srinivas N, Deb K (1995) Multi-objective function optimization using non-dominated sorting genetic algorithms. J Evolut Comput 2(3):221–248
Article Google Scholar
Wang Q, Megalooikonomou VA (2008) Dimensionality reduction technique for efficient time series similarity analysis. Information systems, v.33 n.1., 115–132, March, 2008. https://doi.org/10.1016/j.is.2007.07.002
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26th international conference on very large databases, Cairo, Egypt

Download references

Author information

Authors and Affiliations

Aarhus University, MOMA, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
Muhammad Marwan Muhammad Fuad

Authors

Muhammad Marwan Muhammad Fuad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Marwan Muhammad Fuad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Muhammad Fuad, M.M. Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality. Evolving Systems 10, 13–28 (2019). https://doi.org/10.1007/s12530-017-9207-7

Download citation

Received: 24 June 2017
Accepted: 22 October 2017
Published: 02 November 2017
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s12530-017-9207-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

Abstract

Access this article

Similar content being viewed by others

Accelerating the discovery of unsupervised-shapelets

catch22: CAnonical Time-series CHaracteristics

Fast classification of univariate and multivariate time series through shapelet discovery

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

Abstract

Access this article

Similar content being viewed by others

Accelerating the discovery of unsupervised-shapelets

catch22: CAnonical Time-series CHaracteristics

Fast classification of univariate and multivariate time series through shapelet discovery

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation