Skip to main content

Advertisement

Log in

Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality; the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Proceedings of the 4th conference on foundations of data organization and algorithms

  • Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proceedings of the 21st int’l conference on very large databases. Zurich, Switzerland, pp. 490–501

  • Bramer M (2007) Principles of data mining. Springer, London. https://doi.org/10.1007/978-1-4471-4884-5

    MATH  Google Scholar 

  • Cai Y, Ng R (2004) Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. https://doi.org/10.1145/1007568.1007636

  • Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In Proc. 15th. int. conf. on data engineering

  • Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. http://www.cs.ucr.edu/~eamonn/time_series_data. Accessed 29 Oct 2017

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolutionary Computation. https://doi.org/10.1109/4235.996017

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series. Proc of the 34th VLDB

  • El-Ghazali T (2009) Metaheuristics: from design to implementation. Wiley, Hoboken, NJ. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470278587.html

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. Proc. ACM SIGMOD Conf., Minneapolis

  • Feoktistov V (2006) Differential evolution: in search of solutions (Springer optimization and its applications). Springer- Verlag New York, Inc., Secaucus

    MATH  Google Scholar 

  • Gorunescu F (2006) Data mining: concepts, models and techniques. Blue Publishing House, Cluj-Napoca

    MATH  Google Scholar 

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Waltham, MA. https://www.elsevier.com/books/data-mining-concepts-and-techniques/han/978-0-12-381479-1

  • Haupt RL, Haupt SE (2004) Practical genetic algorithms with CD-ROM. Wiley, Hoboken, NJ. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471455652.html

  • Hetland ML (2003) A Survey of recent methods for efficient retrieval of similar time sequences. In: Last M, Kandel A, Bunke H (eds) Data mining in time series databases. World Scientific Printers (S) Pte Ltd, Singapore. http://www.worldscientific.com/worldscibooks/10.1142/5210

  • Kanungo T, Netanyahu NS, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  Google Scholar 

  • Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2000) Dimensionality reduction for fast similarity search in large time series databases. J Know Info Syst. https://doi.org/10.1007/PL00011669

  • Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2002) Locally adaptive dimensionality reduction for similarity search in large time series databases. ACM Trans Database Syst (TODS) TODS Homepage Arch 27(2):188–228. https://doi.org/10.1145/568518.568520

  • Korn F, Jagadish H, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD ‘97, Tucson, AZ, pp 289–300

  • Larose DT (2005) Discovering knowledge in data: an introduction to data mining. Wiley, New York

    MATH  Google Scholar 

  • Lin J, Keogh E, Lonardi S, Chiu BY (2003) A symbolic representation of time series, with implications for streaming algorithms. DMKD 2003:2–11

    Article  Google Scholar 

  • Ma Q, Xu D, Iv P, Shi Y (2007) Application of NSGA-II in parameter optimization of extended state observer. Challenges of Power Engineering and Environment. https://doi.org/10.1007/978-3-540-76694-0_109

  • Maulik U, Bandyopadhyay S, Mukhopadhyay A (2011) Multiobjective genetic algorithms for clustering. Springer, Heidelberg

    Book  MATH  Google Scholar 

  • Mörchen F (2006) Time series knowledge mining, PhD thesis, Philipps-University Marburg, Germany, Görich & Weiershäuser, Marburg, Germany. Accessed 29 Oct 2017

  • Morinaka Y, Yoshikawa M, Amagasa T, Uemura S (2001) The L-index: An indexing structure for efficient subsequence matching in time sequence databases. Proc. 5th Pacific Asia conf. on knowledge discovery and data mining, pp 51–60

  • Muhammad Fuad MM (2015) Applying non-dominated sorting genetic algorithm II to multi-objective optimization of a weighted multi-metric distance for performing data mining tasks. The 18th European conference on the applications of evolutionary computation—EvoApplications 2015, April 8–10, 2015, Copenhagen, Denmark. Published in lecture notes in computer science, Volume 9028

  • Muhammad Fuad MM (2016) A differential evolution optimization algorithm for reducing time series dimensionality. The 2016 IEEE Congress on Evolutionary Computation—IEEE CEC 2016. July 24–29, 2016, Vancouver, Canada

  • Perng C, Wang H, Zhang S, Parker S (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. Proceedings of 16th international conference on data engineering, pp. 33–45

  • Srinivas N, Deb K (1995) Multi-objective function optimization using non-dominated sorting genetic algorithms. J Evolut Comput 2(3):221–248

    Article  Google Scholar 

  • Wang Q, Megalooikonomou VA (2008) Dimensionality reduction technique for efficient time series similarity analysis. Information systems, v.33 n.1., 115–132, March, 2008. https://doi.org/10.1016/j.is.2007.07.002

  • Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26th international conference on very large databases, Cairo, Egypt

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Marwan Muhammad Fuad.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muhammad Fuad, M.M. Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality. Evolving Systems 10, 13–28 (2019). https://doi.org/10.1007/s12530-017-9207-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-017-9207-7

Keywords

Navigation