Advertisement

Knowledge and Information Systems

, Volume 60, Issue 2, pp 1135–1161 | Cite as

Introducing time series chains: a new primitive for time series data mining

  • Yan ZhuEmail author
  • Makoto Imamura
  • Daniel Nikovski
  • Eamonn Keogh
Regular Paper

Abstract

Time series motifs were introduced in 2002 and have since become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work, we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, such that each pattern is similar to the pattern that preceded it, but the first and last patterns can be arbitrarily dissimilar. In the discrete space, this is similar to extracting the text chain “data, date, cate, cade, code” from text stream. The first and last words have nothing in common, yet they are connected by a chain of words with a small mutual difference. Time series chains can capture the evolution of systems, and help predict the future. As such, they potentially have implications for prognostics. In this work, we introduce two robust definitions of time series chains and scalable algorithms that allow us to discover them in massive complex datasets.

Keywords

Time series Motifs Prognostics Link analysis 

Notes

Acknowledgements

We would like to acknowledge funding from MERL and from NSF IIS-1161997 II and NSF IIS-1510741. We especially want to thank Dr. John Michael Criley and Dr. Gregory Mason for their invaluable advice on the hemodynamics domain, and Dr. Matsubara for providing the GoogleTrend data.

References

  1. 1.
    Bertens R, Vreeken J, Siebes A (2016) Keeping it short and simple: Summarising complex event sequences with multivariate patterns. In: ACM SIGKDD, pp 735–744Google Scholar
  2. 2.
    Bögel T, Gertz M (2015) Time will tell: Temporal linking of news stories. In: Proceedings of the 15th ACM/IEEE-CS joint conference on digital libraries, pp 195–204Google Scholar
  3. 3.
    Ding H, Trajcevski G, Scheuermann P et al (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: VLDB, pp 1542–1552Google Scholar
  4. 4.
    Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):1–37CrossRefzbMATHGoogle Scholar
  5. 5.
    Gupta S, Reynolds M, Patel S (2010) ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. In: Proceedings of the UbiComp 12th ACM international conference on Ubiquitous computingGoogle Scholar
  6. 6.
    Hao MC, Marwah M, Janetzko H et al (2012) Visual exploration of frequent patterns in multivariate time series. Inf Vis 11(1):71–83CrossRefGoogle Scholar
  7. 7.
    Heldt T, Oefinger MB, Hoshiyama M et al (2003) Circulatory response to passive and active changes in posture. In: Computers in cardiology. IEEE, pp 263–266Google Scholar
  8. 8.
    Hoang T, Choi D, Nguyen T (2015) On the instability of sensor orientation in gait verification on mobile phone. In: 12th IEEE international joint conference on e-Business and telecommunications (ICETE), vol 4, pp 148–159Google Scholar
  9. 9.
    Krumme C, Llorente A, Cebrian M et al (2013) The predictability of consumer visitation patterns. Sci Rep 3:1645CrossRefGoogle Scholar
  10. 10.
    Li Z, Han J, Ding B et al (2012) Mining periodic behaviors of object movements for animal and biological sustainability studies. Data Min Knowl Discov 24(2):355–386MathSciNetCrossRefGoogle Scholar
  11. 11.
    Lovallo WR, Wilson MF, Vincent AS et al (2004) Blood pressure response to caffeine shows incomplete tolerance after short-term regular consumption. Hypertension 43(4):760–765CrossRefGoogle Scholar
  12. 12.
    Matsubara Y, Sakurai Y, Faloutsos C (2015) The web as a jungle: non-linear dynamical systems for co-evolving online activities. In: Proc’ of the 24th WWW, pp 721–731Google Scholar
  13. 13.
    McLoone J (2012) The longest word ladder puzzle ever. blog.wolfram.com/2012/01/11/the-longest-word-ladder-puzzle-ever. Retrieved 6 Sept 2016
  14. 14.
    Moya A (2009) Tilt testing and neurally mediated syncope: too many protocols for one condition or specific protocols for different situations? Eur Heart J 30(18):2174–2176CrossRefGoogle Scholar
  15. 15.
    Mueen A, Zhu Y, Yeh M et al (2017) The fastest similarity search algorithm for time series subsequences under Euclidean distance. www.cs.unm.edu/~mueen/FastestSimilaritySearch.html. Retrieved 2 Feb 2017
  16. 16.
    Murray D, Liao J, Stankovic L (2015) A data management platform for personalised real-time energy feedback. In: EEDALGoogle Scholar
  17. 17.
    Patel P, Keogh E, Lin J et al (2002) Mining motifs in massive time series databases. In: ICDM, pp 370–377Google Scholar
  18. 18.
    Ponganis PJ, St Leger J, Scadeng M (2015) Penguin lungs and air sacs: implications for baroprotection, oxygen stores and buoyancy. J Exp Biol 218(5):720–730CrossRefGoogle Scholar
  19. 19.
    Shokoohi-Yekta M, Chen Y, Campana B et al (2015) Discovery of meaningful rules in time series. In: Proc’ of the 21th ACM SIGKDD, pp 1085–1094Google Scholar
  20. 20.
    Silver N (2012) The signal and the noise: the art and science of prediction. Penguin, LondonGoogle Scholar
  21. 21.
    Smith J (2010) The accidentally-on-purpose history of cyber monday. www.esquire.com/news-politics/news/a23870/cyber-monday-online-shopping-4021548/. Retrieved 5 Feb 2017
  22. 28.
    Supporting Webpage (2017). https://sites.google.com/site/timeserieschain/. Retrieved 1 Jun 2017
  23. 22.
    Syed Z, Stultz C, Kellis M et al (2010) Motif discovery in physiological datasets: a methodology for inferring predictive elements. TKDD 4(1):2CrossRefGoogle Scholar
  24. 23.
    Williams CL, Sato K, Shiomi K et al (2011) Muscle energy stores and stroke rates of emperor penguins: implications for muscle metabolism and dive performance. Phys Biochem Zool 85(2):120–133CrossRefGoogle Scholar
  25. 24.
    Yan R, Wan X, Otterbacher J et al (2011) Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proc’ of the 34th ACM SIGIR, pp 745–754Google Scholar
  26. 25.
    Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: IEEE ICDM, pp 1317–1322Google Scholar
  27. 26.
    Zhu X, Oates T (2012) Finding story chains in newswire articles. In: IEEE IRI, pp 93–100Google Scholar
  28. 27.
    Zhu Y, Zimmerman Z, Shakibay Senobari N et al (2016) Matrix profile II: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In: IEEE ICDM, pp 739–748Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of California, RiversideRiversideUSA
  2. 2.Tokai UniversityTokyoJapan
  3. 3.Mitsubishi Electric Research LaboratoriesCambridgeUSA

Personalised recommendations