Knowledge and Information Systems

, Volume 54, Issue 1, pp 203–236 | Cite as

Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins

  • Yan ZhuEmail author
  • Zachary Zimmerman
  • Nader Shakibay Senobari
  • Chin-Chia Michael Yeh
  • Gareth Funning
  • Abdullah Mueen
  • Philip Brisk
  • Eamonn Keogh
Regular Paper


Time series motifs are approximately repeated subsequences found within a longer time series. They have been in the literature since 2002, but recently they have begun to receive significant attention in research and industrial communities. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series problems, including rule discovery, anomaly detection, density estimation, semantic segmentation, summarization, etc. Recent work has improved the scalability so exact motifs can be computed on datasets with up to a million data points in tenable time. However, in some domains, for example seismology or climatology, there is an immediate need to address even larger datasets. In this work, we demonstrate that a combination of a novel algorithm and a high-performance GPU allows us to significantly improve the scalability of motif discovery. We demonstrate the scalability of our ideas by finding the full set of exact motifs on a dataset with one hundred and forty-three million subsequences, which is by far the largest dataset ever mined for time series motifs/joins; it requires ten quadrillion pairwise comparisons. Furthermore, we demonstrate that our algorithm can produce actionable insights into seismology and ethology.


Time series Joins Motifs GPUs 



This research was funded by NSF IIS-1161997 II, NSF IIS-1510741, NSF CCF-1528181, NSF CCF-1527127 and USGS Earthquake Hazard Program Award G16AP00034. We gratefully acknowledge all the donors of the datasets.


  1. 1.
    Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of data organization and algorithmsm, 69–84Google Scholar
  2. 2.
    Allstadt K, Malone SD (2014) Swarms of repeating stick-slip icequakes triggered by snow loading at Mount Rainier volcano. J Geophys Res Earth Surf 119(5):1180–1203CrossRefGoogle Scholar
  3. 3.
    Balasubramanian A, Wang J, Balakrishnan P (2016) Discovering multidimensional motifs in physiological signals for personalized healthcare. IEEE J Sel Top Signal Process 10(5):832–841CrossRefGoogle Scholar
  4. 4.
    Bailis P, Gan E, Rong K et al (2017) Prioritizing attention in fast data: principles and promise. In: CIDRGoogle Scholar
  5. 5.
    Brown AEX, Yemini EI, Grundy LJ et al (2013) A dictionary of behavioral motifs reveals clusters of genes affecting caenorhabditis elegans locomotion. Proc Natl Acad Sci 110(2):791–796CrossRefGoogle Scholar
  6. 6.
    Brown JR, Beroza GC, Shelly DR (2008) An autocorrelation method to detect low frequency earthquakes within tremor. Geophys Res Lett 35, L16305.
  7. 7.
    Chandola V, Banerjee A, Kumar V (2007) Anomaly detection: a survey. Technical report, University of MinnesotaGoogle Scholar
  8. 8.
    Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: SIGKDD, pp 493–498Google Scholar
  9. 9.
    Geller RJ, Mueller CS (1980) Four similar earthquakes in central California. Geophys Res Lett 7(10):821–824CrossRefGoogle Scholar
  10. 10.
    Harris M (2007) Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2.4Google Scholar
  11. 11.
    Havskov J, Alguacil G (2004) Instrumentation in earthquake seismology, vol 358. Springer, DordrechtCrossRefGoogle Scholar
  12. 12.
    Igarashi T, Matsuzawa T, Hasegawa A (2003) Repeating earthquakes and interplate aseismic slip in the northeastern Japan subduction zone. J Geophys Res 108, 2249.
  13. 13.
    Iverson RM, Dzurisin D, Gardner CA et al (2006) Dynamics of seismogenic volcanic extrusion at Mount St. Helens in 2004–2005. Nature 444(7118):439–443CrossRefGoogle Scholar
  14. 14.
    Li Y, U LH, Yiu ML, Gong Z (2015) Quick-motif: An efficient and scalable framework for exact motif discovery. In: ICDE, IEEE, pp 579–590Google Scholar
  15. 15.
    Luo W, Tan H, Mao H et al (2012) Efficient similarity joins on massive high-dimensional datasets using mapreduce. In: MDM, IEEE, pp 1–10Google Scholar
  16. 16.
    McGovern A, Rosendahl D, Brown R et al (2011) Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction. Data Min Knowl Discov 22(1):232–258CrossRefGoogle Scholar
  17. 17.
    Meng X, Yu X, Peng Z et al (2012) Detecting earthquakes around salton sea following the 2010 mw7.2 El Mayor-Cucapah earthquake using GPU parallel computing. Procedia Comput Sci 9:937–946CrossRefGoogle Scholar
  18. 18.
    Minnen D, Isbell CL, Essa I et al (2007) Discovering multivariate motifs using subsequence density estimation and greedy mixture learning. In: AAAI, pp 615–620Google Scholar
  19. 19.
    Mueen A, Keogh E, Zhu Q et al (2009) Exact discovery of time series motifs. In: SDM, pp 473–484Google Scholar
  20. 20.
    NVIDIA CUDA C Programming Guide (2016) Version 7.5.
  21. 21.
    NVIDIA CUFFT Library User’s Guide (2016) Version 7.5.
  22. 22.
  23. 23.
  24. 24.
    Rakthanmanon T, Campana B, Mueen A et al (2013) Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. TKDD 7(3):10CrossRefGoogle Scholar
  25. 25.
    Rong K, Bailis P (2017) ASAP: prioritizing attention via time series smoothing. VLDB Endowment 10(11):1358–1369CrossRefGoogle Scholar
  26. 26.
    Shelly DR, Beroza GC, Ide S et al (2006) Low-frequency earthquakes in Shikoku, Japan, and their relationship to episodic tremor and slip. Nature 442(7099):188–191CrossRefGoogle Scholar
  27. 27.
    Shelly DR, Beroza GC, Ide S (2017) Non-volcanic tremor and low-frequency earthquake swarms. Nature 446(7133):305–307CrossRefGoogle Scholar
  28. 28.
    Shelly DR, Ellsworth WL, Ryberg T et al (2009) Precise location of San Andreas Fault tremors near Cholame, California using seismometer clusters: Slip on the deep extension of the fault? Geophys Res Lett 36, L01303.
  29. 29.
    Simeone A, Wilson RP (2003) In-depth studies of Magellanic penguin (Spheniscus magellanicus) foraging: Can we estimate prey consumption by perturbations in the dive profile? Mar Biol 143(4):825–831CrossRefGoogle Scholar
  30. 30.
    Sparks RSJ (2003) Forecasting volcanic eruptions. Earth Planet Sci Lett 210(1):1–15MathSciNetCrossRefGoogle Scholar
  31. 31.
    Tanaka Y, Iwamoto K, Uehara K (2005) Discovery of time-series motif from multi-dimensional data based on MDL principle. Mach Learn 58(2):269–300CrossRefzbMATHGoogle Scholar
  32. 32.
    Vahdatpour A, Amini N, Sarrafzadeh M (2009) Toward unsupervised activity discovery using multi-dimensional motif detection in time series. IJCAI 9:1261–1266Google Scholar
  33. 33.
    Wang L, Chng ES, Li H (2010) A tree-construction search approach for multivariate time series motifs discovery. Pattern Recognit Lett 31(9):869–875CrossRefGoogle Scholar
  34. 34.
    Wang X, Mueen A, Ding H et al (2013) Comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309MathSciNetCrossRefGoogle Scholar
  35. 35.
    Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, IEEE, pp 579–588Google Scholar
  36. 36.
    Yoon CE, O’Reilly O, Bergen KJ et al (2015) Earthquake detection through computationally efficient similarity search. Sci Adv 1(11):e1501057CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of CaliforniaRiversideUSA
  2. 2.Department of Earth SciencesUniversity of CaliforniaRiversideUSA
  3. 3.Department of Computer ScienceUniversity of New MexicoAlbuquerqueUSA

Personalised recommendations