Abstract
Time series motifs are approximately repeated subsequences found within a longer time series. They have been in the literature since 2002, but recently they have begun to receive significant attention in research and industrial communities. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series problems, including rule discovery, anomaly detection, density estimation, semantic segmentation, summarization, etc. Recent work has improved the scalability so exact motifs can be computed on datasets with up to a million data points in tenable time. However, in some domains, for example seismology or climatology, there is an immediate need to address even larger datasets. In this work, we demonstrate that a combination of a novel algorithm and a high-performance GPU allows us to significantly improve the scalability of motif discovery. We demonstrate the scalability of our ideas by finding the full set of exact motifs on a dataset with one hundred and forty-three million subsequences, which is by far the largest dataset ever mined for time series motifs/joins; it requires ten quadrillion pairwise comparisons. Furthermore, we demonstrate that our algorithm can produce actionable insights into seismology and ethology.
Notes
10,224,499,928,500,000 * 0.000001 s is 324 years.
A small earthquake of that magnitude would only be felt by attentive humans in the immediate vicinity of the epicenter.
References
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of data organization and algorithmsm, 69–84
Allstadt K, Malone SD (2014) Swarms of repeating stick-slip icequakes triggered by snow loading at Mount Rainier volcano. J Geophys Res Earth Surf 119(5):1180–1203
Balasubramanian A, Wang J, Balakrishnan P (2016) Discovering multidimensional motifs in physiological signals for personalized healthcare. IEEE J Sel Top Signal Process 10(5):832–841
Bailis P, Gan E, Rong K et al (2017) Prioritizing attention in fast data: principles and promise. In: CIDR
Brown AEX, Yemini EI, Grundy LJ et al (2013) A dictionary of behavioral motifs reveals clusters of genes affecting caenorhabditis elegans locomotion. Proc Natl Acad Sci 110(2):791–796
Brown JR, Beroza GC, Shelly DR (2008) An autocorrelation method to detect low frequency earthquakes within tremor. Geophys Res Lett 35, L16305. https://doi.org/10.1029/2008GL034560
Chandola V, Banerjee A, Kumar V (2007) Anomaly detection: a survey. Technical report, University of Minnesota
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: SIGKDD, pp 493–498
Geller RJ, Mueller CS (1980) Four similar earthquakes in central California. Geophys Res Lett 7(10):821–824
Harris M (2007) Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2.4
Havskov J, Alguacil G (2004) Instrumentation in earthquake seismology, vol 358. Springer, Dordrecht
Igarashi T, Matsuzawa T, Hasegawa A (2003) Repeating earthquakes and interplate aseismic slip in the northeastern Japan subduction zone. J Geophys Res 108, 2249. https://doi.org/10.1029/2002JB001920.
Iverson RM, Dzurisin D, Gardner CA et al (2006) Dynamics of seismogenic volcanic extrusion at Mount St. Helens in 2004–2005. Nature 444(7118):439–443
Li Y, U LH, Yiu ML, Gong Z (2015) Quick-motif: An efficient and scalable framework for exact motif discovery. In: ICDE, IEEE, pp 579–590
Luo W, Tan H, Mao H et al (2012) Efficient similarity joins on massive high-dimensional datasets using mapreduce. In: MDM, IEEE, pp 1–10
McGovern A, Rosendahl D, Brown R et al (2011) Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction. Data Min Knowl Discov 22(1):232–258
Meng X, Yu X, Peng Z et al (2012) Detecting earthquakes around salton sea following the 2010 mw7.2 El Mayor-Cucapah earthquake using GPU parallel computing. Procedia Comput Sci 9:937–946
Minnen D, Isbell CL, Essa I et al (2007) Discovering multivariate motifs using subsequence density estimation and greedy mixture learning. In: AAAI, pp 615–620
Mueen A, Keogh E, Zhu Q et al (2009) Exact discovery of time series motifs. In: SDM, pp 473–484
NVIDIA CUDA C Programming Guide (2016) Version 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
NVIDIA CUFFT Library User’s Guide (2016) Version 7.5. http://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf
Project Website (2017) http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Quick Motif (2015) http://degroup.cis.umac.mo/quickmotifs/
Rakthanmanon T, Campana B, Mueen A et al (2013) Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. TKDD 7(3):10
Rong K, Bailis P (2017) ASAP: prioritizing attention via time series smoothing. VLDB Endowment 10(11):1358–1369
Shelly DR, Beroza GC, Ide S et al (2006) Low-frequency earthquakes in Shikoku, Japan, and their relationship to episodic tremor and slip. Nature 442(7099):188–191
Shelly DR, Beroza GC, Ide S (2017) Non-volcanic tremor and low-frequency earthquake swarms. Nature 446(7133):305–307
Shelly DR, Ellsworth WL, Ryberg T et al (2009) Precise location of San Andreas Fault tremors near Cholame, California using seismometer clusters: Slip on the deep extension of the fault? Geophys Res Lett 36, L01303. https://doi.org/10.1029/2008GL036367
Simeone A, Wilson RP (2003) In-depth studies of Magellanic penguin (Spheniscus magellanicus) foraging: Can we estimate prey consumption by perturbations in the dive profile? Mar Biol 143(4):825–831
Sparks RSJ (2003) Forecasting volcanic eruptions. Earth Planet Sci Lett 210(1):1–15
Tanaka Y, Iwamoto K, Uehara K (2005) Discovery of time-series motif from multi-dimensional data based on MDL principle. Mach Learn 58(2):269–300
Vahdatpour A, Amini N, Sarrafzadeh M (2009) Toward unsupervised activity discovery using multi-dimensional motif detection in time series. IJCAI 9:1261–1266
Wang L, Chng ES, Li H (2010) A tree-construction search approach for multivariate time series motifs discovery. Pattern Recognit Lett 31(9):869–875
Wang X, Mueen A, Ding H et al (2013) Comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, IEEE, pp 579–588
Yoon CE, O’Reilly O, Bergen KJ et al (2015) Earthquake detection through computationally efficient similarity search. Sci Adv 1(11):e1501057
Acknowledgements
This research was funded by NSF IIS-1161997 II, NSF IIS-1510741, NSF CCF-1528181, NSF CCF-1527127 and USGS Earthquake Hazard Program Award G16AP00034. We gratefully acknowledge all the donors of the datasets.
Author information
Authors and Affiliations
Corresponding author
Additional information
Yan Zhu and Zachary Zimmerman contributed equally, and should be considered joint first authors.
Rights and permissions
About this article
Cite this article
Zhu, Y., Zimmerman, Z., Shakibay Senobari, N. et al. Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins. Knowl Inf Syst 54, 203–236 (2018). https://doi.org/10.1007/s10115-017-1138-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1138-x