Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins
- 647 Downloads
Time series motifs are approximately repeated subsequences found within a longer time series. They have been in the literature since 2002, but recently they have begun to receive significant attention in research and industrial communities. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series problems, including rule discovery, anomaly detection, density estimation, semantic segmentation, summarization, etc. Recent work has improved the scalability so exact motifs can be computed on datasets with up to a million data points in tenable time. However, in some domains, for example seismology or climatology, there is an immediate need to address even larger datasets. In this work, we demonstrate that a combination of a novel algorithm and a high-performance GPU allows us to significantly improve the scalability of motif discovery. We demonstrate the scalability of our ideas by finding the full set of exact motifs on a dataset with one hundred and forty-three million subsequences, which is by far the largest dataset ever mined for time series motifs/joins; it requires ten quadrillion pairwise comparisons. Furthermore, we demonstrate that our algorithm can produce actionable insights into seismology and ethology.
KeywordsTime series Joins Motifs GPUs
This research was funded by NSF IIS-1161997 II, NSF IIS-1510741, NSF CCF-1528181, NSF CCF-1527127 and USGS Earthquake Hazard Program Award G16AP00034. We gratefully acknowledge all the donors of the datasets.
- 1.Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of data organization and algorithmsm, 69–84Google Scholar
- 4.Bailis P, Gan E, Rong K et al (2017) Prioritizing attention in fast data: principles and promise. In: CIDRGoogle Scholar
- 6.Brown JR, Beroza GC, Shelly DR (2008) An autocorrelation method to detect low frequency earthquakes within tremor. Geophys Res Lett 35, L16305. https://doi.org/10.1029/2008GL034560
- 7.Chandola V, Banerjee A, Kumar V (2007) Anomaly detection: a survey. Technical report, University of MinnesotaGoogle Scholar
- 8.Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: SIGKDD, pp 493–498Google Scholar
- 10.Harris M (2007) Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2.4Google Scholar
- 12.Igarashi T, Matsuzawa T, Hasegawa A (2003) Repeating earthquakes and interplate aseismic slip in the northeastern Japan subduction zone. J Geophys Res 108, 2249. https://doi.org/10.1029/2002JB001920.
- 14.Li Y, U LH, Yiu ML, Gong Z (2015) Quick-motif: An efficient and scalable framework for exact motif discovery. In: ICDE, IEEE, pp 579–590Google Scholar
- 15.Luo W, Tan H, Mao H et al (2012) Efficient similarity joins on massive high-dimensional datasets using mapreduce. In: MDM, IEEE, pp 1–10Google Scholar
- 18.Minnen D, Isbell CL, Essa I et al (2007) Discovering multivariate motifs using subsequence density estimation and greedy mixture learning. In: AAAI, pp 615–620Google Scholar
- 19.Mueen A, Keogh E, Zhu Q et al (2009) Exact discovery of time series motifs. In: SDM, pp 473–484Google Scholar
- 20.NVIDIA CUDA C Programming Guide (2016) Version 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
- 21.NVIDIA CUFFT Library User’s Guide (2016) Version 7.5. http://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf
- 22.Project Website (2017) http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
- 23.Quick Motif (2015) http://degroup.cis.umac.mo/quickmotifs/
- 28.Shelly DR, Ellsworth WL, Ryberg T et al (2009) Precise location of San Andreas Fault tremors near Cholame, California using seismometer clusters: Slip on the deep extension of the fault? Geophys Res Lett 36, L01303. https://doi.org/10.1029/2008GL036367
- 32.Vahdatpour A, Amini N, Sarrafzadeh M (2009) Toward unsupervised activity discovery using multi-dimensional motif detection in time series. IJCAI 9:1261–1266Google Scholar
- 35.Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, IEEE, pp 579–588Google Scholar