Advertisement

The Journal of Supercomputing

, Volume 72, Issue 10, pp 3801–3825 | Cite as

NSPRING: the SPRING extension for subsequence matching of time series supporting normalization

  • Xueyuan Gong
  • Simon FongEmail author
  • Jonathan H. Chan
  • Sabah Mohammed
Article

Abstract

Mining sequences and patterns in time series data streams is fast becoming a common practice in today’s world. The rapid progress of data collection and web technologies yields tremendous growth of flowing data in various complex forms that need to be analyzed in real time. Traditional data mining methods that typically require the process data to be scanned repeatedly are not feasible for stream data applications. However, new techniques like SPRING attempt to address these challenges by identifying sequences of patterns on time series streams, thus reducing the complexity to be linear in both time and space. Unfortunately, SPRING does not support data normalization, which renders it to be not applicable for most data sets. In this paper, we are proposing an approach called NSPRING based on SPRING that extends the advantages of SPRING, e.g., low in time and space complexity, while it can support normalization. Furthermore, NSPRING retains similar mining accuracy to SPRING.

Keywords

Data streams Subsequence matching Normalization  SPRING NSPRING UCR-DTW DTW 

Notes

Acknowledgments

The authors are thankful for the financial support from the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF),” Grant no. MYRG2015-00128-FST, offered by the University of Macau, FST, and RDAO. We would also like to acknowledge the kind assistance of Ms. Katlin Kreamer-Tonin for proofreading this paper.

References

  1. 1.
    Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: Proceedings of the 23rd International Conference on Data Engineering, pp 1046–1055Google Scholar
  2. 2.
    Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 262–270Google Scholar
  3. 3.
    Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371MathSciNetCrossRefGoogle Scholar
  4. 4.
    Rodpongpun S, Niennattrakul V, Ratanamahatana CA (2011) Efficient subsequence search on streaming data based on time warping distance. Comput Inf Technol 5(1):2Google Scholar
  5. 5.
    Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. In: Third Workshop on Mining Temporal and Sequential Data, pp 22–25Google Scholar
  6. 6.
    Alon J, Athitsos V, Yuan Q, Sclaroff S (2009) A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans Pattern Anal Mach Intell 31(9):1685–1699CrossRefGoogle Scholar
  7. 7.
    Ihm S-Y, Nasridinov A, Lee J-H, Park Y-H (2014) Efficient duality-based subsequent matching on time-series data in green computing. J Supercomput 69(3):1039–1053CrossRefGoogle Scholar
  8. 8.
    Aach J, Church GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17(6):495–508CrossRefGoogle Scholar
  9. 9.
    Yi B-K, Jagadish H, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: 14th International Conference on Data Engineering, pp 201–208Google Scholar
  10. 10.
    Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust, Speech Signal Process 23(1):67–72CrossRefGoogle Scholar
  11. 11.
    Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust, Speech Signal Process 26(1):43–49CrossRefzbMATHGoogle Scholar
  12. 12.
    Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386CrossRefGoogle Scholar
  13. 13.
    Keogh E, Wei L, Xi X, Vlachos M, Lee S-H, Protopapas P (2009) Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J 18(3):611–630CrossRefGoogle Scholar
  14. 14.
    Sart D, Mueen A, Najjar W, Keogh E, Niennattrakul V (2010) Accelerating dynamic time warping subsequence search with gpus and fpgas. In: IEEE 10th International Conference on Data Mining (ICDM), pp 1001–1006Google Scholar
  15. 15.
    Wong TSF, Wong MH (2003) Efficient subsequence matching for sequences databases under time warping. In: Proceedings of Seventh International Database Engineering and Applications Symposium, pp 139–148Google Scholar
  16. 16.
    Peng Z, Liang S, Yan J, Hong HW, Qiang YS (2008) Fast similarity matching on data stream with noise. In: IEEE 24th International Conference on Data Engineering Workshop (ICDEW), pp 194–199Google Scholar
  17. 17.
    Zhou M, Wong MH (2008) Efficient online subsequence searching in data streams under dynamic time warping distance. In: IEEE 24th International Conference on Data Engineering (ICDE), pp 686–695Google Scholar
  18. 18.
    Niennattrakul V, Wanichsan D, Ratanamahatana CA (2010) Accurate subsequence matching on data stream under time warping distance. In: New Frontiers in Applied Data Mining, pp 156–167Google Scholar
  19. 19.
    Papapetrou P, Athitsos V, Potamias M, Kollios G, Gunopulos D (2011) Embedding-based subsequence matching in time-series databases. ACM Trans Database Syst (TODS) 36(3):17CrossRefGoogle Scholar
  20. 20.
    Agrawal R, Faloutsos C, Swami AN (1993) Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, FODO ’93, pp 69–84Google Scholar
  21. 21.
    Chan K-P, Fu A-C (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th International Conference on Data Engineering, pp 126–133Google Scholar
  22. 22.
    Keogh EJ, Pazzani MJ (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: Knowledge Discovery and Data Mining Current Issues and New Applications, pp 122–133Google Scholar
  23. 23.
    Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of IEEE International Conference on Data Mining, pp 370–377Google Scholar
  24. 24.
    Shieh J, Keogh E (2008) \(i\)sax: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 623–631Google Scholar
  25. 25.
    Chu KKW, Wong MH (1999) Fast time-series searching with scaling and shifting. In: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 237–248Google Scholar
  26. 26.
    Zhou M, Wong M-H, Chu K-W (2006) A geometrical solution to time series searching invariant to shifting and scaling. Knowl Inf Syst 9(2):202–229CrossRefGoogle Scholar
  27. 27.
    Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp 181–192Google Scholar
  28. 28.
    Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) www.cs.ucr.edu/~eamonn/time_series_data/ The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Xueyuan Gong
    • 1
  • Simon Fong
    • 1
    Email author
  • Jonathan H. Chan
    • 2
  • Sabah Mohammed
    • 3
  1. 1.Department of Computer and Information ScienceUniversity of MacauMacauChina
  2. 2.School of Information TechnologyKing Mongkut’s University of Technology ThonburiBangkokThailand
  3. 3.Department of Computer ScienceLakehead UniversityThunder BayCanada

Personalised recommendations