Mining Time Series Data

  • Chotirat Ann Ratanamahatana
  • Jessica Lin
  • Dimitrios Gunopulos
  • Eamonn Keogh
  • Michail Vlachos
  • Gautam Das
Chapter

Summary

Much of the world’s supply of data is in the form of time series. In the last decade, there has been an explosion of interest in mining time series data. A number of new algorithms have been introduced to classify, cluster, segment, index, discover rules, and detect anomalies/novelties in time series. While these many different techniques used to solve these problems use a multitude of different techniques, they all have one common factor; they require some high level representation of the data, rather than the original raw data. These high level representations are necessary as a feature extraction step, or simply to make the storage, transmission, and computation of massive dataset feasible. A multitude of representations have been proposed in the literature, including spectral transforms, wavelets transforms, piecewise polynomials, eigenfunctions, and symbolic mappings. This chapter gives a high-level survey of time series Data Mining tasks, with an emphasis on time series representations.

Key words

Data Mining Time Series Representations Classification Clustering Time Series Similarity Measures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aach, J. and Church, G. Aligning gene expression time series with time warping algorithms. Bioinformatics; 2001, Volume 17, pp. 495-508.CrossRefGoogle Scholar
  2. Aggarwal, C., Hinneburg, A., Keim, D. A. On the surprising behavior of distance metrics in high dimensional space. In proceedings of the 8th International Conference on Database Theory; 2001 Jan 4-6; London, UK, pp 420-434.Google Scholar
  3. Agrawal, R., Faloutsos, C., Swami, A. Efficient Similarity Search in Sequence Data bases. International Conference on Foundations of Data Organization (FODO); 1993.Google Scholar
  4. Agrawal, R., Lin, K.-I., Sawhney, H.S., Shim, K. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Trime-Series Databases. Proceedings of 21st International Conference on Very Large Databases; 1995 Sep; Zurich, Switzerland, pp. 490-500.Google Scholar
  5. Berndt, D.J., Clifford, J. Finding Patterns in Time Series: A Dynamic Programming Approach. In Advances in Knowledge Discovery and Data Mining AAAI/MIT Press, Menlo Park, CA, 1996, pp. 229-248.Google Scholar
  6. Bollobas, B., Das, G., Gunopulos, D., Mannila, H. Time-Series Similarity Problems and Well-Separated Geometric Sets. Nordic Jour. of Computing 2001; 4.Google Scholar
  7. Brin, S. Near neighbor search in large metric spaces. Proceedings of 21st VLDB; 1995.Google Scholar
  8. Chakrabarti, K., Keogh, E., Pazzani, M., Mehrotra, S. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems. Volume 27, Issue 2, (June 2002). pp 188-228.CrossRefGoogle Scholar
  9. Chan, K., Fu, A.W. Efficient time series matching by wavelets. Proceedings of 15th IEEE International Conference on Data Engineering; 1999 Mar 23-26; Sydney, Australia, pp. 126-133.Google Scholar
  10. Chang, C.L.E., Garcia-Molina, H., Wiederhold, G. Clustering for Approximate Similarity Search in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 2002; Jul – Aug, 14(4): 792-808.CrossRefGoogle Scholar
  11. Chiu, B.Y., Keogh, E., Lonardi, S. Probabilistic discovery of time series motifs. Proceedings of ACM SIGKDD; 2003, pp. 493-498.Google Scholar
  12. Ciaccia, P., Patella, M., Zezula, P. M-tree: An efficient access method for similarity search in metric spaces. Proceedings of 23rd VLDB; 1997, pp. 426-435.Google Scholar
  13. Crochemore, M., Czumaj, A., Gasjeniec, L, Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W. Speeding up two string-matching algorithms. Algorithmica; 1994; Vol. 12(4/5), pp. 247-267.MATHCrossRefMathSciNetGoogle Scholar
  14. Dasgupta, D., Forrest, S. Novelty Detection in Time Series Data Using Ideas from Immunology. Proceedings of 8th International conference on Intelligent Systems; 1999 Jun 24-26; Denver, CO.Google Scholar
  15. Debregeas, A., Hebrail, G. Interactive interpretation of kohonen maps applied to curves. In proceedings of the 4th Int’l Conference of Knowledge Discovery and Data Mining; 1998 Aug 27-31; New York, NY, pp 179-183.Google Scholar
  16. Faloutsos, C., Jagadish, H., Mendelzon, A., Milo, T. A signature technique for similaritybased queries. Proceedings of the International Conference on Compression and Complexity of Sequences; 1997 Jun 11-13; Positano-Salerno, Italy.Google Scholar
  17. Faloutsos, C., Ranganathan, M., Manolopoulos, Y. Fast subsequence matching in time-series databases. In proceedings of the ACM SIGMOD Int’l Conference on Management of Data; 1994 May 25-27; Minneapolis, MN, pp 419-429.Google Scholar
  18. Ge, X., Smyth, P. Deformable Markov Model Templates for Time-Series Pattern Matching. Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000 Aug 20-23; Boston , MA, pp. 81-90.Google Scholar
  19. Geurts, P. Pattern extraction for time series classification. Proceedings of Principles of Data Mining and Knowledge Discovery, 5th European Conference; 2001 Sep 3-5; Freiburg, Germany, pp 115-127.Google Scholar
  20. Goldin, D.Q., Kanellakis, P.C. On Similarity Queries for Time-Series Data: Constraint Specification and Implementation. Proceedings of the 1st International Conference on the Principles and Practice of Constraint Programming; 1995 Sep 19-22; Cassis, France, pp. 137-153.Google Scholar
  21. Guralnik, V., Srivastava, J. Event detection from time series data. In proceedings of the 5th ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining; 1999 Aug 15-18; San Diego, CA, pp 33-42.Google Scholar
  22. Huhtala, Y., Karkkainen, J, Toivonen, H. Mining for similarities in aligned time series using wavelet. Data Mining and Knowledge Discovery: Theory, Tools, and Technology, SPIE Proceedings Series 1995; Orlando, FL, Vol. 3695, pp. 150-160.Google Scholar
  23. Hochheiser, H., Shneiderman,, B. Interactive Exploration of Time-Sereis Data. Proceedings of 4th International conference on Discovery Science; 2001 Nov 25-28; Washington, DC, pp. 441-446.Google Scholar
  24. Indyk, P., Koudas, N., Muthukrishnan, S. Identifying representative trends in massive time series data sets using sketches. In proceedings of the 26th Int’l Conference on Very Large Data Bases; 2000 Sept 10-14; Cairo, Egypt, pp 363-372.Google Scholar
  25. Jagadish, H.V., Mendelzon, A.O., and Milo, T. Similarity-Based Queries. Proceedings of ACM PODS; 1995 May; San Jose, CA, pp. 36-45.Google Scholar
  26. Kahveci, T., Singh, A. Variable length queries for time series data. In proceedings of the 17th Int’l Conference on Data Engineering; 2001 Apr 2-6; Heidelberg, Germany, pp 273-282.Google Scholar
  27. Kalpakis, K., Gada, D., Puttagunta, V. Distance measures for effective clustering of ARIMA time-series. Proceedings of the IEEE Int’l Conference on Data Mining; 2001 Nov 29-Dec 2; San Jose, CA, pp 273-280.Google Scholar
  28. Kanth, K.V., Agrawal, D., Singh, A. Dimensionality reduction for similarity searching in dynamic databases. Proceedings of ACM SIGMOD International Conference; 1998, pp. 166-176.Google Scholar
  29. Keogh, E. Exact indexing of dynamic time warping. Proceedings of 28th Internation Conference on Very Large Databases; 2002; Hong Kong, pp. 406-417.Google Scholar
  30. Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M. Locally adaptive dimensionality reduction for indexing large time series databases. Proceedings of ACM SIGMOD International Conference; 2001.Google Scholar
  31. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems 2001; 3: 263-286.MATHCrossRefGoogle Scholar
  32. Keogh, E., Lin, J., Truppel, W. Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research. Proceedings of ICDM; 2003, pp. 115-122.Google Scholar
  33. Keogh, E., Lonardi, S., Chiu, W. Finding Surprising Patterns in a Time Series Database In Linear Time and Space. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002 Jul 23 – 26; Edmonton, Alberta, Canada, pp 550-556.Google Scholar
  34. Keogh, E., Lonardi, S., Ratanamahatana, C.A. Towards Parameter-Free Data Mining. Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004 Aug 22-25; Seattle, WA.Google Scholar
  35. Keogh, E., Pazzani, M. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th Int’l Conference on Knowledge Discovery and Data Mining; 1998 Aug 27-31; New York, NY, pp 239-241.Google Scholar
  36. Keogh, E. and Kasetty, S. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002 Jul 23 – 26; Edmonton, Alberta, Canada, pp 102-111.Google Scholar
  37. Keogh, E., Smyth, P. A Probabilistic Approach to Fast Pattern matching in Time Series Databases. Proceedings of 3rd International conference on Knowledge Discovery and Data Mining; 1997 Aug 14-17; Newport Beach, CA, pp. 24-30.Google Scholar
  38. Korn, F., Jagadish, H., Faloutsos, C. Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD International Conferences 1997; Tucson, AZ, pp. 289-300.Google Scholar
  39. Kruskal, J.B., Sankoff, D., Editors. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.Google Scholar
  40. Lin, J., Keogh, E., Lonardi, S., Chiu, B. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. Workshop on Research Issues in Data Mining and Knowledge Discovery, 8th ACM SIGMOD; 2003 Jun 13; San Diego, CA.Google Scholar
  41. Lin, J., Keogh, E., Lonardi, S., Lankford, J. P., Nystrom, D. M. Visually Mining and Monitoring Massive Time Series. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004 Aug 22-25; Seattle, WA.Google Scholar
  42. Ma, J., Perkins, S. Online Novelty Detection on Temporal Sequences. Proceedings of 9th International Conference on Knowledge Discovery and Data Mining; 2003 Aug 24-27; Washington DC.Google Scholar
  43. Nievergelt, H., Hinterberger, H., Sevcik, K.C. The grid file: An adaptable, symmetricmultikey file structure. ACM Trans. Database Systems; 1984; 9(1): 38-71.CrossRefGoogle Scholar
  44. Palpanas, T., Vlachos, M., Keogh, E., Gunopulos, D., Truppel, W. Online Amnestic Approximation of Streaming Time Series. Proceedings of 20th International Conference on Data Engineering; 2004, Boston, MA.Google Scholar
  45. Pavlidis, T., Horowitz, S. Segmentation of plane curves. IEEE Transactions on Computers; 1974 August; Vol. C-23(8), pp. 860-870.CrossRefMathSciNetGoogle Scholar
  46. Popivanov, I., Miller, R. J. Similarity search over time series data using wave -lets. In proceedings of the 18th Int’l Conference on Data Engineering; 2002 Feb 26-Mar 1; San Jose, CA, pp 212-221.Google Scholar
  47. Rafiei, D., Mendelzon, A. O. Efficient retrieval of similar time sequences using DFT. In proceedings of the 5th Int’l Conference on Foundations of Data Organization and Algorithms; 1998 Nov 12-13; Kobe, Japan.Google Scholar
  48. Ratanamahatana, C.A., Keogh, E. Making Time-Series Classification More Accurate Using Learned Constrints. Proceedings of SIAM International Conference on Data Mining; 2004 Apr 22-24; Lake Buena Vista, FL, pp.11-22.Google Scholar
  49. Ripley, B.D. Pattern recognition and neural networks. Cambridge University Press, Cambridge, UK, 1996.MATHGoogle Scholar
  50. Robinson, J.T. The K-d-b-tree: A search structure for large multidimensional dynamic indexes. Proceedings of ACM SIGMOD; 1981.Google Scholar
  51. Shahabi, C., Tian, X., Zhao,W. TSA-tree: a wavelet based approach to improve the efficiency of multi-level surprise and trend queries. In proceedings of the 12th Int’l Conference on Scientific and Statistical Database Management; 2000 Jul 26-28; Berlin, Germany, pp 55-68.Google Scholar
  52. Struzik, Z., Siebes, A. The Haar wavelet transform in the time series similarity paradigm. Proceedings of 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases; 1999; Prague, Czech Republic, pp. 12-22.Google Scholar
  53. Tufte, E. The visual display of quantitative information. Graphics Press, Cheshire, Connecticut, 1983.Google Scholar
  54. Tzouramanis, T., Vassilakopoulos, M., Manolopoulos, Y. Overlapping Linear Quadtrees: A Spatio-Temporal Access Method. ACM-GIS; 1998, pp. 1-7.Google Scholar
  55. Guralnik, V., Srivastava, J. Event Detection from Time Series Data. Proceedings of ACM SIGKDD; 1999, pp 33-42.Google Scholar
  56. Vlachos, M., Gunopulos, D., Das, G. Rotation Invariant Distance Measures for Trajectories. Proceedings of 10th International Conference on Knowledge Discovery and Data Mining; 2004 Aug 22-25; Seattle, WA.Google Scholar
  57. Vlachos, M., Meek, C., Vagena, Z., Gunopulos, D. Identification of Similarities, Periodicities & Bursts for Online Search Queries. Proceedings of International Conference on Management of Data; 2004; Paris, France.Google Scholar
  58. Weber, M., lexa, M., Muller, W. Visualizing Time Series on Spirals. Proceedings of IEEE Symposium on Information Visualization; 2000 Oct 21-26; San Diego, CA, pp. 7-14.Google Scholar
  59. Wijk, J.J. van, E. van Selow. Cluster and calendar-based visualization of time series data. Proceedings of IEEE Symposium on Information Visualization; 1999 Oct 25-26, IEEE Computer Society, pp 4-9.Google Scholar
  60. Wu, D., Agrawal, D., El Abbadi, A., Singh, A, Smith, T.R. Efficient retrieval for browsing large image databases. Proceedings of 5th International Conference on Knowledge Information; 1996; Rockville, MD, pp. 11-18.Google Scholar
  61. Wu, Y., Agrawal, D., El Abbadi, A. A comparison of DFT and DWT based similarity search in time-series databases. In proceedings of the 9th ACM CIKM Int’l Conference on Information and Knowledge Management; 2000 Nov 6-11; McLean, VA, pp 488-495.Google Scholar
  62. Yi, B., Faloutsos, C. Fast time sequence indexing for arbitrary lp norms. Proceedings of the 26th Int’l Conference on Very Large Databases; 2000 Sep 10-14; Cairo, Egypt, pp 385-394.Google Scholar
  63. Yianilos, P. Data structures and algorithms for nearest neighbor search in general metric spaces. Proceedings of 3rd SIAM on Discrete Algorithms; 1992.Google Scholar
  64. Zhu, Y., Shasha, D. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, Proceedings of VLDB; 2002, pp. 358-369.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Chotirat Ann Ratanamahatana
    • 1
  • Jessica Lin
    • 1
  • Dimitrios Gunopulos
    • 1
  • Eamonn Keogh
    • 1
  • Michail Vlachos
    • 2
  • Gautam Das
    • 3
  1. 1.University of CaliforniaRiversideUSA
  2. 2.IBM T.J. Watson Research CenterNew-YorkUSA
  3. 3.University of TexasArlingtonUSA

Personalised recommendations