Sadhana

, Volume 31, Issue 2, pp 173–198 | Cite as

A survey of temporal data mining

  • Srivatsan Laxman
  • P. S. Sastry
Article

Abstract

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.

Keywords

Temporal data mining ordered data streams temporal interdependency pattern discovery 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R, Srikant R 1994 Fast algorithms for mining association rules in large databases. InProc. 20th Int. Conf. on Very Large Data Bases, pp 487–499Google Scholar
  2. Agrawal R, Srikant R 1995 Mining sequential patterns. InProc. 11th Int. Conf. on Data Engineering, (Washington, DC: IEEE Comput. Soc.)Google Scholar
  3. Agrawal R, Imielinski T, Swami 1993 A Mining association rules between sets of items in large databases. InProc. ACM SIGMOD Conf. on Management of Data, pp 207–216Google Scholar
  4. Agrawal R, Lin K I, Sawhney H S, Shim K 1995a Fast similarity search in the presence of noise, scaling and translation in time series databases. InProc. 21st Int. Conf. on Very Large Data Bases (VLDB95), pp 490–501Google Scholar
  5. Agrawal R, Psaila G, Wimmers E L, Zait M 1995b Querying shapes of histories. InProc. 21st Int. Conf. on Very Large Databases, Zurich, SwitzerlandGoogle Scholar
  6. Alon J, Sclaroff S, Kollios G, Pavlovic V 2003 Discovering clusters in motion time series data. InProc. 2003 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition, pp I-375–I-381, Madison, WisconsinGoogle Scholar
  7. Alur R, Dill D L 1994 A theory of timed automata.Theor. Comput. Sci. 126: 183–235MATHCrossRefMathSciNetGoogle Scholar
  8. Atallah M J, Gwadera R, Szpankowski W 2004 Detection of significant sets of episodes in event sequences. InProc. 4th IEEE Int. Conf. on Data Mining (ICDM 2004), pp 3–10, Brighton, UKGoogle Scholar
  9. Baeza-Yates R A 1991 Searching subsequences.Theor. Comput. Sci. 78: 363–376MATHCrossRefMathSciNetGoogle Scholar
  10. Baldi P, Chauvin Y, Hunkapiller T, McClure M 1994 Hidden Markov models of biological primary sequence information.Proc. Nat. Acad. Sci. USA 91: 1059–1063CrossRefGoogle Scholar
  11. Bender E A, Kochman F 1993 The distribution of subword counts is usually normal.Eur. J. Combinatorics 14: 265–275MATHCrossRefMathSciNetGoogle Scholar
  12. Berberidis C, Vlahavas I P, Aref W G, Atallah M J, Elmagarmid A K 2002 On the discovery of weak periodicities in large time series. InLecture notes in computer science, Proc. 6th Eur. Conf. on Principles of Data Mining and Knowledge Discovery, vol. 2431, pp 51–61Google Scholar
  13. Bettini C, Wang X S, Jajodia S, Lin J L 1998 Discovering frequent event patterns with multiple granularities in time sequences.IEEE Trans. Knowledge Data Eng. 10: 222–237CrossRefGoogle Scholar
  14. Box G E P, Jenkins G M, Reinsel G C 1994Time series analysis: Forecasting and control (Singapore: Pearson Education Inc.)MATHGoogle Scholar
  15. Cadez I, Heckerman D, Meek C, Smyth P, White S 2000 Model-based clustering and visualisation of navigation patterns on a web site. Technical Report CA 92717-3425, Dept. of Information and Computer Science, University of California, Irvine, CAGoogle Scholar
  16. Cao H, Cheung D W, Mamoulis N 2004 Discovering partial periodic patterns in discrete data sequences. InProc. 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD ’04), Sydney, pp 653–658Google Scholar
  17. Casas-Garriga G 2003 Discovering unbounded episodes in sequential data. InProc. 7th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD’03), Cavtat-Dubvrovnik, Croatia, pp 83–94Google Scholar
  18. Chang S F, Chen W, Men J, Sundaram H, Zhong D 1998 A fully automated content based video search engine supporting spatio-temporal queries.IEEE Trans. Circuits Syst. Video Technol. 8(5): 602–615CrossRefGoogle Scholar
  19. Chatfield C 1996The analysis of time series 5th edn (New York, NY: Chapman and Hall)MATHGoogle Scholar
  20. Chudova D, Smyth P 2002 Pattern discovery in sequences under a Markovian assumption. InProc. Eigth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, CanadaGoogle Scholar
  21. Cohen J 2004 Bioinformatics — an introduction for computer scientists.ACM Comput. Surv. 36(2): 122–158CrossRefGoogle Scholar
  22. Corpet F 1988 Multiple sequence alignment with hierarchical clustering.Nucleic Acids Research, 16: 10881–10890CrossRefGoogle Scholar
  23. Darrell T, Pentland A 1993 Space-time gestures. InProc. 1993 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’93), pp 335–340Google Scholar
  24. Dietterich T G, Michalski R S 1985 Discovering patterns in sequences of events.Artif. Intell. 25: 187–232CrossRefGoogle Scholar
  25. Duda R O, Hart P E, Stork D G 1997Pattern classification and scene analysis (New York: Wiley)Google Scholar
  26. Durbin R, Eddy S, Krogh A, Mitchison G 1998Biological sequence analysis (Cambridge: University Press)MATHGoogle Scholar
  27. Ewens W J, Grant G R 2001Statistical methods in bioinformatics: An introduction (New York: Springer-Verlag)MATHGoogle Scholar
  28. Fadili M J, Ruan S, Bloyet D, Mazoyer B 2000 A multistep unsupervised fuzzy clustering analysis of fMRI time series.Human Brain Mapping 10: 160–178CrossRefGoogle Scholar
  29. Flajolet P, Guivarc’h Y, Szpankowski W, Vallee B 2001 Hidden pattern statistics. InLecture notes in computer science; Proc. 28th Int. Colloq. on Automata, Languages and Programming (London: Springer-Verlag) vol. 2076, pp 152–165Google Scholar
  30. Frenkel K A 1991 The human genome project and informatics.Commun. ACM 34(11): 40–51CrossRefGoogle Scholar
  31. Garofalakis M, Rastogi R, Shim K 2002 Mining sequential patterns with regular expression constraints.IEEE Trans. Knowledge Data Eng. 14: 530–552CrossRefGoogle Scholar
  32. Ghias A, Logan J, Chamberlin D, Smith B C 1995 Query by humming — musical information retrieval in an audio database. InProc. ACM Multimedia 95, San Fransisco, CAGoogle Scholar
  33. Gold B, Morgan N 2000Speech and audio signal processing: Processing and perception of speech and music (New York: John Wiley & Sons)Google Scholar
  34. Gray R M, Buzo A, Gray Jr. A H, Matsuyama Y 1980 Distortion measures for speech processing.IEEE Trans. Acoust., Speech Signal Process. 28: 367–376MATHCrossRefGoogle Scholar
  35. Gusfield D 1997A lgorithms on strings, trees and subsequences (New York: University of Cambridge Press)Google Scholar
  36. Gwadera R, Atallah M J, Szpankowski W 2003 Reliable detection of episodes in event sequences. InProc. 3rd IEEE Int. Conf. on Data Mining (ICDM 2003), pp 67–74Google Scholar
  37. Gwadera R, Atallah M J, Szpankowski W 2005 Markov models for identification of significant episodes. InProc. 2005 SIAM Int. Conf. on Data Mining (SDM-05), Newport Beach, CaliforniaGoogle Scholar
  38. Han J, Kamber M 2001Data mining: Concepts and techniques (San Fransisco, CA: Morgan Kauffmann)Google Scholar
  39. Han J, Gong W, Yin Y 1998 Mining segment-wise periodic patterns in time-related databases. InProc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD’98), New York, pp 214–218Google Scholar
  40. Han J, Dong G, Yin Y 1999 Efficient mining of partial periodic patterns in time series database. InProc. 15th Int. Conf. on Data Engineering, (ICDE’99), Sydney, pp 106–115Google Scholar
  41. Hand D, Mannila H, Smyth P 2001Principles of data mining (Cambridge, MA: MIT Press)Google Scholar
  42. Haselsteiner E, Pfurtscheller G 2000 Using time-dependent neural networks for EEG classification.IEEE Trans. Rehab. Eng. 8: 457–463CrossRefGoogle Scholar
  43. Hastie T, Tibshirani R, Friedman J 2001The elements of statistical learning: Data mining, inference and prediction (New York: Springer-Verlag)MATHGoogle Scholar
  44. Haykin S 1992Neural networks: A comprehensive foundation (New York: Macmillan)Google Scholar
  45. Hirao M, Inenaga S, Shinohara A, Takeda M, Arikawa S 2001 A practical algorithm to find the best episode patterns.Lecture notes in computer science; Proc. 4th Int. Conf. on Discovery Science (DS 2001) Washington, DC, vol. 2226, pp 435–441, 25–28Google Scholar
  46. Juang B H, Rabiner L 1993Fundamentals of speech recognition. (Englewood Cliffs, NJ: Prentice Hall)Google Scholar
  47. Kalpakis K, Puttagunta D G V 2001 Distance measures for effective clustering of ARIMA time series. In2001 IEEE Int. Conf. on Data Mining (ICDM01), San Jose, CAGoogle Scholar
  48. Keogh E J, Pazzani M J 2000 Scaling up dynamic time warping for datamining applications. InProc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data mining, Boston, MA, pp 285–289, 20–23Google Scholar
  49. Koskela T, Lehtokangas M, Saarinen J, Kaski K 1996 Time series prediction with multilayer perceptron, FIR and Elman neural networks. InProc. World Congress on Neural Networks, pp 491–496Google Scholar
  50. Kruskal J B 1983 An overview of sequence comparison: Time warps, string edits and macromolecules.SIAM Rev. 21:201–237CrossRefMathSciNetGoogle Scholar
  51. Kundu A, He Y, Bahl P 1988 Word recognition and word hypothesis generation for handwritten script: A Hidden Markov Model based approach. InProc. 1988 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’88), pp 457–462Google Scholar
  52. Law M H, Kwok J T 2000 Rival penalized competitive learning for model-based sequence clustering. InProc. IEEE Int. Conf. on Pattern Recognition (ICPR00), Barcelona, SpainGoogle Scholar
  53. Laxman S, Sastry P S, Unnikrishnan K P 2002 Generalized frequent episodes in event sequences.Temporal Data Mining Workshop Notes, SIGKDD, (eds) K P Unnikrishnan, R Uthurusamy, Edmonton, Alberta, CanadaGoogle Scholar
  54. Laxman S, Sastry P S, Unnikrishnan K P 2004a Fast algorithms for frequent episode discovery in event sequences. Technical Report CL-2004-04/MSR, GM R&D Center, WarrenGoogle Scholar
  55. Laxman S, Sastry P S, Unnikrishnan K P 2004b Fast algorithms for frequent episode discovery in event sequences. InProc. 3rd Workshop on Mining Temporal and Sequential Data, Seattle, WAGoogle Scholar
  56. Laxman S, Sastry P S, Unnikrishnan K P 2005 Discovering frequent episodes and learning hidden markov models: A formal connection.IEEE Trans. Knowledge Data Eng. 17: 1505–1517CrossRefGoogle Scholar
  57. Lee C-H, Chen M-S, Lin C-R 2003 Progressive pattern miner: An efficient algorithm for mining general temporal association rules.IEEE Trans. Knowledge Data Eng. 15: 1004–1017CrossRefGoogle Scholar
  58. Levenshtein VI 1966 Binary codes capable of correcting deletions, insertions and reversals.Sov. Phys. Dokl. 10: 707–710MathSciNetGoogle Scholar
  59. Lin M-Y, Lee S-Y 2003 Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery. InProc. IEEE 36th Annu. Hawaii Int. Conf. on System Sciences (HICSS03), Big Island, HawaiiGoogle Scholar
  60. Ma S, Hellerstein J L 2001 Mining partially periodic event patterns with unknown periods. InProc. 17th Int. Conf. on Data Eng. (ICDE’01), pp 205–214Google Scholar
  61. Mannila H, Rusakov D 2001 Decomposition of event sequences into independent components. InFirst SIAM Int. Conf. on Data Mining, Chicago, ILGoogle Scholar
  62. Mannila H, Toivonen H, Verkamo A I 1997 Discovery of frequent episodes in event sequences.Data Mining Knowledge Discovery 1: 259–289CrossRefGoogle Scholar
  63. Meger N, Rigotti C 2004 Constraint-based mining of episode rules and optimal window sizes. InProc. 8th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD ’04), Pisa, ItalyGoogle Scholar
  64. Miller R T, Christoffels A G, Gopalakrishnan C, Burke J, Ptitsyn A A, Broveak T R, Hide W A 1999 A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base.Genome Res. 9: 1143–1155CrossRefGoogle Scholar
  65. Miller W, Schwartz S, Hardison R C 1994 A point of contact between computer science and molecular biology.IEEE Comput. Sci. Eng. 1: 69–78CrossRefGoogle Scholar
  66. Nag R, Wong K H, Fallside F 1986 Script recognition using Hidden Markov Models. InProc. 1986 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’86), pp 2071–2074Google Scholar
  67. Nalwa V S 1997 Automatic on-line signature verification.Proc. IEEE 85: 215–239CrossRefGoogle Scholar
  68. Ng R T, Lakshmanan L V S, Han J, Pang A 1998 Exploratory mining and pruning optimizations of constrained associations rules. InProc. 1998 ACM SIGMOD Int. Conf. on Management of Data, (Seattle, Washington) pp 13–24Google Scholar
  69. Oates T, Firoiu L, Cohen P R 2001 Using dynamic time warping to bootstrap HMM-based clustering of time series. InLecture notes in computer science; Sequence learning: Paradigms, algorithms, and applications (eds) C L Giles, R Sun (Heidelberg: Springer-Verlag) vol. 1828, p. 35Google Scholar
  70. Osata Net al 2002 A computer-based method of selecting clones for a full-length cDNA project: Simulataneous collection of negligibly redundant and variant cDNAs.Genome Res. 12: 1127–1134CrossRefGoogle Scholar
  71. O’Shaughnessy D 2003Speech communications: Human and machine (Piscataway, NJ: IEEE Press)Google Scholar
  72. Ozden B, Ramaswamy S, Silberschatz A 1998 Cyclic association rules. InProc. 14th Int. Conf. on Data Engineering (ICDE’98), Orlando, Florida, pp 412–421Google Scholar
  73. Pasquier N, Bastide Y, Taouil R, Lakhal L 1999 Discovering frequent closed itemsets for association rules. InLecture notes in computer science; Proc. 7th Int. Conf. on Database Theory (ICDT99), Jerusalem, Israel, vol. 1540, pp 398–416Google Scholar
  74. Perng C-S, Wang H, Zhang S R, Parker D S 2000 Landmarks: A new model for similarity-based pattern querying in time series databases. In16th Int. Conf. on Data Engineering (ICDE00), p. 33, San Diego, CAGoogle Scholar
  75. Pevzner P A, Borodovski M Y, Mironov A A 1989 Linguistic of nucleotide sequences: The significance of deviation from mean statistical characteristics and prediction of the frequencies of occurrence of words.J. Biomol. Struct. Dynamics 6: 1013–1026Google Scholar
  76. Rabiner L R 1989 A tutorial on hidden Markov models and selected applications in speech recognition.Proc. IEEE 77: 257–286CrossRefGoogle Scholar
  77. Regnier M, Szpankowski W 1998 On pattern frequency occurrences in a Markovian sequence.Algorithmica 22: 631–649MATHCrossRefMathSciNetGoogle Scholar
  78. Schreiber T, Schmitz A 1997 Classification of time series data with nonlinear similarity measures.Phys. Rev. Lett. 79: 1475–1478CrossRefGoogle Scholar
  79. Sclaroff S, Kollios G, Betke M, Rosales R 2001 Motion mining. InLecture notes in computer science;Proc. 2nd Int. Workshop on Multimedia Databases and Image Communication (Heidelberg: Springer-Verlag)Google Scholar
  80. Sebastiani P, Ramoni M, Cohen P, Warwick J, Davis J 1999 Discovering dynamics using bayesian clustering. InLecture notes in computer science;Adv. in Intelligent Data Analysis: 3rd Int. Symp., IDA-99 (Heidelberg: Springer-Verlag) vol. 1642, p. 199Google Scholar
  81. Shintani T, Kitsuregawa M 1998 Mining algorithms for sequential patterns in parallel: Hash based approach. InProc. 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 283–294Google Scholar
  82. Smyth P 1997 Clustering sequences with hidden Markov models.Adv. Neural Inf. Process. 9: 648–655MathSciNetGoogle Scholar
  83. Smyth P 2001 Data mining at the interface of computer science and statistics. InData mining for scientific and engineering applications. (eds) R L Grossman, C Kamath, P Kegelmeyer, V Kumar, R R Namburu (Dordrecht: Kluwer Academic)Google Scholar
  84. Srikanth R, Agrawal R 1996 Mining sequential patterns: Generalizations and performance improvements. InProc. 5th Int. Conf. on Extending Database Technology (EDBT), Avignon, FranceGoogle Scholar
  85. Starner T E, Pentland A 1995 Visual recognition of American sign language. InProc. 1995 Int. Workshop on Face and Gesture Recognition, ZurichGoogle Scholar
  86. Sutton R S 1988 Learning to predict by method of temporal differences.Machine Learning 3(1): 9–44Google Scholar
  87. Sze S H, Gelfand M S, Pevzner P A 2002 Finding weak motifs in DNA sequences. InProc. 2002 Pacific Symposium on Biocomputing, pp 235–246Google Scholar
  88. Tappert C C, Suen C Y, Wakahara T 1990 The state of the art in on-line handwriting recognition.IEEE Trans. Pattern Anal. Machine Intell. 12: 787–808CrossRefGoogle Scholar
  89. Tino P, Schittenkopf C, Dorffner G 2000 Temporal pattern recognition in noisy non-stationary time series based on quantization into symbolic streams: Lessons learned from financial volatility trading (url:citeseer.nj.nec.com/tino00temporal.html)Google Scholar
  90. Tronicek Z 2001 Episode matching. InProc. 12th Annu. Symp. on Combinatorial Pattern Matching (CPM 2001), Jerusalem, Israel, vol. 2089, pp 143–146Google Scholar
  91. Wan E A 1990 Temporal backpropagation for FIR neural networks. InInt. Joint Conf. on Neural Networks (1990 IJCNN), vol. 1, pp 575–580CrossRefGoogle Scholar
  92. Wang J T-L, Chirn G-W, Marr T G, Shapiro B, Shasha D, Zhang K 1994 Combinatorial pattern discovery for scientific data: some preliminary results. InProc. 1994 ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, Minnesota, pp 115–125Google Scholar
  93. Wang J, Han J 2004 BIDE: Efficient mining of frequent closed sequences. In20th Int. Conf. on Data Engineering, Boston, MAGoogle Scholar
  94. Witten I H, Frank E 2000Data mining: Practical machine learning tools and techniques with JAVA implementations (San Fransisco, CA: Morgan Kaufmann)Google Scholar
  95. Wu C, Berry M, Shivakumar S, McLarty J 1995 Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition.Machine Learning, Special issue on applications in molecular biology 21(1–2): 177–193 Wu S, Manber U 1992 Fast text searching allowing errors.Commun. ACM 35(10): 83–91Google Scholar
  96. Wu Y-L, Agrawal D, Abbadi A E 2000 A comparison of DFT and DWT based similarity search in time series databases. InProc. Ninth Int. Conf. on Information and Knowledge Management, McLean, VA, pp 488–495Google Scholar
  97. Xiong Y, Yeung D Y 2002 Mixtures of ARMA models for model-based time series clustering. In2002 IEEE Int. Conf. on Data Mining, Maebashi City, Japan, pp 717–720Google Scholar
  98. Yamato J, Ohya J, Ishii K 1992 Recognizing human action in time-sequential images using Hidden Markov Model. InProc. 1992 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’92), Champaign, IL, pp 379–385Google Scholar
  99. Yan X, Han J, Afshar R 2003 CloSpan: Mining closed sequential patterns in large datasets. InProc. 2003 Int. SIAM Conf. on Data Mining (SDM03), San Fransisco, CAGoogle Scholar
  100. Yule G 1927 On a method of investigating periodicity in distributed series with special reference to Wolfer’s sunspot numbers.Philos. Trans. R. Soc. London A226Google Scholar
  101. Zaki M J 1998 Efficient enumeration of frequent sequences. InProc. ACM 7th Int. Conf. Information and Knowledge Management (CIKM) Google Scholar

Copyright information

© Indian Academy of Sciences 2006

Authors and Affiliations

  • Srivatsan Laxman
    • 1
  • P. S. Sastry
    • 1
  1. 1.Department of Electrical EngineeringIndian Institute of ScienceBangaloreIndia

Personalised recommendations