A survey of temporal data mining
 Srivatsan Laxman,
 P. S. Sastry
 … show all 2 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.
 Agrawal R, Srikant R 1994 Fast algorithms for mining association rules in large databases. InProc. 20th Int. Conf. on Very Large Data Bases, pp 487–499
 Agrawal R, Srikant R 1995 Mining sequential patterns. InProc. 11th Int. Conf. on Data Engineering, (Washington, DC: IEEE Comput. Soc.)
 Agrawal R, Imielinski T, Swami 1993 A Mining association rules between sets of items in large databases. InProc. ACM SIGMOD Conf. on Management of Data, pp 207–216
 Agrawal R, Lin K I, Sawhney H S, Shim K 1995a Fast similarity search in the presence of noise, scaling and translation in time series databases. InProc. 21st Int. Conf. on Very Large Data Bases (VLDB95), pp 490–501
 Agrawal R, Psaila G, Wimmers E L, Zait M 1995b Querying shapes of histories. InProc. 21st Int. Conf. on Very Large Databases, Zurich, Switzerland
 Alon J, Sclaroff S, Kollios G, Pavlovic V 2003 Discovering clusters in motion time series data. InProc. 2003 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition, pp I375–I381, Madison, Wisconsin
 Alur R, Dill D L 1994 A theory of timed automata.Theor. Comput. Sci. 126: 183–235 CrossRef
 Atallah M J, Gwadera R, Szpankowski W 2004 Detection of significant sets of episodes in event sequences. InProc. 4th IEEE Int. Conf. on Data Mining (ICDM 2004), pp 3–10, Brighton, UK
 BaezaYates R A 1991 Searching subsequences.Theor. Comput. Sci. 78: 363–376 CrossRef
 Baldi P, Chauvin Y, Hunkapiller T, McClure M 1994 Hidden Markov models of biological primary sequence information.Proc. Nat. Acad. Sci. USA 91: 1059–1063 CrossRef
 Bender E A, Kochman F 1993 The distribution of subword counts is usually normal.Eur. J. Combinatorics 14: 265–275 CrossRef
 Berberidis C, Vlahavas I P, Aref W G, Atallah M J, Elmagarmid A K 2002 On the discovery of weak periodicities in large time series. InLecture notes in computer science, Proc. 6th Eur. Conf. on Principles of Data Mining and Knowledge Discovery, vol. 2431, pp 51–61
 Bettini C, Wang X S, Jajodia S, Lin J L 1998 Discovering frequent event patterns with multiple granularities in time sequences.IEEE Trans. Knowledge Data Eng. 10: 222–237 CrossRef
 Box G E P, Jenkins G M, Reinsel G C 1994Time series analysis: Forecasting and control (Singapore: Pearson Education Inc.)
 Cadez I, Heckerman D, Meek C, Smyth P, White S 2000 Modelbased clustering and visualisation of navigation patterns on a web site. Technical Report CA 927173425, Dept. of Information and Computer Science, University of California, Irvine, CA
 Cao H, Cheung D W, Mamoulis N 2004 Discovering partial periodic patterns in discrete data sequences. InProc. 8th PacificAsia Conf. on Knowledge Discovery and Data Mining (PAKDD ’04), Sydney, pp 653–658
 CasasGarriga G 2003 Discovering unbounded episodes in sequential data. InProc. 7th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD’03), CavtatDubvrovnik, Croatia, pp 83–94
 Chang S F, Chen W, Men J, Sundaram H, Zhong D 1998 A fully automated content based video search engine supporting spatiotemporal queries.IEEE Trans. Circuits Syst. Video Technol. 8(5): 602–615 CrossRef
 Chatfield C 1996The analysis of time series 5th edn (New York, NY: Chapman and Hall)
 Chudova D, Smyth P 2002 Pattern discovery in sequences under a Markovian assumption. InProc. Eigth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada
 Cohen J 2004 Bioinformatics — an introduction for computer scientists.ACM Comput. Surv. 36(2): 122–158 CrossRef
 Corpet F 1988 Multiple sequence alignment with hierarchical clustering.Nucleic Acids Research, 16: 10881–10890 CrossRef
 Darrell T, Pentland A 1993 Spacetime gestures. InProc. 1993 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’93), pp 335–340
 Dietterich T G, Michalski R S 1985 Discovering patterns in sequences of events.Artif. Intell. 25: 187–232 CrossRef
 Duda R O, Hart P E, Stork D G 1997Pattern classification and scene analysis (New York: Wiley)
 Durbin R, Eddy S, Krogh A, Mitchison G 1998Biological sequence analysis (Cambridge: University Press)
 Ewens W J, Grant G R 2001Statistical methods in bioinformatics: An introduction (New York: SpringerVerlag)
 Fadili M J, Ruan S, Bloyet D, Mazoyer B 2000 A multistep unsupervised fuzzy clustering analysis of fMRI time series.Human Brain Mapping 10: 160–178 CrossRef
 Flajolet P, Guivarc’h Y, Szpankowski W, Vallee B 2001 Hidden pattern statistics. InLecture notes in computer science; Proc. 28th Int. Colloq. on Automata, Languages and Programming (London: SpringerVerlag) vol. 2076, pp 152–165
 Frenkel K A 1991 The human genome project and informatics.Commun. ACM 34(11): 40–51 CrossRef
 Garofalakis M, Rastogi R, Shim K 2002 Mining sequential patterns with regular expression constraints.IEEE Trans. Knowledge Data Eng. 14: 530–552 CrossRef
 Ghias A, Logan J, Chamberlin D, Smith B C 1995 Query by humming — musical information retrieval in an audio database. InProc. ACM Multimedia 95, San Fransisco, CA
 Gold B, Morgan N 2000Speech and audio signal processing: Processing and perception of speech and music (New York: John Wiley & Sons)
 Gray R M, Buzo A, Gray Jr. A H, Matsuyama Y 1980 Distortion measures for speech processing.IEEE Trans. Acoust., Speech Signal Process. 28: 367–376 CrossRef
 Gusfield D 1997A lgorithms on strings, trees and subsequences (New York: University of Cambridge Press)
 Gwadera R, Atallah M J, Szpankowski W 2003 Reliable detection of episodes in event sequences. InProc. 3rd IEEE Int. Conf. on Data Mining (ICDM 2003), pp 67–74
 Gwadera R, Atallah M J, Szpankowski W 2005 Markov models for identification of significant episodes. InProc. 2005 SIAM Int. Conf. on Data Mining (SDM05), Newport Beach, California
 Han J, Kamber M 2001Data mining: Concepts and techniques (San Fransisco, CA: Morgan Kauffmann)
 Han J, Gong W, Yin Y 1998 Mining segmentwise periodic patterns in timerelated databases. InProc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD’98), New York, pp 214–218
 Han J, Dong G, Yin Y 1999 Efficient mining of partial periodic patterns in time series database. InProc. 15th Int. Conf. on Data Engineering, (ICDE’99), Sydney, pp 106–115
 Hand D, Mannila H, Smyth P 2001Principles of data mining (Cambridge, MA: MIT Press)
 Haselsteiner E, Pfurtscheller G 2000 Using timedependent neural networks for EEG classification.IEEE Trans. Rehab. Eng. 8: 457–463 CrossRef
 Hastie T, Tibshirani R, Friedman J 2001The elements of statistical learning: Data mining, inference and prediction (New York: SpringerVerlag)
 Haykin S 1992Neural networks: A comprehensive foundation (New York: Macmillan)
 Hirao M, Inenaga S, Shinohara A, Takeda M, Arikawa S 2001 A practical algorithm to find the best episode patterns.Lecture notes in computer science; Proc. 4th Int. Conf. on Discovery Science (DS 2001) Washington, DC, vol. 2226, pp 435–441, 25–28
 Juang B H, Rabiner L 1993Fundamentals of speech recognition. (Englewood Cliffs, NJ: Prentice Hall)
 Kalpakis K, Puttagunta D G V 2001 Distance measures for effective clustering of ARIMA time series. In2001 IEEE Int. Conf. on Data Mining (ICDM01), San Jose, CA
 Keogh E J, Pazzani M J 2000 Scaling up dynamic time warping for datamining applications. InProc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data mining, Boston, MA, pp 285–289, 20–23
 Koskela T, Lehtokangas M, Saarinen J, Kaski K 1996 Time series prediction with multilayer perceptron, FIR and Elman neural networks. InProc. World Congress on Neural Networks, pp 491–496
 Kruskal J B 1983 An overview of sequence comparison: Time warps, string edits and macromolecules.SIAM Rev. 21:201–237 CrossRef
 Kundu A, He Y, Bahl P 1988 Word recognition and word hypothesis generation for handwritten script: A Hidden Markov Model based approach. InProc. 1988 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’88), pp 457–462
 Law M H, Kwok J T 2000 Rival penalized competitive learning for modelbased sequence clustering. InProc. IEEE Int. Conf. on Pattern Recognition (ICPR00), Barcelona, Spain
 Laxman S, Sastry P S, Unnikrishnan K P 2002 Generalized frequent episodes in event sequences.Temporal Data Mining Workshop Notes, SIGKDD, (eds) K P Unnikrishnan, R Uthurusamy, Edmonton, Alberta, Canada
 Laxman S, Sastry P S, Unnikrishnan K P 2004a Fast algorithms for frequent episode discovery in event sequences. Technical Report CL200404/MSR, GM R&D Center, Warren
 Laxman S, Sastry P S, Unnikrishnan K P 2004b Fast algorithms for frequent episode discovery in event sequences. InProc. 3rd Workshop on Mining Temporal and Sequential Data, Seattle, WA
 Laxman S, Sastry P S, Unnikrishnan K P 2005 Discovering frequent episodes and learning hidden markov models: A formal connection.IEEE Trans. Knowledge Data Eng. 17: 1505–1517 CrossRef
 Lee CH, Chen MS, Lin CR 2003 Progressive pattern miner: An efficient algorithm for mining general temporal association rules.IEEE Trans. Knowledge Data Eng. 15: 1004–1017 CrossRef
 Levenshtein VI 1966 Binary codes capable of correcting deletions, insertions and reversals.Sov. Phys. Dokl. 10: 707–710
 Lin MY, Lee SY 2003 Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery. InProc. IEEE 36th Annu. Hawaii Int. Conf. on System Sciences (HICSS03), Big Island, Hawaii
 Ma S, Hellerstein J L 2001 Mining partially periodic event patterns with unknown periods. InProc. 17th Int. Conf. on Data Eng. (ICDE’01), pp 205–214
 Mannila H, Rusakov D 2001 Decomposition of event sequences into independent components. InFirst SIAM Int. Conf. on Data Mining, Chicago, IL
 Mannila H, Toivonen H, Verkamo A I 1997 Discovery of frequent episodes in event sequences.Data Mining Knowledge Discovery 1: 259–289 CrossRef
 Meger N, Rigotti C 2004 Constraintbased mining of episode rules and optimal window sizes. InProc. 8th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD ’04), Pisa, Italy
 Miller R T, Christoffels A G, Gopalakrishnan C, Burke J, Ptitsyn A A, Broveak T R, Hide W A 1999 A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base.Genome Res. 9: 1143–1155 CrossRef
 Miller W, Schwartz S, Hardison R C 1994 A point of contact between computer science and molecular biology.IEEE Comput. Sci. Eng. 1: 69–78 CrossRef
 Nag R, Wong K H, Fallside F 1986 Script recognition using Hidden Markov Models. InProc. 1986 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’86), pp 2071–2074
 Nalwa V S 1997 Automatic online signature verification.Proc. IEEE 85: 215–239 CrossRef
 Ng R T, Lakshmanan L V S, Han J, Pang A 1998 Exploratory mining and pruning optimizations of constrained associations rules. InProc. 1998 ACM SIGMOD Int. Conf. on Management of Data, (Seattle, Washington) pp 13–24
 Oates T, Firoiu L, Cohen P R 2001 Using dynamic time warping to bootstrap HMMbased clustering of time series. InLecture notes in computer science; Sequence learning: Paradigms, algorithms, and applications (eds) C L Giles, R Sun (Heidelberg: SpringerVerlag) vol. 1828, p. 35
 Osata Net al 2002 A computerbased method of selecting clones for a fulllength cDNA project: Simulataneous collection of negligibly redundant and variant cDNAs.Genome Res. 12: 1127–1134 CrossRef
 O’Shaughnessy D 2003Speech communications: Human and machine (Piscataway, NJ: IEEE Press)
 Ozden B, Ramaswamy S, Silberschatz A 1998 Cyclic association rules. InProc. 14th Int. Conf. on Data Engineering (ICDE’98), Orlando, Florida, pp 412–421
 Pasquier N, Bastide Y, Taouil R, Lakhal L 1999 Discovering frequent closed itemsets for association rules. InLecture notes in computer science; Proc. 7th Int. Conf. on Database Theory (ICDT99), Jerusalem, Israel, vol. 1540, pp 398–416
 Perng CS, Wang H, Zhang S R, Parker D S 2000 Landmarks: A new model for similaritybased pattern querying in time series databases. In16th Int. Conf. on Data Engineering (ICDE00), p. 33, San Diego, CA
 Pevzner P A, Borodovski M Y, Mironov A A 1989 Linguistic of nucleotide sequences: The significance of deviation from mean statistical characteristics and prediction of the frequencies of occurrence of words.J. Biomol. Struct. Dynamics 6: 1013–1026
 Rabiner L R 1989 A tutorial on hidden Markov models and selected applications in speech recognition.Proc. IEEE 77: 257–286 CrossRef
 Regnier M, Szpankowski W 1998 On pattern frequency occurrences in a Markovian sequence.Algorithmica 22: 631–649 CrossRef
 Schreiber T, Schmitz A 1997 Classification of time series data with nonlinear similarity measures.Phys. Rev. Lett. 79: 1475–1478 CrossRef
 Sclaroff S, Kollios G, Betke M, Rosales R 2001 Motion mining. InLecture notes in computer science;Proc. 2nd Int. Workshop on Multimedia Databases and Image Communication (Heidelberg: SpringerVerlag)
 Sebastiani P, Ramoni M, Cohen P, Warwick J, Davis J 1999 Discovering dynamics using bayesian clustering. InLecture notes in computer science;Adv. in Intelligent Data Analysis: 3rd Int. Symp., IDA99 (Heidelberg: SpringerVerlag) vol. 1642, p. 199
 Shintani T, Kitsuregawa M 1998 Mining algorithms for sequential patterns in parallel: Hash based approach. InProc. 2nd PacificAsia Conf. on Knowledge Discovery and Data Mining, pp 283–294
 Smyth P 1997 Clustering sequences with hidden Markov models.Adv. Neural Inf. Process. 9: 648–655
 Smyth P 2001 Data mining at the interface of computer science and statistics. InData mining for scientific and engineering applications. (eds) R L Grossman, C Kamath, P Kegelmeyer, V Kumar, R R Namburu (Dordrecht: Kluwer Academic)
 Srikanth R, Agrawal R 1996 Mining sequential patterns: Generalizations and performance improvements. InProc. 5th Int. Conf. on Extending Database Technology (EDBT), Avignon, France
 Starner T E, Pentland A 1995 Visual recognition of American sign language. InProc. 1995 Int. Workshop on Face and Gesture Recognition, Zurich
 Sutton R S 1988 Learning to predict by method of temporal differences.Machine Learning 3(1): 9–44
 Sze S H, Gelfand M S, Pevzner P A 2002 Finding weak motifs in DNA sequences. InProc. 2002 Pacific Symposium on Biocomputing, pp 235–246
 Tappert C C, Suen C Y, Wakahara T 1990 The state of the art in online handwriting recognition.IEEE Trans. Pattern Anal. Machine Intell. 12: 787–808 CrossRef
 Tino P, Schittenkopf C, Dorffner G 2000 Temporal pattern recognition in noisy nonstationary time series based on quantization into symbolic streams: Lessons learned from financial volatility trading (url:citeseer.nj.nec.com/tino00temporal.html)
 Tronicek Z 2001 Episode matching. InProc. 12th Annu. Symp. on Combinatorial Pattern Matching (CPM 2001), Jerusalem, Israel, vol. 2089, pp 143–146
 Wan E A 1990 Temporal backpropagation for FIR neural networks. InInt. Joint Conf. on Neural Networks (1990 IJCNN), vol. 1, pp 575–580 CrossRef
 Wang J TL, Chirn GW, Marr T G, Shapiro B, Shasha D, Zhang K 1994 Combinatorial pattern discovery for scientific data: some preliminary results. InProc. 1994 ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, Minnesota, pp 115–125
 Wang J, Han J 2004 BIDE: Efficient mining of frequent closed sequences. In20th Int. Conf. on Data Engineering, Boston, MA
 Witten I H, Frank E 2000Data mining: Practical machine learning tools and techniques with JAVA implementations (San Fransisco, CA: Morgan Kaufmann)
 Wu C, Berry M, Shivakumar S, McLarty J 1995 Neural networks for fullscale protein sequence classification: Sequence encoding with singular value decomposition.Machine Learning, Special issue on applications in molecular biology 21(1–2): 177–193 Wu S, Manber U 1992 Fast text searching allowing errors.Commun. ACM 35(10): 83–91
 Wu YL, Agrawal D, Abbadi A E 2000 A comparison of DFT and DWT based similarity search in time series databases. InProc. Ninth Int. Conf. on Information and Knowledge Management, McLean, VA, pp 488–495
 Xiong Y, Yeung D Y 2002 Mixtures of ARMA models for modelbased time series clustering. In2002 IEEE Int. Conf. on Data Mining, Maebashi City, Japan, pp 717–720
 Yamato J, Ohya J, Ishii K 1992 Recognizing human action in timesequential images using Hidden Markov Model. InProc. 1992 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’92), Champaign, IL, pp 379–385
 Yan X, Han J, Afshar R 2003 CloSpan: Mining closed sequential patterns in large datasets. InProc. 2003 Int. SIAM Conf. on Data Mining (SDM03), San Fransisco, CA
 Yule G 1927 On a method of investigating periodicity in distributed series with special reference to Wolfer’s sunspot numbers.Philos. Trans. R. Soc. London A226
 Zaki M J 1998 Efficient enumeration of frequent sequences. InProc. ACM 7th Int. Conf. Information and Knowledge Management (CIKM)
 Title
 A survey of temporal data mining
 Journal

Sadhana
Volume 31, Issue 2 , pp 173198
 Cover Date
 20060401
 DOI
 10.1007/BF02719780
 Print ISSN
 02562499
 Online ISSN
 09737677
 Publisher
 Springer India
 Additional Links
 Topics
 Keywords

 Temporal data mining
 ordered data streams
 temporal interdependency
 pattern discovery
 Industry Sectors
 Authors

 Srivatsan Laxman ^{(1)}
 P. S. Sastry ^{(1)}
 Author Affiliations

 1. Department of Electrical Engineering, Indian Institute of Science, 560 012, Bangalore, India