Data Mining and Knowledge Discovery

, Volume 15, Issue 2, pp 181–215 | Cite as

Efficient mining of understandable patterns from multivariate interval time series

Article

Abstract

We present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge in time interval data. The patterns have a hierarchical structure, with levels corresponding to the temporal concepts duration, coincidence, and partial order. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. The search for coincidence and partial order in interval data can be formulated as instances of the well known frequent itemset problem. Efficient algorithms for the discovery of the patterns are adapted accordingly. A novel form of search space pruning effectively reduces the size of the mining result to ease interpretation and speed up the algorithms. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using two real life data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field.

Keywords

Knowledge discovery Time series Interval patterns Allen’s relations 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afrati F, Gionis A, Mannila H (2004) Approximating a collection of frequent sets. In: Kim W, Kohavi R, Gehrke J, DuMouchel W (eds) Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04). ACM Press, pp 12–19Google Scholar
  2. Aggarwal CC (2001) A human-computer cooperative system for effective high dimensional clustering. In: Provost F, Srikant R (eds) Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data Mining (KDD’01). ACM Press, pp 221–226Google Scholar
  3. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. ACM Press, pp 207–216Google Scholar
  4. Aiello M, Monz C, Todoran L and Worring M (2002). Document understanding for a broad class of documents. Int J Document Anal Recog 5(1): 1–16 MATHCrossRefGoogle Scholar
  5. Allen JF (1983). Maintaining knowledge about temporal intervals. Commun ACM 26(11): 832–843 MATHCrossRefGoogle Scholar
  6. Ankerst M, Ester M, Kriegel H-P (2000) Towards an effective cooperation of the user and the computer for classification. In: Ramakrishnan R, Stolfo S, Bayardo R, Parsa I (eds) Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00). ACM Press, pp 179–188Google Scholar
  7. Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Tiwary A, Franklin M (eds) Proceedings of the 17th ACM SIGMOD symposium on principles of database systems (PODS’98). ACM Press, pp 85–93Google Scholar
  8. Bellazi R, Larizza C, Magni P and Bellazi R (2005). Temporal data mining for the quality assessment of hemodialysis services. Artif Intell Med 34: 25–39 CrossRefGoogle Scholar
  9. Boulicaut J-F, Bykowski A and Rigotti C (2003). Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Disc 7(1): 5–22 CrossRefGoogle Scholar
  10. Bykowski A, Rigotti C (2001) A condensed representation to find frequent patterns. In: Fan W (ed) Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS’01). ACM Press, pp 267–273Google Scholar
  11. Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Lavrac N, Gamberger D, Blockeel H, Todorovski L (eds) Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases (PKDD’03). Springer, pp 71–82Google Scholar
  12. Casas-Garriga G (2005) Summarizing sequential data with closed partial orders. In: Kargupta H, Srivastava J, Kamath C, Goodman A (eds) Proceedings of the 5th SIAM international conference on data mining (SDM’05). SIAM, pp 380–391Google Scholar
  13. Chen G, Wu X, Zhu X (2006) Mining sequential patterns across data streams. Technical Report CS-05-04, University of Vermont, Burlington, VT, USAGoogle Scholar
  14. Cheng J, Ke Y, Ng W (2006) δ-Tolerance closed frequent itemsets. In: Proceedings of the 6th IEEE international conference on data mining (ICDM’06). IEEE Press, pp 139–148Google Scholar
  15. Cohen PR (2001) Fluent learning: elucidating the structure of episodes. In: Hoffmann F, Hand D, Adams N, Fisher D, Guimarães G (eds) Proceedings of the 4th international conference in intelligent data analysis (IDA’01). Springer, pp 268–277Google Scholar
  16. Dubois D, Hüllermeier E and Prade H (2006). A systematic approach to the assessment of fuzzy association rules. Data Min Knowl Disc 13(2): 167–192 CrossRefGoogle Scholar
  17. Fern A (2004) Learning models and formulas of a temporal event logic. PhD thesis, Purdue University, West Lafayette, IN, USAGoogle Scholar
  18. Gionis A, Mannila H, Terzi E (2004) Clustered segmentations. In: Workshop on mining temporal and sequential data, 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04)Google Scholar
  19. Grice H (1989) Studies in the way of words. Harvard University PressGoogle Scholar
  20. Guimarães G (1998) Eine Methode zur Entdeckung von komplexen Mustern in Zeitreihen mit Neuronalen Netzen und deren Überführung in eine symbolische Wissensrepräsentation. PhD thesis, Philipps-University Marburg, Germany (German)Google Scholar
  21. Guimarães G., Ultsch A (1997) A symbolic representation for pattern in time series using definitive clause grammars. In: Klar R, Opitz O (eds) Proceedings of the 20th annual conference of the german classification society (GfKl’96). Springer, pp 105–111Google Scholar
  22. Guimarães G, Ultsch A (1999) A method for temporal knowledge conversion. In: Hand DJ, Kok JN, Berthold MR (eds) Proceedings of the 3rd international conference in intelligent data analysis (IDA’99). Springer, pp 369–380Google Scholar
  23. Hoos O (2003). Bewegungsstruktur, Bewegungstechnik und Geschwindigkeitsregulation im ausdauerorientierten Inline-Skating. Görich & Weiershäuser, Marburg, Germany Google Scholar
  24. Höppner F (2001) Discovery of temporal patterns – learning rules about the qualitative behaviour of time series. In: Raedt LD, Siebes A (eds) Proceedings of the 5th European conference on principles of data mining and knowledge discovery (PKDD’01). Springer, pp 192–203Google Scholar
  25. Höppner F (2003) Knowledge discovery from sequential data. PhD thesis, Technical University Braunschweig, GermanyGoogle Scholar
  26. Höppner F and Klawonn F (2002). Finding informative rules in interval sequences. Intell. Data Anal 6(3): 237–255 MATHGoogle Scholar
  27. Kam P-S, Fu AW-C (2000) Discovering temporal patterns for interval-based events. In: Kambayashi Y, Mohania MK, Tjoa AM (eds) Proceedings of the 2nd international conference on data warehousing and knowledge discovery (DaWaK’00). Springer, pp 317–326Google Scholar
  28. Keogh E, Chu S, Hart D and Pazzani M (2004). Segmenting time series: a survey and novel approach. In: Last, M, Kandel, A, and Bunke, H (eds) Data mining in time series databases, chapter 1, pp 1–22. World Scientific, Singapore pp Google Scholar
  29. Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Cercone N, Lin T, Wu X (eds) Proceedings of the 1st IEEE international conference on data mining (ICDM’01). IEEE Press, pp 305–312Google Scholar
  30. Last M, Klein Y and Kandel A (2001). Knowledge discovery in time series databases. IEEE Trans Syst Man Cybernet 31(1): 160–169 CrossRefGoogle Scholar
  31. Lin M-Y, Lee S-Y (2002) Fast discovery of sequential patterns by memory indexing. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Proceedings of the 4th international conference on data warehousing and knowledge discovery (DaWaK’02). Springer, pp 150–160Google Scholar
  32. Lin J, Keogh E, Lonardi S, Patel P (2002) Finding motifs in time series. In: Hand D, Keim D, Ng R (eds) Workshop on temporal data mining, 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02).Google Scholar
  33. Lin J, Keogh E, Lonardi S, Lankford JP, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Kim W, Kohavi R, Gehrke J, DuMouchel W (eds) Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04). ACM Press, pp 460–469Google Scholar
  34. Lucchese C, Orlando S and Perego R (2006). Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1): 21–36 CrossRefGoogle Scholar
  35. Mannila H, Toivonen H, Verkamo I (1995) Discovery of frequent episodes in event sequences. In: Fayyad UM, Uthurusamy R (eds) Proceedings of the 1st international conference on knowledge discovery and data mining (KDD’96). AAAI Press, pp 210–215Google Scholar
  36. Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Berry MW, Dayal U, Kamath C, Skillicorn DB (eds) Proceedings of the 4th SIAM international conference on data mining (SDM’04). SIAMGoogle Scholar
  37. Mörchen F (2006a) Algorithms for time series knowledge mining. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06). ACM Press, pp 668–673Google Scholar
  38. Mörchen F (2006b) A better tool than Allen’s relations for expressing temporal knowledge in interval data. In: Li T, Perng C, Wang H, Domeniconi C (eds) Workshop on temporal data mining at the 12th ACM SIGKDD international conference on knowledge discovery and data mining. pp 25–34Google Scholar
  39. Mörchen F (2006c) Time series knowledge mining. PhD thesis Philipps-University Marburg GermanyGoogle Scholar
  40. Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Grossman R, Bayardo R, Bennett KP (eds) Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05). ACM Press, pp 660–665Google Scholar
  41. Mörchen F, Ultsch A, Hoos O (2004) Discovering interpretable muscle activation patterns with the Temporal Data Mining Method. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD’04). Lecture notes in computer science. Springer, pp 512–514Google Scholar
  42. Mörchen F, Ultsch A and Hoos O (2006). Extracting interpretable muscle activation patterns with time series knowledge mining. Int J Knowl-Based Intell Eng Syst 9(3): 197–208 Google Scholar
  43. Palpanas T, Cardle M, Gunopulos D, Keogh E, Zordan VB (2004a) Indexing large human motion databases. In: Nascimento MA, Özsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB (eds) Proceedings of the 30th international conference on very large data bases (VLDB’04). Morgan Kaufmann, pp 780–791Google Scholar
  44. Palpanas T, Vlachos M, Keogh E, Gunopulos D, Truppel W (2004b) Online amnesic approximation of streaming time series. In: Proceedings of the 20th international conference on data engineering (ICDE’04). IEEE Press, pp 338–349Google Scholar
  45. Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Böhm K, Jensen CS, Haas LM, Kersten ML, Larson P-Å, Ooi BC (eds) Proceedings of the 31st international conference on very large data bases (VLDB’05). Morgan Kaufmann, pp 697–708Google Scholar
  46. Papaterou P, Kollios G, Sclaroff S, Gunopoulos D (2005) Discovering frequent arrangements of temporal intervals. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05). IEEE Press, pp 354–361Google Scholar
  47. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory (ICDT’99). Springer, pp 398–416Google Scholar
  48. Pei J, Tung AK, Han J (2001) Fault-tolerant frequent pattern mining: problems and challenges. In: Workshop on research issues in data mining and knowledge discovery, 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS’01). IEEE PressGoogle Scholar
  49. Pei J, Dong G, Zou W, Han J (2002) On computing condensed frequent pattern bases. In: Proceedings of the 2nd IEEE international conference on data mining (ICDM’02). IEEE Press, pp 378–385Google Scholar
  50. Pei J, Liu J, Wang H, Wang K, Yu PS, Wang J (2005) Efficiently mining frequent closed partial orders. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05). IEEE Press, pp 753–756Google Scholar
  51. Pei J, Wang H, Liu J, Wang K, Wang J and Yu PS (2006). Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11): 1467–1481 CrossRefGoogle Scholar
  52. Pudi V, Haritsa JR (2003) Generalized closed itemsets for association rule mining. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proceedings of the 19th international conference on data engineering (ICDE’03). IEEE Press, pp 714–716Google Scholar
  53. Rainsford C, Roddick J (1999) Adding temporal semantics to association rules. In: Zytkow JM, Rauch J (eds) Proceedings of the 3rd European conference on principles of data mining and knowledge discovery (PKDD’99). Springer, pp 504–509Google Scholar
  54. Roddick JF and Mooney CH (2005). Linear temporal sequences and their interpretation using midpoint relationships. IEEE Trans Knowl Data Eng 17(1): 133–135 CrossRefGoogle Scholar
  55. Schwalb E, Vila L (1997) Temporal constraints: a survey. Technical report, ICS, University of California at Irvine, CA, USAGoogle Scholar
  56. Seppänen JK, Mannila H (2004) Dense itemsets. In: Kim W, Kohavi R, Gehrke J, DuMouchel W (eds) Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04). ACM Press, pp 683–688Google Scholar
  57. Shneiderman B (1996) The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE symposium on visual languages. IEEE Press, p 336Google Scholar
  58. Siskind JM (2001). Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J Artif Intell Res 15: 31–90 MATHGoogle Scholar
  59. Sripada SG, Reiter E, Hunter J (2003) Generating English summaries of time series data using the Gricean maxims. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03). ACM Press, pp 187–196Google Scholar
  60. Ultsch A (1996) Eine unifikationsbasierte Grammatik zur Beschreibung von komplexen Mustern in multivariaten Zeitreihen. Personal notes (German)Google Scholar
  61. Ultsch A (1999) Data mining and knowledge discovery with emergent self-organizing feature maps for multivariate time series. In: Oja E, Kaski S (eds) Kohonen Maps. Elsevier, pp 33–46Google Scholar
  62. Ultsch A (2004) Unification-based temporal grammar. Technical Report 37, Department of Mathematics and Computer Science, Philipps-University Marburg, GermanyGoogle Scholar
  63. Vilain M, Kautz HA, van Beek PG (1989) Constraint propagation algorithms for temporal reasoning: a revised report. In: Readings in qualitative reasoning about physical systems. Morgan Kaufmann, San Francisco, USA, pp 373–381Google Scholar
  64. Villafane R, Hua KA, Tran D and Maulik B (2000). Knowledge discovery from series of interval events. J Intell Inform Syst 15(1): 71–89 CrossRefGoogle Scholar
  65. Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering (ICDE’04). IEEE Press, pp 79–90Google Scholar
  66. Winarko E, Roddick JF (2007) ARMADA – an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl EngGoogle Scholar
  67. Yahia SB, Hamrouni T and Mephu Nguifo E (2006). Frequent closed itemset based algorithms: A thorough structural and analytical survey. ACM SIGKDD Explor Newslett 8(1): 93–104 CrossRefGoogle Scholar
  68. Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Barbará D, Kamath C (eds) Proceedings of the 3rd SIAM international conference on data mining (SDM’03). SIAM, pp 166–177Google Scholar
  69. Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Grossman R, Bayardo R, Bennett KP (eds) Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05). ACM Press, pp 314–323Google Scholar
  70. Yang C, Fayyad U, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Provost F, Srikant R (eds) Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’01). ACM Press, pp 194–203Google Scholar
  71. Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Grossman RL, Han J, Kumar V, Mannila H, Motwani R (eds) Proceedings of the 2nd SIAM international conference on data mining (SDM’02). SIAM, pp 457–473Google Scholar
  72. Zaki MJ and Hsiao C-J (2005). Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4): 462–478 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Siemens Corporate ResearchPrincetonUSA
  2. 2.Databionic Research GroupPhilipps-University MarburgMarburgGermany

Personalised recommendations