Data Mining and Knowledge Discovery

, Volume 29, Issue 5, pp 1178–1210 | Cite as

Finding the longest common sub-pattern in sequences of temporal intervals

Article

Abstract

We study the problem of finding the longest common sub-pattern (LCSP) shared by two sequences of temporal intervals. In particular we are interested in finding the LCSP of the corresponding arrangements. Arrangements of temporal intervals are a powerful way to encode multiple concurrent labeled events that have a time duration. Discovering commonalities among such arrangements is useful for a wide range of scientific fields and applications, as it can be seen by the number and diversity of the datasets we use in our experiments. In this paper, we define the problem of LCSP and prove that it is NP-complete by demonstrating a connection between graphs and arrangements of temporal intervals. This connection leads to a series of interesting open problems. In addition, we provide an exact algorithm to solve the LCSP problem, and also propose and experiment with three polynomial time and space under-approximation techniques. Finally, we introduce two upper bounds for LCSP and study their suitability for speeding up 1-NN search. Experiments are performed on seven datasets taken from a wide range of real application domains, plus two synthetic datasets. Lastly, we describe several application cases that demonstrate the need and suitability of LCSP.

Keywords

Temporal intervals Longest common sub-pattern  Event-interval sequences 

References

  1. Abraham T, Roddick JF (1999) Incremental meta-mining from large temporal data sets. In: ER ’98: Proceedings of the workshops on data warehousing and data mining, pp 41–54Google Scholar
  2. Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Proceedings of the 15th ACM symposium on applied computing, pp 294–300Google Scholar
  3. Allen J, Ferguson G (1994) Actions and events in interval temporal logic. J Log Comput 4:531–579MathSciNetCrossRefMATHGoogle Scholar
  4. Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843CrossRefMATHGoogle Scholar
  5. Berendt B (1996) Explaining preferred mental models in Allen inferences with a metrical model of imagery. In: Proceedings of the 18th annual conference of the cognitive science society, pp 489–494Google Scholar
  6. Bergen B, Chang N (2005) Embodied construction grammar in simulation-based language understanding. Constr Gramm 3:147–190CrossRefGoogle Scholar
  7. Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Proceedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer-Verlag, New York, pp 295–300Google Scholar
  8. Chen YC, Peng WC, Le SY (2011) CEMiner—an effcient algorithms for mining closed patterns from interval-based data. In: Proceedings of the IEEE international conference on data mining (ICDM)Google Scholar
  9. Cormen TH, Rivest RL, Leiserson CE, Stein C (2001) Introduction to algorithms. MIT Press, CambridgeMATHGoogle Scholar
  10. Feige U, Goldwasser S, Lovasz L, Safra S, Szegedy M (1991) Approximating clique is almost NP-complete. In: Proceedings of the 32nd annual IEEE symposium on foundations of computer science, pp 2–12Google Scholar
  11. Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. In: Proceedings of the ACM SIGKDD workshop on useful patterns. ACM, New York, UP ’10, pp 45–54. doi: 10.1145/1816112.1816119
  12. Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the 6th SIAM data mining conference, vol 124, pp 348–359Google Scholar
  13. Håstad J (1996) Clique is hard to approximate within \(n^{1-\epsilon }\). In: FOCS, pp 627–636Google Scholar
  14. Höppner F (2001) Discovery of temporal patterns—learning rules about the qualitative behaviour of time series. In: Proceedings of the 5th European conference on principles of knowledge discovery in databases, pp 192–203Google Scholar
  15. Höppner F, Klawonn F (2001) Finding informative rules in interval sequences. In: Proceedings of the 4th international symposium on advances in intelligent data analysis, pp 123–132Google Scholar
  16. Hwang SY, Wei CP, Yang WS (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364CrossRefGoogle Scholar
  17. Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data 2(4):16:1–16:42MathSciNetCrossRefGoogle Scholar
  18. Kam P, Fu AW (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the 2nd international conference on data warehousing and knowledge discovery, pp 317–326Google Scholar
  19. Kosara R, Miksch S (2001) Visualizing complex notions of time. Stud Health Technol Inf 1:211–215Google Scholar
  20. Kostakis O, Papapetrou P, Hollmén J (2011) Artemis: assessing the similarity of event-interval sequences. In: Proceedings of the conference on machine learning and knowledge discovery in databases (ECML/PKDD 2011), pp 229–244Google Scholar
  21. Kotsifakos A, Papapetrou P, Athitsos V (2013) IBSM: interval-based sequence matching. In: Proceedings of the SIAM conference on data mining (SDM), pp 596–604Google Scholar
  22. Lam HT, Mrchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52. doi: 10.1002/sam.11192 MathSciNetCrossRefGoogle Scholar
  23. Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201. doi: 10.1109/TKDE.2007.1055 CrossRefGoogle Scholar
  24. Lin JL (2003) Mining maximal frequent intervals. In: Proceedings of the 18th ACM symposium on applied computing, pp 624–629Google Scholar
  25. Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Proceedings of the European conference on machine learning and knowledge discovery in databases: part II. Springer-Verlag, Berlin, ECML PKDD ’08, pp 33–49. doi: 10.1007/978-3-540-87481-2_3
  26. Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Proceedings of the 4th SIAM international conference on data miningGoogle Scholar
  27. Mörchen F (2007) Unsupervised pattern mining from symbolic temporal data. SIGKDD Explor Newsl 9:41–55CrossRefGoogle Scholar
  28. Mörchen F, Fradkin D (2010) Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the 10th SIAM international conference on data mining, pp 315–326Google Scholar
  29. Pachet F, Ramalho G, Carrive J (1996) Representing temporal musical objects and reasoning in the MusES system. J New Music Res 25(3):252–275CrossRefGoogle Scholar
  30. Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21:133–171CrossRefGoogle Scholar
  31. Patel D, Hsu W, Lee M (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 28th ACM SIGMOD international conference on management of data, ACM, pp 393–404Google Scholar
  32. Paterson M, Dancik V (1994) Longest common subsequences. In: Proceedings of the 19th MFCS, number 841 in LNCS, pp 127–142Google Scholar
  33. Pissinou N, Radev I, Makki K (2001) Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica 5(4):375–409CrossRefMATHGoogle Scholar
  34. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRefGoogle Scholar
  35. Tsourakakis CE, Bonchi F, Gionis A, Gullo F, Tsiarli MA (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 104–112Google Scholar
  36. Villafane R, Hua KA, Tran D, Maulik B (2000) Knowledge discovery from series of interval events. Intell Inf Syst 15(1):71–89CrossRefGoogle Scholar
  37. Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw (TOMS) 11(1):37–57MathSciNetCrossRefMATHGoogle Scholar
  38. Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2006) Indexing multidimensional time-series. VLDB J 15(1):1–20CrossRefGoogle Scholar
  39. Winarko E, Roddick JF (2007) Armada—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90. doi: 10.1016/j.datak.2006.10.009 CrossRefGoogle Scholar
  40. Wu SY, Chen YL (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758. doi: 10.1109/TKDE.2007.190613 CrossRefGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.Aalto UniversityEspooFinland
  2. 2.Department of Computer and Systems SciencesStockholm UniversityKistaSweden

Personalised recommendations