Skip to main content
Log in

Finding the longest common sub-pattern in sequences of temporal intervals

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We study the problem of finding the longest common sub-pattern (LCSP) shared by two sequences of temporal intervals. In particular we are interested in finding the LCSP of the corresponding arrangements. Arrangements of temporal intervals are a powerful way to encode multiple concurrent labeled events that have a time duration. Discovering commonalities among such arrangements is useful for a wide range of scientific fields and applications, as it can be seen by the number and diversity of the datasets we use in our experiments. In this paper, we define the problem of LCSP and prove that it is NP-complete by demonstrating a connection between graphs and arrangements of temporal intervals. This connection leads to a series of interesting open problems. In addition, we provide an exact algorithm to solve the LCSP problem, and also propose and experiment with three polynomial time and space under-approximation techniques. Finally, we introduce two upper bounds for LCSP and study their suitability for speeding up 1-NN search. Experiments are performed on seven datasets taken from a wide range of real application domains, plus two synthetic datasets. Lastly, we describe several application cases that demonstrate the need and suitability of LCSP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://users.ics.aalto.fi/kostakis/software/lcsp/.

  2. http://www.ics.uci.edu/mlearn/MLRepository.html.

  3. http://users.ics.aalto.fi/kostakis/software/intgen_lcsp.zip.

References

  • Abraham T, Roddick JF (1999) Incremental meta-mining from large temporal data sets. In: ER ’98: Proceedings of the workshops on data warehousing and data mining, pp 41–54

  • Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Proceedings of the 15th ACM symposium on applied computing, pp 294–300

  • Allen J, Ferguson G (1994) Actions and events in interval temporal logic. J Log Comput 4:531–579

    Article  MathSciNet  MATH  Google Scholar 

  • Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843

    Article  MATH  Google Scholar 

  • Berendt B (1996) Explaining preferred mental models in Allen inferences with a metrical model of imagery. In: Proceedings of the 18th annual conference of the cognitive science society, pp 489–494

  • Bergen B, Chang N (2005) Embodied construction grammar in simulation-based language understanding. Constr Gramm 3:147–190

    Article  Google Scholar 

  • Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Proceedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer-Verlag, New York, pp 295–300

  • Chen YC, Peng WC, Le SY (2011) CEMiner—an effcient algorithms for mining closed patterns from interval-based data. In: Proceedings of the IEEE international conference on data mining (ICDM)

  • Cormen TH, Rivest RL, Leiserson CE, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge

    MATH  Google Scholar 

  • Feige U, Goldwasser S, Lovasz L, Safra S, Szegedy M (1991) Approximating clique is almost NP-complete. In: Proceedings of the 32nd annual IEEE symposium on foundations of computer science, pp 2–12

  • Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. In: Proceedings of the ACM SIGKDD workshop on useful patterns. ACM, New York, UP ’10, pp 45–54. doi:10.1145/1816112.1816119

  • Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the 6th SIAM data mining conference, vol 124, pp 348–359

  • Håstad J (1996) Clique is hard to approximate within \(n^{1-\epsilon }\). In: FOCS, pp 627–636

  • Höppner F (2001) Discovery of temporal patterns—learning rules about the qualitative behaviour of time series. In: Proceedings of the 5th European conference on principles of knowledge discovery in databases, pp 192–203

  • Höppner F, Klawonn F (2001) Finding informative rules in interval sequences. In: Proceedings of the 4th international symposium on advances in intelligent data analysis, pp 123–132

  • Hwang SY, Wei CP, Yang WS (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364

    Article  Google Scholar 

  • Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data 2(4):16:1–16:42

    Article  MathSciNet  Google Scholar 

  • Kam P, Fu AW (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the 2nd international conference on data warehousing and knowledge discovery, pp 317–326

  • Kosara R, Miksch S (2001) Visualizing complex notions of time. Stud Health Technol Inf 1:211–215

    Google Scholar 

  • Kostakis O, Papapetrou P, Hollmén J (2011) Artemis: assessing the similarity of event-interval sequences. In: Proceedings of the conference on machine learning and knowledge discovery in databases (ECML/PKDD 2011), pp 229–244

  • Kotsifakos A, Papapetrou P, Athitsos V (2013) IBSM: interval-based sequence matching. In: Proceedings of the SIAM conference on data mining (SDM), pp 596–604

  • Lam HT, Mrchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52. doi:10.1002/sam.11192

    Article  MathSciNet  Google Scholar 

  • Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201. doi:10.1109/TKDE.2007.1055

    Article  Google Scholar 

  • Lin JL (2003) Mining maximal frequent intervals. In: Proceedings of the 18th ACM symposium on applied computing, pp 624–629

  • Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Proceedings of the European conference on machine learning and knowledge discovery in databases: part II. Springer-Verlag, Berlin, ECML PKDD ’08, pp 33–49. doi:10.1007/978-3-540-87481-2_3

  • Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Proceedings of the 4th SIAM international conference on data mining

  • Mörchen F (2007) Unsupervised pattern mining from symbolic temporal data. SIGKDD Explor Newsl 9:41–55

    Article  Google Scholar 

  • Mörchen F, Fradkin D (2010) Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the 10th SIAM international conference on data mining, pp 315–326

  • Pachet F, Ramalho G, Carrive J (1996) Representing temporal musical objects and reasoning in the MusES system. J New Music Res 25(3):252–275

    Article  Google Scholar 

  • Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21:133–171

    Article  Google Scholar 

  • Patel D, Hsu W, Lee M (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 28th ACM SIGMOD international conference on management of data, ACM, pp 393–404

  • Paterson M, Dancik V (1994) Longest common subsequences. In: Proceedings of the 19th MFCS, number 841 in LNCS, pp 127–142

  • Pissinou N, Radev I, Makki K (2001) Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica 5(4):375–409

    Article  MATH  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  Google Scholar 

  • Tsourakakis CE, Bonchi F, Gionis A, Gullo F, Tsiarli MA (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 104–112

  • Villafane R, Hua KA, Tran D, Maulik B (2000) Knowledge discovery from series of interval events. Intell Inf Syst 15(1):71–89

    Article  Google Scholar 

  • Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw (TOMS) 11(1):37–57

    Article  MathSciNet  MATH  Google Scholar 

  • Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2006) Indexing multidimensional time-series. VLDB J 15(1):1–20

    Article  Google Scholar 

  • Winarko E, Roddick JF (2007) Armada—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90. doi:10.1016/j.datak.2006.10.009

    Article  Google Scholar 

  • Wu SY, Chen YL (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758. doi:10.1109/TKDE.2007.190613

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Papapetrou.

Additional information

Responsible editors: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.

Appendix: Proof of the properties described in Sect. 4.3.2

Appendix: Proof of the properties described in Sect. 4.3.2

  1. 1.

    Supposing that \(LCS(i,j)\) was composed of more than one interval, then there must exist a pair of intervals with the same label in \(\{{S_A}_1,\ldots ,{S_A}_{i-1}\}\) and \(\{{S_B}_1,\ldots ,{S_B}_{i-1}\}\). That is a contradiction since it would imply that not all previous sub-problems yield \(\emptyset \) as their solution.

  2. 2.

    By applying the operation \(LCS(p,q)\otimes (i,j)\) or, equivalently selecting from \(LCS(p,q)\) only the intervals that induce similar relations to the corresponding interval of \(i\) and \(j\), we make sure that the interval corresponding to \(i\) and \(j\) has the same relations to the previous intervals in the produced arrangement. Conversely, the existing intervals have the same relations to the correspondent of \(i\) and \(j\). Additionally, pairs of existing intervals of \(LCS(p,q)\) have identical relations with their correspondents in \(\mathcal {A}\) and \(\mathcal {B}\); this was examined when each interval was added to the solution of the previous sub-problems.

  3. 3.

    In other words, the \(\otimes \) operator does not discard extra intervals. Suppose that the maximal CSPs are correctly retrieved for all previous sub-problems \(LCS(p,q)\), but not for \(LCS(i,j)\). This would imply that an interval belonging to a maximal CSP of \(\{ {{E_S}_A}_1, \ldots , {{E_S}_A}_i\}\) and \(\{ {{E_S}_B}_1, \ldots , {{E_S}_B}_j\}\) (where \(A_i\) is matched to \(B_j\)) exists but was not selected for \(LCS(i,j)\). But since the not-selected interval belongs to a maximal CSP then it has the same relation to \({S_A}_i\) and \({S_B}_j\). So, since the relations are the same, the interval would have been selected for \(LCS(i,j)\), which contradicts to the previous. Thus, the algorithm at point \((i,j)\) returns maximal CSPs of \(\{ {{E_S}_A}_1, \ldots , {{E_S}_A}_i\}\) and \(\{ {{E_S}_B}_1, \ldots , {{E_S}_B}_j\}\) that matches \({{E_S}_A}_i\) to \({{E_S}_B}_j\).

  4. 4.

    Suppose there exists a maximal CSP that matches \(A_i\) to \(B_j\) but was not discovered. This would imply that by removing the interval corresponding to \(A_i\) and \(B_j\), one is left with common a sub-pattern \(s\). Then, either \(s\subseteq r,r\in \mathcal {M}_{i-1,j-1}\) or not. In the first case, \(s\) must have been retrieved when performing \(r\otimes (i,j)\), so this cannot be. So, it can only be that \(s\) is maximal but then it must hold that \(s\in \mathcal {M}_{i-1,j-1}\). Contradiction.

    An alternative approach is that in the Cartesian graph \(G_{AB}\) (see proof of Theorem 2 for exact definition), this corresponds to finding all maximal cliques containing the vertex \(u\) labeled \((i,j)\) by checking all previously found maximal cliques and for each one returning its intersection with the neighbors of \(u\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kostakis, O., Papapetrou, P. Finding the longest common sub-pattern in sequences of temporal intervals. Data Min Knowl Disc 29, 1178–1210 (2015). https://doi.org/10.1007/s10618-015-0404-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0404-3

Keywords

Navigation