Finding the longest common sub-pattern in sequences of temporal intervals

Kostakis, Orestis; Papapetrou, Panagiotis

doi:10.1007/s10618-015-0404-3

Finding the longest common sub-pattern in sequences of temporal intervals

Published: 19 February 2015

Volume 29, pages 1178–1210, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Orestis Kostakis¹ &
Panagiotis Papapetrou²

675 Accesses
9 Citations
Explore all metrics

Abstract

We study the problem of finding the longest common sub-pattern (LCSP) shared by two sequences of temporal intervals. In particular we are interested in finding the LCSP of the corresponding arrangements. Arrangements of temporal intervals are a powerful way to encode multiple concurrent labeled events that have a time duration. Discovering commonalities among such arrangements is useful for a wide range of scientific fields and applications, as it can be seen by the number and diversity of the datasets we use in our experiments. In this paper, we define the problem of LCSP and prove that it is NP-complete by demonstrating a connection between graphs and arrangements of temporal intervals. This connection leads to a series of interesting open problems. In addition, we provide an exact algorithm to solve the LCSP problem, and also propose and experiment with three polynomial time and space under-approximation techniques. Finally, we introduce two upper bounds for LCSP and study their suitability for speeding up 1-NN search. Experiments are performed on seven datasets taken from a wide range of real application domains, plus two synthetic datasets. Lastly, we describe several application cases that demonstrate the need and suitability of LCSP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

References

Abraham T, Roddick JF (1999) Incremental meta-mining from large temporal data sets. In: ER ’98: Proceedings of the workshops on data warehousing and data mining, pp 41–54
Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Proceedings of the 15th ACM symposium on applied computing, pp 294–300
Allen J, Ferguson G (1994) Actions and events in interval temporal logic. J Log Comput 4:531–579
Article MathSciNet MATH Google Scholar
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843
Article MATH Google Scholar
Berendt B (1996) Explaining preferred mental models in Allen inferences with a metrical model of imagery. In: Proceedings of the 18th annual conference of the cognitive science society, pp 489–494
Bergen B, Chang N (2005) Embodied construction grammar in simulation-based language understanding. Constr Gramm 3:147–190
Article Google Scholar
Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Proceedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer-Verlag, New York, pp 295–300
Chen YC, Peng WC, Le SY (2011) CEMiner—an effcient algorithms for mining closed patterns from interval-based data. In: Proceedings of the IEEE international conference on data mining (ICDM)
Cormen TH, Rivest RL, Leiserson CE, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge
MATH Google Scholar
Feige U, Goldwasser S, Lovasz L, Safra S, Szegedy M (1991) Approximating clique is almost NP-complete. In: Proceedings of the 32nd annual IEEE symposium on foundations of computer science, pp 2–12
Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. In: Proceedings of the ACM SIGKDD workshop on useful patterns. ACM, New York, UP ’10, pp 45–54. doi:10.1145/1816112.1816119
Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the 6th SIAM data mining conference, vol 124, pp 348–359
Håstad J (1996) Clique is hard to approximate within \(n^{1-\epsilon }\). In: FOCS, pp 627–636
Höppner F (2001) Discovery of temporal patterns—learning rules about the qualitative behaviour of time series. In: Proceedings of the 5th European conference on principles of knowledge discovery in databases, pp 192–203
Höppner F, Klawonn F (2001) Finding informative rules in interval sequences. In: Proceedings of the 4th international symposium on advances in intelligent data analysis, pp 123–132
Hwang SY, Wei CP, Yang WS (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364
Article Google Scholar
Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data 2(4):16:1–16:42
Article MathSciNet Google Scholar
Kam P, Fu AW (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the 2nd international conference on data warehousing and knowledge discovery, pp 317–326
Kosara R, Miksch S (2001) Visualizing complex notions of time. Stud Health Technol Inf 1:211–215
Google Scholar
Kostakis O, Papapetrou P, Hollmén J (2011) Artemis: assessing the similarity of event-interval sequences. In: Proceedings of the conference on machine learning and knowledge discovery in databases (ECML/PKDD 2011), pp 229–244
Kotsifakos A, Papapetrou P, Athitsos V (2013) IBSM: interval-based sequence matching. In: Proceedings of the SIAM conference on data mining (SDM), pp 596–604
Lam HT, Mrchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52. doi:10.1002/sam.11192
Article MathSciNet Google Scholar
Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201. doi:10.1109/TKDE.2007.1055
Article Google Scholar
Lin JL (2003) Mining maximal frequent intervals. In: Proceedings of the 18th ACM symposium on applied computing, pp 624–629
Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Proceedings of the European conference on machine learning and knowledge discovery in databases: part II. Springer-Verlag, Berlin, ECML PKDD ’08, pp 33–49. doi:10.1007/978-3-540-87481-2_3
Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Proceedings of the 4th SIAM international conference on data mining
Mörchen F (2007) Unsupervised pattern mining from symbolic temporal data. SIGKDD Explor Newsl 9:41–55
Article Google Scholar
Mörchen F, Fradkin D (2010) Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the 10th SIAM international conference on data mining, pp 315–326
Pachet F, Ramalho G, Carrive J (1996) Representing temporal musical objects and reasoning in the MusES system. J New Music Res 25(3):252–275
Article Google Scholar
Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21:133–171
Article Google Scholar
Patel D, Hsu W, Lee M (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 28th ACM SIGMOD international conference on management of data, ACM, pp 393–404
Paterson M, Dancik V (1994) Longest common subsequences. In: Proceedings of the 19th MFCS, number 841 in LNCS, pp 127–142
Pissinou N, Radev I, Makki K (2001) Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica 5(4):375–409
Article MATH Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Article Google Scholar
Tsourakakis CE, Bonchi F, Gionis A, Gullo F, Tsiarli MA (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 104–112
Villafane R, Hua KA, Tran D, Maulik B (2000) Knowledge discovery from series of interval events. Intell Inf Syst 15(1):71–89
Article Google Scholar
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw (TOMS) 11(1):37–57
Article MathSciNet MATH Google Scholar
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2006) Indexing multidimensional time-series. VLDB J 15(1):1–20
Article Google Scholar
Winarko E, Roddick JF (2007) Armada—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90. doi:10.1016/j.datak.2006.10.009
Article Google Scholar
Wu SY, Chen YL (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758. doi:10.1109/TKDE.2007.190613
Article Google Scholar

Download references

Author information

Authors and Affiliations

Aalto University, Konemiehentie 2, 02150, Espoo, Finland
Orestis Kostakis
Department of Computer and Systems Sciences, Stockholm University, Forum 100, 164 40, Kista, Sweden
Panagiotis Papapetrou

Authors

Orestis Kostakis
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Papapetrou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Papapetrou.

Additional information

Responsible editors: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.

Appendix: Proof of the properties described in Sect. 4.3.2

1.
Supposing that \(LCS(i,j)\) was composed of more than one interval, then there must exist a pair of intervals with the same label in \(\{{S_A}_1,\ldots ,{S_A}_{i-1}\}\) and \(\{{S_B}_1,\ldots ,{S_B}_{i-1}\}\). That is a contradiction since it would imply that not all previous sub-problems yield \(\emptyset \) as their solution.
2.
By applying the operation \(LCS(p,q)\otimes (i,j)\) or, equivalently selecting from \(LCS(p,q)\) only the intervals that induce similar relations to the corresponding interval of \(i\) and \(j\), we make sure that the interval corresponding to \(i\) and \(j\) has the same relations to the previous intervals in the produced arrangement. Conversely, the existing intervals have the same relations to the correspondent of \(i\) and \(j\). Additionally, pairs of existing intervals of \(LCS(p,q)\) have identical relations with their correspondents in \(\mathcal {A}\) and \(\mathcal {B}\); this was examined when each interval was added to the solution of the previous sub-problems.
3.
In other words, the \(\otimes \) operator does not discard extra intervals. Suppose that the maximal CSPs are correctly retrieved for all previous sub-problems \(LCS(p,q)\), but not for \(LCS(i,j)\). This would imply that an interval belonging to a maximal CSP of \(\{ {{E_S}_A}_1, \ldots , {{E_S}_A}_i\}\) and \(\{ {{E_S}_B}_1, \ldots , {{E_S}_B}_j\}\) (where \(A_i\) is matched to \(B_j\)) exists but was not selected for \(LCS(i,j)\). But since the not-selected interval belongs to a maximal CSP then it has the same relation to \({S_A}_i\) and \({S_B}_j\). So, since the relations are the same, the interval would have been selected for \(LCS(i,j)\), which contradicts to the previous. Thus, the algorithm at point \((i,j)\) returns maximal CSPs of \(\{ {{E_S}_A}_1, \ldots , {{E_S}_A}_i\}\) and \(\{ {{E_S}_B}_1, \ldots , {{E_S}_B}_j\}\) that matches \({{E_S}_A}_i\) to \({{E_S}_B}_j\).
4.
Suppose there exists a maximal CSP that matches \(A_i\) to \(B_j\) but was not discovered. This would imply that by removing the interval corresponding to \(A_i\) and \(B_j\), one is left with common a sub-pattern \(s\). Then, either \(s\subseteq r,r\in \mathcal {M}_{i-1,j-1}\) or not. In the first case, \(s\) must have been retrieved when performing \(r\otimes (i,j)\), so this cannot be. So, it can only be that \(s\) is maximal but then it must hold that \(s\in \mathcal {M}_{i-1,j-1}\). Contradiction.

An alternative approach is that in the Cartesian graph \(G_{AB}\) (see proof of Theorem 2 for exact definition), this corresponds to finding all maximal cliques containing the vertex \(u\) labeled \((i,j)\) by checking all previously found maximal cliques and for each one returning its intersection with the neighbors of \(u\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kostakis, O., Papapetrou, P. Finding the longest common sub-pattern in sequences of temporal intervals. Data Min Knowl Disc 29, 1178–1210 (2015). https://doi.org/10.1007/s10618-015-0404-3

Download citation

Received: 02 March 2014
Accepted: 04 February 2015
Published: 19 February 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10618-015-0404-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Finding the longest common sub-pattern in sequences of temporal intervals

Abstract

Access this article

Similar content being viewed by others

Sequence Graphs: Characterization and Counting of Admissible Elements

Mining Time-constrained Sequential Patterns with Constraint Programming

Finding events in temporal networks: segmentation meets densest subgraph discovery

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of the properties described in Sect. 4.3.2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding the longest common sub-pattern in sequences of temporal intervals

Abstract

Access this article

Similar content being viewed by others

Sequence Graphs: Characterization and Counting of Admissible Elements

Mining Time-constrained Sequential Patterns with Constraint Programming

Finding events in temporal networks: segmentation meets densest subgraph discovery

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of the properties described in Sect. 4.3.2

Appendix: Proof of the properties described in Sect. 4.3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation