Advertisement

Data Mining and Knowledge Discovery

, Volume 31, Issue 3, pp 809–850 | Cite as

On searching and indexing sequences of temporal intervals

  • Orestis Kostakis
  • Panagotis Papapetrou
Article

Abstract

In several application domains, including sign language, sensor networks, and medicine, events are not necessarily instantaneous but they may have a time duration. Such events build sequences of temporal intervals, which may convey useful domain knowledge; thus, searching and indexing these sequences is crucial. We formulate the problem of comparing sequences of labeled temporal intervals and present a distance measure that can be computed in polynomial time. We prove that the distance measure is metric and satisfies the triangle inequality. For speeding up search in large databases of sequences of temporal intervals, we propose an approximate indexing method that is based on embeddings. The proposed indexing framework is shown to be contractive and can guarantee no false dismissal. The distance measure is tested and benchmarked through rigorous experimentation on real data taken from several application domains, including: American Sign Language annotated video recordings, robot sensor data, and Hepatitis patient data. In addition, the indexing scheme is tested on a large synthetic dataset. Our experiments show that speedups of over an order of magnitude can be achieved while maintaining high levels of accuracy. As a result of our work, it becomes possible to implement recommender systems, search engines and assistive applications for the fields that employ sequences of temporal intervals.

Keywords

Temporal intervals Event-interval sequences Indexing temporal interval sequences Embeddings 

Notes

Acknowledgements

The work of Orestis Kostakis was supported in party by the Helsinki Doctoral Education Network in Information and Communications Technology (HICT). The work of Panagiotis Papapetrou was supported in part by the Stockholm City Council (Stockholms Läns Landsting).

References

  1. Abraham T, Roddick JF (1999) Incremental meta-mining from large temporal data sets. In: ER ’98: Proceedings of the Workshops on Data Warehousing and Data Mining, pp 1–37CrossRefGoogle Scholar
  2. Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Proceedings of the ACM Symposium On Applied Computing, pp 294–300Google Scholar
  3. Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843CrossRefGoogle Scholar
  4. Athitsos V, Hadjieleftheriou M, Kollios G, Sclaroff S (2007) Query-sensitive embeddings. ACM Trans Database Syst 32(2). doi: 10.1145/1242524.1242525 CrossRefGoogle Scholar
  5. Batal I, Sacchi L, Bellazzi R, Hauskrecht M (2009) Multivariate time series classification with temporal abstractions. In: FLAIRSGoogle Scholar
  6. Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp 280–288Google Scholar
  7. Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4):63:1–63:22CrossRefGoogle Scholar
  8. Bentley JL, Friedman JH (1979) Data structures for range searching. ACM Comput Surv 11(4):397–409. doi: 10.1145/356789.356797 CrossRefGoogle Scholar
  9. Berendt B (1996) Explaining preferred mental models in Allen inferences with a metrical model of imagery. In: Proceedings of the Conference of the Cognitive Science Society, pp 489–494Google Scholar
  10. Bergen B, Chang N (2005) Embodied construction grammar in simulation-based language understanding. In: Construction grammars: cognitive grounding and theoretical extensions, vol 3, pp 147–190CrossRefGoogle Scholar
  11. Bunke H (2000) Recent developments in graph matching. In: IEEE 15th International Conference on Pattern Recognition, vol 2, pp 117–124Google Scholar
  12. Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Tech. Rep. 124, Systems Research Center, Palo Alto. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.6774
  13. Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, pp 295–300Google Scholar
  14. Chen L, Ng R (2004) On the marriage of \(l_p\)-norms and edit distance. In: VLDB, pp 792–803Google Scholar
  15. Chen L, Özsu MT (2005) Robust and fast similarity search for moving object trajectories. In: SIGMOD, pp 491–502Google Scholar
  16. Chen YC, Jiang JC, Peng WC, Lee SY (2010) An efficient algorithm for mining time interval-based patterns in large database. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp 49–58Google Scholar
  17. Chen YC, Peng WC, Le SY (2011) CEMiner- an effcient algorithms for mining closed patterns from interval-based data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM)Google Scholar
  18. Chen YC, Weng JTY, Hui L (2015) A novel algorithm for mining closed temporal patterns from interval-based data. KAIS 46(1):151–183Google Scholar
  19. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’94, pp 419–429CrossRefGoogle Scholar
  20. Finkel RA, Bentley JL (1974) Quad trees: a data structure for retrieval on composite keys. Acta Inf 4:1–9. doi: 10.1007/BF00288933 CrossRefzbMATHGoogle Scholar
  21. Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst 45(3):731–749CrossRefGoogle Scholar
  22. Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231CrossRefGoogle Scholar
  23. Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the 6th SIAM Data Mining Conference, vol 124, pp 348–359CrossRefGoogle Scholar
  24. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’99, pp 518–529. http://dl.acm.org/citation.cfm?id=645925.671516
  25. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’84, pp 47–57. doi: 10.1145/602259.602266
  26. Han TS, Ko SK, Kang J (2007) Efficient subsequence matching using the longest common subsequence with a dual match index. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp 585–600Google Scholar
  27. Hjaltason G, Samet H (2003) Properties of embedding methods for similarity searching in metric spaces. IEEE Trans Pattern Anal Mach Intell 25(5):530–549CrossRefGoogle Scholar
  28. Höppner F (2001) Discovery of temporal patterns: learning rules about the qualitative behaviour of time series. In: Proceedings of the European Conference on Principles of Knowledge Discovery in Databases, pp 192–203CrossRefGoogle Scholar
  29. Höppner F, Klawonn F (2001) Finding informative rules in interval sequences. In: Proceedings of the International Symposium on Advances in Intelligent Data Analysis, pp 123–132CrossRefGoogle Scholar
  30. Hwang SY, Wei CP, Yang WS (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364CrossRefGoogle Scholar
  31. Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech and Signal Process 23(1):67–72CrossRefGoogle Scholar
  32. Kam P, Fu AW (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the 2nd International Conference on Data Warehousing and Knowledge Discovery, pp 317–326CrossRefGoogle Scholar
  33. Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp 406–417CrossRefGoogle Scholar
  34. Klimov D, Shknevsky A, Shahar Y (2015) Exploration of patterns predicting renal damage in patients with diabetes type II using a visual temporal analysis laboratory. J Am Med Inform Assoc 22(2):275–289Google Scholar
  35. Kosara R, Miksch S (2001) Visualizing complex notions of time. Stud Health Technol Inform 1:211–215Google Scholar
  36. Kostakis O (2014) Classy: fast clustering streams of call-graphs. Data Min Knowl Discov 28(5–6):1554–1585MathSciNetCrossRefGoogle Scholar
  37. Kostakis O, Gionis A (2015) Subsequence search in event-interval sequences. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 851–854Google Scholar
  38. Kostakis O, Papapetrou P (2015) Finding the longest common sub-pattern in sequences of temporal intervals. Data Min Knowl Discov 29(5):1178–1210MathSciNetCrossRefGoogle Scholar
  39. Kostakis O, Papapetrou P, Hollmén J (2011a) Artemis: assessing the similarity of event-interval sequences. In: Proceedings of the Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2011), pp 229–244Google Scholar
  40. Kostakis O, Papapetrou P, Hollmén J (2011b) Distance measure for querying arrangements of temporal intervals. In: Proceedings of Pervasive Technologies Related to Assistive EnvironmentsGoogle Scholar
  41. Kotsifakos A, Papapetrou P, Athitsos V (2013) IBSM: Interval-based sequence matching. In: Proceedings of SIAM Conference on Data Mining, pp 596–604CrossRefGoogle Scholar
  42. Kruskall JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. In: Time warps, Addison-WesleyGoogle Scholar
  43. Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201CrossRefGoogle Scholar
  44. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys 10(8):707–710MathSciNetGoogle Scholar
  45. Li C, Lu J, Lu Y (2008) Efficient merging and filtering algorithms for approximate string searches. In: International Conference on data Engineering (ICDE)Google Scholar
  46. Li Y, Patel JM, Terrell A (2012) Wham: a high-throughput sequence alignment method. ACM Trans Database Syst (TODS) 37(4):28Google Scholar
  47. Lin JL (2003) Mining maximal frequent intervals. In: Proceedings of the ACM Symposium On Applied Computing, pp 624–629Google Scholar
  48. Maier D (1978) The complexity of some problems on subse- quences and supersequences. J ACM 25(2):322–336CrossRefGoogle Scholar
  49. Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Proceedings of the 4th SIAM International Conference on Data MiningGoogle Scholar
  50. Mörchen F (2007) Unsupervised pattern mining from symbolic temporal data. SIGKDD Explor Newsl 9:41–55CrossRefGoogle Scholar
  51. Mörchen F (2010) Temporal pattern mining in symbolic time point and time interval data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, ACM, KDD ’10, pp 2:1–2:1Google Scholar
  52. Mörchen F, Fradkin D (2010) Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the SIAM International Conference on Data Mining, pp 315–326Google Scholar
  53. Moskovitch R, Shahar Y (2009) Medical temporal-knowledge discovery via temporal abstraction. Proceedings of the AMIA Annual Symposium 2009:452–456Google Scholar
  54. Moskovitch R, Shahar Y (2014a) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Discov 29(4):871–913MathSciNetCrossRefGoogle Scholar
  55. Moskovitch R, Shahar Y (2014b) Classification of multivariate time series via temporal abstraction and time intervals mining. Knowl Inf Syst 45(1):35–74CrossRefGoogle Scholar
  56. Moskovitch R, Shahar Y (2015) Fast time intervals mining using the transitivity of temporal relations. Knowl Inf Syst 42(1):21–48CrossRefGoogle Scholar
  57. Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38MathSciNetCrossRefGoogle Scholar
  58. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453CrossRefGoogle Scholar
  59. Orlandic R, Yu B (2002) A retrieval technique for high-dimensional data and partially specified queries. Data Knowl Eng 42(1):1–21. doi: 10.1016/S0169-023X(02)00023-X CrossRefzbMATHGoogle Scholar
  60. Pachet F, Ramalho G, Carrive J (1996) Representing temporal musical objects and reasoning in the MusES system. J New Music Res 25(3):252–275CrossRefGoogle Scholar
  61. Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2005) Discovering frequent arrangements of temporal intervals. In: Proceedings of IEEE International Conference on Data Mining, pp 354–361Google Scholar
  62. Papapetrou P, Benson G, Kollios G (2006) Discovering frequent poly-regions in DNA sequences. In: Proceedings of the IEEE ICDM Workshop on Data Mining in BioinformaticsGoogle Scholar
  63. Papapetrou P, Athitsos V, Kollios G, Gunopulos D (2009a) Reference-based alignment in large sequence databases. Proc VLDB Endow 2(1):205–216CrossRefGoogle Scholar
  64. Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009b) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21:133–171CrossRefGoogle Scholar
  65. Papapetrou P, Athitsos V, Potamias M, Kollios G, Gunopulos D (2011) Embedding-based subsequence matching in time-series databases. ACM Trans Database Syst 36(3):17:1–17:39CrossRefGoogle Scholar
  66. Patel D, Hsu W, Lee M (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 28th ACM SIGMOD International Conference on Management of Data, ACM, pp 393–404Google Scholar
  67. Pissinou N, Radev I, Makki K (2001) Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica 5(4):375–409CrossRefGoogle Scholar
  68. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp 262–270Google Scholar
  69. Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247MathSciNetCrossRefGoogle Scholar
  70. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. Trans ASSP 26:43–49CrossRefGoogle Scholar
  71. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197CrossRefGoogle Scholar
  72. Umeyama S (1988) An eigendecomposition approach to weighted graph matching problems. IEEE Trans Pattern Anal Mach Intell 10(5):695–703CrossRefGoogle Scholar
  73. Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequence databases. In: International Conference on Very Large Databases (VLDB), pp 906–917Google Scholar
  74. Villafane R, Hua KA, Tran D, Maulik B (2000) Knowledge discovery from series of interval events. Intell Inf Syst 15(1):71–89CrossRefGoogle Scholar
  75. Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’98, pp 194–205. http://dl.acm.org/citation.cfm?id=645924.671192
  76. Winarko E, Roddick JF (2007) Armada: an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90CrossRefGoogle Scholar
  77. Wu SY, Chen YL (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758CrossRefGoogle Scholar
  78. Yang X, Wang B, Li C (2008) Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp 353–364Google Scholar
  79. Yi BK, Roh JW (2004) Similarity search for interval time sequences. In: International Conference on Database Systems for Advanced Applications, Springer, pp 232–243Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Aalto UniversityEspooFinland
  2. 2.Stockholm UniversityStockholmSweden

Personalised recommendations