Semigeometric Tiling of Event Sequences

  • Andreas Henelius
  • Isak Karlsson
  • Panagiotis Papapetrou
  • Antti Ukkonen
  • Kai Puolamäki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9851)

Abstract

Event sequences are ubiquitous, e.g., in finance, medicine, and social media. Often the same underlying phenomenon, such as television advertisements during Superbowl, is reflected in independent event sequences, like different Twitter users. It is hence of interest to find combinations of temporal segments and subsets of sequences where an event of interest, like a particular hashtag, has an increased occurrence probability. Such patterns allow exploration of the event sequences in terms of their evolving temporal dynamics, and provide more fine-grained insights to the data than what for example straightforward clustering can reveal. We formulate the task of finding such patterns as a novel matrix tiling problem, and propose two algorithms for solving it. Our first algorithm is a greedy set-cover heuristic, while in the second approach we view the problem as time-series segmentation. We apply the algorithms on real and artificial datasets and obtain promising results. The software related to this paper is available at https://github.com/bwrc/semigeom-r.

Keywords

Event sequences Tiling Covering Binary matrices 

Notes

Acknowledgements

AH, AU, and KP were supported by Tekes (Revolution of Knowledge Work project) and Academy of Finland (decision 288814) and IK and PP by Swedish Foundation for Strategic Research (grant IIS11-0053).

References

  1. 1.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993). doi:10.1007/3-540-57301-1_5 CrossRefGoogle Scholar
  2. 2.
    Batal, I., Fradkin, D., Harrison, J., Moerchen, F., Hauskrecht, M.: Mining recent temporal patterns for event detection in multivariate time series data. KDD 2012, 280–288 (2012)Google Scholar
  3. 3.
    Bellman, R.: On the approximation of curves by line segments using dynamic programming. Commun. ACM 4(6), 284 (1961)CrossRefMATHGoogle Scholar
  4. 4.
    Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. ICDE 1999, 126–133 (1999)Google Scholar
  5. 5.
    Cheng, Y., Church, G.M.: Biclustering of expression data. ISMB 8, 93–103 (2000)Google Scholar
  6. 6.
    Faloutsos, C., Jagadish, H., Mendelzon, A.O., Milo, T.: A signature technique for similarity-based queries. SEQUENCES 1997, 2–20 (1997)Google Scholar
  7. 7.
    Fortelius, M.: Coordinator: New and old worlds database of fossil mammals (NOW) (2016). University of Helsinki. http://www.helsinki.fi/science/now/
  8. 8.
    Franzblau, D.S., Kleitman, D.J.: An algorithm for covering polygons with rectangles. Inf. Control 63(3), 164–189 (1984)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman & Co, New York (1979)MATHGoogle Scholar
  10. 10.
    Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30214-8_22 CrossRefGoogle Scholar
  11. 11.
    Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and combinatorial tiles in 0–1 data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 173–184. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30116-5_18 CrossRefGoogle Scholar
  12. 12.
    Gionis, A., Mannila, H., Terzi, E.: Clustered segmentations. In: 3rd Workshop on Mining Temporal and Sequential Data, KDD 2004 (2004)Google Scholar
  13. 13.
    Györi, E.: A minimax theorem on intervals. J. Comb. Theor. Ser. B 37(1), 1–9 (1984)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)CrossRefGoogle Scholar
  15. 15.
    Hemminki, S., Nurmi, P., Tarkoma, S.: Accelerometer-based transportation mode detection on smartphones. In: SenSys 2013, p. 13 (2013)Google Scholar
  16. 16.
    Huang, C.F.: A hybrid stock selection model using genetic algorithms and support vector regression. Appl. Soft Comput. 12(2), 807–818 (2012)CrossRefGoogle Scholar
  17. 17.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATHGoogle Scholar
  18. 18.
    Khalighi, S., Sousa, T., Santos, J.M., Nunes, U.: Isruc-sleep: a comprehensive public dataset for sleep researchers. Comput. Methods Program. Biomed. 124, 180–192 (2015)CrossRefGoogle Scholar
  19. 19.
    Klemettinen, M., Mannila, H., Toivonen, H.: Rule discovery in telecommunication alarm data. J. Network Syst. Manage. 7(4), 395–423 (1999)CrossRefMATHGoogle Scholar
  20. 20.
    Knuth, D.E.: Irredundant intervals. J. Exp. Algorithmics (JEA) 1 (1996)Google Scholar
  21. 21.
    Kontonasios, K.N., De Bie, T.: An information-theoretic approach to finding informative noisy tiles in binary databases. In: SDM 2010, p. 153 (2010)Google Scholar
  22. 22.
    Lam, H.T., Pei, W., Prado, A., Jeudy, B., Fromont, É.: Mining top-k largest tiles in a data stream. In: ECML PKDD 2014, pp. 82–97. Springer, Heidelberg (2014)Google Scholar
  23. 23.
    Lee, J., Lee, Y., Jun, C.H.: A biclustering method for time series analysis. Ind. Eng. Manage. Syst. 9, 129–138 (2010)Google Scholar
  24. 24.
    Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J.M., Glance, N.S.: Cost-effective outbreak detection in networks. KDD 2007, 420–429 (2007)Google Scholar
  25. 25.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. SIGMOD 2003, 2–11 (2003)Google Scholar
  26. 26.
    Lionello, P.: The climate of the venetian and north adriatic region: variability, trends and future change. Phys. Chem. Earth Parts A/B/C 40, 1–8 (2012)CrossRefGoogle Scholar
  27. 27.
    Madeira, S.C., Oliveira, A.L.: A linear time biclustering algorithm for time series gene expression data. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS, vol. 3692, pp. 39–52. Springer, Heidelberg (2005). doi:10.1007/11557067_4 CrossRefGoogle Scholar
  28. 28.
    Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functionsi. Math. Program. 14(1), 265–294 (1978)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Patel, P., Keogh, E., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. ICDM 2002, 370–377 (2002)Google Scholar
  30. 30.
    Puolamäki, K., Fortelius, M., Mannila, H.: Seriation in paleontological data using markov chain monte carlo methods. PLoS Comput. Biol. 2(2), e6 (2006)CrossRefGoogle Scholar
  31. 31.
    Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)CrossRefGoogle Scholar
  32. 32.
    Ukkonen, A.: Mining local correlation patterns in sets of sequences. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS (LNAI), vol. 5808, pp. 347–361. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04747-3_27 CrossRefGoogle Scholar
  33. 33.
    Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Succinct summarization of transactional databases: An overlapped hyperrectangle scheme. Knowl. Discov. Data Min. 23, 758–766 (2008)Google Scholar
  34. 34.
    Zhang, Y., Zha, H., Chu, C.H.: A time-series biclustering algorithm forrevealing co-regulated genes. Int. Conf. Informationtechnology: Coding Comput. 1, 32–37 (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Andreas Henelius
    • 1
  • Isak Karlsson
    • 2
  • Panagiotis Papapetrou
    • 2
  • Antti Ukkonen
    • 1
  • Kai Puolamäki
    • 1
  1. 1.Finnish Institute of Occupational HealthHelsinkiFinland
  2. 2.Department of Computer and Systems SciencesStockholm UniversityKistaSweden

Personalised recommendations