Mining Local Correlation Patterns in Sets of Sequences

Ukkonen, Antti

doi:10.1007/978-3-642-04747-3_27

Antti Ukkonen²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5808))

Included in the following conference series:

International Conference on Discovery Science

1919 Accesses
1 Citations

Abstract

Given a set of (possibly infinite) sequences, we consider the problem of detecting events where a subset of the sequences is correlated for a short period. In other words, we want to find cases where a number of the sequences output exactly the same substring at the same time. Such substrings, together with the sequences in which they are contained, form a local correlation pattern. In practice we only want to find patterns that are longer than γ and appear in at least σ sequences.

Our main contribution is an algorithm for mining such patterns in an online case, where the sequences are read in parallel one symbol at a time (no random access) and the patterns must be reported as soon as they occur.

We conduct experiments on both artificial and real data. The results show that the proposed algorithm scales well as the number of sequences increases. We also conduct a case study using a public EEG dataset. We show that the local correlation patterns capture essential features that can be used to automatically distinguish subjects diagnosed with a genetic predisposition to alcoholism from a control group.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online algorithms for mining semi-structured data stream. In: Proceedings of the 2002 IEEE International Conference on Data Mining, p. 27 (2002)
Google Scholar
Das, G., Lin, K.-I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1998)
Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Granularities. In: Data Mining: Next Generation Challenges and Future Directions. MIT Press, Cambridge (2004)
Google Scholar
Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings of the 15th International Conference on Data Engineering (ICDE 1999), pp. 106–115 (1999)
Google Scholar
Kannathal, N., Acharya, U., Lim, C., Sadasivan, P.: Characterization of eeg – a comparative study. Computer Methods and Programs in Biomedicine 80(1), 17–23 (2005)
Article Google Scholar
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM 53(6), 918–936 (2006)
Article MathSciNet MATH Google Scholar
Keogh, E., Leonardi, S., Chiu, B.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 550–556 (2002)
Google Scholar
Lin, J., Keogh, E., Lonardi, S., Patel, P.: Finding motifs in time series. In: Proceedings of the Second Workshop on Temporal Data Mining (2002)
Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on Very Large Data Bases, pp. 346–357 (2002)
Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of Algorithms 23(2), 262–272 (1976)
MathSciNet MATH Google Scholar
Raïssi, C., Poncelet, P., Teisseire, M.: Speed: Mining maximal sequential patterns over data streams. In: Proceedings of the 3rd International IEEE Conference on Intelligent Systems, pp. 546–552 (2006)
Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Article MathSciNet MATH Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar
Yang, J., Wang, W., Yu, P.S.: Mining asynchronous periodic patterns in time series data. IEEE Transactions on Knowledge Engineering 15(3), 613–628 (2003)
Article Google Scholar
Zhang, X.L., Begleiter, H., Porjesz, B., Wang, W., Litke, A.: Event related potentials during object recognition tasks. Brain Research Bulletin 38(6), 531–538 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki University of Technology & HIIT, Finland
Antti Ukkonen

Authors

Antti Ukkonen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics; Rua Dr. Roberto Frias, University of Porto, 4200-465, Porto, Portugal
João Gama
DCC-FC, Universidade do Porto, Portugal
Vítor Santos Costa
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ukkonen, A. (2009). Mining Local Correlation Patterns in Sets of Sequences. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-04747-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04746-6
Online ISBN: 978-3-642-04747-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics