Skip to main content

Mining Local Correlation Patterns in Sets of Sequences

  • Conference paper
Discovery Science (DS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5808))

Included in the following conference series:

Abstract

Given a set of (possibly infinite) sequences, we consider the problem of detecting events where a subset of the sequences is correlated for a short period. In other words, we want to find cases where a number of the sequences output exactly the same substring at the same time. Such substrings, together with the sequences in which they are contained, form a local correlation pattern. In practice we only want to find patterns that are longer than γ and appear in at least σ sequences.

Our main contribution is an algorithm for mining such patterns in an online case, where the sequences are read in parallel one symbol at a time (no random access) and the patterns must be reported as soon as they occur.

We conduct experiments on both artificial and real data. The results show that the proposed algorithm scales well as the number of sequences increases. We also conduct a case study using a public EEG dataset. We show that the local correlation patterns capture essential features that can be used to automatically distinguish subjects diagnosed with a genetic predisposition to alcoholism from a control group.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online algorithms for mining semi-structured data stream. In: Proceedings of the 2002 IEEE International Conference on Data Mining, p. 27 (2002)

    Google Scholar 

  2. Das, G., Lin, K.-I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1998)

    Google Scholar 

  3. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Granularities. In: Data Mining: Next Generation Challenges and Future Directions. MIT Press, Cambridge (2004)

    Google Scholar 

  4. Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings of the 15th International Conference on Data Engineering (ICDE 1999), pp. 106–115 (1999)

    Google Scholar 

  5. Kannathal, N., Acharya, U., Lim, C., Sadasivan, P.: Characterization of eeg – a comparative study. Computer Methods and Programs in Biomedicine 80(1), 17–23 (2005)

    Article  Google Scholar 

  6. Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM 53(6), 918–936 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Keogh, E., Leonardi, S., Chiu, B.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 550–556 (2002)

    Google Scholar 

  8. Lin, J., Keogh, E., Lonardi, S., Patel, P.: Finding motifs in time series. In: Proceedings of the Second Workshop on Temporal Data Mining (2002)

    Google Scholar 

  9. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on Very Large Data Bases, pp. 346–357 (2002)

    Google Scholar 

  10. McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of Algorithms 23(2), 262–272 (1976)

    MathSciNet  MATH  Google Scholar 

  11. Raïssi, C., Poncelet, P., Teisseire, M.: Speed: Mining maximal sequential patterns over data streams. In: Proceedings of the 3rd International IEEE Conference on Intelligent Systems, pp. 546–552 (2006)

    Google Scholar 

  12. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  13. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  14. Yang, J., Wang, W., Yu, P.S.: Mining asynchronous periodic patterns in time series data. IEEE Transactions on Knowledge Engineering 15(3), 613–628 (2003)

    Article  Google Scholar 

  15. Zhang, X.L., Begleiter, H., Porjesz, B., Wang, W., Litke, A.: Event related potentials during object recognition tasks. Brain Research Bulletin 38(6), 531–538 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ukkonen, A. (2009). Mining Local Correlation Patterns in Sets of Sequences. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04747-3_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04746-6

  • Online ISBN: 978-3-642-04747-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics