Mining Approximate Motifs in Time Series

  • Pedro G. Ferreira
  • Paulo J. Azevedo
  • Cândida G. Silva
  • Rui M. M. Brito
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4265)

Abstract

The problem of discovering previously unknown frequent patterns in time series, also called motifs, has been recently introduced. A motif is a subseries pattern that appears a significant number of times. Results demonstrate that motifs may provide valuable insights about the data and have a wide range of applications in data mining tasks. The main motivation for this study was the need to mine time series data from protein folding/unfolding simulations. We propose an algorithm that extracts approximate motifs, i.e. motifs that capture portions of time series with a similar and eventually symmetric behavior. Preliminary results on the analysis of protein unfolding data support this proposal as a valuable tool. Additional experiments demonstrate that the application of utility of our algorithm is not limited to this particular problem. Rather it can be an interesting tool to be applied in many real world problems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Azevedo, P.J., Silva, C.G., Rodrigues, J.R., Loureiro-Ferreira, N., Brito, R.M.M.: Detection of Hydrophobic Clusters in Molecular Dynamics Protein Unfolding Simulations Using Association Rules. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds.) ISBMDA 2005. LNCS (LNBI), vol. 3745, pp. 329–337. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. of the 2th ISMB (1994)Google Scholar
  3. 3.
    Brito, R., Dubitzky, W., Rodrigues, J.: Protein folding and unfolding simulations: A new challenge for data mining. OMICS: A Journal of Integrative Biology (8), 153–166 (2004)Google Scholar
  4. 4.
    Caraca-Valente, J., Lopez-Chavarrias, I.: Discovering similar patterns in time series. In: Proc. of the 6th ACM SIGKDD (2000)Google Scholar
  5. 5.
    Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD, Washington DC, USA, August 24-27 (2003)Google Scholar
  6. 6.
    Gunopulos, D., Das, G.: Time series similarity measures (tutorial pm-2). In: Tutorial notes of the 6th ACM SIGKDD (2000)Google Scholar
  7. 7.
    Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proc. of the 15th ICDE (1999)Google Scholar
  8. 8.
    Hettich, S., Bay, S.D.: The uci kdd archive irvine, CA, Department of Information and Computer Science, University of California (1999), http://kdd.ics.uci.edu
  9. 9.
    Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: A survey and empricial demonstration. In: Proc. of the 8th ACM SIGKDD (2002)Google Scholar
  10. 10.
    Keogh, E., Lin, J., Fu, A.: Hot sax: Efficiently finding the most unusual time series subsequence. In: Proc. of the 5th IEEE ICDM (2005)Google Scholar
  11. 11.
    Keogh, E., Pazzani, M.: Scaling up dynamic time warping for datamining applications. In: Proc. of the 6th ACM SIGKDD (2000)Google Scholar
  12. 12.
    Krogh, A.: An Introduction to Hidden Markov Models for Biological Sequences, Ch. 4, pp. 45–63. Elsevier, Amsterdam (1998)Google Scholar
  13. 13.
    Lei, H., Govindaraju, V.: Grm: A new model for clustering linear sequences. In: Proc. of SIAM Int’l. Conference on Data Mining (2004)Google Scholar
  14. 14.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proc. of the 8th ACM SIGMOD workshop DMKD 2003 (2003)Google Scholar
  15. 15.
    Mannila, H., Toivonen, H., Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997)CrossRefGoogle Scholar
  16. 16.
    Patel, P., Keogh, E., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: Proc. of 2th IEEE ICDM (December 2002)Google Scholar
  17. 17.
    Tanaka, Y., Uehara, K.: Discover motifs in multi-dimensional time-series using the principal component analysis and the mdl principle. In: Proc. of 3th MLDM (2003)Google Scholar
  18. 18.
    Thompson, W., Rouchka, E., Lawrence, C.: Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Research 31(13), 3580–3585 (2003)CrossRefGoogle Scholar
  19. 19.
    Yang, J., Yu, P.S., Wang, W.: Mining surprising periodic patterns. In: Proc. of the 7th ACM SIGKDD (2001)Google Scholar
  20. 20.
    Zar, J.H.: Biostatistical Analysis, 4th edn. Prentice-Hall, Englewood Cliffs (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Pedro G. Ferreira
    • 1
  • Paulo J. Azevedo
    • 1
  • Cândida G. Silva
    • 2
  • Rui M. M. Brito
    • 2
  1. 1.Department of InformaticsUniversity of MinhoBragaPortugal
  2. 2.Chemistry Department, Faculty of Sciences and Technology, and Centre of Neurosciences of CoimbraUniversity of CoimbraCoimbraPortugal

Personalised recommendations