Advertisement

Features for Learning Local Patterns in Time-Stamped Data

  • Katharina Morik
  • Hanna Köpcke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3539)

Abstract

Time-stamped data occur frequently in real-world databases. The goal of analysing time-stamped data is very often to find a small group of objects (customers, machine parts,...) which is important for the business at hand. In contrast, the majority of objects obey well-known rules and is not of interest for the analysis. In terms of a classification task, the small group means that there are very few positive examples and within them, there is some sort of a structure such that the small group differs significantly from the majority. We may consider such a learning task learning a local pattern.

Depending on the goal of the data analysis, different aspects of time are relevant, e.g., the particular date, the duration of a certain state, or the number of different states. From the given data, we may generate features that allow us to express the aspect of interest. Here, we investigate the aspect of state change and its representation for learning local patterns in time-stamped data. Besides a simple Boolean representation indicating a change, we use frequency features from information retrieval. We transfer Joachim’s theory for text classification to our task and investigate its fit to local pattern learning. The approach has been implemented within the MiningMart system and was successfully applied to real-world insurance data.

Keywords

Knowledge Discovery Local Pattern Term Frequency Binary Representation Target Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Joachims, T.: Learning to Classify Text using Support Vector Machines. Kluwer International Series in Engineering and Computer Science, vol. 668. Kluwer, Dordrecht (2002)Google Scholar
  2. 2.
    Hand, D., Bolton, R., Adams, N.: Determining hit rate in pattern search. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, p. 36. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Hand, D.: Pattern detection and discovery. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, p. 1. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Siebes, A., Struzik, Z.: Complex data: Mining using patterns. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, p. 24. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Morik, K.: Detecting interesting instances. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 13–23. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Cohen, P., Heeringa, B., Adams, N.M.: An unsupervised algorithm for segmenting categorical timeseries into episodes. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 1–12. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis. Forecasting and Control, 3rd edn. Prentice Hall, Englewood Cliffs (1994)MATHGoogle Scholar
  8. 8.
    Schlittgen, R., Streitberg, B.H.J.: Zeitreihenanalyse, 9th edn. Oldenburg (2001)Google Scholar
  9. 9.
    Keogh, E., Pazzani, M.: Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 285–289. ACM Press, New York (2000)CrossRefGoogle Scholar
  10. 10.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)Google Scholar
  11. 11.
    Oates, T., Firoiu, L., Cohen, P.R.: Using dynamic time warping to bootstrap HMM-based clustering of time series. In: Sun, R., Giles, C.L. (eds.) IJCAI-WS 1999. LNCS (LNAI), vol. 1828, pp. 35–52. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Geurts, P.: Pattern extraction for time series classification. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 115–127. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  13. 13.
    Lausen, G., Savnik, I., Dougarjapov, A.: Msts: A system for mining sets of time series. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 289–298. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Das, G., Lin, K.I., Mannila, H., Renganathan, G., Smyth, P.: Rule Discovery from Time Series. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G. (eds.) Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York City, pp. 16–22. AAAI Press, Menlo Park (1998)Google Scholar
  15. 15.
    Guralnik, V., Srivastava, J.: Event detection from time series data. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, USA, pp. 33–42 (1999)Google Scholar
  16. 16.
    Morik, K., Wessel, S.: Incremental signal to symbol processing. In: Morik, K., Kaiser, M., Klingspor, V. (eds.) Making Robots Smarter – Combining Sensing and Action through Robot Learning, pp. 185–198. Kluwer Academic Publ., Dordrecht (1999)Google Scholar
  17. 17.
    Salatian, A., Hunter, J.: Deriving trends in historical and real-time continuously sampled medical data. Journal of Intelligent Information Systems 13, 47–71 (1999)CrossRefGoogle Scholar
  18. 18.
    Agrawal, R., Psaila, G., Wimmers, E.L., Zaït, M.: Querying shapes of histories. In: Proceedings of 21st International Conference on Very Large Data Bases, pp. 502–514. Morgan Kaufmann, San Francisco (1995)Google Scholar
  19. 19.
    Domeniconi, C., shing Perng, C., Vilalta, R., Ma, S.: A classification approach for prediction of target events in temporal sequences. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 125. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Blockeel, H., Fürnkranz, J., Prskawetz, A., Billari, F.: Detecting temporal change in event sequences: An application to demographic data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 29–41. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  21. 21.
    Mannila, H., Toivonen, H., Verkamo, A.: Discovering frequent episode in sequences. In: Procs. of the 1st Int. Conf. on Knowledge Discovery in Databases and Data Mining. AAAI Press, Menlo Park (1995)Google Scholar
  22. 22.
    Mannila, H., Toivonen, H., Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1, 259–290 (1997)CrossRefGoogle Scholar
  23. 23.
    Höppner, F.: Discovery of Core Episodes from Sequences. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 1–12. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  24. 24.
    Allen, J.F.: Towards a general theory of action and time. Artificial Intelligence 23, 123–154 (1984)MATHCrossRefGoogle Scholar
  25. 25.
    Agrawal, R., Imielinski, T., Swami, A.: Database mining: A performance perspektive. IEEE Transactions on Knowledge and Data Engineering 5, 914–925 (1993)CrossRefGoogle Scholar
  26. 26.
    Nunez, M.: Learning patterns of behavior by observing system events. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 323–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  27. 27.
    Klingspor, V., Morik, K.: Learning understandable concepts for robot navigation. In: Morik, K., Klingspor, V., Kaiser, M. (eds.) Making Robots Smarter – Combining Sensing and Action through Robot Learning. Kluwer, Dordrecht (1999)Google Scholar
  28. 28.
    Rieger, A.D.: Program Optimization for Temporal Reasoning within a Logic Programming Framework. PhD thesis, Universität Dortmund, Dortmund, Germany (1998)Google Scholar
  29. 29.
    Bettini, C., Jajodia, S., Wang, S.: Time Granularities in Databases, Data Mining, and Temporal Reasoning. Springer, Heidelberg (2000)MATHGoogle Scholar
  30. 30.
    Morik, K.: The representation race - preprocessing for handling time phenomena. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 4–19. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  31. 31.
    Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)CrossRefGoogle Scholar
  32. 32.
    Kietz, J.U., Vaduva, A., Zücker, R.: Mining Mart: Combining Case-Based- Reasoning and Multi-Strategy Learning into a Framework to reuse KDDApplication. In: Michalki, R., Brazdil, P. (eds.) Proceedings of the fifth International Workshop on Multistrategy Learning (MSL 2000), Guimares, Portugal (2000)Google Scholar
  33. 33.
    Fisseler, J.: Anwendung eines Data Mining-Verfahrens auf Versicherungsdaten. Master’s thesis, Fachbereich Informatik, Universität Dortmund (2003)Google Scholar
  34. 34.
    Zipf, G.K.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading (1949)Google Scholar
  35. 35.
    Mandelbrot, B.: A note on a class of skew distribution functions: Analysis and critique of a paper by H.A.Simon. Informationi and Control 2, 90–99 (1959)MATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)MATHCrossRefGoogle Scholar
  37. 37.
    Liu, H., Motoda, H.: Feature Extraction, Construction, and Selection: A Data Mining Perspective. Kluwer, Dordrecht (1998)MATHGoogle Scholar
  38. 38.
    Ritthoff, O., Klinkenberg, R., Fischer, S., Mierswa, I.: A hybrid approach to feature selection and generation using an evolutionary algorithm. Technical Report CI- 127/02, Collaborative Research Center 531, University of Dortmund, Dortmund, Germany (2002); ISSN 1433-3325Google Scholar
  39. 39.
    Morik, K., Scholz, M.: The MiningMart Approach to Knowledge Discovery in Databases. In: Zhong, N., Liu, J. (eds.) Intelligent Technologies for Information Analysis. Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Katharina Morik
    • 1
  • Hanna Köpcke
    • 1
  1. 1.Computer Science Department, LS VIIIUniv. Dortmund 

Personalised recommendations