Data Mining and Knowledge Discovery

, Volume 1, Issue 3, pp 259–289

Discovery of Frequent Episodes in Event Sequences

  • Heikki Mannila
  • Hannu Toivonen
  • A. Inkeri Verkamo
Article

Abstract

Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management.

event sequences frequent episodes sequence analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '93). Washington, D.C., pp. 207–216.Google Scholar
  2. Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE '95). Taipei, Taiwan, pp. 3–14.Google Scholar
  3. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), Menlo Park, CA: AAAI Press, pp. 307–328.Google Scholar
  4. Bairoch, A., Bucher, P., and Hofmann, K. 1995. The PROSITE database, its status in 1995. Nucleic Acids Research, 24:189–196.Google Scholar
  5. Bettini, C., Wang, X.S., and Jajodia, S. 1996. Testing complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '96). Montréal, Canada, pp. 68–78.Google Scholar
  6. Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., and Kärkkäinen, J. 1997. Episode matching. In Proceedings of the 8th Symposium on Combinatorial Pattern Matching (CPM '97). Aarhus, Denmark, pp. 12–27.Google Scholar
  7. Denning, P.J. 1968. The working set model of program behavior. Communications of the ACM, 11(5):323–333.Google Scholar
  8. Dousson, C., Gaborit, P., and Ghallab, M. 1993. Situation recognition: Representation and algorithms. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93). Chambery, France, pp. 166–172.Google Scholar
  9. ExPASy Molecular Biology Server. Geneva University Hospital and University of Geneva, Switzerland. http://expasy.hcuge.ch/.Google Scholar
  10. Forgy, C.L. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19:17–37.Google Scholar
  11. Gehani, N., Jagadish, H., and Shmueli, O. 1992. Composite event specification in active databases. In Proceedings of the Eighteenth International Conference on Very Large Data Bases (VLDB '92). San Diego, CA, pp. 327–338.Google Scholar
  12. Goodman, R.M., Ambrose, B.E., Latin, H.W., and Ulmer, C.T. 1995. Noaa-An expert system managing the telephone network. In Integrated Network Management IV, A.S. Sethi, Y. Raynaud, and F. Faure-Vincent (Eds.), London, UK: Chapman and Hall, pp. 316–327.Google Scholar
  13. Grossi, R. and Luccio, F. 1989. Simple and efficient string matching with k mismatches. Information Processing Letters, 33:113–120.Google Scholar
  14. Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 420–431.Google Scholar
  15. Hätönen, K., Klemettinen, M., Mannila, H., Ronkainen, P., and Toivonen, H. 1996a. Knowledge discovery from telecommunication network alarm databases. In 12th International Conference on Data Engineering (ICDE '96). New Orleans, LA, pp. 115–122.Google Scholar
  16. Hätönen, K., Klemettinen, M., Mannila, H., Ronkainen, P., and Toivonen, H. 1996b. TASA: Telecommunication alarm sequence analyzer, or how to enjoy faults in your network. In 1996 IEEE Network Operations and Management Symposium (NOMS '96). Kyoto, Japan, pp. 520–529.Google Scholar
  17. Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 150–155.Google Scholar
  18. Howe, A.E. 1995. Finding dependencies in event streams using local search. In Preliminary Papers of the Fifth International Workshop on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, pp. 271–277.Google Scholar
  19. Jakobson, G. and Weissman, M.D. 1993. Alarm correlation. IEEE Network, 7(6):52–59.Google Scholar
  20. Jonassen, I., Collins, J.F., and Higgins, D.G. 1995. Finding flexible patterns in unaligned protein sequences. Protein Science, 4(8):1587–1595.Google Scholar
  21. Kalbfleisch, J.D. and Prentice, R.L. 1980. The Statistical Analysis of Failure Time Data. New York, NY: John Wiley Inc.Google Scholar
  22. Laird, P. 1993. Identifying and using patterns in sequential data. In Algorithmic Learning Theory, 4th International Workshop (ALT 93). (Lecture Notes in Artificial Intelligence 744, Berlin: Springer-Verlag). Chofu, Japan, pp. 1–18.Google Scholar
  23. Mannila, H., Toivonen, H., and Verkamo, A.I. 1995. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 210–215.Google Scholar
  24. Mannila, H. and Toivonen, H. 1996. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD '96). Portland, OR, pp. 146–151.Google Scholar
  25. Mannila, H. and Toivonen, H. 1997. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258.Google Scholar
  26. Milne, R., Nicol, C., Ghallab, M., Trave-Massuyes, L., Bousson, K., Dousson, C., Quevedo, J., Aguilar, J., and Guasch, A. 1994. TIGER: Real-time situation assessment of dynamic systems. Intelligent Systems Engineering, pp. 103–124.Google Scholar
  27. Morris, R.A., Shoaff, W.D., and Khatib, L. 1994. An algebraic formulation of temporal knowledge for reasoning about recurring events. In Proceedings of the InternationalWorkshop onTemporal Representation and Reasoning (TIME-94). Pensacola, FL, pp. 29–34.Google Scholar
  28. Muggleton, S. 1992. Inductive Logic Programming. London, UK: Academic Press.Google Scholar
  29. Oates, T. and Cohen, P.R. 1996. Searching for structure in multiple streams of data. In Proceedings of the Thirteenth International Conference on Machine Learning (ICML '96). Bari, Italy, pp. 346–354.Google Scholar
  30. Padmanabhan, B. and Tuzhilin, A. 1996. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD '96). Portland, OR, pp. 351–354.Google Scholar
  31. Seshadri, P., Livny, M., and Ramakrishnan, R. 1996. SEQ: Design and implementation of a sequence database system. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). Mumbay, India, pp. 99–110.Google Scholar
  32. Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 407–419.Google Scholar
  33. Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Advances in Database Technology-5th International Conference on Extending Database Technology (EDBT '96). Avignon, France, pp. 3–17.Google Scholar
  34. Wang, J.T.-L., Chirn, G.-W., Marr, T.G., Shapiro, B., Shasha, D., and Zhang, K. 1994. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '94). Minneapolis, MI, pp. 115–125.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Heikki Mannila
    • 1
  • Hannu Toivonen
    • 1
  • A. Inkeri Verkamo
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations