Advertisement

Mining Compressed Sequential Patterns

  • Lei Chang
  • Dongqing Yang
  • Shiwei Tang
  • Tengjiao Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4093)

Abstract

Current sequential pattern mining algorithms often produce a large number of patterns. It is difficult for a user to explore in so many patterns and get a global view of the patterns and the underlying data. In this paper, we examine the problem of how to compress a set of sequential patterns using only K SP-Features(Sequential Pattern Features). A novel similarity measure is proposed for clustering SP-Features and an effective SP-Feature combination method is designed. We also present an efficient algorithm, called CSP(Compressing Sequential Patterns) to mine compressed sequential patterns based on the hierarchical clustering framework. A thorough experimental study with both real and synthetic datasets shows that CSP can compress sequential patterns effectively.

Keywords

Sequential Pattern Synthetic Dataset Frequent Itemsets Sequential Pattern Mining Closed Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afrati, F., Gionis, A., Mannila, H.: Approximating a Collection of Frequent Sets. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 12–19 (2004)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14 (1995)Google Scholar
  3. 3.
    Chang, L., Yang, D., Tang, S., Wang, T.: Mining Compressed Sequential Patterns. Technical Report PKUCS-R-2006-3-105, Department of Computer Science & Technology, Peking University (2006)Google Scholar
  4. 4.
    Gribskov, M., McLachlan, A., Eisenberg, D.: Profile analysis: Detection of distantly related proteins. In: Proceeding of National Academy Science, pp. 4355–4358 (1987)Google Scholar
  5. 5.
    Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proceedings of International Conference on Data Engineering, pp. 215–224 (2001)Google Scholar
  6. 6.
    Stoye, J., Evers, D., Meyer, F.: Rose: generating sequence families. Bioinformatics 14(2), 157–163 (1998)CrossRefGoogle Scholar
  7. 7.
    Xin, D., Han, J., Yan, X., Cheng, H.: Mining Compressed Frequent-Pattern Sets. In: Proceedings of International Conference on Very Large Data Bases, pp. 709–720 (2005)Google Scholar
  8. 8.
    Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Datasets. In: Proceddings of SIAM International Conference on Data Mining (2003)Google Scholar
  9. 9.
    Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing Itemset Patterns: A Profile-Based Approach. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 314–323 (2005)Google Scholar
  10. 10.
    Yang, J., Wang, W., Yu, S.P., Han, J.: Mining Long Sequential Patterns in a Noisy Environment. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 406–417 (2002)Google Scholar
  11. 11.
    Wang, J., Han, J.: BIDE: Efficient Mining of Frequent Closed Sequences. In: Proceedings of International Conference on Data Engineering, pp. 79–90 (2004)Google Scholar
  12. 12.
    Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 42(1/2), 31–60 (2001)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lei Chang
    • 1
  • Dongqing Yang
    • 1
  • Shiwei Tang
    • 1
  • Tengjiao Wang
    • 1
  1. 1.Department of Computer Science & TechnologyPeking UniversityBeijingChina

Personalised recommendations