Advertisement

A Fast Algorithm to Find All the Maximal Frequent Sequences in a Text

  • René A. García-Hernández
  • José Fco. Martínez-Trinidad
  • Jesús Ariel Carrasco-Ochoa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3287)

Abstract

One of the sequential pattern mining problems is to find the maximal frequent sequences in a database with a β support. In this paper, we propose a new algorithm to find all the maximal frequent sequences in a text instead of a database. Our algorithm in comparison with the typical sequential pattern mining algorithms avoids the joining, pruning and text scanning steps. Some experiments have shown that it is possible to get all the maximal frequent sequences in a few seconds for medium texts.

Keywords

Fast Algorithm Sequential Pattern Text Mining Position Node Frequent Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Fayyad, U., Piatetsky-Shapiro, G.: Advances in Knowledge Discovery and Data mining. AAAI Press, Menlo Park (1996)Google Scholar
  2. 2.
    Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases (KDT). In: Proceedings of the 1st International Conference on Knowledge Discovery, KDD 1995 (1995)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of the International Conference on Data Engineering (1995)Google Scholar
  4. 4.
    Lin, Dao-I. Fast Algorithms for Discovering the Maximum Frequent Set, Ph. Thesis, New York University (1998)Google Scholar
  5. 5.
    Pei, J.H., et al.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc International Conference on Data Engineering, ICDE 2001 (2001)Google Scholar
  6. 6.
    Zaki, M.j.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. In: Machine Learning, Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  7. 7.
    Ahonen, H.: Finding All Maximal Frequent Sequences in Text. In: ICML 1999 Workshop: Machine Learning in Text Data (1999)Google Scholar
  8. 8.
    Antunes, C., Oliveira, A.: Generalization of Pattern-growth Methods for Sequential Pattern Mining with Gap Constraints. In: Third IAPR Workshop on Machine Learning and Data Mining MLDM 2003 (2003)Google Scholar
  9. 9.
    Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Discovery and Data Mining (1996)Google Scholar
  10. 10.
    Public domain documents from American and English literature as well as Western philosophy, http://www.infomotions.com/alex/

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • René A. García-Hernández
    • 1
  • José Fco. Martínez-Trinidad
    • 1
  • Jesús Ariel Carrasco-Ochoa
    • 1
  1. 1.National Institute of Astrophysics, Optics and Electronics (INAOE)PueblaMéxico

Personalised recommendations