Skip to main content

SPADE: An Efficient Algorithm for Mining Frequent Sequences

Abstract

In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.

References

  • Agrawal, R. & Srikant, R. (1995). Mining sequential patterns. In 11th Intl. Conf. on Data Engineering.

  • Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fast discovery of association rules. In U. Fayyad, et al. (ed.), Advances in knowledge discovery and data mining, pp. 307–328. Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Davey, B. A. & Priestley, H. A. (1990). Introduction to lattices and order. Cambridge: Cambridge University Press.

    Google Scholar 

  • Ferguson, G. & James, A. (1998). TRIPS: An integrated intelligent problem-solving assistant. In 15th Nat. Conf. Artificial Intelligence.

  • Hatonen, K., Klemettinen, M., Mannila, H., Ronkainen, P., & Toivonen, H. (1996). Knowledge discovery from telecommunication network alarm databases. In 12th Intl. Conf. Data Engineering.

  • IBM. http://www.almaden.ibm.com/cs/quest/syndata.html. Quest Data Mining Project, IBM Almaden Research Center, San Jose, CA 95120.

  • Lesh, N., Martin, N., & Allen, J. (1998). Improving big plans. In 15th Nat. Conf. Artificial Intelligence.

  • Mannila, H., & Toivonen, H. (1996). Discovering generalized episodes using minimal occurences. In 2nd Intl. Conf. Knowledge Discovery and Data Mining.

  • Mannila, H., Toivonen, H., & Verkamo, I. (1995). Discovering frequent episodes in sequences. In 1st Intl. Conf. Knowledge Discovery and Data Mining.

  • Oates, T., Schmill, M. D., Jensen, D., & Cohen, P. R. (1997). A family of algorithms for finding temporal structure in data. In 6th Intl. Workshop on AI and Statistics.

  • Parthasarathy, S., Zaki, M. J., & Li, W.(1998). Memory placement techniques for parallel association mining. In 4th Intl. Conf. Knowledge Discovery and Data Mining.

  • Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. In 21st Intl. Conf. on Very Large Data Bases.

  • Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In 5th Intl. Conf. Extending Database Technology.

  • Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining.

  • Zaki, M. J., Lesh, N., & Ogihara, M. (1998). PLANMINE: Sequence mining for plan failures. In 4th Intl. Conf. Knowledge Discovery and Data Mining.

  • Zaki, M. J. (1998). Efficient enumeration of frequent sequences. In 7th Intl. Conf. on Information and Knowledge Management.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zaki, M.J. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 42, 31–60 (2001). https://doi.org/10.1023/A:1007652502315

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007652502315

  • sequence mining
  • sequential patterns
  • frequent patterns
  • data mining
  • knowledge discovery