Machine Learning

, Volume 42, Issue 1–2, pp 31–60 | Cite as

SPADE: An Efficient Algorithm for Mining Frequent Sequences

  • Mohammed J. Zaki

Abstract

In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.

sequence mining sequential patterns frequent patterns data mining knowledge discovery 

References

  1. Agrawal, R. & Srikant, R. (1995). Mining sequential patterns. In 11th Intl. Conf. on Data Engineering.Google Scholar
  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fast discovery of association rules. In U. Fayyad, et al. (ed.), Advances in knowledge discovery and data mining, pp. 307–328. Menlo Park, CA: AAAI Press.Google Scholar
  3. Davey, B. A. & Priestley, H. A. (1990). Introduction to lattices and order. Cambridge: Cambridge University Press.Google Scholar
  4. Ferguson, G. & James, A. (1998). TRIPS: An integrated intelligent problem-solving assistant. In 15th Nat. Conf. Artificial Intelligence.Google Scholar
  5. Hatonen, K., Klemettinen, M., Mannila, H., Ronkainen, P., & Toivonen, H. (1996). Knowledge discovery from telecommunication network alarm databases. In 12th Intl. Conf. Data Engineering.Google Scholar
  6. IBM. http://www.almaden.ibm.com/cs/quest/syndata.html. Quest Data Mining Project, IBM Almaden Research Center, San Jose, CA 95120.Google Scholar
  7. Lesh, N., Martin, N., & Allen, J. (1998). Improving big plans. In 15th Nat. Conf. Artificial Intelligence.Google Scholar
  8. Mannila, H., & Toivonen, H. (1996). Discovering generalized episodes using minimal occurences. In 2nd Intl. Conf. Knowledge Discovery and Data Mining.Google Scholar
  9. Mannila, H., Toivonen, H., & Verkamo, I. (1995). Discovering frequent episodes in sequences. In 1st Intl. Conf. Knowledge Discovery and Data Mining.Google Scholar
  10. Oates, T., Schmill, M. D., Jensen, D., & Cohen, P. R. (1997). A family of algorithms for finding temporal structure in data. In 6th Intl. Workshop on AI and Statistics.Google Scholar
  11. Parthasarathy, S., Zaki, M. J., & Li, W.(1998). Memory placement techniques for parallel association mining. In 4th Intl. Conf. Knowledge Discovery and Data Mining.Google Scholar
  12. Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. In 21st Intl. Conf. on Very Large Data Bases.Google Scholar
  13. Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In 5th Intl. Conf. Extending Database Technology.Google Scholar
  14. Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining.Google Scholar
  15. Zaki, M. J., Lesh, N., & Ogihara, M. (1998). PLANMINE: Sequence mining for plan failures. In 4th Intl. Conf. Knowledge Discovery and Data Mining.Google Scholar
  16. Zaki, M. J. (1998). Efficient enumeration of frequent sequences. In 7th Intl. Conf. on Information and Knowledge Management.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Mohammed J. Zaki
    • 1
  1. 1.Computer Science DepartmentRensselaer Polytechnic InstituteTroy

Personalised recommendations