Skip to main content
Log in

Finding Frequent Patterns Using Length-Decreasing Support Constraints

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Finding prevalent patterns in large amount of data has been one of the major problems in the area of data mining. Particularly, the problem of finding frequent itemset or sequential patterns in very large databases has been studied extensively over the years, and a variety of algorithms have been developed for each problem. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of these two problems. In general, patterns that contain only a few items will tend to be interesting if they have a high support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we want to find all the frequent patterns whose support decreases as a function of their length without having to find many uninteresting infrequent short patterns. Developing such algorithms is particularly challenging because the downward closure property of the constant support constraint cannot be used to prune short infrequent patterns.

In this paper we present two algorithms, LPMiner and SLPMiner. Given a length-decreasing support constraint, LPMiner finds all the frequent itemset patterns from an itemset database, and SLPMiner finds all the frequent sequential patterns from a sequential database. Each of these two algorithms combines a well-studied efficient algorithm for constant-support-based pattern discovery with three effective database pruning methods that dramatically reduce the runtime. Our experimental evaluations show that both LPMiner and SLPMiner, by effectively exploiting the length-decreasing support constraint, are up to two orders of magnitude faster, and their runtime increases gradually as the average length of the input patterns increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, R., Aggarwal, C., Prasad, V., and Crestana, V. 1998. A tree projection algorithm for generation of large itemsets for association rules. IBM Research Report RC21341.

  • Agarwal, R.C., Aggarwal, C.C., and Prasad, V.V.V. 2000. Depth first generation of long patterns. In Knowledge Discovery and Data Mining, pp. 108–118.

  • Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. of the 20th Int’l Conference on Very Large Databases. Santiago, Chile.

  • Bayardo, R.J. 1998. Efficiently mining long patterns from databases. In Proceedings of ACM International Conference on Management of Data (SIGMOD). Seattle, Washington, pp. 85–93.

  • Brin, S., Motwani, R., and Silversteim, C. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data. Tucson, Arizona.

    Google Scholar 

  • Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., and Yang, C. 2000. Finding interesting associations without support pruning. In ICDE, pp. 489–499.

  • Guralnik, V., Garg, N., and Karypis, G. 2001. Parallel tree projection algorithm for sequence mining. In European Conference on Parallel Processing, pp. 310–320.

  • Gusfield, D. 1993. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol., 55:141–154.

    Google Scholar 

  • Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’00). Dallas, TX, pp. 1–12.

    Google Scholar 

  • Han, J., Wang, J., Lu, Y., and Tzvetkov, P. 2002. Mining top-K frequent closed patterns without minimum support. In IEEE International Conference on Data Mining.

  • Kohavi, R., Broadley, C., Frasca, B., Mason, L., and Zheng, Z. 2000. KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations, 2(2):86–98.

    Google Scholar 

  • Kuramochi, M., Seno, M., and Karypis, G. 2003. PAFI a pattern finding toolkit, Technical report, Dept. of Computer Science, University of Minnesota. Available at http://www.cs.umn.edu/~pafi.

  • Lin, D. and Kedem, Z. 1998. Princer-Search: An efficient algorithm for discovering the maximum frequent set. In Proceedings of the 6th International Conference on Extending Database Technology.

  • Liu, B., Hsu, W., and Ma, Y. 1999. Mining association rules with multiple minimum supports. In SIGKDD 1999.

  • Mount, D.W. 2001. Bioinformatics: Sequence and Genome Analysis. CSHL Press.

  • Mueller, A. 1995. Fast sequential and parallel algorithms for association rule mining: A comparison, Technical Report CS-TR-3515, College Park, MD.

    Google Scholar 

  • Park, J., Chen, M., and Yu, P. 1995a. An effective hash-based algorithm for mining association rules. In Proc. of 1995 ACM-SIGMOD Int. Conf. on Management of Data.

  • Park, J., Chen, M., and Yu, P. 1995b. Efficient parallel data mining for association rules. In Proceedings of the 4th Int’l Conf. on Information and Knowledge Management.

  • Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. 1999. Discoverying frequent closed itemsets for association rules. In 7th International Conference on Database Theory, pp. 398–416.

  • Pei, J. and Han, J. 2000. Can we push more constraints into frequent pattern mining? In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

  • Pei, J., Han, J., and Mao, R. 2000. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. 2000 of ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery.

  • Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M. 2001. PrefixSpan: Mining sequential patterns by prefix-projected growth. In ICDE, pp. 215–224.

  • Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proc. of the 21st VLDB Conference. Zurich, Switzerland, pp. 432–443.

  • Srikant, R. and Agrawal, R. 1995. Mining sequential patterns. In 11th Int. Conf. Data Engineering.

  • Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the Fifth Int’l Conference on Extending Database Technology. Avignon, France.

  • Wang, K., He, Y., and Han, J. 2000. Mining frequent itemsets using support constraints. The VLDB Journal, pp. 43–52.

  • Yan, Z., Han, J., and Afshar, R. 2003. CloSpan: Mining closed sequential patterns in large datasets. In SIAM Data Mining Conference.

  • Zaki, M.J. 1997. Fast mining of sequential patterns in very large databases. Technical Report 668, Department of Computer Science, Rensselaer Polytechnic Institute.

  • Zaki, M. 2000a. Generating non-redundant association rules. In 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 34–43.

  • Zaki, M.J. 2000b. Scalable algorithms for association mining. Knowledge and Data Engineering, 12(2):372–390.

    Google Scholar 

  • Zaki, M.J. 2001. SPADE: An efficient algorithms for mining frequent sequences. Machine Learning Journal, 42:31–60.

    Google Scholar 

  • Zaki, M. J. and Gouda, K. 2001. Fast vertical mining using diffsets. Technical Report 01-1, RPI.

  • Zheng, Z., Kohavi, R., and Mason, L. 2001. Real world performance of association rule algorithms. In Proc. of the Seventh ACM SIGKDD Internation Conference on Knoeledge Discovery and Data Mining, pp. 401–406.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masakazu Seno.

Additional information

This work was supported by NSF CCR-9972519, EIA-9986042, ACI-9982274, ACI-0133464, and by Army High Performance Computing Research Center contract number DA/DAAG55-98-1-0441. Access to computing facilities was provided by the Minnesota Supercomputing Institute.

Masakazu Seno has been a system software programmer at Hitachi Software Engineering Co., Ltd. in Japan for eight years. He joined Prof. George Karypis’s research team at the University of Minnesota in 2000 to work on data mining projects, and received his master’s degree in computer science there. He is now back to the company and currently involved in the development project of a relational database management system.

George Karypis received his Ph.D. degree in computer science at the University of Minnesota and he is currently an associate professor at the Department of Computer Science and Engineering at the University of Minnesota. His research interests spans the areas of parallel algorithm design, data mining, bioinformatics, information retrieval, applications of parallel processing in scientific computing and optimization, sparse matrix computations, parallel preconditioners, and parallel programming languages and libraries. His research has resulted in the development of software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph partitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering-based recommendation algorithms (SUGGEST), clustering high dimensional datasets (CLUTO), and finding frequent patterns in diverse datasets (PAFI). He has coauthored over ninety journal and conference papers on these topics and a book title “Introduction to Parallel Computing” (Publ. AddisonWesley, 2003, 2nd edition). In addition, he is serving on the program committees of many conferences and workshops on these topics and is an associate editor of the IEEE Transactions on Parallel and Distributed Systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seno, M., Karypis, G. Finding Frequent Patterns Using Length-Decreasing Support Constraints. Data Min Knowl Disc 10, 197–228 (2005). https://doi.org/10.1007/s10618-005-0364-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-005-0364-0

Keywords

Navigation